New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kaitai [Virtual] Machine #103
Comments
I was thinking about something like this suggestion, it would be good if we could export everything the compiler knows about the format's model before generating the output code. We could use the same initial ksy model as we use for format ksys but extend them with new nodes. I was thinking about adding the following information:
But I am not sure this is the right approach, and in the long run it may hurt the goals of the project if everybody creates his/her homemade code instead of extending Kaitai Struct. First we should find out what is the underlaying problem do we want to solve here. I can tell you my case, maybe helps thinking: currently the generated C# code is less than optimal for my purposes, my main concerns are:
So I wanted to modify the Kaitai compiler to fix the first two issues (as I believe this has only pros, no cons), but I simply gave up during the process. It was not the first time I tried to modify the compiler but the logic of the compiler is still too complicated for me. But if I had this intermediate format representation I am pretty sure I could write a compiler / code generator (even from scretch) in a few hours which could generate the code I wanted. |
Can you post your ideas for improving the C# compiler in a new issue? I could probably take a look at making some of those changes when I get some time. |
Technically, it shouldn't be too hard now. Compilation is now more or less clearly separated in 3 steps:
After (1) and (2) is done, the simplest form of "exporting everything compiler knows about" is basically doing
"llvm like code" usually means LLVM intermediate representation (IR), which is almost as low level as it gets. It's a generic abstraction of assembler using SSA (static single assignment form) that allows to not bother with details on how many registers you have, which combinations are possible and how stack and flags work exactly. For example, if you take this sample C code: void int_to_binary(int x, char** dst) {
char* s = *dst;
int i = 0;
while (x != 0) {
int bit = x & 1;
s[i] = '0' + bit;
x >>= 1;
i++;
}
} And this is equivalent in LLVM IR: define void @int_to_binary(i32 %x, i8** nocapture readonly %dst) #0 {
%1 = load i8*, i8** %dst, align 8, !tbaa !1
%2 = icmp eq i32 %x, 0
br i1 %2, label %._crit_edge, label %.lr.ph
.lr.ph: ; preds = %0, %.lr.ph
%indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ]
%.02 = phi i32 [ %7, %.lr.ph ], [ %x, %0 ]
%3 = and i32 %.02, 1
%4 = or i32 %3, 48
%5 = trunc i32 %4 to i8
%6 = getelementptr inbounds i8, i8* %1, i64 %indvars.iv
store i8 %5, i8* %6, align 1, !tbaa !5
%7 = ashr i32 %.02, 1
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%8 = icmp eq i32 %7, 0
br i1 %8, label %._crit_edge, label %.lr.ph
._crit_edge: ; preds = %.lr.ph, %0
ret void
} That
I would rather totally embrace anybody creating alternative implementations of Kaitai Struct using our specifications and test suite. The bad thing would be if someone would do a KS fork and start adding things in a way not compatible with our implementation. |
I have a bit strange idea which needs some discussion. Let's assume that someone is making an app utilizing KS, for example hex editor with KS support, like kaitai IDE, but in native code. Then he can modify KS compiler and library to compile ksy right into the code it needs. IMHO the best way here is to embed some interpreter into an app, provide some API and make KS-generated code use it through its runtime library. But there is another way. It can worth to compile ksy into intermediate representation (a kind of byte-code?) which can be either interpreted by the target application (hex editor in our example) or transformed into the actual code on the language (something like llvm). The good part in it is that there is no need to modify the compiler (the frontend) for every language, only backend (actual interpreter/JIT compiler embedded in application) is needed to be created (and it should be easy because the intermediate repr should be high-level). So, how do you think, do we really need a intermediate representation target, or scripted language targets are enough?
The text was updated successfully, but these errors were encountered: