Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kaitai [Virtual] Machine #103

Open
KOLANICH opened this issue Feb 1, 2017 · 3 comments
Open

Kaitai [Virtual] Machine #103

KOLANICH opened this issue Feb 1, 2017 · 3 comments
Labels

Comments

@KOLANICH
Copy link

KOLANICH commented Feb 1, 2017

I have a bit strange idea which needs some discussion. Let's assume that someone is making an app utilizing KS, for example hex editor with KS support, like kaitai IDE, but in native code. Then he can modify KS compiler and library to compile ksy right into the code it needs. IMHO the best way here is to embed some interpreter into an app, provide some API and make KS-generated code use it through its runtime library. But there is another way. It can worth to compile ksy into intermediate representation (a kind of byte-code?) which can be either interpreted by the target application (hex editor in our example) or transformed into the actual code on the language (something like llvm). The good part in it is that there is no need to modify the compiler (the frontend) for every language, only backend (actual interpreter/JIT compiler embedded in application) is needed to be created (and it should be easy because the intermediate repr should be high-level). So, how do you think, do we really need a intermediate representation target, or scripted language targets are enough?

@koczkatamas
Copy link
Member

koczkatamas commented Feb 1, 2017

I was thinking about something like this suggestion, it would be good if we could export everything the compiler knows about the format's model before generating the output code.

We could use the same initial ksy model as we use for format ksys but extend them with new nodes.

I was thinking about adding the following information:

  • expressions as parsed AST tree
  • type / enum / etc names should be resolved to a reference in the ksy (yaml supports references, but we could use string path references, like "types/header/enums/header_type")
  • type / field parsing logic with code AST or llvm like code (I don't know llvm exactly, my only criteria here is not to use too low level logic, like goto jumps for loops, etc)
  • type informations (like EnumType(BitsType)) if available

But I am not sure this is the right approach, and in the long run it may hurt the goals of the project if everybody creates his/her homemade code instead of extending Kaitai Struct.

First we should find out what is the underlaying problem do we want to solve here.

I can tell you my case, maybe helps thinking: currently the generated C# code is less than optimal for my purposes, my main concerns are:

  • every class inherits from KaitaiStruct, which only contains the "io" property, so I cannot inherit this (partial) class from and other class if needed, instead the current approach I would prefer adding this field manually into every class and implement a IKaitaiStruct interface (containing an "io" getter) instead
  • currently we generate backing fields for every property, in modern C# code we use auto-properties for this purpose
  • what happens if I want to add change notification? I cannot extend easily the resulting code, only if I modify the Kaitai compiler

So I wanted to modify the Kaitai compiler to fix the first two issues (as I believe this has only pros, no cons), but I simply gave up during the process. It was not the first time I tried to modify the compiler but the logic of the compiler is still too complicated for me.

But if I had this intermediate format representation I am pretty sure I could write a compiler / code generator (even from scretch) in a few hours which could generate the code I wanted.

@LogicAndTrick
Copy link
Collaborator

Can you post your ideas for improving the C# compiler in a new issue? I could probably take a look at making some of those changes when I get some time.

@KOLANICH KOLANICH changed the title Kaitai Machine Kaitai [Virtual] Machine Feb 4, 2017
@GreyCat
Copy link
Member

GreyCat commented Mar 3, 2017

I was thinking about something like this suggestion, it would be good if we could export everything the compiler knows about the format's model before generating the output code.

Technically, it shouldn't be too hard now. Compilation is now more or less clearly separated in 3 steps:

  1. Initial YAML parsing (including expression language → AST parsing)
  2. Precompilation (type inferring, determining _parents, determining fully qualified names of the classes, validation, etc) — this process is language-agnostic
  3. Actual compilation into target language(s)

After (1) and (2) is done, the simplest form of "exporting everything compiler knows about" is basically doing topLevelClass.toString — this will yield lengthy Scala-generated ClassSpec(...) dump, recursively dumping all the structure. It isn't terribly hard to do toYaml or toJson methods in all our model structures (i.e. ClassSpec, AttrSpec, InstanceSpec, EnumSpec, *Type, etc). Actually, it might be even as easy as doing some sort of trait, that will add this method (implemented using reflection) to any object that you'll add this trait to.

type / field parsing logic with code AST or llvm like code (I don't know llvm exactly, my only criteria here is not to use too low level logic, like goto jumps for loops, etc)

"llvm like code" usually means LLVM intermediate representation (IR), which is almost as low level as it gets. It's a generic abstraction of assembler using SSA (static single assignment form) that allows to not bother with details on how many registers you have, which combinations are possible and how stack and flags work exactly. For example, if you take this sample C code:

void int_to_binary(int x, char** dst) {
  char* s = *dst;
  int i = 0;
  while (x != 0) {
    int bit = x & 1;
    s[i] = '0' + bit;
    x >>= 1;
    i++;
  }
}

And this is equivalent in LLVM IR:

define void @int_to_binary(i32 %x, i8** nocapture readonly %dst) #0 {
  %1 = load i8*, i8** %dst, align 8, !tbaa !1
  %2 = icmp eq i32 %x, 0
  br i1 %2, label %._crit_edge, label %.lr.ph

.lr.ph:                                           ; preds = %0, %.lr.ph
  %indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ]
  %.02 = phi i32 [ %7, %.lr.ph ], [ %x, %0 ]
  %3 = and i32 %.02, 1
  %4 = or i32 %3, 48
  %5 = trunc i32 %4 to i8
  %6 = getelementptr inbounds i8, i8* %1, i64 %indvars.iv
  store i8 %5, i8* %6, align 1, !tbaa !5
  %7 = ashr i32 %.02, 1
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %8 = icmp eq i32 %7, 0
  br i1 %8, label %._crit_edge, label %.lr.ph

._crit_edge:                                      ; preds = %.lr.ph, %0
  ret void
}

That br is exactly a conditional jump.

But I am not sure this is the right approach, and in the long run it may hurt the goals of the project if everybody creates his/her homemade code instead of extending Kaitai Struct.

I would rather totally embrace anybody creating alternative implementations of Kaitai Struct using our specifications and test suite. The bad thing would be if someone would do a KS fork and start adding things in a way not compatible with our implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants