Kaitai [Virtual] Machine #103

KOLANICH · 2017-02-01T19:59:23Z

I have a bit strange idea which needs some discussion. Let's assume that someone is making an app utilizing KS, for example hex editor with KS support, like kaitai IDE, but in native code. Then he can modify KS compiler and library to compile ksy right into the code it needs. IMHO the best way here is to embed some interpreter into an app, provide some API and make KS-generated code use it through its runtime library. But there is another way. It can worth to compile ksy into intermediate representation (a kind of byte-code?) which can be either interpreted by the target application (hex editor in our example) or transformed into the actual code on the language (something like llvm). The good part in it is that there is no need to modify the compiler (the frontend) for every language, only backend (actual interpreter/JIT compiler embedded in application) is needed to be created (and it should be easy because the intermediate repr should be high-level). So, how do you think, do we really need a intermediate representation target, or scripted language targets are enough?

koczkatamas · 2017-02-01T22:07:57Z

I was thinking about something like this suggestion, it would be good if we could export everything the compiler knows about the format's model before generating the output code.

We could use the same initial ksy model as we use for format ksys but extend them with new nodes.

I was thinking about adding the following information:

expressions as parsed AST tree
type / enum / etc names should be resolved to a reference in the ksy (yaml supports references, but we could use string path references, like "types/header/enums/header_type")
type / field parsing logic with code AST or llvm like code (I don't know llvm exactly, my only criteria here is not to use too low level logic, like goto jumps for loops, etc)
type informations (like EnumType(BitsType)) if available

But I am not sure this is the right approach, and in the long run it may hurt the goals of the project if everybody creates his/her homemade code instead of extending Kaitai Struct.

First we should find out what is the underlaying problem do we want to solve here.

I can tell you my case, maybe helps thinking: currently the generated C# code is less than optimal for my purposes, my main concerns are:

every class inherits from KaitaiStruct, which only contains the "io" property, so I cannot inherit this (partial) class from and other class if needed, instead the current approach I would prefer adding this field manually into every class and implement a IKaitaiStruct interface (containing an "io" getter) instead
currently we generate backing fields for every property, in modern C# code we use auto-properties for this purpose
what happens if I want to add change notification? I cannot extend easily the resulting code, only if I modify the Kaitai compiler

So I wanted to modify the Kaitai compiler to fix the first two issues (as I believe this has only pros, no cons), but I simply gave up during the process. It was not the first time I tried to modify the compiler but the logic of the compiler is still too complicated for me.

But if I had this intermediate format representation I am pretty sure I could write a compiler / code generator (even from scretch) in a few hours which could generate the code I wanted.

LogicAndTrick · 2017-02-01T23:05:20Z

Can you post your ideas for improving the C# compiler in a new issue? I could probably take a look at making some of those changes when I get some time.

GreyCat · 2017-03-03T14:53:22Z

I was thinking about something like this suggestion, it would be good if we could export everything the compiler knows about the format's model before generating the output code.

Technically, it shouldn't be too hard now. Compilation is now more or less clearly separated in 3 steps:

Initial YAML parsing (including expression language → AST parsing)
Precompilation (type inferring, determining _parents, determining fully qualified names of the classes, validation, etc) — this process is language-agnostic
Actual compilation into target language(s)

After (1) and (2) is done, the simplest form of "exporting everything compiler knows about" is basically doing topLevelClass.toString — this will yield lengthy Scala-generated ClassSpec(...) dump, recursively dumping all the structure. It isn't terribly hard to do toYaml or toJson methods in all our model structures (i.e. ClassSpec, AttrSpec, InstanceSpec, EnumSpec, *Type, etc). Actually, it might be even as easy as doing some sort of trait, that will add this method (implemented using reflection) to any object that you'll add this trait to.

type / field parsing logic with code AST or llvm like code (I don't know llvm exactly, my only criteria here is not to use too low level logic, like goto jumps for loops, etc)

"llvm like code" usually means LLVM intermediate representation (IR), which is almost as low level as it gets. It's a generic abstraction of assembler using SSA (static single assignment form) that allows to not bother with details on how many registers you have, which combinations are possible and how stack and flags work exactly. For example, if you take this sample C code:

void int_to_binary(int x, char** dst) {
  char* s = *dst;
  int i = 0;
  while (x != 0) {
    int bit = x & 1;
    s[i] = '0' + bit;
    x >>= 1;
    i++;
  }
}

And this is equivalent in LLVM IR:

define void @int_to_binary(i32 %x, i8** nocapture readonly %dst) #0 {
  %1 = load i8*, i8** %dst, align 8, !tbaa !1
  %2 = icmp eq i32 %x, 0
  br i1 %2, label %._crit_edge, label %.lr.ph

.lr.ph:                                           ; preds = %0, %.lr.ph
  %indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ]
  %.02 = phi i32 [ %7, %.lr.ph ], [ %x, %0 ]
  %3 = and i32 %.02, 1
  %4 = or i32 %3, 48
  %5 = trunc i32 %4 to i8
  %6 = getelementptr inbounds i8, i8* %1, i64 %indvars.iv
  store i8 %5, i8* %6, align 1, !tbaa !5
  %7 = ashr i32 %.02, 1
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %8 = icmp eq i32 %7, 0
  br i1 %8, label %._crit_edge, label %.lr.ph

._crit_edge:                                      ; preds = %.lr.ph, %0
  ret void
}

That br is exactly a conditional jump.

But I am not sure this is the right approach, and in the long run it may hurt the goals of the project if everybody creates his/her homemade code instead of extending Kaitai Struct.

I would rather totally embrace anybody creating alternative implementations of Kaitai Struct using our specifications and test suite. The bad thing would be if someone would do a KS fork and start adding things in a way not compatible with our implementation.

KOLANICH changed the title ~~Kaitai Machine~~ Kaitai [Virtual] Machine Feb 4, 2017

KOLANICH mentioned this issue May 24, 2018

Kaitai-powered Photorec cgsecurity/testdisk#33

Open

GreyCat added the question label Mar 12, 2019

KOLANICH mentioned this issue Oct 15, 2019

Upcoming IDE: Hexalepis #635

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kaitai [Virtual] Machine #103

Kaitai [Virtual] Machine #103

KOLANICH commented Feb 1, 2017 •

edited

koczkatamas commented Feb 1, 2017 •

edited

LogicAndTrick commented Feb 1, 2017

GreyCat commented Mar 3, 2017

Kaitai [Virtual] Machine #103

Kaitai [Virtual] Machine #103

Comments

KOLANICH commented Feb 1, 2017 • edited

koczkatamas commented Feb 1, 2017 • edited

LogicAndTrick commented Feb 1, 2017

GreyCat commented Mar 3, 2017

KOLANICH commented Feb 1, 2017 •

edited

koczkatamas commented Feb 1, 2017 •

edited