Investigate using opaque MLIR as our AST #101

rengolin · 2020-05-07T18:10:27Z

Weigh the pros and cons of using opaque MLIR as our AST instead of having a hand-coded AST structure.

Acceptance criteria:

In-depth analysis of the pros and cons of each choice
Examples for each side, showing benefits and issues
A consensus on which path to take

Not included:

Implementation of any kind

plietar · 2020-05-07T23:27:17Z

I’m quite skeptical of this, as function bodies is only a fraction of the AST. Stuff like signatures, types, classes and interfaces, modules and imports... also need to be represented.

rengolin · 2020-05-08T11:11:29Z

I'm sceptical too, but it's worth investigating, even if to quickly conclude it's not a good idea.

The assumption is that complex types and signatures can be represented as text or attributes, and classes and interfaces to be an annotation on a lexical block. Modules could be done via name mangling (like namespace) and imports would probably be done before lowering to MLIR.

The reason behind this is that keeping a formal AST with rich semantics is yet-another internal representation. Also, there was the idea of doing the type inference directly in MLIR, which this path would make it much simpler.

That is, of course, if it's at all possible. This task is just to investigate, discuss and come back with a consensus. Whatever the direction later, we'll create the appropriate issues.

mjp41 · 2020-05-11T06:59:04Z

I am also sceptical and suggested we try it. I just think MLIR regions might make this possible as they allow nesting regions of code. But symbol resolution probably will be the problem as they can only refer to sibling and parent regions. If we use regions to represent classes, then the methods won't be exposed. But perhaps there is another encoding trick.

The main reason I like this, is that having a "single" format for all the stages would make some tooling nicer. Probably not worth the pain, but worth investigating.

rengolin · 2020-05-11T15:47:06Z

Very rough draft, types are going to be hard without at least a shell dialect.
https://github.com/rengolin/verona/tree/mlir-ast

mjp41 · 2020-05-12T13:18:58Z

I prefer the fib-verona-type.mlir. Regions are very good for capturing the closure nature of when.

I think the example does not reflect one of the challenges @plietar raised. Fields: etc, Where we will need something to declare the shape of the classes. This is also going to have to take account of generics. Something like

https://github.com/microsoft/verona/blob/master/testsuite/demo/run-pass/bank1.verona

is about the simplist thing with fields. We need something in the MLIR dialect to declare them.

rengolin · 2020-05-12T13:42:18Z

Right, me too. I'm working on moving the standalone example from MLIR as a skeleton for the Verona dialect, so we can have opaque !verona<> types for now at least. But I need to plug the CMake files in the right or risk breaking the rest of the build.

I'm hoping we can get any additional requirements as attributes, like fib-attr.mlir tries to do with actual types. If we can refer to SSA values in attributes, perhaps we can also create a reference chain for more complex patterns.

rengolin · 2020-05-12T16:22:24Z

Stub Verona dialect: https://github.com/rengolin/verona/tree/mlir-ast/mlir

rengolin · 2020-05-14T13:37:17Z

Turns out there's an option to enable opaque operands and types without the need of a stub dialect: -allow-unregistered-dialect.

The new fib.mlir works when that option is used with mlir-opt:
https://github.com/rengolin/verona/blob/mlir-ast/testsuite/demo/run-pass/fib.mlir

rengolin · 2020-05-14T13:38:08Z

Now we need to show that we can represent all concepts from the language we want only using opaque operations, types and attribute lists.

davidchisnall · 2020-05-15T08:56:16Z

Keep in mind a few things that are not present in the current interpreter (I don't think any these will be impossible in MLIR, but I could be wrong):

We have different kinds of region. We need to support different semantics for different region kinds. For example, tracing GC, refcounting, bump-allocated, sandboxed (parameterised over different sandboxing mechanisms, e.g. process, wasm, CHERI)
@sylvanc's proposal for user-defined operator syntax. Depending on how the lowering is done, this may involve some rewrites later on to determine which things are actually the called things (and what the types of any of the things are).
Value types
Constraints on function types (both in terms of region and type restrictions), which will affect overloading.
Generics

I think the last ones are the ones most likely to cause problems, because the callee won't be known at early lowering time and will need to be transmuted and possibly synthesised after type checking. I think MLIR supports this, it's a bit painful to do with LLVM IR.

rengolin · 2020-05-15T10:04:21Z

A few ways we could do those:

Have region annotations (attribute lists)
Generics can be used with !verona<"typename T"> or something

The others, I really don't know how to do. We talked about this yesterday and the consensus is that it may be too early to go straight to MLIR.

First, it's clear we don't know how to lower most of the advanced concepts of Verona, and that's the whole point of creating a new language. :)

Second, even if we did lower everything (ex. as attribute lists and opaque ops/types), the manipulation of this "AST" would be tedious and under-performing. AST nodes are small structures with clear semantics, translating that to an opaque representation of lists of lists with string comparisons isn't going to make it fast.

Finally, Sylvan already has an AST prototype (#111) that can parse some variation of the Verona grammar, and will allow us to work on different levels of the compiler at the same time, so it's a better early development strategy.

I'll close this ticket now but, once we have a better view of the AST/MLIR dialects, and we want to revisit this idea, we can either reopen it or create a new one.

rengolin · 2020-05-26T08:54:35Z

For reference, this is the last version of the "opaque" MLIR file:

module {

  // Class Fib
  // fib(x: U64 & imm): cown[U64Obj] & imm
  // FIXME: What is the semantics of the cown declaration above?
  func @"Fib::fib"(%x: !verona<"U64 & imm">) -> !verona<"Fib"> {

    // Avoid redeclaring constants (auto-generated probably would)
    %one = "verona.constant"() { "1" } : () -> (!verona<"U64 & imm">)
    %two = "verona.constant"() { "2" } : () -> (!verona<"U64 & imm">)

    // var pw = Promise::create();
    // Like "Fib::fib", we prefix all methods with the class mangled name
    %pw = "verona.Promise::create"() : () -> (!verona<"Promise">)

    // var pr = (mut-view pw).wait_handle();
    // FIXME: What is the semantics of this?
    %mv_pr = "verona.mut-view" (%pw) : (!verona<"Promise">) -> (!verona<"mut-view?">)
    %mv_pr_wh = "verona.wait_handle"() : () -> (!verona<"wait_handle?">)

    // if (x < 2)
    // Comparing verona types can't be standard "cmp", but the result _must_ be i1
    %eq = "verona.cmp_lt"(%x, %two) : (!verona<"U64 & imm">, !verona<"U64 & imm">) -> (i1)
    cond_br %eq, ^bb1, ^bb2

  ^bb1:                                 // pred: ^bb0
    // pw.fulfill(U64Obj.create(1));
    // object method call represented as class method with "this" as first argument
    %obj = "verona.U64Obj::create"(%one) : (!verona<"U64 & imm">) -> (!verona<"[U64Obj] & imm">)
    "verona.Promise::fullfill"(%pw, %obj) : (!verona<"Promise">, !verona<"[U64Obj] & imm">) -> ()
    br ^bb3

  ^bb2:                                 // pred: ^bb0:
    // when ()
    // Arguments on first basic block
    "when"() ({
      //   var p1 = Fib.fib(x - 1);
      // Same argument for "cmp" above applies to "sub" and "add" below
      %x1 = "verona.sub"(%x, %one) : (!verona<"U64 & imm">, !verona<"U64 & imm">) -> (!verona<"U64 & imm">)
      %p1 = call @"Fib::fib"(%x1) : (!verona<"U64 & imm">) -> (!verona<"Fib">)
     
      //   var p2 = Fib.fib(x - 2);
      %x2 = "verona.sub"(%x, %two) : (!verona<"U64 & imm">, !verona<"U64 & imm">) -> (!verona<"U64 & imm">)
      %p2 = call @"Fib::fib"(%x2) : (!verona<"U64 & imm">) -> (!verona<"Fib">)

      // when()
      // Arguments on first basic block
      "when"(%p1, %p2) ({
        // var r = U64Obj.create(p1.v + p2.v);
        // FIXME: what is the semantics of p1.v/p2.v?
        %p1v = "verona.getValue"(%p1) : (!verona<"Fib">) -> (!verona<"U64 & imm">)
        %p2v = "verona.getValue"(%p2) : (!verona<"Fib">) -> (!verona<"U64 & imm">)
        %sum = "verona.add"(%p1v, %p2v) : (!verona<"U64 & imm">, !verona<"U64 & imm">) -> (!verona<"U64 & imm">)
        %r = "verona.U64Obj::create"(%sum) : (!verona<"U64 & imm">) -> (!verona<"[U64Obj] & imm">)

        // pw.fulfill(r)
        // object method call represented as class method with "this" as first argument
        "verona.Promise::fullfill"(%pw, %r) : (!verona<"Promise">, !verona<"[U64Obj] & imm">) -> ()
      }) : (!verona<"Fib">, !verona<"Fib">) -> ()
    }) : () -> ()
    br ^bb3

  ^bb3:                                 // pred: ^bb1, ^bb2:
    // pr
    // FIXME: "pr" what now?
    %ret = "verona.pr"() : () -> (!verona<"Fib">)
    return %ret : !verona<"Fib">
  }

  // Class Main
  // main()
  func @"Main::main"() {
    // when (var uo = Fib.fib(12))
    %twelve = "verona.constant"() { "12" } : () -> (!verona<"U64 & imm">)
    %uo = call @"Fib::fib"(%twelve) : (!verona<"U64 & imm">) -> (!verona<"Fib">)
    "when"(%uo) ({
      // Builtin.print1("result={}\n", uo.v);
      // FIXME: what is the semantics of uo.v?
      %fmt = "verona.constant"() { "result={}\n" } : () -> (!verona<"string">)
      %uo_v = "verona.getValue"(%uo) : (!verona<"Fib">) -> (!verona<"U64">)
      "verona.Builtin::printl"(%fmt, %uo_v) : (!verona<"string">, !verona<"U64">) -> ()
    }) : (!verona<"Fib">) -> ()
  }
}

rengolin mentioned this issue May 7, 2020

Implement basic Parser #103

Closed

rengolin mentioned this issue May 11, 2020

Design a Verona MLIR dialect #106

Closed

rengolin self-assigned this May 11, 2020

rengolin closed this as completed May 15, 2020

mjp41 mentioned this issue Nov 10, 2020

Typed AST #344

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate using opaque MLIR as our AST #101

Investigate using opaque MLIR as our AST #101

rengolin commented May 7, 2020

plietar commented May 7, 2020

rengolin commented May 8, 2020

mjp41 commented May 11, 2020

rengolin commented May 11, 2020 •

edited

Loading

mjp41 commented May 12, 2020

rengolin commented May 12, 2020

rengolin commented May 12, 2020

rengolin commented May 14, 2020

rengolin commented May 14, 2020

davidchisnall commented May 15, 2020

rengolin commented May 15, 2020

rengolin commented May 26, 2020

Investigate using opaque MLIR as our AST #101

Investigate using opaque MLIR as our AST #101

Comments

rengolin commented May 7, 2020

plietar commented May 7, 2020

rengolin commented May 8, 2020

mjp41 commented May 11, 2020

rengolin commented May 11, 2020 • edited Loading

mjp41 commented May 12, 2020

rengolin commented May 12, 2020

rengolin commented May 12, 2020

rengolin commented May 14, 2020

rengolin commented May 14, 2020

davidchisnall commented May 15, 2020

rengolin commented May 15, 2020

rengolin commented May 26, 2020

rengolin commented May 11, 2020 •

edited

Loading