Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Moore MIR dialect and move codegen to CIRCT #235

Open
1 of 6 tasks
fabianschuiki opened this issue Nov 21, 2021 · 9 comments
Open
1 of 6 tasks

Add Moore MIR dialect and move codegen to CIRCT #235

fabianschuiki opened this issue Nov 21, 2021 · 9 comments
Labels
A-codegen Area: Code generation. C-enhancement Category: Adding or improving on features. L-vlog Language: Verilog and SystemVerilog. P-high Priority: High.

Comments

@fabianschuiki
Copy link
Owner

fabianschuiki commented Nov 21, 2021

#234 implements code generation by linking against the CIRCT project and using MLIR to generate and emit assembly. As a first step towards moving more of Moore's code over to MLIR, we should add a "Moore MIR" dialect to CIRCT.

This dialect should aim to model the mir::Rvalue and mir::Lvalue representations, and add new MLIR ops to represent the remaining SV statements and declarations (modules, processes, instances, variables, nets, assigns, conditionals, loops -- basically everything that codegen.rs knows how to emit code for, but isn't currently captured as part of the mir module). This is likely to require adding a full implementation of svlog/ty.rs in CIRCT.

Once this dialect exists, we can raise the level of abstraction in codegen.rs and emit the Moore MIR dialect instead of all the low level HW/Comb/LLHD/Standard. The translation from MIR to those low-level dialects can then move into the CIRCT project as a dedicated lowering pass. This is phenomenal because MIR being an MLIR dialect will allow us to write very concise tests that check code generation for specific SV features and semantics without the whole parser and type checking in the loop. The MIR then ends up representing a full SV design with all types and implicit operations resolved to explicit things -- making it essentially an SV semantics dialect.

Todo

@fabianschuiki fabianschuiki added L-vlog Language: Verilog and SystemVerilog. C-enhancement Category: Adding or improving on features. A-codegen Area: Code generation. P-high Priority: High. labels Nov 21, 2021
@fabianschuiki
Copy link
Owner Author

@maerhart What do you think about this plan?

@maerhart
Copy link
Collaborator

Sounds great to me!

Just wondering whether we can use (part of) the SV dialect in the lowering chain from SV AST to HW/Comb/LLHD. And whether this would make sense at all since the SV dialect is targeted at printing and we are focussed on lowering.

How would such a Moore MIR dialect look like?
As far as I understand, the goal of MIR is to make all types and casts explicit. We would have an operation for each expression and statement in SV (including the constructs that are directly lowered from HIR to LLHD) with a strong type system. lvalues and rvalues are probably best represented in the type system (wrapping the real type inside a lvalue/rvalue type?). The assign ops then only accept lvalues as first argument and only rvalues as second argument. Do you already have concrete ideas on this?

What exactly is the goal of HIR?
Just getting rid of syntactic sugar and ambiguities?
As the amount of expressions/statements in SV is quite large, it could make sense to have most of them only in one dialect and use other dialects as extensions to reduce the amount of redundancy once we also want to port over HIR, e.g., have an HIR dialect that only models the syntactic sugar and ambiguities part. Would that be possible? Or do we need a complete IR to model and infer the unresolved types?

@fabianschuiki
Copy link
Owner Author

Just wondering whether we can use (part of) the SV dialect in the lowering chain from SV AST to HW/Comb/LLHD. And whether this would make sense at all since the SV dialect is targeted at printing and we are focussed on lowering.

I think you're right and it'll probably be difficult to use the SV dialect directly from the start. That dialect is being driven largely by what is needed for good Verilog emission, which may be a use case very distinct from representing SV from a frontend point of view. I'm not saying that they shouldn't share as much as possible, but given the different design goals it sounds like a better approach to first build out the Moore-specific dialect, and then try to nudge that and the SV dialect in CIRCT closer together, and increase the reuse between them.

We would have an operation for each expression and statement in SV (including the constructs that are directly lowered from HIR to LLHD) with a strong type system. lvalues and rvalues are probably best represented in the type system (wrapping the real type inside a lvalue/rvalue type?). The assign ops then only accept lvalues as first argument and only rvalues as second argument. Do you already have concrete ideas on this?

Yeah I was thinking about something like you describe. MIR should be the "final bastion" of SV and be a truthful representation of the semantics of the input file, with all ambiguities and implicitness removed, and all types fully known. So pretty much what you describe: operations for all the constructs, expressions, and statements that SV has to offer. This would include all the classes, verification craziness like properties and sequences, clocking blocks, programs, interfaces, packages, assertions, and much more.

What exactly is the goal of HIR? Just getting rid of syntactic sugar and ambiguities? As the amount of expressions/statements in SV is quite large, it could make sense to have most of them only in one dialect and use other dialects as extensions to reduce the amount of redundancy once we also want to port over HIR, e.g., have an HIR dialect that only models the syntactic sugar and ambiguities part. Would that be possible? Or do we need a complete IR to model and infer the unresolved types?

The initial goal in Moore was to get around a few limitations in the early days of the AST and query system. It was intended to offer a way to resolve syntactic ambiguities (for example the cast foo'(someExpr), where you don't know if foo is a type or an expression during AST construction). I'm not sure this is still needed. I have reworked the AST in the meantime, which is now much easier to work with and I've been pushing queries to work directly on the AST where possible, instead of HIR. There was also the addition of the RST (and the ambiguity resolution queries), which basically just map a few of the AST constructs (the ambiguous ones) to one of the concrete possibilities after names have been resolved (scope and name table construction runs on the AST).

I could totally see operations survive all the way from the AST down to the MIR. In the Rust world I was very careful to prevent mutation of the ops in the different IRs, to enforce safety and make passes purely additive. But in MLIR with the mutation galore and rampant unsafety of C++, we can basically start to mutate operations as we see fit. For example, we might just have one single moore.expr.add operation to represent + in SV:

  • After parsing, this operation would be created from the AST, maybe with a !moore.unresolved marker type
  • Name resolution would then update moore.expr.ident nodes to contain a pointer (symbol or something) to the thing they are referring to
  • Type checking could then go in and update moore.expr.add with the correct resolved type, and also convert types from their AST construct (like logic [3:0] being a Type { name: Named("logic"), dims: Range(3, 0) }) to a corresponding type in the IR
  • MIR lowering would then go through the expressions and insert casts where appropriate. For example if a moore.expr.add operates on a 3 and 4 bit value, and is assigned to a 5 bit signal, it would have an operation type of 5 bits and needed a cast inserted for its operands.

You are totally right that replicating a lot of the ops just for the sake of providing a few restrictions on them (like "here the types need to be known") is probably wasteful. We could also just declare "MIR" as being a subset of all the Moore dialect operations, with certain additional restrictions on types.

@fabianschuiki
Copy link
Owner Author

My suggestion would be to start with the minimum that is needed to represent the MIR and move codegen over to CIRCT. I'm pretty sure this will already instruct quite a few design decisions, and requires implementing the fully resolved SV type system as a start. Then we can look into having implicit casts inserted on the CIRCT side, and start to move monomorphization over to CIRCT as well 😄

@maerhart
Copy link
Collaborator

Thank you for the detailed description! I completely agree with that.

I already started with a skeleton dialect for Moore MIR, some types and three ops forming a simple example plus lowering to HW/LLHD here.

@fabianschuiki
Copy link
Owner Author

Wow this is some seriously amazing work! I love it 🎉! Let me add some comments right to the commit itself. It's great that you went for a minimal working example. I would suggest that we try to merge this into upstream CIRCT as soon as possible, to keep the PRs small and easy for people to digest. Then it can evolve within CIRCT.

@maerhart
Copy link
Collaborator

Thanks for the quick feedback! I did some cleanup and addressed your comments. The diff against main is here. Let me know if there's anything else I should change or if it's ready for a PR. We just have to wait until the LLVM Submodule update PR is merged as I used the new type assembly format feature for convenience.

@fabianschuiki
Copy link
Owner Author

Cool thanks a lot, this looks great! Since we're only working with 3 types at the moment (and the LLVM submodule update upstream might take a while to get merged), would it make sense to just use the old-school manual type parsing approach instead to unblock this PR?

@maerhart
Copy link
Collaborator

Yeah that's actually an easy change. I rebased to the old LLVM version and opened the PR in CIRCT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation. C-enhancement Category: Adding or improving on features. L-vlog Language: Verilog and SystemVerilog. P-high Priority: High.
Projects
None yet
Development

No branches or pull requests

2 participants