Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arc] MemoryReadPortOp: remove enable and clock operand, make it pure #5194

Merged
merged 1 commit into from
May 14, 2023

Conversation

maerhart
Copy link
Member

  • Remove the optional enable and clock operands from arc.memory_read_port making it a combinatorial read port. Latencies can be added using the arc.state operation.
  • Adjust InferMemories accordingly
  • Make arc.memory_read_port pure to allow it to be moved into arc.define ops. This means the memory value has to be passed via argument. This will be considered to pass the whole memory as a value (like hw.array) at this stage of the lowering. The LegalizeStateUpdates pass should make it easy to switch to pointer semantics for memories.
  • Remove top-level sv.macro.decl ops in StripSV (sorry for this unrelated change, but it's just 4 lines :) )
  • Remove the optional stride parameter from the !arc.memory type and add a member function to conveniently compute the stride on-demand. Not removing it led to some issues in the lowering to LLVM which would have required bigger changes to AllocateState otherwise.
  • Fix a bug in the ClockDomainOp canonicalizer revealed by the change to the regression test required by the removal of the memory_read_port operands
  • Change the memory read port lowering in LowerState such that it does not reorder read ops anymore, but just converts them and also lowers Arcs that contain reads to functions early on because arcs don't allow ops with memory side effects. Alternatively we could split the arcs and pull the read ops out again.
  • Add the memory read/write reordering to LegalizeStateUpdates instead. This is just a temporary implementation and fails on more complex cases, but we need to re-implement this pass anyways as it is very slow (~4 sec on rocket, 100s of sec on largeBoom)
  • Add support for func.call op lowering in LowerArcToLLVM and lower arcs with LLVM internal linkage such that LLVM can remove those functions when getting inlined to reduce the objfile size to almost a third for rocket.

Sorry for touching a lot of things in this PR. Let me know if I should try to split it up a bit more.

@maerhart maerhart added the Arc Involving the `arc` dialect label May 13, 2023
Copy link
Contributor

@fabianschuiki fabianschuiki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for tackling all of this 🥳!

include/circt/Dialect/Arc/ArcOps.td Show resolved Hide resolved
lib/Conversion/ArcToLLVM/LowerArcToLLVM.cpp Show resolved Hide resolved
lib/Dialect/Arc/Transforms/InferMemories.cpp Show resolved Hide resolved
Comment on lines +619 to +647
SymbolTableCollection symbolTable;
mlir::SymbolUserMap userMap(symbolTable, getOperation());
for (auto defOp : arcsToLower) {
auto *terminator = defOp.getBodyBlock().getTerminator();
builder.setInsertionPoint(terminator);
builder.create<func::ReturnOp>(terminator->getLoc(),
terminator->getOperands());
terminator->erase();
builder.setInsertionPoint(defOp);
auto funcOp = builder.create<func::FuncOp>(defOp.getLoc(), defOp.getName(),
defOp.getFunctionType());
funcOp->setAttr("llvm.linkage",
LLVM::LinkageAttr::get(builder.getContext(),
LLVM::linkage::Linkage::Internal));
funcOp.getBody().takeBody(defOp.getBody());

for (auto *user : userMap.getUsers(defOp)) {
builder.setInsertionPoint(user);
ValueRange results = builder
.create<func::CallOp>(
user->getLoc(), funcOp,
cast<CallOpInterface>(user).getArgOperands())
->getResults();
user->replaceAllUsesWith(results);
user->erase();
}

defOp.erase();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can avoid the need for a FuncOp in the future if we made MemoryReadOp not have a MemoryRead side-effect, since LegalizeStateUpdate guarantees that reads do the right thing even if they are misordered according to their side-effects. Sounds ugly though...

Alternatively, we could think about converting all arcs into funcs at this point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the one hand I would love to not have func.func operations at that stage already, on the other hand it would allow us to move read and write operation into the function and thus reducing the number of argument and result values of a call and the number of overall functions because we could merge arcs that are only separated by a memory effect operation.

Changing MemoryReadOp to not have a memory read effect is likely not safe though. It's definitely fine before LegalizeStateUpdate, but we also use it after that pass and then, e.g., CSE, could change semantics. Another option could be lowering memories as part of LegalizeStateUpdate, not sure if I like that approach though.

Lowering all functions here already could totally be a way to go. I wasn't really sure what's the best way to deal with the memory side effect added in LowerState, so it would be great if we could discuss this in more detail at some point. My reasoning to not lower all arcs was that func.call has arbitrary side-effects while arc.call doesn't have any, so it could allow for more canonicalizations/optimizations, but having both at the same time might not be nice indeed.

@maerhart maerhart merged commit 43e69a8 into main May 14, 2023
@maerhart maerhart deleted the dev/maerhart/arc-memreadport-remove-clockandenable branch May 14, 2023 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arc Involving the `arc` dialect
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants