[Synth][CutRewriter] Migrate CutRewriter to use Flat IR#9805
Conversation
| } else if (numFanins == 2) { | ||
| // Two-way merge (common case for AND gates) |
There was a problem hiding this comment.
Profile showed this merge sort is one of the critical path so these parts are manually written instead of relying on STL.
41c4577 to
644a4ec
Compare
644a4ec to
774d609
Compare
fabianschuiki
left a comment
There was a problem hiding this comment.
This is really really cool! Amazing speedup, and great to see that the cut rewriting is in the same ballpark as ABC! 🥳
…work (#9804) This commit introduces a data structure which reduces CutRewriter run time quite a bit (see benchmark result in #9805). It is well known that MLIR is too slow when naively used for specific logic synthesis transformation like Cut rewriting, and previous implementation was impractical when cut input size > 4. For these tasks, a cache-friendly flat IR data structure (like those in ABC or mockturtle) is much more efficient. This PR changes CutRewriter to first create a read-only, in-memory IR to be used by the CutRewriter. It allows 2-input AND, 3-input Majority, and 2-input XOR to coexist in the same data structure, which lets us represent AIG, MIG, XOR-AND, and XOR-MAJ together. This is slight different from ABC (AIG), Mockurtle's template based implementation (AIG/MIG/XOR-MAJ/XOR-AND). This PR introduces only data structure, test changes are included in #9805 AI-assisted-by: Claude Sonnet 4.6, Opus 4.6
Switch cut enumeration and truth table handling to LogicNetwork indices, update downstream mappers to consume the new cut API, and refresh Synth tests for the new ordering/operation support without area-recovery changes.
6be9729 to
4662d46
Compare
|
Amazing stuff @uenoku - sorry I was too slow to review haha - we've been running circt-synth on some really large testcases and my collaborators have complained about speed so hopefully this might offer some nice benefits right away |
|
No worry! Thank you for taking a look! Do your collaborators have performance issues around LUT mapping? If so this PR will immediately improve the compilation time. Otherwise I'm happy to investigate regression. |
Switch cut enumeration and truth table handling to LogicNetwork indices, update downstream mappers to the new cut API, and refresh Synth tests for the updated ordering and operations. PR staked on #9804.
This is NFCI except for a small bug fix in NPNClass utils; only operand orders are reordered.
The following is a runtime comparison of LUT mapping for
hyptest case (0.2M AIG nodes, largest test case of non MtM category) in EPFL Combinational Benchmark Suite. LUT mapping is a good benchmark to check cut enumeration performance since only cut enumeration is generally performed. Note that this is not a perfectly fair comparison to ABC because ABC performs more optimizations like area recovery or applying several strategies for cut selection. When k=6 is generally same, for k > 6 we are missing fast computation for truth table (since we cannot represent in single 64 bit integer) so it's reasonable. At least it got 8~10 times faster from the previous implementation.before:
after:
AI-assisted-by: Claude Sonnet 4.6, Opus 4.6