New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llvm-tblgen for X86GenInstrInfo.inc.tmp takes a long time due to lots of instregex use #35303
Comments
We're largely abusing instregex in the scheduler models. Most of them match a single instruction. We should convert them to Instrs or combine the regular expressions to make them useful. |
Updating the Btver2 model will be pretty trivial - I'm not sure whether there are any instregex worth keeping or not - most match 1 or 2 instructions only and should definitely go. We have some that match more but not many, and those should benefit from merging the similar InstRW<> defs. Anyone want to guess at what point the binary regexp search actually helps? Craig/Gadi - IIRC most of Intel's scheduler models are auto-generated is that right? Can they be updated to issue instrs and/or instregex ? |
After r323277 this should be a lot faster. Getting rid of instregex would be nice though. |
Possible things to look at next: 1 - throw an warning/error if a regex only has a single match so could be replaced with an instr 2 - merge multiple InstRW lines to the same WriteRes to reduce time in CodeGenSchedModels::collectSchedClasses() (this might reduce code size as well?): from: to: Craig - any thoughts? Is there any reason not to start (2) right away? |
Anything we can do to reduce the number InstRWs should tend to reduce code size if the set of instructions is grouped on every CPU and use the same SchedRW/itinerary. We could also merge AVX regular expressions into their SSE counterparts by adding "(V)?" to the beginning of the regular expressions. Should we be deliberate with the merging by focusing on combining lines with similar instructions rather than just anything that happens to have the same latency/throughput/ports on one CPU. Logical groups like that might guide us to better SchedRWs? |
That does make sense if we're going to keep the regex, I'm still a little unclear of when the cost of matching a regex to multiple instruction is better than having all the instructions listed in instrs() block, although sometimes regexs catches missed instructions (e.g. the *_Int versions for instance get matched for free currently). I'm going to start merging some of the instregex and see where that gets us.
We get a related metric from CodeGenSchedModels::createInstRWClass, when it splits all the entries from a InstRW into entries in its ClassInstrs vector, which maps them back to their original scheduler classes. If we kept track of how those 'original' classes get redistributed by InstRW entries then we might be able to flag classes that need splitting/refactoring. I've already been playing with this to help recognise unnecessary InstRW entries. |
rL327974 merges all the InstRW that belong to the same SchedWriteRes in the SandyBridge model. It also merges some of the VEX/non-VEX instructions into the same instregex, but there are a lot more that could be done. Hopefully we can get this done for the other models as well. |
This reduces diffs between znver1/znver2 and should marginally speed up tlbgen build time (Issue #35303) Found by adding a temp check inside InstRegexOp::apply inside single matches
Extended Description
In a highly-parallel build (-j250), full build time is limited by llvm-tblgen taking 1 minute to generate X86GenInstrInfo.inc.tmp and X86GenSubtargetInfo.inc.tmp each.
Command:
cd /Users/thakis/src/llvm-build-goma/lib/Target/X86 && time /Users/thakis/src/llvm-build-goma/bin/llvm-tblgen -gen-instr-info -I /Users/thakis/src/llvm-rw/lib/Target/X86 -I /Users/thakis/src/llvm-rw/include -I /Users/thakis/src/llvm-rw/lib/Target /Users/thakis/src/llvm-rw/lib/Target/X86/X86.td -o /Users/thakis/src/llvm-build-goma/lib/Target/X86/X86GenInstrInfo.inc.tmp
Profile:
43.05 s 100.0% 0 s llvm-tblgen (26142)
43.05 s 100.0% 0 s Main Thread 0x7659e
43.05 s 99.9% 0 s start
43.04 s 99.9% 0 s main
43.04 s 99.9% 0 s llvm::TableGenMain(char*, bool ()(llvm::raw_ostream&, llvm::RecordKeeper&))
40.53 s 94.1% 0 s (anonymous namespace)::LLVMTableGenMain(llvm::raw_ostream&, llvm::RecordKeeper&)
40.53 s 94.1% 0 s llvm::EmitInstrInfo(llvm::RecordKeeper&, llvm::raw_ostream&)
38.05 s 88.3% 0 s llvm::CodeGenTarget::getSchedModels() const
38.05 s 88.3% 0 s llvm::CodeGenSchedModels::CodeGenSchedModels(llvm::RecordKeeper&, llvm::CodeGenTarget const&)
37.93 s 88.1% 3.00 ms llvm::CodeGenSchedModels::collectSchedClasses()
37.88 s 87.9% 12.00 ms llvm::CodeGenSchedModels::createInstRWClass(llvm::Record)
37.85 s 87.9% 38.00 ms llvm::SetTheory::expand(llvm::Record*)
37.77 s 87.7% 28.67 s (anonymous namespace)::InstRegexOp::apply(llvm::SetTheory&, llvm::DagInit*, llvm::SmallSetVector<llvm::Record*, 16u>&, llvm::ArrayRefllvm::SMLoc)
9.02 s 20.9% 1.45 s llvm::Regex::match(llvm::StringRef, llvm::SmallVectorImplllvm::StringRef*)
41.00 ms 0.0% 0 s llvm::Regex::Regex(llvm::StringRef, unsigned int)
Looking through http://llvm-cs.pcc.me.uk/utils/TableGen/CodeGenSchedule.cpp#60 and its callers on that stack suggests that reordering could help a lot here.
All the instregex calls are pretty new (r315175, r316492, etc); maybe you could try to speed this up some?
The text was updated successfully, but these errors were encountered: