llvm-tblgen for X86GenInstrInfo.inc.tmp takes a long time due to lots of instregex use #35303

nico · 2018-01-15T21:16:37Z


Bugzilla Link	35955
Version	trunk
OS	All
Blocks	#33939
CC	@adibiagio,@d0k,@legrosbuffle,@topperc,@dwblaikie,@RKSimon

Extended Description

In a highly-parallel build (-j250), full build time is limited by llvm-tblgen taking 1 minute to generate X86GenInstrInfo.inc.tmp and X86GenSubtargetInfo.inc.tmp each.

Command:

cd /Users/thakis/src/llvm-build-goma/lib/Target/X86 && time /Users/thakis/src/llvm-build-goma/bin/llvm-tblgen -gen-instr-info -I /Users/thakis/src/llvm-rw/lib/Target/X86 -I /Users/thakis/src/llvm-rw/include -I /Users/thakis/src/llvm-rw/lib/Target /Users/thakis/src/llvm-rw/lib/Target/X86/X86.td -o /Users/thakis/src/llvm-build-goma/lib/Target/X86/X86GenInstrInfo.inc.tmp

Profile:

43.05 s 100.0% 0 s llvm-tblgen (26142)
43.05 s 100.0% 0 s Main Thread 0x7659e
43.05 s 99.9% 0 s start
43.04 s 99.9% 0 s main
43.04 s 99.9% 0 s llvm::TableGenMain(char*, bool ()(llvm::raw_ostream&, llvm::RecordKeeper&))
40.53 s 94.1% 0 s (anonymous namespace)::LLVMTableGenMain(llvm::raw_ostream&, llvm::RecordKeeper&)
40.53 s 94.1% 0 s llvm::EmitInstrInfo(llvm::RecordKeeper&, llvm::raw_ostream&)
38.05 s 88.3% 0 s llvm::CodeGenTarget::getSchedModels() const
38.05 s 88.3% 0 s llvm::CodeGenSchedModels::CodeGenSchedModels(llvm::RecordKeeper&, llvm::CodeGenTarget const&)
37.93 s 88.1% 3.00 ms llvm::CodeGenSchedModels::collectSchedClasses()
37.88 s 87.9% 12.00 ms llvm::CodeGenSchedModels::createInstRWClass(llvm::Record)
37.85 s 87.9% 38.00 ms llvm::SetTheory::expand(llvm::Record*)
37.77 s 87.7% 28.67 s (anonymous namespace)::InstRegexOp::apply(llvm::SetTheory&, llvm::DagInit*, llvm::SmallSetVector<llvm::Record*, 16u>&, llvm::ArrayRefllvm::SMLoc)
9.02 s 20.9% 1.45 s llvm::Regex::match(llvm::StringRef, llvm::SmallVectorImplllvm::StringRef*)
41.00 ms 0.0% 0 s llvm::Regex::Regex(llvm::StringRef, unsigned int)

Looking through http://llvm-cs.pcc.me.uk/utils/TableGen/CodeGenSchedule.cpp#60 and its callers on that stack suggests that reordering could help a lot here.

All the instregex calls are pretty new (r315175, r316492, etc); maybe you could try to speed this up some?

topperc · 2018-01-15T21:29:51Z

We're largely abusing instregex in the scheduler models. Most of them match a single instruction. We should convert them to Instrs or combine the regular expressions to make them useful.

RKSimon · 2018-01-16T18:33:50Z

Updating the Btver2 model will be pretty trivial - I'm not sure whether there are any instregex worth keeping or not - most match 1 or 2 instructions only and should definitely go. We have some that match more but not many, and those should benefit from merging the similar InstRW<> defs.

Anyone want to guess at what point the binary regexp search actually helps?

Craig/Gadi - IIRC most of Intel's scheduler models are auto-generated is that right? Can they be updated to issue instrs and/or instregex ?

d0k · 2018-01-24T11:28:50Z

After r323277 this should be a lot faster. Getting rid of instregex would be nice though.

RKSimon · 2018-03-19T21:26:58Z

Possible things to look at next:

1 - throw an warning/error if a regex only has a single match so could be replaced with an instr

2 - merge multiple InstRW lines to the same WriteRes to reduce time in CodeGenSchedModels::collectSchedClasses() (this might reduce code size as well?):

from:
def SBWriteResGroup134 : SchedWriteRes<[SBPort0,SBPort23,SBPort05]> {
let Latency = 52;
let NumMicroOps = 4;
let ResourceCycles = [2,1,1];
}
def: InstRW<[SBWriteResGroup134], (instregex "VDIVPDYrm")>;
def: InstRW<[SBWriteResGroup134], (instregex "VSQRTPDYm")>;

to:
def SBWriteResGroup134 : SchedWriteRes<[SBPort0,SBPort23,SBPort05]> {
let Latency = 52;
let NumMicroOps = 4;
let ResourceCycles = [2,1,1];
}
def: InstRW<[SBWriteResGroup134], (instregex "VDIVPDYrm",
"VSQRTPDYm")>;

Craig - any thoughts? Is there any reason not to start (2) right away?

topperc · 2018-03-19T21:57:22Z

Anything we can do to reduce the number InstRWs should tend to reduce code size if the set of instructions is grouped on every CPU and use the same SchedRW/itinerary.

We could also merge AVX regular expressions into their SSE counterparts by adding "(V)?" to the beginning of the regular expressions.

Should we be deliberate with the merging by focusing on combining lines with similar instructions rather than just anything that happens to have the same latency/throughput/ports on one CPU. Logical groups like that might guide us to better SchedRWs?

RKSimon · 2018-03-19T22:23:30Z

Anything we can do to reduce the number InstRWs should tend to reduce code
size if the set of instructions is grouped on every CPU and use the same
SchedRW/itinerary.

We could also merge AVX regular expressions into their SSE counterparts by
adding "(V)?" to the beginning of the regular expressions.

That does make sense if we're going to keep the regex, I'm still a little unclear of when the cost of matching a regex to multiple instruction is better than having all the instructions listed in instrs() block, although sometimes regexs catches missed instructions (e.g. the *_Int versions for instance get matched for free currently).

I'm going to start merging some of the instregex and see where that gets us.

Should we be deliberate with the merging by focusing on combining lines with
similar instructions rather than just anything that happens to have the same
latency/throughput/ports on one CPU. Logical groups like that might guide us
to better SchedRWs?

We get a related metric from CodeGenSchedModels::createInstRWClass, when it splits all the entries from a InstRW into entries in its ClassInstrs vector, which maps them back to their original scheduler classes. If we kept track of how those 'original' classes get redistributed by InstRW entries then we might be able to flag classes that need splitting/refactoring. I've already been playing with this to help recognise unnecessary InstRW entries.

RKSimon · 2018-03-20T12:29:52Z

rL327974 merges all the InstRW that belong to the same SchedWriteRes in the SandyBridge model. It also merges some of the VEX/non-VEX instructions into the same instregex, but there are a lot more that could be done.

Hopefully we can get this done for the other models as well.

This reduces diffs between znver1/znver2 and should marginally speed up tlbgen build time (Issue #35303) Found by adding a temp check inside InstRegexOp::apply inside single matches

RKSimon mentioned this issue Sep 13, 2017

[meta] Reduce build time of llvm+clang #33939

Open

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

RKSimon added the backend:X86 Scheduler Models Accuracy of X86 scheduler models label Apr 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llvm-tblgen for X86GenInstrInfo.inc.tmp takes a long time due to lots of instregex use #35303

llvm-tblgen for X86GenInstrInfo.inc.tmp takes a long time due to lots of instregex use #35303

nico commented Jan 15, 2018

topperc commented Jan 15, 2018

RKSimon commented Jan 16, 2018

d0k commented Jan 24, 2018

RKSimon commented Mar 19, 2018

topperc commented Mar 19, 2018

RKSimon commented Mar 19, 2018

RKSimon commented Mar 20, 2018

llvm-tblgen for X86GenInstrInfo.inc.tmp takes a long time due to lots of instregex use #35303

llvm-tblgen for X86GenInstrInfo.inc.tmp takes a long time due to lots of instregex use #35303

Comments

nico commented Jan 15, 2018

Extended Description

topperc commented Jan 15, 2018

RKSimon commented Jan 16, 2018

d0k commented Jan 24, 2018

RKSimon commented Mar 19, 2018

topperc commented Mar 19, 2018

RKSimon commented Mar 19, 2018

RKSimon commented Mar 20, 2018