Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvm-tblgen for X86GenInstrInfo.inc.tmp takes a long time due to lots of instregex use #35303

Open
nico opened this issue Jan 15, 2018 · 7 comments
Labels
backend:X86 Scheduler Models Accuracy of X86 scheduler models backend:X86 bugzilla Issues migrated from bugzilla

Comments

@nico
Copy link
Contributor

nico commented Jan 15, 2018

Bugzilla Link 35955
Version trunk
OS All
Blocks #33939
CC @adibiagio,@d0k,@legrosbuffle,@topperc,@dwblaikie,@RKSimon

Extended Description

In a highly-parallel build (-j250), full build time is limited by llvm-tblgen taking 1 minute to generate X86GenInstrInfo.inc.tmp and X86GenSubtargetInfo.inc.tmp each.

Command:

cd /Users/thakis/src/llvm-build-goma/lib/Target/X86 && time /Users/thakis/src/llvm-build-goma/bin/llvm-tblgen -gen-instr-info -I /Users/thakis/src/llvm-rw/lib/Target/X86 -I /Users/thakis/src/llvm-rw/include -I /Users/thakis/src/llvm-rw/lib/Target /Users/thakis/src/llvm-rw/lib/Target/X86/X86.td -o /Users/thakis/src/llvm-build-goma/lib/Target/X86/X86GenInstrInfo.inc.tmp

Profile:

43.05 s 100.0% 0 s llvm-tblgen (26142)
43.05 s 100.0% 0 s Main Thread 0x7659e
43.05 s 99.9% 0 s start
43.04 s 99.9% 0 s main
43.04 s 99.9% 0 s llvm::TableGenMain(char*, bool ()(llvm::raw_ostream&, llvm::RecordKeeper&))
40.53 s 94.1% 0 s (anonymous namespace)::LLVMTableGenMain(llvm::raw_ostream&, llvm::RecordKeeper&)
40.53 s 94.1% 0 s llvm::EmitInstrInfo(llvm::RecordKeeper&, llvm::raw_ostream&)
38.05 s 88.3% 0 s llvm::CodeGenTarget::getSchedModels() const
38.05 s 88.3% 0 s llvm::CodeGenSchedModels::CodeGenSchedModels(llvm::RecordKeeper&, llvm::CodeGenTarget const&)
37.93 s 88.1% 3.00 ms llvm::CodeGenSchedModels::collectSchedClasses()
37.88 s 87.9% 12.00 ms llvm::CodeGenSchedModels::createInstRWClass(llvm::Record
)
37.85 s 87.9% 38.00 ms llvm::SetTheory::expand(llvm::Record*)
37.77 s 87.7% 28.67 s (anonymous namespace)::InstRegexOp::apply(llvm::SetTheory&, llvm::DagInit*, llvm::SmallSetVector<llvm::Record*, 16u>&, llvm::ArrayRefllvm::SMLoc)
9.02 s 20.9% 1.45 s llvm::Regex::match(llvm::StringRef, llvm::SmallVectorImplllvm::StringRef*)
41.00 ms 0.0% 0 s llvm::Regex::Regex(llvm::StringRef, unsigned int)

Looking through http://llvm-cs.pcc.me.uk/utils/TableGen/CodeGenSchedule.cpp#60 and its callers on that stack suggests that reordering could help a lot here.

All the instregex calls are pretty new (r315175, r316492, etc); maybe you could try to speed this up some?

@topperc
Copy link
Collaborator

topperc commented Jan 15, 2018

We're largely abusing instregex in the scheduler models. Most of them match a single instruction. We should convert them to Instrs or combine the regular expressions to make them useful.

@RKSimon
Copy link
Collaborator

RKSimon commented Jan 16, 2018

Updating the Btver2 model will be pretty trivial - I'm not sure whether there are any instregex worth keeping or not - most match 1 or 2 instructions only and should definitely go. We have some that match more but not many, and those should benefit from merging the similar InstRW<> defs.

Anyone want to guess at what point the binary regexp search actually helps?

Craig/Gadi - IIRC most of Intel's scheduler models are auto-generated is that right? Can they be updated to issue instrs and/or instregex ?

@d0k
Copy link
Member

d0k commented Jan 24, 2018

After r323277 this should be a lot faster. Getting rid of instregex would be nice though.

@RKSimon
Copy link
Collaborator

RKSimon commented Mar 19, 2018

Possible things to look at next:

1 - throw an warning/error if a regex only has a single match so could be replaced with an instr

2 - merge multiple InstRW lines to the same WriteRes to reduce time in CodeGenSchedModels::collectSchedClasses() (this might reduce code size as well?):

from:
def SBWriteResGroup134 : SchedWriteRes<[SBPort0,SBPort23,SBPort05]> {
let Latency = 52;
let NumMicroOps = 4;
let ResourceCycles = [2,1,1];
}
def: InstRW<[SBWriteResGroup134], (instregex "VDIVPDYrm")>;
def: InstRW<[SBWriteResGroup134], (instregex "VSQRTPDYm")>;

to:
def SBWriteResGroup134 : SchedWriteRes<[SBPort0,SBPort23,SBPort05]> {
let Latency = 52;
let NumMicroOps = 4;
let ResourceCycles = [2,1,1];
}
def: InstRW<[SBWriteResGroup134], (instregex "VDIVPDYrm",
"VSQRTPDYm")>;

Craig - any thoughts? Is there any reason not to start (2) right away?

@topperc
Copy link
Collaborator

topperc commented Mar 19, 2018

Anything we can do to reduce the number InstRWs should tend to reduce code size if the set of instructions is grouped on every CPU and use the same SchedRW/itinerary.

We could also merge AVX regular expressions into their SSE counterparts by adding "(V)?" to the beginning of the regular expressions.

Should we be deliberate with the merging by focusing on combining lines with similar instructions rather than just anything that happens to have the same latency/throughput/ports on one CPU. Logical groups like that might guide us to better SchedRWs?

@RKSimon
Copy link
Collaborator

RKSimon commented Mar 19, 2018

Anything we can do to reduce the number InstRWs should tend to reduce code
size if the set of instructions is grouped on every CPU and use the same
SchedRW/itinerary.

We could also merge AVX regular expressions into their SSE counterparts by
adding "(V)?" to the beginning of the regular expressions.

That does make sense if we're going to keep the regex, I'm still a little unclear of when the cost of matching a regex to multiple instruction is better than having all the instructions listed in instrs() block, although sometimes regexs catches missed instructions (e.g. the *_Int versions for instance get matched for free currently).

I'm going to start merging some of the instregex and see where that gets us.

Should we be deliberate with the merging by focusing on combining lines with
similar instructions rather than just anything that happens to have the same
latency/throughput/ports on one CPU. Logical groups like that might guide us
to better SchedRWs?

We get a related metric from CodeGenSchedModels::createInstRWClass, when it splits all the entries from a InstRW into entries in its ClassInstrs vector, which maps them back to their original scheduler classes. If we kept track of how those 'original' classes get redistributed by InstRW entries then we might be able to flag classes that need splitting/refactoring. I've already been playing with this to help recognise unnecessary InstRW entries.

@RKSimon
Copy link
Collaborator

RKSimon commented Mar 20, 2018

rL327974 merges all the InstRW that belong to the same SchedWriteRes in the SandyBridge model. It also merges some of the VEX/non-VEX instructions into the same instregex, but there are a lot more that could be done.

Hopefully we can get this done for the other models as well.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@RKSimon RKSimon added the backend:X86 Scheduler Models Accuracy of X86 scheduler models label Apr 30, 2022
RKSimon added a commit that referenced this issue Nov 24, 2022
This reduces diffs between znver1/znver2 and should marginally speed up tlbgen build time (Issue #35303)

Found by adding a temp check inside InstRegexOp::apply inside single matches
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 Scheduler Models Accuracy of X86 scheduler models backend:X86 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

4 participants