-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[RISCV][LLD] Add RISCV zcmt optimise in linker relaxation #77884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
use `DenseMap` and `CachedHashStringRef`
I tried linking several common applications and libraries using this patch to compare Zcmt's contribution to reducing Code Szie. The results can be found in the Sheet or in the image below. Besides that, I found a problem that needs to be discussed. In However, the linker still adds the symbol Thus, I tried to delay adding the symbol But it will crash with error placeholder symbol reached writer at |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Let me express my gratitude for the dedicated work you put into measuring the impact. This has been useful to figure out how useful an extension is. As an established code size reduction feature, global pointer relaxation seems to have a saving larger than this but still needed a lot of discussions whether it was justified (since the numbers aren't great either). The zcmt numbers seem smaller while the implementation is much heavier in assembler/linker and table jump seems to get questions on hardware side whether the little code size saving justifies probably significant performance overhead. I also heard that zcmt is incompatible with another extension. It would be greatly beneficial to see a more substantial commitment from hardware vendors, given the drawbacks, minimal saving, and the considerable implementation complexity. |
The RISC-V code size reduction work-group did some analysis of expected code size improvements from zcmt https://docs.google.com/spreadsheets/d/1bFMyGkuuulBXuIaMsjBINoCWoLwObr1l9h5TAWN8s7k/edit#gid=1679419155 - it would be worth understanding why the results seem so different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In TableJumpSection::finalizeContents(), the linker will abort the tbljal optimization and empty the Jump table if tbljal optimization would cause a negative optimization (i.e., the size reduction caused by the Table jump Inst is less than the size increase caused by making the Jump Table).
Add a testcase to demonstrate that?
I'm trying remove the symble |
// an increase in code size (i.e. the reduction from instruction conversion | ||
// does not cover the code size gain from adding a table entry). | ||
SmallVector<llvm::detail::DenseMapPair<const Symbol *, int>, 0> | ||
TableJumpSection::finalizeEntry(llvm::DenseMap<const Symbol *, int> EntryMap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finalizeEntry seems not modifying any section members and the input EntryMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually do following 3 things
- sort the EntryMap as decrease order by size reduction of each item in EntryMap
- drop rest if EntryMap larger then maxSize
- drop the item that have a negative effect
This PR is tracked here: riscv-admin/dev-partners#4 Unfortunately, there were no updates for more than half a year. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried integrating this into a LLVM build locally, and there are a couple of issues that need fixing which I've commented on the appropriate lines.
const auto jalr = sec.contentMaybeDecompress().data()[r.offset + 4]; | ||
const uint8_t rd = extractBits(jalr, 11, 7); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The loading value of this
jalr
register is indexing an array ofuint8_t
s so actually is only getting the least significant bit ofrd
, so anyjalr
using anything other than zero/ra will be converted. This isn't being picked up in tests because the encoding of ra is 1. - This assumes that the
jalr
instruction is 4 bytes after the relocation offset. This is true forR_RISCV_CALL
, butR_RISCV_JAL
relocations are also processed via this function. This means the instruction after thejal
instruction is both processed for its bits in therd
field and deleted if the branch target is in the table.
The same issues appear in scanTableJumpEntries
, but commented once for brevity.
|
||
TableJumpSection::TableJumpSection() | ||
: SyntheticSection(SHF_ALLOC | SHF_EXECINSTR, SHT_PROGBITS, | ||
config->wordsize, ".riscv.jvt") {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the RISC-V unprivileged specification (27.14.3. jvt CSR) the jvt section should be aligned to 64 bytes.
"If jvt is writable, the set of values the register may hold can vary by implementation. The value in the
BASE field must always be aligned on a 64-byte boundary."
This is relevant to my interests, is there anyone working on it? I am happy to help. |
This patch implements optimizations for the zcmt extension in lld.
A new TableJumpSectio has been added.
Scans each
R_RISCV_CALL/R_RISCV_CALL_PLT
relocType in each section before the linker relaxation, recording the name of the symbol.In
finalizeContents
the recorded symbol names are sorted in descending order by the number of jumps.Optimise and insert a new
cm.jt/cm.jalt
during the relax process. in the process, we reused theR_RISCV_JAL
relocTypeco-author: @ScottEgerton