-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TableGen] fix tlbgen for EncodingByHwMode #84906
base: main
Are you sure you want to change the base?
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
For more details, please refer to: https://discourse.llvm.org/t/rfc-fix-tablegen-for-hwmode/77625 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, but I'd wait for someone else to review this to see if it meets their needs.
A couple of suggestions regarding style:
- use structured bindings in place of "auto &KV : some-map",
- modify "append" to take "const std::string &", it would get rid of the "c_str" and make the code easier to read.
// In the default case, this instruction is duplicated into two Alt decoder | ||
// tables (ModeA and ModeB). | ||
// In the suppressed case, this instruction appears in a single decoder table. | ||
// Instructions whose DecoderNamespace dosen't contain valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: dosen't -> doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@superZWT123 Please give me a day or two to review this code. From a very quick scan, it actually undoes some of the work I did recently (or at least does it differently)-- which may be fine-- but I need to examine carefully. I'm using this feature in a downstream production backend with a complicated ISA (generously speaking) and even more complicated encoding schemes (100+ decoder tables). It tends to exercise the encode/decode heavily. Also, my previous patch to suppress duplicates had a command line option so that existing downstream backends could opt-in to that feature-- at least temporarily. That is, a downstream backend could rebase and their disassembler would not need any changes due to decoder table naming. Something like that would probably be good to keep initially. All of that said, I am certainly glad to see others helping to production-harden this feature. |
Thanks kparzysz, I can’t assert that my modification is the optimal solution, but what I can say with certainty is that this logic is definitely better than before and can solve many current problems. One important point is that the current changes don’t require any modifications from the backends that are already using hwmode, offering good compatibility. |
Thanks nvjle, If you could help with validating my patch on your backend, that would be fantastic. Finally, I am truly thankful for the time you spent reviewing my code! |
15ee710
to
ffcba0a
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
By using 'HwMode', we can let one instruction have different encodings. But there are some defects in the original TableGen logic in llvm framework. 1. If we use HwModes to control one instruction's encodings, there will generate multiple copies of the original DecoderTables to where the instruction belongs. But for instructions who are not controlled by HwModes, tablegen still generate multiple copies of DecoderTables for them, which is an absolutely redundant behavior. 2. If we use HwModes to control one instruction's encodings, when get binary code for instruction, framework will check all subtargets' HwModes, but we only set HwModes for subtargets whose instructions' encodings conflict, those subtargets without HwModes will cause 'llvm_unreachable'. 3. RegInfoByHwMode and EncodingByHwModes cannot coexist, and resolve it by store HwModes Id in bits. 4. Fix DefaultMode logic for EncodingByHwMode in Decoder and CodeEmitter.
ffcba0a
to
629b8fd
Compare
Hi nvjle. |
Hi @superZWT123, Patches to the decode and encode can be subtle-- especially in larger backends. Here are some specific requests and observations:
I would say 3 days of review time is not terrible in LLVM time, especially for this sort of patch. I will certainly try to turn around as quickly as possible-- with the emitter changes being likely quicker to turnaround. FWIW, in recent changes to TableGen, I've been very careful myself about making narrow, isolated changes. This helps reviewers more easily understand and review the code (lessen cognitive load for people who largely work full time and need to context switch). It also makes it easier to pinpoint any problems later and revert/change relatively small amounts of code. This does require a bit more effort to upstream patches from downstream/private backends, but this is exactly what all of us "living downstream" are doing. |
Hi nvjle, I have carefully read your comments and discussed your suggestions with our team members. Thank you very much for taking the time to review this patch and providing valuable feedback.
What worries me the most is what you mentioned about seeing miscompiles in your backend. If you can provide some error information, I believe I could better pinpoint where the problem lies. Alternatively, after I split the patch into two parts, it might be easier to reproduce and locate the problem on your backend, but this would probably require your continued assistance in reviewing the split patches. In fact, I enabled the current patch in my backend and assigned the 'EncodingByHwMode' attribute to seven or eight instructions to resolve encoding conflicts. So far, I have not encountered any problems. Also, I wanted to ask if I should split this pull request into two new pull requests, or just split the commit of this pull request into two? Since this branch has already fallen significantly behind the main branch, I might rebase it and then resubmit two new pull requests. I’m not sure if this is the correct approach. If there are any points that I haven't replied to, please point them out. Thank you very much! |
Hi @superZWT123 ,
As mentioned earlier, I have not yet had time to nail down precisely what is happening here, and whether related to the encoder changes per-se or something on our end. It has taken all of my time so far just integrating and reviewing all of the changes-- another reason independent/small changes are important. I should have this nailed down soon.
Submitting two new pull requests (after rebasing) is what I had in mind. Two logically independent patches which can be reviewed separately and on different timelines.
Regarding the |
Thank you very much for your response nvjle, which I saw in the middle of the night. Regarding your third point, I roughly understand what you mean. You meant that the suppression was already implemented in your previous patch, and there is no need to repeat that. I have carefully considered and compared the implementation of our two patches, and I will explain my specific ideas. Firstly, if we define two hwmodes(Mode1 and Mode2) for instruction A, the old logic will split all DecoderTables into two form, each corresponding to one of the two hwmodes. Except for the DecoderTable containing instruction A, the rest of the tables do not need to be duplicated and as far as I'm concerned must be preserved in their original form. My patch accomplishes this by removing unnecessary table duplications and retaining necessary table duplications for hwmode enablement. As for your patch, you first add an ‘AllModes’ suffix to the DecoderTables that do not need any modification(no hwmodes in them), which is unnecessary in my opinion. Leaving that aside, instruction A is then individually selected, and two DecoderTables, one for hwmode 1 and another for hwmode 2, are created storing the different encodings infos for instruction A. All the other instructions under the same 'DecoderNamespace' but without the hwmode attribute as instruction A are saved in an 'AllModes' table. Overall, I see that your suppression method separates the instructions assigned with hwmode attributes from their original 'DecoderNamespace', which is not suitable since hwmode should not change the DecoderNamespace of a given instruction. Once you extract hwmode-specific instructions using your approach, you will no longer be able to check whether the encoding of instruction A under hwmode 1 and hwmode 2 conflicts when disassembling with other instructions that belong to the same DecoderNamespace. Actually I have considered extracting instructions with hwmode attributes into separate tables before I seeing your patch, this can minimize table duplications to the greatest extent possible, but I felt that this way might break the logic of DecoderNamespace, so I just give it up. My view is that if users do not specify a new DecoderNamespace, the encodings of instructions under different hwmodes should still remain within the previous DecoderNamespace. Of course, your suppression method is more radical and can significantly reduce the size of the DecoderTable if users are made aware of it and knowing completely what this command line is doing. But still want to say , if you can remove the ‘AllModes’ suffix , it would be better. |
Hi nvjle, kparzysz. I might take a few days to organize my patch and perform some more in-depth checks (such as enabling the EncodingByHwMode functionality on other backends like RISCV, and build tablegen in debug mode to dump some records) to ensure the functionality is correct, although we have already conducted numerous CI tests on this patch in our local project. I will try to decouple the code of 'DecoderEmitter' part within one or two days to submit a pull request first. Thank you very much for your patient review and valuable comments. |
Sounds good @superZWT123, and doing the |
By using 'HwMode', we can let one instruction have different encodings. But there are some defects in the original TableGen logic in llvm framework.