Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JITLink][AArch32] Add TableGen Backend for Instr Encodings #76996

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

eymay
Copy link
Contributor

@eymay eymay commented Jan 4, 2024

This TableGen backend uses the Target info in lib/Target/ARM to
produce instruction encodings necessary for JITLink AArch32 backend.
Currently opcode, opcode mask, register mask and immediate mask are
generated. These were used to replace the mov related instruction
information in aarch32.h.

Copy link

github-actions bot commented Jan 4, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@eymay
Copy link
Contributor Author

eymay commented Jan 4, 2024

The current state is not very clean with the build system. I would appreciate any comment on including the generated file into the header file itself. Though being a temporary solution, you can find the generated file at llvm/unittests/ExecutionEngine/JITLink/JITLinkAArch32.inc.

@tschuett tschuett requested a review from MaskRay January 4, 2024 22:09
Copy link
Contributor

@weliveindetail weliveindetail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello everyone, before we start with detailed reviews, it might be worth mentioning that this is more of a design sketch and not ready to land yet. @eymay Can we mark it as Draft please?

I really like the idea of pulling all kinds of data from TableGen! And it's good to see that it can work already for opcodes, masks, register- and immediate bits. Thanks for working on this @eymay! As explained inline, I think the generated structures must still be extensible and one question is how much noise a macro-based approach would add.

Another question is how much linker-specific information can be distilled from TableGen. For example, can TableGen tell us the position of the BLX or H bits in an ARM BL instruction? And otherwise, would it be interesting to add this data to the .td files? @smithp35 What do you think?

llvm/lib/ExecutionEngine/JITLink/aarch32.cpp Outdated Show resolved Hide resolved
llvm/unittests/ExecutionEngine/JITLink/JITLinkAArch32.inc Outdated Show resolved Hide resolved
@eymay eymay marked this pull request as draft January 5, 2024 13:20
@eymay
Copy link
Contributor Author

eymay commented Jan 6, 2024

Hello everyone, before we start with detailed reviews, it might be worth mentioning that this is more of a design sketch and not ready to land yet. @eymay Can we mark it as Draft please?

Thanks for the note! Yes it is a proof of concept at the moment.

static constexpr uint32_t Opcode = 0x03000000;
};
template <>
struct FixupInfo<Arm_MovwAbsNC> : public FixupInfoArmMov<Arm_MovwAbsNC> {};

template <> struct FixupInfo<Thumb_Jump24> : public FixupInfoThumb {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thumb Mov's are not modified yet since we can discuss the method of extracting further. Arm Fixup changes provide a baseline of what is possible currently.

Copy link
Contributor

@weliveindetail weliveindetail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's pretty neat. Thanks for the update!

I realize that we pull in a lot more instructions (~3.7K) than we actually need (<50). We only include the header in 6 compile units, but still they will be copied into each of them. Is there any way to filter out unrelated instructions from the .inc file? For example, all arithmetic instructions?

/// FixupInfo checks for Arm edge kinds work on 32-bit words
template <EdgeKind_aarch32 Kind> struct FixupInfoArm : public FixupInfoArmBase {
static constexpr uint32_t Opcode =
InstrTable[getInstrFromJITLinkEdgeKind(Kind)].Opcode;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we turn these into something like getTableGenInfoArm(Kind).Opcode? Then getInstrFromJITLinkEdgeKind() could become an implementation detail and for Thumb we can also convert to HalfWords in there.

@@ -212,19 +272,18 @@ template <> struct FixupInfo<Arm_Call> : public FixupInfoArmBranch {
static constexpr uint32_t BitBlx = 0x10000000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we could infer this as:

BitBLX = InstrTable[BLXi] & ~InstrTable[BL] == 0xfa000000 & 0x14000000 == 0x10000000

Opcode and OpcodeMask don't match exactly what we have. They seem to include the condition mask and H bit. Is there a way to get this info from TableGen? And if so, can we integrate it without adding another InstrInfo entry for each line in the generated .inc?

Otherwise, we will have to do some adjustments in the implementation. In general, I'd be happy to follow whatever TableGen can give us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my attempt at integrating the TableGen info of branch instructions with our bit level info. I realized as we diverge more from TableGen in terms of information, it gets harder to inspect and make sense of the resulting encodings. I plan to revert this branch specific TableGen integration since more complexity is added compared to simply using the previous values.

This TableGen backend uses the Target info in lib/Target/ARM to
produce instruction encodings necessary for JITLink AArch32 backend.
Currently opcode, opcode mask, register mask and immediate mask are
generated. These were used to replace the `mov` related instruction
information in aarch32.h.
Generating an includable file in header requires the TableGen backend
to be independent of the headers, if not a cyclic dependency occurs.
A macro based generation is prioritised and its manipulation is done
in the header file by constexpr lookups.
@eymay
Copy link
Contributor Author

eymay commented Jan 26, 2024

Wow, that's pretty neat. Thanks for the update!

I realize that we pull in a lot more instructions (~3.7K) than we actually need (<50). We only include the header in 6 compile units, but still they will be copied into each of them. Is there any way to filter out unrelated instructions from the .inc file? For example, all arithmetic instructions?

Absolutely! We now emit only Move and Branch instructions, around 20 instructions.

@weliveindetail
Copy link
Contributor

We discussed this in our last call. The approach is interesting and it would be great to integrate TableGen, but we are not sure about the details yet. In particular, it seems useful to have the specific byte-sequences at hand and easily discoverable. The TableGen approach will hide them from the code. For the moment we leave this here as a proposal for further consideration. Some situations to consider:

  • in tests we write assembly to produce the expected opcodes: TableGen names are similar to assembly instructions, but not always obvious -- we still need to compare instruction bytes
  • we often work with disassembly when debugging and compare instruction bytes
  • instruction bytes are very relevant for reviews as well
  • we must be able to match byte sequences for a range of instructions, for example B/B.W/BL/BLX (we'd need to combine TableGen results with bitwise operations)
  • encode/decode functions are hand-written (and we probably want to keep that)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants