-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LoongArch] Insert nops and emit align reloc when handle alignment directive #72962
Conversation
@llvm/pr-subscribers-backend-loongarch @llvm/pr-subscribers-mc Author: Jinyang He (MQ-mengqing) ChangesRefer to RISCV, we will fix up the alignment if linker relaxation changes code size and breaks alignment. Insert enough Nops and emit R_LARCH_ALIGN relocation type so that linker could satisfy the alignment by removing Nops. It does so only in sections with the SHF_EXECINSTR flag. Full diff: https://github.com/llvm/llvm-project/pull/72962.diff 4 Files Affected:
diff --git a/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.cpp b/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.cpp
index a35916d2ad2197d..6625ad6bc94d188 100644
--- a/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.cpp
+++ b/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.cpp
@@ -160,6 +160,54 @@ void LoongArchAsmBackend::applyFixup(const MCAssembler &Asm,
}
}
+// Linker relaxation may change code size. We have to insert Nops
+// for .align directive when linker relaxation enabled. So then Linker
+// could satisfy alignment by removing Nops.
+// The function returns the total Nops Size we need to insert.
+bool LoongArchAsmBackend::shouldInsertExtraNopBytesForCodeAlign(
+ const MCAlignFragment &AF, unsigned &Size) {
+ // Calculate Nops Size only when linker relaxation enabled.
+ const MCSubtargetInfo *STI = AF.getSubtargetInfo();
+ if (!STI->hasFeature(LoongArch::FeatureRelax))
+ return false;
+
+ const unsigned MinNopLen = 4;
+ Size = AF.getAlignment().value() - MinNopLen;
+ return AF.getAlignment() > MinNopLen;
+}
+
+// We need to insert R_LARCH_ALIGN relocation type to indicate the
+// position of Nops and the total bytes of the Nops have been inserted
+// when linker relaxation enabled.
+// The function inserts fixup_loongarch_align fixup which eventually will
+// transfer to R_LARCH_ALIGN relocation type.
+bool LoongArchAsmBackend::shouldInsertFixupForCodeAlign(
+ MCAssembler &Asm, const MCAsmLayout &Layout, MCAlignFragment &AF) {
+ // Insert the fixup only when linker relaxation enabled.
+ const MCSubtargetInfo *STI = AF.getSubtargetInfo();
+ if (!STI->hasFeature(LoongArch::FeatureRelax))
+ return false;
+
+ // Calculate total Nops we need to insert. If there are none to insert
+ // then simply return.
+ unsigned Count;
+ if (!shouldInsertExtraNopBytesForCodeAlign(AF, Count))
+ return false;
+
+ const MCExpr *Dummy = MCConstantExpr::create(0, Asm.getContext());
+ // Create fixup_loongarch_align fixup.
+ MCFixup Fixup = MCFixup::create(
+ 0, Dummy, MCFixupKind(LoongArch::fixup_loongarch_align), SMLoc());
+
+ uint64_t FixedValue = 0;
+ MCValue NopBytes = MCValue::get(Count);
+
+ Asm.getWriter().recordRelocation(Asm, Layout, &AF, Fixup, NopBytes,
+ FixedValue);
+
+ return true;
+}
+
bool LoongArchAsmBackend::shouldForceRelocation(const MCAssembler &Asm,
const MCFixup &Fixup,
const MCValue &Target) {
diff --git a/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.h b/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.h
index f840f9fa2b6a007..1f4743f34dad2a7 100644
--- a/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.h
+++ b/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.h
@@ -40,6 +40,15 @@ class LoongArchAsmBackend : public MCAsmBackend {
uint64_t Value, bool IsResolved,
const MCSubtargetInfo *STI) const override;
+ // Return Size with extra Nop Bytes for alignment directive in code section.
+ bool shouldInsertExtraNopBytesForCodeAlign(const MCAlignFragment &AF,
+ unsigned &Size) override;
+
+ // Insert target specific fixup type for alignment directive in code section.
+ bool shouldInsertFixupForCodeAlign(MCAssembler &Asm,
+ const MCAsmLayout &Layout,
+ MCAlignFragment &AF) override;
+
bool shouldForceRelocation(const MCAssembler &Asm, const MCFixup &Fixup,
const MCValue &Target) override;
diff --git a/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchFixupKinds.h b/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchFixupKinds.h
index ba2d6718cdf9a27..8d773b649107ed6 100644
--- a/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchFixupKinds.h
+++ b/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchFixupKinds.h
@@ -106,7 +106,9 @@ enum Fixups {
// 20-bit fixup corresponding to %gd_pc_hi20(foo) for instruction pcalau12i.
fixup_loongarch_tls_gd_pc_hi20,
// 20-bit fixup corresponding to %gd_hi20(foo) for instruction lu12i.w.
- fixup_loongarch_tls_gd_hi20
+ fixup_loongarch_tls_gd_hi20,
+ // Generate an R_LARCH_ALIGN which indicates the linker may fixup align here.
+ fixup_loongarch_align = FirstLiteralRelocationKind + ELF::R_LARCH_ALIGN
};
} // end namespace LoongArch
} // end namespace llvm
diff --git a/llvm/test/MC/LoongArch/Relocations/relax-align.s b/llvm/test/MC/LoongArch/Relocations/relax-align.s
new file mode 100644
index 000000000000000..36524182777cb02
--- /dev/null
+++ b/llvm/test/MC/LoongArch/Relocations/relax-align.s
@@ -0,0 +1,28 @@
+# RUN: llvm-mc --filetype=obj --triple=loongarch64 %s -o %t
+# RUN: llvm-readobj -r %t | FileCheck %s
+# RUN: llvm-mc --filetype=obj --triple=loongarch64 -mattr=+relax %s -o %t
+# RUN: llvm-readobj -r %t | FileCheck %s --check-prefix=CHECKR
+
+# CHECKR: Relocations [
+# CHECKR-NEXT: Section ({{.*}}) .rela.text {
+# CHECKR-NEXT: 0x0 R_LARCH_ALIGN - 0xC
+# CHECKR-NEXT: 0x10 R_LARCH_ALIGN - 0x1C
+# CHECKR-NEXT: 0x2C R_LARCH_ALIGN - 0xC
+# CHECKR-NEXT: }
+# CHECKR-NEXT: ]
+
+# CHECK: Relocations [
+# CHECK-NEXT: ]
+
+.text
+.p2align 4
+nop
+.p2align 5
+.p2align 4
+nop
+## Not emit R_LARCH_ALIGN if code alignment great than alignment directive.
+.p2align 2
+.p2align 1
+.p2align 0
+## Not emit R_LARCH_ALIGN if alignment directive with specific padding value.
+.p2align 4, 1
|
Add: @SixWeining @wangleiat @xry111 @xen0n @MaskRay . |
…rective Refer to RISCV, we will fix up the alignment if linker relaxation changes code size and breaks alignment. Insert enough Nops and emit R_LARCH_ALIGN relocation type so that linker could satisfy the alignment by removing Nops. It does so only in sections with the SHF_EXECINSTR flag. In LoongArch psABI v2.30, R_LARCH_ALIGN requires symbol index. The lowest 8 bits of addend represent alignment and the other bits of addend represent the maximum number of bytes to emit.
8e63c73
to
4985669
Compare
Based on psABI v2.30 [1]. In LoongArch psABI v2.30, [1] https://github.com/loongson/la-abi-specs/blob/v2.30/laelf.adoc |
bool LoongArchAsmBackend::shouldInsertExtraNopBytesForCodeAlign( | ||
const MCAlignFragment &AF, unsigned &Size) { | ||
// Calculate Nops Size only when linker relaxation enabled. | ||
const MCSubtargetInfo *STI = AF.getSubtargetInfo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can inline one-time-used variable STI
.
if (!STI->hasFeature(LoongArch::FeatureRelax)) | ||
return false; | ||
|
||
// Ignore alignment if the minimum Nop size is less than the MaxBytesToEmit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the comment should be:
// Ignore alignment if MaxBytesToEmit is less than the minimum Nop size.
@@ -0,0 +1,22 @@ | |||
# RUN: llvm-mc --filetype=obj --triple=loongarch64 --mattr=+relax %s \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be better to add a description about the aim of this test. For example:
## Check R_LARCH_ALIGN is not generated for non executable section.
.L1: | ||
la.pcrel $t0, sym | ||
.p2align 3 | ||
.L2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the aim of this test is to check R_LARCH_ALIGN is not generated for non executable section, seems .L2 - .L1
is unnecessary. Use any other simple instruction is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that. Here we need to check the ADDSUB relocation are emitted (see llvm/test/MC/RISCV/align-non-executable.s). Because in AttemptToFoldSymbolOffsetDifference(), the aligment fragments in a section which has instructions cannot be folded. Maybe it can be improved in the future.
@@ -0,0 +1,53 @@ | |||
# RUN: llvm-mc --filetype=obj --triple=loongarch64 --mattr=-relax %s \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Ditto. Add a file-level comment.
- Use -o to save the output file so that we can copy and execute the command easily.
- Is is better to use
llvm-objdump -dr
to check the result? - --check-prefix
|
||
uint64_t FixedValue = 0; | ||
unsigned Lo = Log2_64(Count) + 1; | ||
unsigned Hi = AF.getMaxBytesToEmit() >= Count ? 0 : AF.getMaxBytesToEmit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this is not the same as GAS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make things easier. They behave same when static-link.
// The improved R_LARCH_ALIGN requires symbol index. The lowest 8 bits of | ||
// addend represent alignment and the other bits of addend represent the | ||
// maximum number of bytes to emit. The maximum number of bytes is zero | ||
// means ignore the emit limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The maximum number of bytes is zero means ignore the emit limit.
Seems this is not the same as GAS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The psABI docs don't mention it, but GAS and LD really does it.
bfd/elfnn-loongarch.c:
if (max > 0 && need_nop_bytes > max)
return loongarch_relax_delete_bytes (abfd, sec, rel->r_offset,
addend, link_info);
The one of conditions is "max > 0".
gas/config/tc-loongarch.c:
ex.X_add_number = (max << 8) | n;
It emit unmodified max.
nop | ||
.p2align 4, 1, 12 | ||
|
||
# RELAX-INST: <.text>: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got different output:
0000000000000000 <.text2>:
0: 00 00 40 03 nop
4: 00 00 40 03 nop
8: 00 00 40 03 nop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't check <.text2>
, because <.text2>
is used to check symbol index.
# RELAX-EMPTY: | ||
# RELAX: 0000000000000000 0000000200000066 R_LARCH_ALIGN 0000000000000000 <null> + 4 | ||
# RELAX-EMPTY: | ||
# RELAX: Symbol table '.symtab' contains 3 entries: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we should check symbol table? To check the temp symbol .Lla-relax-align
is generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned earlier, we need to create the symbol for R_LARCH_ALIGN. It seems that this symbol needs to be kept in the same section as the alignment directive (see [1]). So this test case wants to check that two symbols are created and that they are in different sections in the relocation table. The llvm-objdump -dr
cannot check that case, is there any other suitable method?
[1] https://sourceware.org/pipermail/binutils/2024-January/131615.html
# RELAX-INST-COUNT-3: 01 01 01 01 | ||
|
||
.section .text2, "ax" | ||
.p2align 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the last directive also take effect while not when relax is off. But this match RISCV and GAS.
But we can add an extra instruction at the end of line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, it is both the first and the last instruction in this section. There are some cases which alignment directive can not-generate R_LARCH_ALIGN. It may be improved in the future. But now the alignment directive is generated almost unconditionally when relax is enabled. I'll add an extra instruction at the end of line.
@@ -75,6 +87,9 @@ class LoongArchAsmBackend : public MCAsmBackend { | |||
std::unique_ptr<MCObjectTargetWriter> | |||
createObjectTargetWriter() const override; | |||
const MCTargetOptions &getTargetOptions() const { return TargetOptions; } | |||
DenseMap<MCSection *, const MCSymbolRefExpr *> &getSecToAlignSym() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not match binutils. Binutils does not generate .Lla-relax-align
for each section.
$ cat a.s
.section .text2, "ax"
.p2align 4
.text
.p2align 4
- readelf -rSs
There are 10 section headers, starting at offset 0x178:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
000000000000000c 0000000000000000 AX 0 0 16
[ 2] .rela.text RELA 0000000000000000 00000108
0000000000000018 0000000000000018 I 7 1 8
[ 3] .data PROGBITS 0000000000000000 0000004c
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .bss NOBITS 0000000000000000 0000004c
0000000000000000 0000000000000000 WA 0 0 1
[ 5] .text2 PROGBITS 0000000000000000 00000050
000000000000000c 0000000000000000 AX 0 0 16
[ 6] .rela.text2 RELA 0000000000000000 00000120
0000000000000018 0000000000000018 I 7 5 8
[ 7] .symtab SYMTAB 0000000000000000 00000060
0000000000000090 0000000000000018 8 6 8
[ 8] .strtab STRTAB 0000000000000000 000000f0
0000000000000012 0000000000000000 0 0 1
[ 9] .shstrtab STRTAB 0000000000000000 00000138
000000000000003d 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), p (processor specific)
Relocation section '.rela.text' at offset 0x108 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
000000000000 000500000066 R_LARCH_ALIGN 0000000000000000 .Lla-relax-align + 4
Relocation section '.rela.text2' at offset 0x120 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
000000000000 000500000066 R_LARCH_ALIGN 0000000000000000 .Lla-relax-align + 4
Symbol table '.symtab' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
2: 0000000000000000 0 SECTION LOCAL DEFAULT 3 .data
3: 0000000000000000 0 SECTION LOCAL DEFAULT 4 .bss
4: 0000000000000000 0 SECTION LOCAL DEFAULT 5 .text2
5: 0000000000000000 0 NOTYPE LOCAL DEFAULT 5 .Lla-relax-align
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think there may be some potential errors if the section of the symbol ".Lla-relax-align" is inconsistent with the section of the alignment directive. And I submitted patch before (see [1]). For example if the section of the ".Lla-relax-align" is discard. And another example for linux kernel [2].
[1] https://sourceware.org/pipermail/binutils/2024-January/131615.html
[2] https://lore.kernel.org/loongarch/2abbb633-a10e-71cc-a5e1-4d9e39074066@loongson.cn/T/#t
const MCSymbolRefExpr *MCSym = getSecToAlignSym()[Sec]; | ||
if (MCSym == nullptr) { | ||
// Create a symbol and make the value of symbol is zero. | ||
MCSymbol *Sym = Ctx.createTempSymbol(".Lla-relax-align", false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about to use
MCSymbol *Sym = Ctx.createNamedTempSymbol("la-relax-align");
so that we can check the symbol name in ELF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
Looks like this breaks check-llvm: http://45.33.8.238/macm1/77360/step_11.txt , http://45.33.8.238/linux/128902/step_12.txt Please take a look and revert for now if it takes a while to fix. |
Sorry for that. I have confirmed and will fix it right now (hope in half an hour). |
) To support the third parameter of the alignment directive, R_LARCH_ALIGN relocations need a non-zero symbol index. In many cases we don't need the third parameter and can set the symbol index to 0. This patch will remove a lot of .Lla-relax-align* symbols and mitigate the size regression due to #72962. Co-authored-by: Jinyang He <hejinyang@loongson.cn> Co-authored-by: Weining Lu <luweining@loongson.cn>
…m#93775) To support the third parameter of the alignment directive, R_LARCH_ALIGN relocations need a non-zero symbol index. In many cases we don't need the third parameter and can set the symbol index to 0. This patch will remove a lot of .Lla-relax-align* symbols and mitigate the size regression due to llvm#72962. Co-authored-by: Jinyang He <hejinyang@loongson.cn> Co-authored-by: Weining Lu <luweining@loongson.cn>
Refer to RISCV, we will fix up the alignment if linker relaxation changes code size and breaks alignment. Insert enough Nops and emit R_LARCH_ALIGN relocation type so that linker could satisfy the alignment by removing Nops.
It does so only in sections with the SHF_EXECINSTR flag.
In LoongArch psABI v2.30, R_LARCH_ALIGN requires symbol index. The lowest 8 bits of addend represent alignment and the other bits of addend represent the maximum number of bytes to emit.