Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoongArch] Insert nops and emit align reloc when handle alignment directive #72962

Merged
merged 3 commits into from
Jan 24, 2024

Conversation

MQ-mengqing
Copy link
Contributor

@MQ-mengqing MQ-mengqing commented Nov 21, 2023

Refer to RISCV, we will fix up the alignment if linker relaxation changes code size and breaks alignment. Insert enough Nops and emit R_LARCH_ALIGN relocation type so that linker could satisfy the alignment by removing Nops.
It does so only in sections with the SHF_EXECINSTR flag.

In LoongArch psABI v2.30, R_LARCH_ALIGN requires symbol index. The lowest 8 bits of addend represent alignment and the other bits of addend represent the maximum number of bytes to emit.

@llvmbot llvmbot added mc Machine (object) code backend:loongarch labels Nov 21, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 21, 2023

@llvm/pr-subscribers-backend-loongarch

@llvm/pr-subscribers-mc

Author: Jinyang He (MQ-mengqing)

Changes

Refer to RISCV, we will fix up the alignment if linker relaxation changes code size and breaks alignment. Insert enough Nops and emit R_LARCH_ALIGN relocation type so that linker could satisfy the alignment by removing Nops. It does so only in sections with the SHF_EXECINSTR flag.


Full diff: https://github.com/llvm/llvm-project/pull/72962.diff

4 Files Affected:

  • (modified) llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.cpp (+48)
  • (modified) llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.h (+9)
  • (modified) llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchFixupKinds.h (+3-1)
  • (added) llvm/test/MC/LoongArch/Relocations/relax-align.s (+28)
diff --git a/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.cpp b/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.cpp
index a35916d2ad2197d..6625ad6bc94d188 100644
--- a/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.cpp
+++ b/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.cpp
@@ -160,6 +160,54 @@ void LoongArchAsmBackend::applyFixup(const MCAssembler &Asm,
   }
 }
 
+// Linker relaxation may change code size. We have to insert Nops
+// for .align directive when linker relaxation enabled. So then Linker
+// could satisfy alignment by removing Nops.
+// The function returns the total Nops Size we need to insert.
+bool LoongArchAsmBackend::shouldInsertExtraNopBytesForCodeAlign(
+    const MCAlignFragment &AF, unsigned &Size) {
+  // Calculate Nops Size only when linker relaxation enabled.
+  const MCSubtargetInfo *STI = AF.getSubtargetInfo();
+  if (!STI->hasFeature(LoongArch::FeatureRelax))
+    return false;
+
+  const unsigned MinNopLen = 4;
+  Size = AF.getAlignment().value() - MinNopLen;
+  return AF.getAlignment() > MinNopLen;
+}
+
+// We need to insert R_LARCH_ALIGN relocation type to indicate the
+// position of Nops and the total bytes of the Nops have been inserted
+// when linker relaxation enabled.
+// The function inserts fixup_loongarch_align fixup which eventually will
+// transfer to R_LARCH_ALIGN relocation type.
+bool LoongArchAsmBackend::shouldInsertFixupForCodeAlign(
+    MCAssembler &Asm, const MCAsmLayout &Layout, MCAlignFragment &AF) {
+  // Insert the fixup only when linker relaxation enabled.
+  const MCSubtargetInfo *STI = AF.getSubtargetInfo();
+  if (!STI->hasFeature(LoongArch::FeatureRelax))
+    return false;
+
+  // Calculate total Nops we need to insert. If there are none to insert
+  // then simply return.
+  unsigned Count;
+  if (!shouldInsertExtraNopBytesForCodeAlign(AF, Count))
+    return false;
+
+  const MCExpr *Dummy = MCConstantExpr::create(0, Asm.getContext());
+  // Create fixup_loongarch_align fixup.
+  MCFixup Fixup = MCFixup::create(
+      0, Dummy, MCFixupKind(LoongArch::fixup_loongarch_align), SMLoc());
+
+  uint64_t FixedValue = 0;
+  MCValue NopBytes = MCValue::get(Count);
+
+  Asm.getWriter().recordRelocation(Asm, Layout, &AF, Fixup, NopBytes,
+                                   FixedValue);
+
+  return true;
+}
+
 bool LoongArchAsmBackend::shouldForceRelocation(const MCAssembler &Asm,
                                                 const MCFixup &Fixup,
                                                 const MCValue &Target) {
diff --git a/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.h b/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.h
index f840f9fa2b6a007..1f4743f34dad2a7 100644
--- a/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.h
+++ b/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.h
@@ -40,6 +40,15 @@ class LoongArchAsmBackend : public MCAsmBackend {
                   uint64_t Value, bool IsResolved,
                   const MCSubtargetInfo *STI) const override;
 
+  // Return Size with extra Nop Bytes for alignment directive in code section.
+  bool shouldInsertExtraNopBytesForCodeAlign(const MCAlignFragment &AF,
+                                             unsigned &Size) override;
+
+  // Insert target specific fixup type for alignment directive in code section.
+  bool shouldInsertFixupForCodeAlign(MCAssembler &Asm,
+                                     const MCAsmLayout &Layout,
+                                     MCAlignFragment &AF) override;
+
   bool shouldForceRelocation(const MCAssembler &Asm, const MCFixup &Fixup,
                              const MCValue &Target) override;
 
diff --git a/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchFixupKinds.h b/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchFixupKinds.h
index ba2d6718cdf9a27..8d773b649107ed6 100644
--- a/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchFixupKinds.h
+++ b/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchFixupKinds.h
@@ -106,7 +106,9 @@ enum Fixups {
   // 20-bit fixup corresponding to %gd_pc_hi20(foo) for instruction pcalau12i.
   fixup_loongarch_tls_gd_pc_hi20,
   // 20-bit fixup corresponding to %gd_hi20(foo) for instruction lu12i.w.
-  fixup_loongarch_tls_gd_hi20
+  fixup_loongarch_tls_gd_hi20,
+  // Generate an R_LARCH_ALIGN which indicates the linker may fixup align here.
+  fixup_loongarch_align = FirstLiteralRelocationKind + ELF::R_LARCH_ALIGN
 };
 } // end namespace LoongArch
 } // end namespace llvm
diff --git a/llvm/test/MC/LoongArch/Relocations/relax-align.s b/llvm/test/MC/LoongArch/Relocations/relax-align.s
new file mode 100644
index 000000000000000..36524182777cb02
--- /dev/null
+++ b/llvm/test/MC/LoongArch/Relocations/relax-align.s
@@ -0,0 +1,28 @@
+# RUN: llvm-mc --filetype=obj --triple=loongarch64 %s -o %t
+# RUN: llvm-readobj -r %t | FileCheck %s
+# RUN: llvm-mc --filetype=obj --triple=loongarch64 -mattr=+relax %s -o %t
+# RUN: llvm-readobj -r %t | FileCheck %s --check-prefix=CHECKR
+
+# CHECKR:       Relocations [
+# CHECKR-NEXT:    Section ({{.*}}) .rela.text {
+# CHECKR-NEXT:      0x0 R_LARCH_ALIGN - 0xC
+# CHECKR-NEXT:      0x10 R_LARCH_ALIGN - 0x1C
+# CHECKR-NEXT:      0x2C R_LARCH_ALIGN - 0xC
+# CHECKR-NEXT:    }
+# CHECKR-NEXT:  ]
+
+# CHECK:       Relocations [
+# CHECK-NEXT:  ]
+
+.text
+.p2align 4
+nop
+.p2align 5
+.p2align 4
+nop
+## Not emit R_LARCH_ALIGN if code alignment great than alignment directive.
+.p2align 2
+.p2align 1
+.p2align 0
+## Not emit R_LARCH_ALIGN if alignment directive with specific padding value.
+.p2align 4, 1

@MQ-mengqing
Copy link
Contributor Author

…rective

Refer to RISCV, we will fix up the alignment if linker relaxation changes
code size and breaks alignment. Insert enough Nops and emit R_LARCH_ALIGN
relocation type so that linker could satisfy the alignment by removing Nops.
It does so only in sections with the SHF_EXECINSTR flag.

In LoongArch psABI v2.30, R_LARCH_ALIGN requires symbol index. The
lowest 8 bits of addend represent alignment and the other bits of
addend represent the maximum number of bytes to emit.
@MQ-mengqing
Copy link
Contributor Author

MQ-mengqing commented Jan 16, 2024

Based on psABI v2.30 [1]. In LoongArch psABI v2.30, R_LARCH_ALIGN requires symbol index. The lowest 8 bits of addend represent alignment and the other bits of addend represent the maximum number of bytes to emit. The symbol is created by createTempSymbol() which calls createSymbol() with CanBeUnnamed is true. And the symbols is in the same section of alignment directive.

[1] https://github.com/loongson/la-abi-specs/blob/v2.30/laelf.adoc

bool LoongArchAsmBackend::shouldInsertExtraNopBytesForCodeAlign(
const MCAlignFragment &AF, unsigned &Size) {
// Calculate Nops Size only when linker relaxation enabled.
const MCSubtargetInfo *STI = AF.getSubtargetInfo();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can inline one-time-used variable STI.

if (!STI->hasFeature(LoongArch::FeatureRelax))
return false;

// Ignore alignment if the minimum Nop size is less than the MaxBytesToEmit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the comment should be:

// Ignore alignment if MaxBytesToEmit is less than the minimum Nop size.

@@ -0,0 +1,22 @@
# RUN: llvm-mc --filetype=obj --triple=loongarch64 --mattr=+relax %s \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be better to add a description about the aim of this test. For example:

## Check R_LARCH_ALIGN is not generated for non executable section.

.L1:
la.pcrel $t0, sym
.p2align 3
.L2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the aim of this test is to check R_LARCH_ALIGN is not generated for non executable section, seems .L2 - .L1 is unnecessary. Use any other simple instruction is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that. Here we need to check the ADDSUB relocation are emitted (see llvm/test/MC/RISCV/align-non-executable.s). Because in AttemptToFoldSymbolOffsetDifference(), the aligment fragments in a section which has instructions cannot be folded. Maybe it can be improved in the future.

@@ -0,0 +1,53 @@
# RUN: llvm-mc --filetype=obj --triple=loongarch64 --mattr=-relax %s \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Ditto. Add a file-level comment.
  • Use -o to save the output file so that we can copy and execute the command easily.
  • Is is better to use llvm-objdump -dr to check the result?
  • --check-prefix


uint64_t FixedValue = 0;
unsigned Lo = Log2_64(Count) + 1;
unsigned Hi = AF.getMaxBytesToEmit() >= Count ? 0 : AF.getMaxBytesToEmit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this is not the same as GAS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make things easier. They behave same when static-link.

// The improved R_LARCH_ALIGN requires symbol index. The lowest 8 bits of
// addend represent alignment and the other bits of addend represent the
// maximum number of bytes to emit. The maximum number of bytes is zero
// means ignore the emit limit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// The maximum number of bytes is zero means ignore the emit limit.

Seems this is not the same as GAS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The psABI docs don't mention it, but GAS and LD really does it.

bfd/elfnn-loongarch.c:
  if (max > 0 && need_nop_bytes > max)
    return loongarch_relax_delete_bytes (abfd, sec, rel->r_offset,
                                          addend, link_info);
The one of conditions is  "max > 0".

gas/config/tc-loongarch.c:
ex.X_add_number = (max << 8) | n;
It emit unmodified max.

nop
.p2align 4, 1, 12

# RELAX-INST: <.text>:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got different output:

0000000000000000 <.text2>:
       0: 00 00 40 03   nop
       4: 00 00 40 03   nop
       8: 00 00 40 03   nop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't check <.text2>, because <.text2> is used to check symbol index.

# RELAX-EMPTY:
# RELAX: 0000000000000000 0000000200000066 R_LARCH_ALIGN 0000000000000000 <null> + 4
# RELAX-EMPTY:
# RELAX: Symbol table '.symtab' contains 3 entries:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we should check symbol table? To check the temp symbol .Lla-relax-align is generated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned earlier, we need to create the symbol for R_LARCH_ALIGN. It seems that this symbol needs to be kept in the same section as the alignment directive (see [1]). So this test case wants to check that two symbols are created and that they are in different sections in the relocation table. The llvm-objdump -dr cannot check that case, is there any other suitable method?

[1] https://sourceware.org/pipermail/binutils/2024-January/131615.html

# RELAX-INST-COUNT-3: 01 01 01 01

.section .text2, "ax"
.p2align 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the last directive also take effect while not when relax is off. But this match RISCV and GAS.

But we can add an extra instruction at the end of line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, it is both the first and the last instruction in this section. There are some cases which alignment directive can not-generate R_LARCH_ALIGN. It may be improved in the future. But now the alignment directive is generated almost unconditionally when relax is enabled. I'll add an extra instruction at the end of line.

@@ -75,6 +87,9 @@ class LoongArchAsmBackend : public MCAsmBackend {
std::unique_ptr<MCObjectTargetWriter>
createObjectTargetWriter() const override;
const MCTargetOptions &getTargetOptions() const { return TargetOptions; }
DenseMap<MCSection *, const MCSymbolRefExpr *> &getSecToAlignSym() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not match binutils. Binutils does not generate .Lla-relax-align for each section.

$ cat a.s
.section .text2, "ax"
.p2align 4

.text
.p2align 4
  • readelf -rSs
There are 10 section headers, starting at offset 0x178:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000040
       000000000000000c  0000000000000000  AX       0     0     16
  [ 2] .rela.text        RELA             0000000000000000  00000108
       0000000000000018  0000000000000018   I       7     1     8
  [ 3] .data             PROGBITS         0000000000000000  0000004c
       0000000000000000  0000000000000000  WA       0     0     1
  [ 4] .bss              NOBITS           0000000000000000  0000004c
       0000000000000000  0000000000000000  WA       0     0     1
  [ 5] .text2            PROGBITS         0000000000000000  00000050
       000000000000000c  0000000000000000  AX       0     0     16
  [ 6] .rela.text2       RELA             0000000000000000  00000120
       0000000000000018  0000000000000018   I       7     5     8
  [ 7] .symtab           SYMTAB           0000000000000000  00000060
       0000000000000090  0000000000000018           8     6     8
  [ 8] .strtab           STRTAB           0000000000000000  000000f0
       0000000000000012  0000000000000000           0     0     1
  [ 9] .shstrtab         STRTAB           0000000000000000  00000138
       000000000000003d  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), p (processor specific)

Relocation section '.rela.text' at offset 0x108 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000000  000500000066 R_LARCH_ALIGN     0000000000000000 .Lla-relax-align + 4

Relocation section '.rela.text2' at offset 0x120 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000000  000500000066 R_LARCH_ALIGN     0000000000000000 .Lla-relax-align + 4

Symbol table '.symtab' contains 6 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .text
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 .data
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 .bss
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 .text2
     5: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT    5 .Lla-relax-align

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think there may be some potential errors if the section of the symbol ".Lla-relax-align" is inconsistent with the section of the alignment directive. And I submitted patch before (see [1]). For example if the section of the ".Lla-relax-align" is discard. And another example for linux kernel [2].

[1] https://sourceware.org/pipermail/binutils/2024-January/131615.html
[2] https://lore.kernel.org/loongarch/2abbb633-a10e-71cc-a5e1-4d9e39074066@loongson.cn/T/#t

const MCSymbolRefExpr *MCSym = getSecToAlignSym()[Sec];
if (MCSym == nullptr) {
// Create a symbol and make the value of symbol is zero.
MCSymbol *Sym = Ctx.createTempSymbol(".Lla-relax-align", false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about to use

MCSymbol *Sym = Ctx.createNamedTempSymbol("la-relax-align");

so that we can check the symbol name in ELF?

Copy link
Contributor

@SixWeining SixWeining left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@SixWeining SixWeining merged commit c51ab48 into llvm:main Jan 24, 2024
3 of 4 checks passed
@nico
Copy link
Contributor

nico commented Jan 24, 2024

Looks like this breaks check-llvm: http://45.33.8.238/macm1/77360/step_11.txt , http://45.33.8.238/linux/128902/step_12.txt

Please take a look and revert for now if it takes a while to fix.

@SixWeining
Copy link
Contributor

Looks like this breaks check-llvm: http://45.33.8.238/macm1/77360/step_11.txt , http://45.33.8.238/linux/128902/step_12.txt

Please take a look and revert for now if it takes a while to fix.

Sorry for that. I have confirmed and will fix it right now (hope in half an hour).

@SixWeining
Copy link
Contributor

SixWeining commented Jan 24, 2024

Should be fixed by baba7e4.

Seems that this fix was submitted after llvmorg-19-init was tagged. So maybe we should cherry-pick it to release/18.x.

UPDATE: The cherry-pick is: #79253

SixWeining added a commit that referenced this pull request Jun 4, 2024
)

To support the third parameter of the alignment directive, R_LARCH_ALIGN
relocations need a non-zero symbol index.
In many cases we don't need the third parameter and can set the symbol
index to 0.
This patch will remove a lot of .Lla-relax-align* symbols and mitigate
the size regression due to
#72962.

Co-authored-by: Jinyang He <hejinyang@loongson.cn>
Co-authored-by: Weining Lu <luweining@loongson.cn>
vedantparanjape-amd pushed a commit to vedantparanjape-amd/llvm-project that referenced this pull request Jun 7, 2024
…m#93775)

To support the third parameter of the alignment directive, R_LARCH_ALIGN
relocations need a non-zero symbol index.
In many cases we don't need the third parameter and can set the symbol
index to 0.
This patch will remove a lot of .Lla-relax-align* symbols and mitigate
the size regression due to
llvm#72962.

Co-authored-by: Jinyang He <hejinyang@loongson.cn>
Co-authored-by: Weining Lu <luweining@loongson.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:loongarch mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants