Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoongArch] Insert nops and emit align reloc when handle alignment directive #72962

Merged
merged 3 commits into from
Jan 24, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,11 @@
#include "llvm/MC/MCAssembler.h"
#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCELFObjectWriter.h"
#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCSection.h"
#include "llvm/MC/MCValue.h"
#include "llvm/Support/EndianStream.h"
#include "llvm/Support/MathExtras.h"

#define DEBUG_TYPE "loongarch-asmbackend"

Expand Down Expand Up @@ -174,6 +177,72 @@ void LoongArchAsmBackend::applyFixup(const MCAssembler &Asm,
}
}

// Linker relaxation may change code size. We have to insert Nops
// for .align directive when linker relaxation enabled. So then Linker
// could satisfy alignment by removing Nops.
// The function returns the total Nops Size we need to insert.
bool LoongArchAsmBackend::shouldInsertExtraNopBytesForCodeAlign(
const MCAlignFragment &AF, unsigned &Size) {
// Calculate Nops Size only when linker relaxation enabled.
const MCSubtargetInfo *STI = AF.getSubtargetInfo();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can inline one-time-used variable STI.

if (!STI->hasFeature(LoongArch::FeatureRelax))
return false;

// Ignore alignment if the minimum Nop size is less than the MaxBytesToEmit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the comment should be:

// Ignore alignment if MaxBytesToEmit is less than the minimum Nop size.

const unsigned MinNopLen = 4;
if (AF.getMaxBytesToEmit() < MinNopLen)
return false;
Size = AF.getAlignment().value() - MinNopLen;
return AF.getAlignment() > MinNopLen;
}

// We need to insert R_LARCH_ALIGN relocation type to indicate the
// position of Nops and the total bytes of the Nops have been inserted
// when linker relaxation enabled.
// The function inserts fixup_loongarch_align fixup which eventually will
// transfer to R_LARCH_ALIGN relocation type.
// The improved R_LARCH_ALIGN requires symbol index. The lowest 8 bits of
// addend represent alignment and the other bits of addend represent the
// maximum number of bytes to emit. The maximum number of bytes is zero
// means ignore the emit limit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// The maximum number of bytes is zero means ignore the emit limit.

Seems this is not the same as GAS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The psABI docs don't mention it, but GAS and LD really does it.

bfd/elfnn-loongarch.c:
  if (max > 0 && need_nop_bytes > max)
    return loongarch_relax_delete_bytes (abfd, sec, rel->r_offset,
                                          addend, link_info);
The one of conditions is  "max > 0".

gas/config/tc-loongarch.c:
ex.X_add_number = (max << 8) | n;
It emit unmodified max.

bool LoongArchAsmBackend::shouldInsertFixupForCodeAlign(
MCAssembler &Asm, const MCAsmLayout &Layout, MCAlignFragment &AF) {
// Insert the fixup only when linker relaxation enabled.
const MCSubtargetInfo *STI = AF.getSubtargetInfo();
if (!STI->hasFeature(LoongArch::FeatureRelax))
return false;

// Calculate total Nops we need to insert. If there are none to insert
// then simply return.
unsigned Count;
if (!shouldInsertExtraNopBytesForCodeAlign(AF, Count))
return false;

MCSection *Sec = AF.getParent();
MCContext &Ctx = Asm.getContext();
const MCExpr *Dummy = MCConstantExpr::create(0, Ctx);
// Create fixup_loongarch_align fixup.
MCFixup Fixup =
MCFixup::create(0, Dummy, MCFixupKind(LoongArch::fixup_loongarch_align));
const MCSymbolRefExpr *MCSym = getSecToAlignSym()[Sec];
if (MCSym == nullptr) {
// Create a symbol and make the value of symbol is zero.
MCSymbol *Sym = Ctx.createTempSymbol(".Lla-relax-align", false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about to use

MCSymbol *Sym = Ctx.createNamedTempSymbol("la-relax-align");

so that we can check the symbol name in ELF?

Sym->setFragment(&*Sec->getBeginSymbol()->getFragment());
Asm.registerSymbol(*Sym);
MCSym = MCSymbolRefExpr::create(Sym, Ctx);
getSecToAlignSym()[Sec] = MCSym;
}

uint64_t FixedValue = 0;
unsigned Lo = Log2_64(Count) + 1;
unsigned Hi = AF.getMaxBytesToEmit() >= Count ? 0 : AF.getMaxBytesToEmit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this is not the same as GAS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make things easier. They behave same when static-link.

MCValue Value = MCValue::get(MCSym, nullptr, Hi << 8 | Lo);
Asm.getWriter().recordRelocation(Asm, Layout, &AF, Fixup, Value, FixedValue);

return true;
}

bool LoongArchAsmBackend::shouldForceRelocation(const MCAssembler &Asm,
const MCFixup &Fixup,
const MCValue &Target,
Expand Down
15 changes: 15 additions & 0 deletions llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@
#include "MCTargetDesc/LoongArchFixupKinds.h"
#include "MCTargetDesc/LoongArchMCTargetDesc.h"
#include "llvm/MC/MCAsmBackend.h"
#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCFixupKindInfo.h"
#include "llvm/MC/MCSection.h"
#include "llvm/MC/MCSubtargetInfo.h"

namespace llvm {
Expand All @@ -27,6 +29,7 @@ class LoongArchAsmBackend : public MCAsmBackend {
uint8_t OSABI;
bool Is64Bit;
const MCTargetOptions &TargetOptions;
DenseMap<MCSection *, const MCSymbolRefExpr *> SecToAlignSym;

public:
LoongArchAsmBackend(const MCSubtargetInfo &STI, uint8_t OSABI, bool Is64Bit,
Expand All @@ -45,6 +48,15 @@ class LoongArchAsmBackend : public MCAsmBackend {
uint64_t Value, bool IsResolved,
const MCSubtargetInfo *STI) const override;

// Return Size with extra Nop Bytes for alignment directive in code section.
bool shouldInsertExtraNopBytesForCodeAlign(const MCAlignFragment &AF,
unsigned &Size) override;

// Insert target specific fixup type for alignment directive in code section.
bool shouldInsertFixupForCodeAlign(MCAssembler &Asm,
const MCAsmLayout &Layout,
MCAlignFragment &AF) override;

bool shouldForceRelocation(const MCAssembler &Asm, const MCFixup &Fixup,
const MCValue &Target,
const MCSubtargetInfo *STI) override;
Expand Down Expand Up @@ -75,6 +87,9 @@ class LoongArchAsmBackend : public MCAsmBackend {
std::unique_ptr<MCObjectTargetWriter>
createObjectTargetWriter() const override;
const MCTargetOptions &getTargetOptions() const { return TargetOptions; }
DenseMap<MCSection *, const MCSymbolRefExpr *> &getSecToAlignSym() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not match binutils. Binutils does not generate .Lla-relax-align for each section.

$ cat a.s
.section .text2, "ax"
.p2align 4

.text
.p2align 4
  • readelf -rSs
There are 10 section headers, starting at offset 0x178:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000040
       000000000000000c  0000000000000000  AX       0     0     16
  [ 2] .rela.text        RELA             0000000000000000  00000108
       0000000000000018  0000000000000018   I       7     1     8
  [ 3] .data             PROGBITS         0000000000000000  0000004c
       0000000000000000  0000000000000000  WA       0     0     1
  [ 4] .bss              NOBITS           0000000000000000  0000004c
       0000000000000000  0000000000000000  WA       0     0     1
  [ 5] .text2            PROGBITS         0000000000000000  00000050
       000000000000000c  0000000000000000  AX       0     0     16
  [ 6] .rela.text2       RELA             0000000000000000  00000120
       0000000000000018  0000000000000018   I       7     5     8
  [ 7] .symtab           SYMTAB           0000000000000000  00000060
       0000000000000090  0000000000000018           8     6     8
  [ 8] .strtab           STRTAB           0000000000000000  000000f0
       0000000000000012  0000000000000000           0     0     1
  [ 9] .shstrtab         STRTAB           0000000000000000  00000138
       000000000000003d  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), p (processor specific)

Relocation section '.rela.text' at offset 0x108 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000000  000500000066 R_LARCH_ALIGN     0000000000000000 .Lla-relax-align + 4

Relocation section '.rela.text2' at offset 0x120 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000000  000500000066 R_LARCH_ALIGN     0000000000000000 .Lla-relax-align + 4

Symbol table '.symtab' contains 6 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .text
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 .data
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 .bss
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 .text2
     5: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT    5 .Lla-relax-align

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think there may be some potential errors if the section of the symbol ".Lla-relax-align" is inconsistent with the section of the alignment directive. And I submitted patch before (see [1]). For example if the section of the ".Lla-relax-align" is discard. And another example for linux kernel [2].

[1] https://sourceware.org/pipermail/binutils/2024-January/131615.html
[2] https://lore.kernel.org/loongarch/2abbb633-a10e-71cc-a5e1-4d9e39074066@loongson.cn/T/#t

return SecToAlignSym;
}
};
} // end namespace llvm

Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchFixupKinds.h
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,8 @@ enum Fixups {
fixup_loongarch_tls_gd_hi20,
// Generate an R_LARCH_RELAX which indicates the linker may relax here.
fixup_loongarch_relax = FirstLiteralRelocationKind + ELF::R_LARCH_RELAX,
// Generate an R_LARCH_ALIGN which indicates the linker may fixup align here.
fixup_loongarch_align = FirstLiteralRelocationKind + ELF::R_LARCH_ALIGN,
// 36-bit fixup corresponding to %call36(foo) for a pair instructions:
// pcaddu18i+jirl.
fixup_loongarch_call36 = FirstLiteralRelocationKind + ELF::R_LARCH_CALL36,
Expand Down
22 changes: 22 additions & 0 deletions llvm/test/MC/LoongArch/Relocations/align-non-executable.s
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# RUN: llvm-mc --filetype=obj --triple=loongarch64 --mattr=+relax %s \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be better to add a description about the aim of this test. For example:

## Check R_LARCH_ALIGN is not generated for non executable section.

# RUN: | llvm-readobj -r - | FileCheck --check-prefixes=CHECK,RELAX %s
# RUN: llvm-mc --filetype=obj --triple=loongarch64 --mattr=-relax %s \
# RUN: | llvm-readobj -r - | FileCheck %s

.section ".dummy", "a"
.L1:
la.pcrel $t0, sym
.p2align 3
.L2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the aim of this test is to check R_LARCH_ALIGN is not generated for non executable section, seems .L2 - .L1 is unnecessary. Use any other simple instruction is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that. Here we need to check the ADDSUB relocation are emitted (see llvm/test/MC/RISCV/align-non-executable.s). Because in AttemptToFoldSymbolOffsetDifference(), the aligment fragments in a section which has instructions cannot be folded. Maybe it can be improved in the future.

.dword .L2 - .L1

# CHECK: Relocations [
# CHECK-NEXT: Section ({{.*}}) .rela.dummy {
# CHECK-NEXT: 0x0 R_LARCH_PCALA_HI20 sym 0x0
# RELAX-NEXT: 0x0 R_LARCH_RELAX - 0x0
# CHECK-NEXT: 0x4 R_LARCH_PCALA_LO12 sym 0x0
# RELAX-NEXT: 0x4 R_LARCH_RELAX - 0x0
# RELAX-NEXT: 0x8 R_LARCH_ADD64 .L2 0x0
# RELAX-NEXT: 0x8 R_LARCH_SUB64 .L1 0x0
# CHECK-NEXT: }
# CHECK-NEXT: ]
15 changes: 13 additions & 2 deletions llvm/test/MC/LoongArch/Relocations/relax-addsub.s
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,23 @@

# RELAX: Relocations [
# RELAX-NEXT: Section ({{.*}}) .rela.text {
# RELAX-NEXT: 0x4 R_LARCH_ALIGN {{.*}} 0x4
# RELAX-NEXT: 0x10 R_LARCH_PCALA_HI20 .L1 0x0
# RELAX-NEXT: 0x10 R_LARCH_RELAX - 0x0
# RELAX-NEXT: 0x14 R_LARCH_PCALA_LO12 .L1 0x0
# RELAX-NEXT: 0x14 R_LARCH_RELAX - 0x0
# RELAX-NEXT: }
# RELAX-NEXT: Section ({{.*}}) .rela.data {
# RELAX-NEXT: 0x10 R_LARCH_ADD8 .L3 0x0
# RELAX-NEXT: 0x10 R_LARCH_SUB8 .L2 0x0
# RELAX-NEXT: 0x11 R_LARCH_ADD16 .L3 0x0
# RELAX-NEXT: 0x11 R_LARCH_SUB16 .L2 0x0
# RELAX-NEXT: 0x13 R_LARCH_ADD32 .L3 0x0
# RELAX-NEXT: 0x13 R_LARCH_SUB32 .L2 0x0
# RELAX-NEXT: 0x17 R_LARCH_ADD64 .L3 0x0
# RELAX-NEXT: 0x17 R_LARCH_SUB64 .L2 0x0
# RELAX-NEXT: 0x1F R_LARCH_ADD_ULEB128 .L3 0x0
# RELAX-NEXT: 0x1F R_LARCH_SUB_ULEB128 .L2 0x0
# RELAX-NEXT: 0x20 R_LARCH_ADD8 .L4 0x0
# RELAX-NEXT: 0x20 R_LARCH_SUB8 .L3 0x0
# RELAX-NEXT: 0x21 R_LARCH_ADD16 .L4 0x0
Expand All @@ -57,7 +68,7 @@

# RELAX: Hex dump of section '.data':
# RELAX-NEXT: 0x00000000 04040004 00000004 00000000 00000004
# RELAX-NEXT: 0x00000010 0c0c000c 0000000c 00000000 0000000c
# RELAX-NEXT: 0x00000010 00000000 00000000 00000000 00000000
# RELAX-NEXT: 0x00000020 00000000 00000000 00000000 00000000
# RELAX-NEXT: 0x00000030 00000000 00000000 00000000 000000

Expand All @@ -78,7 +89,7 @@
.word .L2 - .L1
.dword .L2 - .L1
.uleb128 .L2 - .L1
## TODO Handle alignment directive.
## With relaxation, emit relocs because the .align makes the diff variable.
.byte .L3 - .L2
.short .L3 - .L2
.word .L3 - .L2
Expand Down
53 changes: 53 additions & 0 deletions llvm/test/MC/LoongArch/Relocations/relax-align.s
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# RUN: llvm-mc --filetype=obj --triple=loongarch64 --mattr=-relax %s \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Ditto. Add a file-level comment.
  • Use -o to save the output file so that we can copy and execute the command easily.
  • Is is better to use llvm-objdump -dr to check the result?
  • --check-prefix

# RUN: | llvm-readelf -rs - | FileCheck %s --check-prefix=NORELAX
# RUN: llvm-mc --filetype=obj --triple=loongarch64 --mattr=+relax %s \
# RUN: | llvm-readelf -rs - | FileCheck %s --check-prefix=RELAX
# RUN: llvm-mc --filetype=obj --triple=loongarch64 --mattr=+relax %s \
# RUN: | llvm-objdump -d - | FileCheck -check-prefix=RELAX-INST %s

# NORELAX: There are no relocations in this file.
# NORELAX: Symbol table '.symtab' contains 1 entries:

# RELAX: 0000000000000000 0000000100000066 R_LARCH_ALIGN 0000000000000000 {{.*}} + 4
# RELAX-NEXT: 0000000000000010 0000000100000066 R_LARCH_ALIGN 0000000000000000 {{.*}} + 5
# RELAX-NEXT: 000000000000002c 0000000100000066 R_LARCH_ALIGN 0000000000000000 {{.*}} + 4
# RELAX-NEXT: 000000000000003c 0000000100000066 R_LARCH_ALIGN 0000000000000000 {{.*}} + b04
# RELAX-NEXT: 0000000000000048 0000000100000066 R_LARCH_ALIGN 0000000000000000 {{.*}} + 4
# RELAX-EMPTY:
# RELAX: 0000000000000000 0000000200000066 R_LARCH_ALIGN 0000000000000000 <null> + 4
# RELAX-EMPTY:
# RELAX: Symbol table '.symtab' contains 3 entries:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we should check symbol table? To check the temp symbol .Lla-relax-align is generated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned earlier, we need to create the symbol for R_LARCH_ALIGN. It seems that this symbol needs to be kept in the same section as the alignment directive (see [1]). So this test case wants to check that two symbols are created and that they are in different sections in the relocation table. The llvm-objdump -dr cannot check that case, is there any other suitable method?

[1] https://sourceware.org/pipermail/binutils/2024-January/131615.html

# RELAX: 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
# RELAX-NEXT: 1: 0000000000000000 0 NOTYPE LOCAL DEFAULT 2
# RELAX-NEXT: 2: 0000000000000000 0 NOTYPE LOCAL DEFAULT 4

.text
.p2align 4 # A = 0x0
nop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use non-nop instruction as a anchor.

.p2align 5 # B = A + 3 * NOP + NOP = 0x10
.p2align 4 # C = B + 7 * NOP = 0x2C
nop
.p2align 4, , 11 # D = C + 3 * NOP + NOP = 0x3C
## Not emit the third parameter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ## Not emit the third parameter. for this case?
Oh, I see. You have The maximum number of bytes is zero means ignore the emit limit., but this doesn't match GAS.

.p2align 4, , 12 # E = D + 3 * NOP = 0x48
# END = E + 3 * NOP = 0x54 = 21 * NOP

## Not emit R_LARCH_ALIGN if code alignment great than alignment directive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Not emit R_LARCH_ALIGN if code alignment great than alignment directive.
## Not emit R_LARCH_ALIGN if alignment directive is less than or equal to minimum code alignment(a.k.a 4).

.p2align 2
.p2align 1
.p2align 0
## Not emit instructions if max emit bytes less than min nop size.
.p2align 4, , 2
## Not emit R_LARCH_ALIGN if alignment directive with specific padding value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Not emit R_LARCH_ALIGN if alignment directive with specific padding value.
## Not emit R_LARCH_ALIGN if alignment directive with specific padding value.
## The behavior is the same as GNU assembler.

.p2align 4, 1
nop
.p2align 4, 1, 12

# RELAX-INST: <.text>:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got different output:

0000000000000000 <.text2>:
       0: 00 00 40 03   nop
       4: 00 00 40 03   nop
       8: 00 00 40 03   nop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't check <.text2>, because <.text2> is used to check symbol index.

# RELAX-INST-COUNT-21: nop
# RELAX-INST-COUNT-3: 01 01 01 01
# RELAX-INST-NEXT: nop
# RELAX-INST-COUNT-3: 01 01 01 01

.section .text2, "ax"
.p2align 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the last directive also take effect while not when relax is off. But this match RISCV and GAS.

But we can add an extra instruction at the end of line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, it is both the first and the last instruction in this section. There are some cases which alignment directive can not-generate R_LARCH_ALIGN. It may be improved in the future. But now the alignment directive is generated almost unconditionally when relax is enabled. I'll add an extra instruction at the end of line.