Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MC] Fix clang integrated assembler generates incorrect relocations… #83115

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yingopq
Copy link
Contributor

@yingopq yingopq commented Feb 27, 2024

… for mips32

When mips asm parse instruction la, check whether .rdata section was accessed and parsed, if it was not, then check the following statement, check if local symbol(eg "hello:") was in .rdata section, if it was in it, IsLocalSym is true.

Fix #65020

Copy link

github-actions bot commented Feb 27, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@brad0
Copy link
Contributor

brad0 commented Mar 8, 2024

@wzssyqa

CurrentASMContent.take_front(NewlineIndex).take_back(SymbolLength);

// Get and check if ".rdata" section exist.
size_t RdataIndex = CurrentASMContent.find(".rdata");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCC uses ".rodata.str1.4" for this code.

char *g = "xxxx";
char *x() {return g;}

Maybe the section name won't require to be something like ".rodata".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think that we need a more generic solution.
The naive scan string may have lots of bug.
maybe something like, in the next lines, we may have:

do_read_rdata:
      lw ...
      lw ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, MipsAsmParser.cpp use following code to parse .rdata,so the name, maybe I think, basically it won’t change.

 if (IDVal == ".rdata") {
    parseRSectionDirective(".rodata");
    return false;
  }

And I have considered the situation you mentioned, the patch contains this situation, it use containsto check if rdata section contains local symbol, instead of checking the next line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find is not a good way here: you may meet some strange strings from user's asmcode, may be from comment?

# this is a comment about .rdata 

// For O32, "$"-prefixed symbols are recognized as temporary while
// .L-prefixed symbols are not (PrivateGlobalPrefix is "$"). Recognize ".L"
// manually.
if (ABI.IsO32() && Res.getSymA()->getSymbol().getName().starts_with(".L"))
IsLocalSym = true;
else {
if (HasParseRdata == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be

else if (HasParseRdata == false)

So we won't need 2-level nest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I would change it.

@wzssyqa
Copy link
Contributor

wzssyqa commented Mar 12, 2024

In fact we can update the generic asm scan for all ports, instead mips only.
See MCSymbol *MCContext::createSymbol(StringRef Name, bool AlwaysAddSuffix, bool CanBeUnnamed) in llvm/lib/MC/MCContext.cpp

@yingopq
Copy link
Contributor Author

yingopq commented Mar 15, 2024

In fact we can update the generic asm scan for all ports, instead mips only. See MCSymbol *MCContext::createSymbol(StringRef Name, bool AlwaysAddSuffix, bool CanBeUnnamed) in llvm/lib/MC/MCContext.cpp

Can you explain it in more detail?

size_t CommaIndex = CurrentASMContent.find_first_of(',');
size_t SymbolLength = NewlineIndex - CommaIndex - 2;
StringRef LocalSymbol =
CurrentASMContent.take_front(NewlineIndex).take_back(SymbolLength);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the asm code is like

# more spaces and windows style newline
LA   $a0,                          localsymbolname         \r\n

It won't work.

CurrentASMContent.take_front(NewlineIndex).take_back(SymbolLength);

// Get and check if ".rdata" section exist.
size_t RdataIndex = CurrentASMContent.find(".rdata");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find is not a good way here: you may meet some strange strings from user's asmcode, may be from comment?

# this is a comment about .rdata 

@wzssyqa
Copy link
Contributor

wzssyqa commented Mar 15, 2024

In fact we can update the generic asm scan for all ports, instead mips only. See MCSymbol *MCContext::createSymbol(StringRef Name, bool AlwaysAddSuffix, bool CanBeUnnamed) in llvm/lib/MC/MCContext.cpp

Can you explain it in more detail?

Integrated assembler scan the asm code in MCParser/MCAsmParser.cpp, and if it find a symbol, it will call createSymbol just now, so that it cannot know that this symbol is defined in the same asm code later.

That's the real problem, and this problem effects all architectures.

@yingopq
Copy link
Contributor Author

yingopq commented Mar 15, 2024

In fact we can update the generic asm scan for all ports, instead mips only. See MCSymbol *MCContext::createSymbol(StringRef Name, bool AlwaysAddSuffix, bool CanBeUnnamed) in llvm/lib/MC/MCContext.cpp

Can you explain it in more detail?

Integrated assembler scan the asm code in MCParser/MCAsmParser.cpp, and if it find a symbol, it will call createSymbol just now, so that it cannot know that this symbol is defined in the same asm code later.

That's the real problem, and this problem effects all architectures.

OK,I would find another solution.

@yingopq
Copy link
Contributor Author

yingopq commented Mar 29, 2024

@wzssyqa Please help review this patch yingopq@dcc86c6

…or mips32

Modify asm scan starting point to ensure scaning .rdata section before
.text section.
Save begin location, check if have .rdata section, confirm where to
begin parse. If have it and .rdata after .text, begin parse from .rdata,
when go to Eof, jump to begin location and then continue parse. If did
not have .rdata or .rdata is before .text, jump to begin location and
then parse.

Fix llvm#65020
@yingopq
Copy link
Contributor Author

yingopq commented Apr 3, 2024

@topperc @wzssyqa Could you help review this patch and give me some advice?
This patch would generate some fail testcase which I am searching.
I did not konw if we should change asm scan beginning point or scan twice unconditionally.
Thanks!

@wzssyqa
Copy link
Contributor

wzssyqa commented Apr 4, 2024

I guess that we can also CC @MaskRay

@brad0
Copy link
Contributor

brad0 commented Apr 11, 2024

@MaskRay

@yingopq
Copy link
Contributor Author

yingopq commented Apr 23, 2024

@wzssyqa @topperc @MaskRay Could you give me some advice?
I did not konw if we should change asm scan beginning point or scan twice unconditionally.
Thanks!

@wzssyqa
Copy link
Contributor

wzssyqa commented Apr 23, 2024

@wzssyqa @topperc @MaskRay Could you give me some advice? I did not konw if we should change asm scan beginning point or scan twice unconditionally. Thanks!

I think that you can try to figure out a patch first. So that we can run some test.
Please go ahead.

@yingopq
Copy link
Contributor Author

yingopq commented Apr 25, 2024

@wzssyqa @topperc @MaskRay Could you give me some advice? I did not konw if we should change asm scan beginning point or scan twice unconditionally. Thanks!

I think that you can try to figure out a patch first. So that we can run some test. Please go ahead.

@wzssyqa I tested with this patch about scaning twice, the result of readelf -r would occur double entries than normal, could you have some advice? Is it better to scan it once?

diff --git a/llvm/lib/MC/MCParser/AsmParser.cpp b/llvm/lib/MC/MCParser/AsmParser.cpp
index 76a3e501f459..36c3f550e320 100644
--- a/llvm/lib/MC/MCParser/AsmParser.cpp
+++ b/llvm/lib/MC/MCParser/AsmParser.cpp
@@ -997,6 +997,9 @@ bool AsmParser::Run(bool NoInitialTextSection, bool NoFinalize) {
 
   getTargetParser().onBeginOfFile();
 
+  int cnt = 0;
+  SMLoc IDLoc_start = getTok().getLoc();
+A:
   // While we have input, parse each statement.
   while (Lexer.isNot(AsmToken::Eof)) {
     ParseStatementInfo Info(&AsmStrRewrites);
@@ -1017,6 +1020,13 @@ bool AsmParser::Run(bool NoInitialTextSection, bool NoFinalize) {
       eatToEndOfStatement();
   }
 
+  cnt++;
+  if (cnt == 1) {
+    jumpToLoc(IDLoc_start);
+    Lex();
+    goto A;
+  }
+
   getTargetParser().onEndOfFile();
   printPendingErrors();
 
diff --git a/llvm/lib/MC/MCStreamer.cpp b/llvm/lib/MC/MCStreamer.cpp
index 176d55aa890b..e6071abc02ed 100644
--- a/llvm/lib/MC/MCStreamer.cpp
+++ b/llvm/lib/MC/MCStreamer.cpp
@@ -42,7 +42,7 @@
 #include <utility>
 
 using namespace llvm;
-
+#define DEBUG_TYPE "mips-mcstreamer"
 MCTargetStreamer::MCTargetStreamer(MCStreamer &S) : Streamer(S) {
   S.setTargetStreamer(this);
 }
@@ -424,9 +424,11 @@ void MCStreamer::assignFragment(MCSymbol *Symbol, MCFragment *Fragment) {
 void MCStreamer::emitLabel(MCSymbol *Symbol, SMLoc Loc) {
   Symbol->redefineIfPossible();
 
-  if (!Symbol->isUndefined() || Symbol->isVariable())
-    return getContext().reportError(Loc, "symbol '" + Twine(Symbol->getName()) +
-                                             "' is already defined");
+  if (!Symbol->isUndefined() || Symbol->isVariable()) {
+    LLVM_DEBUG(dbgs() << DEBUG_TYPE << "symbol '" + Twine(Symbol->getName()) +
+                                             "' is already defined" << "\n");
+    return;
+  }
 
   assert(!Symbol->isVariable() && "Cannot emit a variable symbol!");
   assert(getCurrentSectionOnly() && "Cannot emit before setting section!");
$ readelf -r 3.o

Relocation section '.rel.text' at offset 0x1b0 contains 11 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000000  00000405 R_MIPS_HI16       00000000   _gp_disp
00000004  00000406 R_MIPS_LO16       00000000   _gp_disp
0000001c  0000050b R_MIPS_CALL16     00000000   printf
00000020  00000525 R_MIPS_JALR       00000000   printf
0000003c  00000405 R_MIPS_HI16       00000000   _gp_disp
00000040  00000406 R_MIPS_LO16       00000000   _gp_disp
00000018  00000209 R_MIPS_GOT16      00000000   .rodata
00000054  00000209 R_MIPS_GOT16      00000000   .rodata
00000058  00000206 R_MIPS_LO16       00000000   .rodata
0000005c  0000050b R_MIPS_CALL16     00000000   printf
00000060  00000525 R_MIPS_JALR       00000000   printf

Relocation section '.rel.pdr' at offset 0x208 contains 2 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000000  00000302 R_MIPS_32         0000003c   main
00000020  00000302 R_MIPS_32         0000003c   main

Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have more time spent on investigating the issue, but adding a lot of mips-specific code to the generic AsmParser.cpp is not acceptable.

@wzssyqa
Copy link
Contributor

wzssyqa commented Apr 27, 2024

I don't have more time spent on investigating the issue, but adding a lot of mips-specific code to the generic AsmParser.cpp is not acceptable.

Although her code is not correct, while I don't think that it is mips-specific, instead it is a generic problem.
With our current logic, if there are 2 symbols in a single ASM source file, and the previous symbol uses the later one,
the previous symbol don't treat the later symbol as local symbol.

GNU GAS doesn't have this problem.

@yingopq yingopq changed the title [Mips] Fix clang integrated assembler generates incorrect relocations… [MC] Fix clang integrated assembler generates incorrect relocations… Apr 28, 2024
@yingopq
Copy link
Contributor Author

yingopq commented Apr 28, 2024

I don't have more time spent on investigating the issue, but adding a lot of mips-specific code to the generic AsmParser.cpp is not acceptable.

@MaskRay Now we have two modification plans.
The first one is in the files changed in this PR, which changes the starting point of the scan to ensure that the .rdata section is in front of the .text section; the second one is scan twice as recommended by @wzssyqa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clang integrated assembler generates incorrect relocations for mips32
4 participants