Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FreeBSD] mold output sometimes doesn't work if stripped #456

Open
koobs opened this issue Apr 21, 2022 · 25 comments
Open

[FreeBSD] mold output sometimes doesn't work if stripped #456

koobs opened this issue Apr 21, 2022 · 25 comments

Comments

@koobs
Copy link

koobs commented Apr 21, 2022

Summary

After upgrading the FreeBSD devel/mimalloc port version from 2.0.5 to 2.0.6 on a recent FreeBSD 14-CURRENT build, mold fails to run, outputting the following on invocation:

[koobs@140-CURRENT-amd64-564d:/usr/home/koobs/repos/freebsd/ports/devel/mimalloc] mold
Mapsize overflow
Mapsize overflow
zsh: exec format error: mold

This appears to be related to a recent freebsd base commit bf83941638 by @kostikbel at the end of last year via a review [1] that is not publicly available

Reproduction Environment / Details

  • FreeBSD 14.0-CURRENT main-n254374-4fd141c7d7a GENERIC-NODEBUG amd64
  • mold 1.2
  • mimalloc 2.0.6

Note

  • The issue does not appear reproducible with mold 1.2 with mimalloc 2.0.5
  • The issue does not appear reproducible with mold 1.1.1 with mimalloc 2.0.6

Steps to Reproduce

  • Build mold 1.2 with mimalloc 2.0.6 on FreeBSD CURRENT
  • Run the mold command

References

[1] https://reviews.freebsd.org/D33359

@rui314
Copy link
Owner

rui314 commented Apr 22, 2022

Thank you for your report. Since it's not easy to set up an environment to reproduce the issue, do you mind if I ask you to help me debug this?

  1. Can you get a stacktrace of mold when it crashes?
  2. If 1.1.1 is OK but 1.2.0 isn't, there might be a regression introduced at some point between the two. Can you find it by git bisect?

@kostikbel
Copy link

The 'Mapsize overflow' error comes from the image activator. Basically, for some binary, the total size of the segments to mmap is too large.

Find the binary that causes the problem and put it somewhere so that I can take a look at it.

@rui314
Copy link
Owner

rui314 commented Apr 22, 2022

mold exclusively uses mmap for all file IO, and it can handle multi-gibibyte input files and output files in a few seconds. This usage pattern may be unique.

@koobs
Copy link
Author

koobs commented Apr 22, 2022

@rui314 I'll do what I can this end. It may be worth you testing a branch with thirdparty updating mimalloc to 2.0.6 to see if anything interesting comes up in CI for other platforms

Also, I'm not sure yet that it's a crash, the error appears to be FreeBSD's elf handling just abort the load via:

+				uprintf("Mapsize overflow\n");
+				error = ENOEXEC;

I'll run mold under gdb/truss to see if I cant identify anything interesting, but @kostikbel should be able to provide some expert insight

@koobs
Copy link
Author

koobs commented Apr 22, 2022

The 'Mapsize overflow' error comes from the image activator. Basically, for some binary, the total size of the segments to mmap is too large.

Find the binary that causes the problem and put it somewhere so that I can take a look at it.

@kostikbel it's reproducible using devel/mold using devel/mimalloc (updated to 2.0.6) today) from ports

@kostikbel
Copy link

I need a binary that causes the problem. It happens during execve(2) of the binary, due to some peculiarity in the binary format, which is rejected by the in-kernel image activator. It does not occurs during runtime, because runtime simply does not happen.

@koobs
Copy link
Author

koobs commented Apr 22, 2022

I need a binary that causes the problem. It happens during execve(2) of the binary, due to some peculiarity in the binary format, which is rejected by the in-kernel image activator. It does not occurs during runtime, because runtime simply does not happen.

Did you miss my #456 (comment) ? Install devel/mold from latest ports tree. That is the binary that is triggering the error, or is that not appropriate for testing this case?

@X547
Copy link

X547 commented Apr 22, 2022

Did you miss my #456 (comment) ? Install devel/mold from latest ports tree. That is the binary that is triggering the error, or is that not appropriate for testing this case?

Can you provide direct link to download problematic mold executable or upload it somewhere (even attach here if not so big)? Some people may have no access to FreeBSD installation. Inspecting executable may help to identify issue.

"Mapsize overflow" error seems caused by too big ELF program header p_memsz field that cause integer overflow.

@rui314
Copy link
Owner

rui314 commented Apr 22, 2022

@koobs Did you link mold using mold? If so, the problem might not exist in mimalloc but in the mold executable that links the problematic mold executable.

@koobs
Copy link
Author

koobs commented Apr 22, 2022

@rui I'll test both cases (linked with mold, without the issue) and with base lld, and upload binaries here

@koobs
Copy link
Author

koobs commented Apr 22, 2022

mold 1.2 linked with mimalloc 2.0.6 linked with mold 1.2 linked with mold 1.2

readelf -p .comment /usr/local/bin/mold

String dump of section '.comment':
  [     1]  FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303)
  [    64]  mold 1.2.0 (compatible with GNU ld)
/usr/local/bin/mold
Mapsize overflow
zsh: exec format error: /usr/local/bin/mold

File: mold.zip

@koobs
Copy link
Author

koobs commented Apr 22, 2022

I can't reproduce with mold 1.2 linked with mimalloc 2.0.6 linked with lld

File: mold-lld.zip

@kostikbel
Copy link

This is the excerpt from the program headers dump:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x00000000004c4040 0x00000000004c4040
                 0x00000000000002a0 0x00000000000002a0  R      0x8
  INTERP         0x00000000000002e0 0x00000000000002e0 0x0000000000000000
                 0x0000000000000015 0x0000000000000015  R      0x1
      [Requesting program interpreter: /libexec/ld-elf.so.1]
  NOTE           0x00000000000002f8 0x00000000000002f8 0x0000000000000000
                 0x0000000000000048 0x0000000000000048  R      0x4
  LOAD           0x0000000000000000 0x00000000004c4000 0x00000000004c4000
                 0x000000000004d920 0xffffffffffb89920  R      0x1000
...

I cut the output, the last pasted loadable segment is the obvious culprit, it's memsiz is nonsensical.
Something is broken in linker.

@rui314
Copy link
Owner

rui314 commented Apr 23, 2022

@koobs I cannot reproduce it, so it must be a subtle bug. Can you share not only the executable but object files that you use to create that executable? You can simply zip the entire mold directory.

@wahjava
Copy link
Contributor

wahjava commented Jun 19, 2022

Hi @rui314,

I maintain the FreeBSD port devel/mold and it seems to happen after the executable is stripped, e.g. on 13.1-RELEASE (amd64) with 1.3.0 (update commit diff: wahjava/freebsd-ports@ac3ce36 also contains couple of patches which make it build on FreeBSD which I'm planning to send upstream):

❯ work/stage/usr/local/bin/mold
mold: fatal: -m option is missing
❯ strip work/stage/usr/local/bin/mold
strip: moving loadable section .interp, is this intentional?
strip: moving loadable section .note.tag, is this intentional?
strip: moving loadable section .hash, is this intentional?
strip: moving loadable section .gnu.hash, is this intentional?
strip: moving loadable section .dynsym, is this intentional?
strip: moving loadable section .dynstr, is this intentional?
strip: moving loadable section .gnu.version, is this intentional?
strip: moving loadable section .gnu.version_r, is this intentional?
strip: moving loadable section .rela.dyn, is this intentional?
strip: moving loadable section .rela.plt, is this intentional?
❯ work/stage/usr/local/bin/mold
Mapsize overflow
zsh: exec format error: work/stage/usr/local/bin/mold

The executable work/stage/usr/local/bin/mold was linked with, whereas /usr/local/bin/mold is linked with LLVM LLD 13.0.0:

c++ out/compress.o out/demangle.o out/filepath.o out/glob.o out/hyperloglog.o out/main.o out/multi-glob.o out/perf.o out/strerror.o out/tar.o out/uuid.o out/elf/arch-arm32.o out/elf/arch-arm64.o out/elf/arch-i386.o out/elf/arch-riscv64.o out/elf/arch-x86-64.o out/elf/cmdline.o out/elf/dwarf.o out/elf/gc-sections.o out/elf/icf.o out/elf/input-files.o out/elf/input-sections.o out/elf/linker-script.o out/elf/lto.o out/elf/main.o out/elf/mapfile.o out/elf/output-chunks.o out/elf/passes.o out/elf/relocatable.o out/elf/subprocess.o out/macho/arch-arm64.o out/macho/arch-x86-64.o out/macho/cmdline.o out/macho/dead-strip.o out/macho/input-files.o out/macho/input-sections.o out/macho/lto.o out/macho/main.o out/macho/mapfile.o out/macho/output-chunks.o out/macho/tapi.o out/macho/yaml.o -o mold -pthread -lz -lm -ldl -lmimalloc out/tbb/libs/libtbb.a -L/usr/local/lib -lcrypto -fuse-ld=/usr/local/bin/mold -L/usr/local/lib -Wl,-rpath,/usr/local/lib -fstack-protector-strong

Please let me know if you need more information to get to the bottom of this.

Thanks!

@rui314
Copy link
Owner

rui314 commented Jun 19, 2022

@wahjava

The executable work/stage/usr/local/bin/mold was linked with, whereas /usr/local/bin/mold is linked with LLVM LLD 13.0.0:

Looks like the word after linked with is missing. Was that linked with mold?

@wahjava
Copy link
Contributor

wahjava commented Jun 19, 2022

Sorry for lack of clarity on my part. The command-line I posted is the linking stage command-line and contains -fuse-ld=/usr/local/bin/mold.

@rui314
Copy link
Owner

rui314 commented Jun 19, 2022

Do you mind if I ask you to build some other program with -fuse-ld=/usr/local/bin/mold, strip the resulting binary and run it to see if the same error occurs?

@wahjava
Copy link
Contributor

wahjava commented Jun 19, 2022

@rui314, ofcourse not. Although, I tried a simple hello world program, and wasn't able to reproduce it with that. Anyways, which one would you like me to try ?

@rui314
Copy link
Owner

rui314 commented Jun 19, 2022

I set up a FreeBSD 13 machine on AWS, build mold using mold on it and stripped the resulting binary. The issue indeed occured. Here is a comparison of the unstripped and stripped binaries.

--- /tmp//sh-np.ADiUGu  2022-06-19 11:33:40.654452000 +0000
+++ /tmp//sh-np.KoLSrN  2022-06-19 11:33:40.659622000 +0000
@@ -3,116 +3,106 @@
   Class:                             ELF64
   Data:                              2's complement, little endian
   Version:                           1 (current)
   OS/ABI:                            NONE
   ABI Version:                       0
   Type:                              EXEC (Executable file)
   Machine:                           Advanced Micro Devices x86-64
   Version:                           0x1
   Entry point address:               0x209fd0
   Start of program headers:          64 (bytes into file)
-  Start of section headers:          10585472 (bytes into file)
+  Start of section headers:          5168200 (bytes into file)
   Flags:                             0
   Size of this header:               64 (bytes)
   Size of program headers:           56 (bytes)
   Number of program headers:         12
   Size of section headers:           64 (bytes)
-  Number of section headers:         49
-  Section header string table index: 37
+  Number of section headers:         39
+  Section header string table index: 36

 Elf file type is EXEC (Executable file)
 Entry point 0x209fd0
 There are 12 program headers, starting at offset 64

 Program Headers:
   Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
   PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0002a0 0x0002a0 R   0x8
-  INTERP         0x0002e0 0x00000000002002e0 0x00000000002002e0 0x000015 0x000015 R   0x1
-      [Requesting program interpreter: /libexec/ld-elf.so.1]
-  NOTE           0x0002f8 0x00000000002002f8 0x00000000002002f8 0x000048 0x000048 R   0x4
-  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x008898 0x008898 R   0x1000
+  INTERP         0x0002e0 0x00000000002002e0 0x00000000002002e0 0x000035 0x000015 R   0x1
+      [Requesting program interpreter: ]
+  NOTE           0x0002f8 0x00000000002002f8 0x00000000002002f8 0x000068 0x000048 R   0x4
+  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x0088b8 0x008898 R   0x1000
   LOAD           0x009000 0x0000000000209000 0x0000000000209000 0x42f41c 0x42f41c R E 0x1000
   LOAD           0x439000 0x0000000000639000 0x0000000000639000 0x08a260 0x08a260 R   0x1000
-  LOAD           0x4c4000 0x00000000006c4000 0x00000000006c4000 0x026f28 0x088479 RW  0x1000
-  TLS            0x4c4000 0x00000000006c4000 0x00000000006c4000 0x000008 0x000119 RW  0x10
+  LOAD           0x4c4000 0x00000000006c4000 0x00000000006c4000 0x026f28 0x026f28 RW  0x1000
+  TLS            0x4c4000 0x00000000006c4000 0x00000000006c4000 0x000008 0x000008 RW  0x10
   DYNAMIC        0x4c4150 0x00000000006c4150 0x00000000006c4150 0x000270 0x000270 RW  0x8
   GNU_EH_FRAME   0x4415cc 0x00000000006415cc 0x00000000006415cc 0x0019ec 0x0019ec R   0x4
   GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
-  GNU_RELRO      0x4c4000 0x00000000006c4000 0x00000000006c4000 0x0020c1 0x003000 R   0x40
+  GNU_RELRO      0x4c4000 0x00000000006c4000 0x00000000006c4000 0x0020c1 0x0020c1 R   0x40

As you can see, strip messed up the INTERP segment. Isn't it an issue of FreeBSD's strip command? Nothing seems to be obviously wrong with the mold's output, and the output works if we do not strip it.

@rui314 rui314 changed the title mold (1.2) fails to run with mimalloc (2.0.6) with Mapsize overflow error on FreeBSD 14-CURRENT [FreeBSD] mold output sometimes doesn't work if stripped Jun 19, 2022
@wahjava
Copy link
Contributor

wahjava commented Jun 19, 2022

I tried the test again with binutils' strip (GNU strip (GNU Binutils) 2.37) this time, and that spared the executable, so indeed something with FreeBSD's strip (strip (elftoolchain r3769)):

--- mold.pre    2022-06-19 12:29:06.778384000 +0000
+++ mold.post   2022-06-19 12:29:31.152217000 +0000
@@ -10,14 +10,14 @@
   Version:                           0x1
   Entry point address:               0x20a000
   Start of program headers:          64 (bytes into file)
-  Start of section headers:          108098168 (bytes into file)
+  Start of section headers:          10577632 (bytes into file)
   Flags:                             0
   Size of this header:               64 (bytes)
   Size of program headers:           56 (bytes)
   Number of program headers:         12
   Size of section headers:           64 (bytes)
-  Number of section headers:         48
-  Section header string table index: 36
+  Number of section headers:         38
+  Section header string table index: 37

 Elf file type is EXEC (Executable file)
 Entry point 0x20a000
@@ -41,7 +41,7 @@
                  0x0000000000083150 0x0000000000083150  R      0x1000
   LOAD           0x00000000009b2000 0x0000000000bb2000 0x0000000000bb2000
                  0x0000000000023e48 0x000000000002acf9  RW     0x1000
-  TLS            0x0000000000000000 0x0000000000bb2000 0x0000000000bb2000
+  TLS            0x00000000009b2000 0x0000000000bb2000 0x0000000000bb2000
                  0x0000000000000000 0x0000000000000108  RW     0x10
   DYNAMIC        0x00000000009b2128 0x0000000000bb2128 0x0000000000bb2128
                  0x0000000000000280 0x0000000000000280  RW     0x8
@@ -49,8 +49,8 @@
                  0x00000000000010c4 0x00000000000010c4  R      0x4
   GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                  0x0000000000000000 0x0000000000000000  RW     0
-  GNU_RELRO      0x0000000000000000 0x0000000000bb2000 0x0000000000bb2000
-                 0x0000000000000e81 0x0000000000001000  R      0x40
+  GNU_RELRO      0x00000000009b2000 0x0000000000bb2000 0x0000000000bb2000
+                 0x0000000000001000 0x0000000000001000  R      0x40

@kostikbel
Copy link

@emaste

@aokblast
Copy link
Contributor

aokblast commented Feb 4, 2024

Hello, I have reworked on mold port in this patch. And I think I have been fix this issue in the recent commit. Please help me test on this issue to confirm their is no any other problem now!

@wahjava
Copy link
Contributor

wahjava commented Feb 4, 2024

Hello, I have reworked on mold port in this patch. And I think I have been fix this issue in the recent commit. Please help me test on this issue to confirm their is no any other problem now!

Thanks @aokblast. I wonder why didn't you cc the maintainer in the differential ? I can test it later.

@aokblast
Copy link
Contributor

aokblast commented Feb 6, 2024

Hello, I have reworked on mold port in this patch. And I think I have been fix this issue in the recent commit. Please help me test on this issue to confirm their is no any other problem now!

Thanks @aokblast. I wonder why didn't you cc the maintainer in the differential ? I can test it later.

Sorry, I add you now. It is the first (or second?) time I work on port. I am not familiar with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants