Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross architecture support for dynamic linker (ld) #198

Closed
ningziwen opened this issue Jan 31, 2023 · 2 comments
Closed

Cross architecture support for dynamic linker (ld) #198

ningziwen opened this issue Jan 31, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@ningziwen
Copy link
Member

ningziwen commented Jan 31, 2023

Describe the bug
Using dynamic linker (ld) to run commands will hang.

Steps to reproduce

> finch run --rm --platform=amd64 -it public.ecr.aws/amazonlinux/amazonlinux:2
# /lib64/ld-linux-x86-64.so.2 /bin/whoami

Expected behavior
It should print "root" similar to running arm64 AL2 image.

> finch run --rm -it public.ecr.aws/amazonlinux/amazonlinux:2
# /lib64/ld-linux-aarch64.so.1 /bin/whoami
root

Screenshots or logs
Added QEMU_STRACE=1 to print more logs.

# QEMU_STRACE=1 /lib64/ld-linux-x86-64.so.2 /bin/whoami
10 brk(NULL) = 0x0000004000226000
10 openat(AT_FDCWD,"/bin/whoami",O_RDONLY|O_CLOEXEC) = 3
10 read(3,0x2a266f8,832) = 832
10 fstat(3,0x0000004002a26590) = 0
Segmentation fault (core dumped)

Additional context

➜  ~ finch version
Client:
 Version:	v0.3.0
 OS/Arch:	linux/arm64
 GitCommit:	d4a056654991ebb7ef169717c8dcb79105fb53ba
 nerdctl:
  Version:	v1.1.0
  GitCommit:	18944bc70784dfa83010d37054d75487a58ab581
 buildctl:
  Version:	v0.10.6
  GitCommit:	0c9b5aeb269c740650786ba77d882b0259415ec7

Server:
 containerd:
  Version:	v1.6.12
  GitCommit:	a05d175400b1145e5e6a735a6710579d181e7fb0
 runc:
  Version:	1.1.4
  GitCommit:	v1.1.4-0-g5fd4c4d1

in M1 mac

@ningziwen ningziwen added the bug Something isn't working label Jan 31, 2023
@ningziwen ningziwen self-assigned this Jan 31, 2023
@ningziwen
Copy link
Member Author

It works in ubuntu

➜  ~ finch run --rm --platform=amd64 -it public.ecr.aws/ubuntu/ubuntu
root@aef08829d000:
/# QEMU_STRACE=1 /lib64/ld-linux-x86-64.so.2 /bin/whoami
14 brk(NULL) = 0x0000004000038000
14 arch_prctl(12289,274920080944,274878025568,2048,3,1) = -1 errno=22 (Invalid argument)
14 openat(-100,"/bin/whoami",O_RDONLY|O_CLOEXEC) = 3
14 read(3,0x2838048,832) = 832
14 mmap(NULL,33224,PROT_READ,MAP_PRIVATE|MAP_DENYWRITE,3,0) = 0x000000400283b000
14 mprotect(0x000000400283d000,20480,PROT_NONE) = 0
14 mmap(0x000000400283d000,12288,PROT_EXEC|PROT_READ,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,3,0x2000) = 0x000000400283d000
14 mmap(0x0000004002840000,4096,PROT_READ,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,3,0x5000) = 0x0000004002840000
14 mmap(0x0000004002842000,8192,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,3,0x6000) = 0x0000004002842000
14 close(3) = 0
14 access("/etc/ld.so.preload",R_OK) = -1 errno=2 (No such file or directory)
14 openat(-100,"/etc/ld.so.cache",O_RDONLY|O_CLOEXEC) = 3
14 newfstatat(3,"",0x00000040028377a0,0x1000) = 0
14 mmap(NULL,5073,PROT_READ,MAP_PRIVATE,3,0) = 0x0000004002844000
14 close(3) = 0
14 openat(-100,"/lib/x86_64-linux-gnu/libc.so.6",O_RDONLY|O_CLOEXEC) = 3
14 read(3,0x28379c8,832) = 832
14 pread64(3,274920076768,784,64,274920077735,0) = 784
14 newfstatat(3,"",0x0000004002837860,0x1000) = 0
14 pread64(3,274920076448,784,64,59647,0) = 784
14 mmap(NULL,2117488,PROT_READ,MAP_PRIVATE|MAP_DENYWRITE,3,0) = 0x0000004002846000
14 mmap(0x0000004002868000,1544192,PROT_EXEC|PROT_READ,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,3,0x22000) = 0x0000004002868000
14 mmap(0x00000040029e1000,356352,PROT_READ,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,3,0x19b000) = 0x00000040029e1000
14 mmap(0x0000004002a38000,24576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,3,0x1f1000) = 0x0000004002a38000
14 mmap(0x0000004002a3e000,53104,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED,-1,0) = 0x0000004002a3e000
14 close(3) = 0
14 mmap(NULL,8192,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0) = 0x0000004002a4b000
14 arch_prctl(4098,274922258752,-274922261200,144,144,0) = 0
14 set_tid_address(274922259472,274922258752,274878132432,144,144,0) = 14
14 set_robust_list(274922259488,24,274878132432,144,144,0) = -1 errno=38 (Function not implemented)
14 Unknown syscall 334
14 mprotect(0x0000004002a38000,16384,PROT_READ) = 0
14 mprotect(0x0000004002842000,4096,PROT_READ) = 0
14 mprotect(0x0000004000034000,8192,PROT_READ) = 0
14 prlimit64(0,3,0,274920080288,65535,1) = 0
14 munmap(0x0000004002844000,5073) = 0
14 getrandom(274922222712,8,1,274920869888,274922199680,0) = 8
14 brk(NULL) = 0x0000004000038000
14 brk(0x0000004000059000) = 0x0000004000059000
14 geteuid() = 0
14 socket(PF_UNIX,SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK,IPPROTO_IP) = 3
14 connect(3,0x2838170,110) = -1 errno=2 (No such file or directory)
14 close(3) = 0
14 socket(PF_UNIX,SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK,IPPROTO_IP) = 3
14 connect(3,0x2838360,110) = -1 errno=2 (No such file or directory)
14 close(3) = 0
14 newfstatat(-100,"/etc/nsswitch.conf",0x0000004002838270,0) = 0
14 newfstatat(-100,"/",0x00000040028383b0,0) = 0
14 openat(-100,"/etc/nsswitch.conf",O_RDONLY|O_CLOEXEC) = 3
14 newfstatat(3,"",0x0000004002838190,0x1000) = 0
14 read(3,0x38a40,4096) = 494
14 read(3,0x38a40,4096) = 0
14 newfstatat(3,"",0x0000004002838270,0x1000) = 0
14 close(3) = 0
14 openat(-100,"/etc/passwd",O_RDONLY|O_CLOEXEC) = 3
14 newfstatat(3,"",0x00000040028382f0,0x1000) = 0
14 lseek(3,0,SEEK_SET) = 0
14 read(3,0x38a40,4096) = 840
14 close(3) = 0
14 newfstatat(1,"",0x00000040028384c0,0x1000) = 0
root
14 write(1,0x38a40,5) = 5
14 close(1) = 0
root@aef08829d000:
/#

@pendo324
Copy link
Member

pendo324 commented Apr 4, 2023

First of all, I would strongly advise anyone that stumbles upon this issue to not use the linker to directly invoke any executable. This is not an intended use of the linker (it works only to support debug purposes), and is certainly not the "most tested path" if relied upon in production. Had invocation with the linker not been a factor, this entire issue would not exist. The linker does not load executables into memory the same way that the kernel does, and these subtle differences are what cause obscure issues like this to occur. This explains why shelling into the AL2 container works (this just executes /bin/bash), and also why a regularly invoked whoami works, but direct invocation by the linker (/lib64/ld-linux-x86-64.so.2 /bin/whoami) fails.

As it turns out, this is a very specific issue that only manifests on operating systems where the user space is compiled to use ET_EXEC instead of the (today) more common ET_DYN. Because of how specific the conditions are to reproduce this, out of all commonly used deployment targets/base-layers, Amazon Linux 2 is the only actively maintained operating system I could find that is impacted by this issue.

The root cause of the SEGFAULT is that our QEMU user mode packages vend statically linked executables which are compiled with PIE (Position Independent Code) disabled. This can be seen in the output of the file command on any of the qemu-<arch>-static binaries:

$ file /usr/bin/qemu-aarch64-static
/usr/bin/qemu-aarch64-static: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=faa4b597732e1ccf9e1ead6d9cdff7620ff0d212, for GNU/Linux 3.2.0, not stripped, too many notes (256)

Because of this, when the linker loads a program into memory via direct invocation, it overrides sections of the QEMU user mode executable. Because the QEMU user mode executable is basically an interpreter from a foreign architecture's machine code to native machine code, once the QEMU user mode binary is corrupted in memory, it causes the entire process to crash.

There are a few workarounds to this (besides the obvious ones of "don't use AL2 / ET_EXEC user mode" and "don't directly invoke via the dynamic linker"):

  1. Use something like Rosetta 2, which does ahead of time binary translation. Because the translation process is complete before the target program is executed, it doesn't matter if the Rosetta code itself is overwritten in memory. This was actually already implemented recently by exp(feat): enable Virtualization.framework and Rosetta  #282
  2. Use different QEMU user mode binaries. There are two main paths to achieve this
    1. Enable (or rather, don't disable) PIE when building QEMU static binaries. Because the QEMU user mode binary is invoked by the VM's kernel (not the container's kernel), the QEMU binary can be loaded high in virtual address space, somewhere where its unlikely for the ET_EXEC AL2 binary to overwrite it
    2. Manually specify segments using LDFLAGS like --section-start or a custom linker script. I think this is more brittle, but also viable

In the unlikely scenario that someone else runs into this issue, please feel free to reopen it, but for now, I'm going to close this issue with the recommendation to use Rosetta 2 if possible. In the future, Finch may also install customized QEMU user mode binaries to mitigate this where Rosetta is not available (like on Intel macs, or when on Apple Silicon macs with macOS < 13.x).

@pendo324 pendo324 closed this as completed Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants