Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Focal armhf and arm64 builds fail with semop: function not implemented: #36

Closed
sloretz opened this issue Mar 20, 2020 · 12 comments · Fixed by #37
Closed

Focal armhf and arm64 builds fail with semop: function not implemented: #36

sloretz opened this issue Mar 20, 2020 · 12 comments · Fixed by #37

Comments

@sloretz
Copy link
Contributor

sloretz commented Mar 20, 2020

Opening a ticket with notes copied from: #35 . See also @clalancette's comment #35 (comment) . @j-rivero also says similar errors are happening on http://build.osrfoundation.org

The Noetic Focal armhf and arm64 jobs jobs all started failling 2 days ago.

Random Notes

  • Only armhf and arm64 jobs on Ubuntu Focal are failing
    • Focal amd64, Buster amd64, and Buster arm64 jobs are not failing
  • The last successful job was 6 days ago, and the first failure was 2 days ago, so it must be from a change after March 13th but before March 17th.
  • For all failing jobs, it shows in the console as:
dpkg-source: info: using options from ros-noetic-pcl-msgs-0.3.0/debian/source/options: --auto-commit
 fakeroot debian/rules clean
semop(1): encountered an error: Function not implemented
dpkg-buildpackage: error: fakeroot debian/rules clean subprocess returned exit status 1
E: Building failed
Traceback (most recent call last):
  File "/tmp/ros_buildfarm/ros_buildfarm/binarydeb_job.py", line 138, in build_binarydeb
    subprocess.check_call(cmd, cwd=source_dir)
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['apt-src', 'build', 'ros-noetic-pcl-msgs']' returned non-zero exit status 1.
+ docker version
Client: Docker Engine - Community
 Version:           19.03.2
 API version:       1.40
 Go version:        go1.12.8
 Git commit:        6a30dfc
 Built:             Thu Aug 29 05:28:19 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.2
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.8
  Git commit:       6a30dfc
  Built:            Thu Aug 29 05:26:54 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
  • @clalancette says the path to investigate is fakeroot-> container glibc -> qemu -> host glibc -> kernel
  • Successful job had container glibc version 2.30-0ubuntu3
  • Failing job had container glibc version 2.31-0ubuntu6
@sloretz
Copy link
Contributor Author

sloretz commented Mar 20, 2020

@j-rivero Is this one of the failing jobs? https://build.osrfoundation.org/view/all/job/ign-rendering-debbuilder/139/consoleFull

Edit: Oops, disregard, that's waaaay too old.

@j-rivero
Copy link
Contributor

Edit: Oops, disregard, that's waaaay too old.

This one was the first I see failing: https://build.osrfoundation.org/view/All/job/ign-math6-debbuilder/396/console

+ mk-build-deps -r -i debian/control --tool 'apt-get --yes -o Debug::pkgProblemResolver=yes -o  Debug::BuildDeps=yes'
semop(1): encountered an error: Function not implemented
Error in the build process: exit status 1
dpkg: error: cannot access archive 'ignition-math6-build-deps_6.4.0-1~focal_all.deb': No such file or directory

@sloretz
Copy link
Contributor Author

sloretz commented Apr 2, 2020

To reproduce locally:

docker run --rm -it osrf/ubuntu_arm64:focal /bin/sh -c 'apt update && apt install -y fakeroot && fakeroot'
Get:1 http://ports.ubuntu.com focal InRelease [255 kB]
Hit:2 http://ports.ubuntu.com focal-updates InRelease
Hit:3 http://ports.ubuntu.com focal-backports InRelease
Get:4 http://ports.ubuntu.com focal/universe arm64 Packages [11.1 MB]
Get:5 http://ports.ubuntu.com focal/restricted arm64 Packages [1546 B]                                                                                                                                          
Get:6 http://ports.ubuntu.com focal/multiverse arm64 Packages [139 kB]                                                                                                                                          
Get:7 http://ports.ubuntu.com focal/main arm64 Packages [1239 kB]                                                                                                                                               
Fetched 12.7 MB in 1min 24s (152 kB/s)                                                                                                                                                                          
Reading package lists... Done
Building dependency tree       
Reading state information... Done
31 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libfakeroot
The following NEW packages will be installed:
  fakeroot libfakeroot
0 upgraded, 2 newly installed, 0 to remove and 31 not upgraded.
Need to get 88.0 kB of archives.
After this operation, 389 kB of additional disk space will be used.
Get:1 http://ports.ubuntu.com focal/main arm64 libfakeroot arm64 1.24-1 [26.0 kB]
Get:2 http://ports.ubuntu.com focal/main arm64 fakeroot arm64 1.24-1 [61.9 kB]
Fetched 88.0 kB in 2s (55.9 kB/s)   
Selecting previously unselected package libfakeroot:arm64.
(Reading database ... 10885 files and directories currently installed.)
Preparing to unpack .../libfakeroot_1.24-1_arm64.deb ...
Unpacking libfakeroot:arm64 (1.24-1) ...
Selecting previously unselected package fakeroot.
Preparing to unpack .../fakeroot_1.24-1_arm64.deb ...
Unpacking fakeroot (1.24-1) ...
Setting up libfakeroot:arm64 (1.24-1) ...
Setting up fakeroot (1.24-1) ...
update-alternatives: using /usr/bin/fakeroot-sysv to provide /usr/bin/fakeroot (fakeroot) in auto mode
Processing triggers for libc-bin (2.31-0ubuntu6) ...
semop(1): encountered an error: Function not implemented

Also, running sudo perf trace -e semop during the call to fakeroot shows nothing on the console. @clalancette does this mean the semop syscall is not making it all the way to the linux kernel?

@sloretz
Copy link
Contributor Author

sloretz commented Apr 2, 2020

#include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>

int main() {
	struct sembuf sops[1];
	int semid = semget(12345, 1, IPC_CREAT|0600);

	sops[0].sem_num = 0;
	sops[0].sem_op = -1;
	sops[0].sem_flg = SEM_UNDO;

	if (semop(semid, sops, 1) == -1) {
		perror("semop");
		return 1;
	}

	return 0;
}

Expected behavior: it should block forever because "If sem_op is less than zero [...] semncnt (the counter of threads waiting for this semaphore's value to increase) is incremented by one and the thread sleeps"

Actual behavior:

$ gcc semop.c 
$ ./a.out 
semop: Function not implemented
$

@sloretz
Copy link
Contributor Author

sloretz commented Apr 7, 2020

Running qemu-aarch64-static -strace in with the program above gives some clues

qemu-aarch64-static -strace ./a.out 
3910 brk(NULL) = 0x0000004000012000
3910 uname(0x4000811da8) = 0
3910 faccessat(AT_FDCWD,"/etc/ld.so.preload",R_OK,AT_SYMLINK_NOFOLLOW|0x50) = -1 errno=2 (No such file or directory)
3910 openat(AT_FDCWD,"/etc/ld.so.cache",O_RDONLY|O_CLOEXEC) = 3
3910 fstat(3,0x0000004000811330) = 0
3910 mmap(NULL,15241,PROT_READ,MAP_PRIVATE,3,0) = 0x0000004000847000
3910 close(3) = 0
3910 openat(AT_FDCWD,"/lib/aarch64-linux-gnu/libc.so.6",O_RDONLY|O_CLOEXEC) = 3
3910 read(3,0x8114f0,832) = 832
3910 fstat(3,0x0000004000811390) = 0
3910 mmap(NULL,1510480,PROT_EXEC|PROT_READ,MAP_PRIVATE|MAP_DENYWRITE,3,0) = 0x000000400084b000
3910 mprotect(0x00000040009a4000,61440,PROT_NONE) = 0
3910 mmap(0x00000040009b3000,24576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,3,0x158000) = 0x00000040009b3000
3910 mmap(0x00000040009b9000,11344,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED,-1,0) = 0x00000040009b9000
3910 close(3) = 0
3910 mmap(NULL,8192,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0) = 0x00000040009bc000
3910 mprotect(0x00000040009b3000,12288,PROT_READ) = 0
3910 mprotect(0x0000004000010000,4096,PROT_READ) = 0
3910 mprotect(0x0000004000844000,4096,PROT_READ) = 0
3910 munmap(0x0000004000847000,15241) = 0
3910 semget(12345,1,896,274877909196,0,8354166038063564516) = 0
3910 semtimedop(0,274886370816,1,0,0,8354166038063564516) = -1 errno=38 (Function not implemented)
3910 dup(2,4222427270,32,274888117504,0,8354166038063564516) = 3
3910 fcntl(3,F_GETFL) = 2
3910 brk(NULL) = 0x0000004000012000
3910 brk(0x0000004000033000) = 0x0000004000012000
3910 mmap(NULL,1048576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0) = 0x00000040009be000
3910 fstat(3,0x00000040008117e8) = 0
3910 write(3,0x9be480,32)semop: Function not implemented
 = 32
3910 close(3) = 0
3910 exit_group(1)

It looks like the syscall returning ENOSYS is semtimedop(). I guess this is from qemu implementing semop as a call to semtimedop from glibc 2.31 inside the container implementing semop as a call to __semtimedop. This man page says

semtimedop() behaves identically to semop() except that in those cases where the calling thread would sleep, the duration of that sleep is limited by the amount of elapsed time specified by the timespec structure whose address is passed in the timeout argument.

Since semtimedop() takes a struct timespec as an argument, I wonder if this bit in the GNU C Library 2.31 changelog is related

  • System call wrappers for time system calls now use the new time64 system
    calls when available. On 32-bit targets, these wrappers attempt to call
    the new system calls first and fall back to the older 32-bit time system
    calls if they are not present. This may cause issues in environments
    that cannot handle unsupported system calls gracefully by returning
    -ENOSYS. Seccomp sandboxes are affected by this issue.

@sloretz
Copy link
Contributor Author

sloretz commented Apr 7, 2020

Minimal example calling semtimedop directly. Same behavior as before: this works outside the container (blocks forever as expected) but errors when run inside the container.

#define _GNU_SOURCE  // Compiler warns semtimedop is implicitly declared without this

#include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>

int main() {
	struct sembuf sops[1];
	int semid = semget(12345, 1, IPC_CREAT|0600);

	sops[0].sem_num = 0;
	sops[0].sem_op = -1;
	sops[0].sem_flg = SEM_UNDO;

	if (semtimedop(semid, sops, 1, NULL) == -1) {
		perror("semtimedop");
		return 1;
	}

	return 0;
}
root@c9b5a3fddeba:/tmp# gcc semop.c 
root@c9b5a3fddeba:/tmp# ./a.out 
semtimedop: Function not implemented

@sloretz
Copy link
Contributor Author

sloretz commented Apr 7, 2020

Adding a sleep to the above program and running sudo perf trace -a -p <pid> shows the semget() call, but nothing else so the semtimedop() call. I think that means qemu must be the one returning ENOSYS.

         ? (     ?   ): a.out/23838  ... [continued]: clock_nanosleep()) = 0
     0.282 ( 0.008 ms): a.out/23838 semget(key: 12345, nsems: 1, semflg: 896                              ) = 0
     0.610 ( 0.006 ms): a.out/23838 dup(fildes: 2                                                         ) = 3
     0.888 ( 0.008 ms): a.out/23838 fcntl(fd: 3</dev/pts/0>, cmd: GETFL                                   ) = RDWR|LARGEFILE
     3.903 ( 0.030 ms): a.out/23838 mmap(addr: 0x4000012000, len: 135168, flags: PRIVATE|ANONYMOUS|NORESERVE) = 0x7fc2a3bef000
     3.940 ( 0.021 ms): a.out/23838 mmap(addr: 0x7fc2a3bef000, len: 135168, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS|FIXED) = 0x7fc2a3bef000
     3.991 ( 0.007 ms): a.out/23838 brk(brk: 0x62ed3000                                                   ) = 0x62ed3000
     4.035 ( 0.014 ms): a.out/23838 munmap(addr: 0x7fc2a3bef000, len: 135168                              ) = 0
     4.395 ( 0.012 ms): a.out/23838 mmap(addr: 0x40009be000, len: 1048576, flags: PRIVATE|ANONYMOUS|NORESERVE) = 0x40009be000
     4.413 ( 0.014 ms): a.out/23838 mmap(addr: 0x40009be000, len: 1048576, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS|FIXED) = 0x40009be000
     9.497 ( 0.023 ms): a.out/23838 fstat(fd: 3</dev/pts/0>, statbuf: 0x7ffd8a99cb10                      ) = 0
    11.430 ( 0.007 ms): a.out/23838 write(fd: 3</dev/pts/0>, buf: 0x40009be480, count: 37                 ) = 37
    12.015 ( 0.003 ms): a.out/23838 close(fd: 3</dev/pts/0>                                               ) = 0
    13.150 (     ?   ): a.out/23838 exit_group(error_code: 1                                              )
         ? (     ?   ): a.out/23840  ... [continued]: futex()) = -1 (null) INTERNAL ERROR: strerror_r(512, [buf], 128)=22
sleep first code
#define _GNU_SOURCE  // Compiler warns semtimedop is implicitly declared without this

#include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>
#include <unistd.h>

int main() {
	sleep(15);
	struct sembuf sops[1];
	int semid = semget(12345, 1, IPC_CREAT|0600);

	sops[0].sem_num = 0;
	sops[0].sem_op = -1;
	sops[0].sem_flg = SEM_UNDO;

	if (semtimedop(semid, sops, 1, NULL) == -1) {
		perror("semtimedop");
		return 1;
	}

	return 0;
}

@sloretz
Copy link
Contributor Author

sloretz commented Apr 7, 2020

I cam reproduce the issue outside of a docker container by building semop.c above with the following steps:

  1. Launch container described above
  2. Build semop.c statically gcc -ggdb -static semop.c
  3. Copy it out of the container with docker cp <container id>:/tmp/a.out /tmp/a.out.aarch64
  4. Run qemu-aarch64-static /tmp/a.out.aarch64 (using qemu-user-static/bionic-updates,bionic-security,now 1:2.11+dfsg-1ubuntu7.23 amd64)

Output:

qemu-aarch64-static /tmp/a.out.aarch64 
qemu: Unsupported syscall: 192
semtimedop: Function not implemented

If I build qemu master from source with

../configure --target-list=aarch64-linux-user --static --disable-system --enable-linux-user
make -j8

Then in the aarch64-linux-user directory run ./qemu-aarch64 /tmp/a.out.aarch64 I get the output:

> ./qemu-aarch64 /tmp/a.out.aarch64
semtimedop: Function not implemented

@sloretz
Copy link
Contributor Author

sloretz commented Apr 7, 2020

Getting the example out of the docker container allowed me to run gdb with gdb --args ./qemu-aarch64 -strace /tmp/a.out.aarc64. I couldn't do it before because gdp in the container needs ptrace() but qemu doesn't implement that, and gdb outside the container wasn't able to attach to a.out, probably because it's a different architecture.

It looks like the root cause is semtimedop is unimplemented for aarch64-linux-user in qemu.

#0  do_syscall1 (cpu_env=cpu_env@entry=0xc97db0, num=num@entry=192, arg1=arg1@entry=0, arg2=arg2@entry=365080608512, arg3=1, arg4=0, arg5=0, arg6=-8853709054069353085, arg8=0, arg7=0)
    at /tmp/qemu/linux-user/syscall.c:12413
#1  0x00000000004725c9 in do_syscall (cpu_env=cpu_env@entry=0xc97db0, num=192, arg1=0, arg2=365080608512, arg3=<optimized out>, arg4=<optimized out>, arg5=0, arg6=-8853709054069353085, arg7=0, arg8=0)
    at /tmp/qemu/linux-user/syscall.c:12448
#2  0x000000000047e958 in cpu_loop (env=env@entry=0xc97db0) at /tmp/qemu/linux-user/aarch64/cpu_loop.c:90
#3  0x000000000040affc in main (argc=<optimized out>, argv=0x7fffffffe008, envp=<optimized out>) at /tmp/qemu/linux-user/main.c:872

I think what happened is we were using glibc 2.30 which implements semop() as a syscall to semop(), but then Ubuntu Focal bumped the version to glibc 2.31 which implements semop() as a syscall to semtimedop(). This is a problem for us because qemu 3.1 (and master) implements semop() in do_syscall1(), but does not implement semtimedop(). I think the fix will have to be to make qemu implement semtimedop() in linux-user.

@clalancette
Copy link
Contributor

Nice analysis, @sloretz .

The "correct" way to fix this does seem to be to implement semtimedop in qemu. So I suggest we follow that path.

However, we may be able to do a short-term workaround as well. If we do an LD_PRELOAD inside of the qemu userland, and redirect the implementation of semop so it calls our own semop (which then calls semop syscall, instead of semtimedop), then I think we can workaround this.

@sloretz
Copy link
Contributor Author

sloretz commented Apr 7, 2020

However, we may be able to do a short-term workaround as well. If we do an LD_PRELOAD inside of the qemu userland, and redirect the implementation of semop so it calls our own semop (which then calls semop syscall, instead of semtimedop), then I think we can workaround this.

It looks like that works! Now to add it to the Dockerfile

#define _GNU_SOURCE
#include <unistd.h>
#include <asm/unistd.h>
#include <sys/syscall.h>


/* glibc 2.31 wraps semop() as a call to semtimedop() with the timespec set to NULL
 * qemu 3.1 doesn't support semtimedop(), so this wrapper syscalls the real semop()
 */
int semop(int semid, struct sembuf *sops, unsigned nsops)
{
  return syscall(__NR_semop, semid, sops, nsops);
}
gcc -fPIC -shared -o libpreload-semop.so wrap_semop.c
LD_PRELOAD=/tmp/libpreload-semop.so ./a.out

@M-Reimer
Copy link

M-Reimer commented May 3, 2020

I'm happy to have found this discussion. I'm using qemu to autobuild packages for Arch Linux ARM. I use qemu 4.2 for my builds which still seems to have this problem:

https://travis-ci.com/github/VDR4Arch/vdr4arch/jobs/326884620#L1596

I've filed an upstream bug just to see how the current status is about fixing this directly in qemu:

https://bugs.launchpad.net/qemu/+bug/1876568

Edit: Your workaround does fix my issue, so I think it is proven that qemu 4.2 still has the issue.

This is the fix I use for my build tool:
https://github.com/M-Reimer/repo-make/blob/master/repo-make-ci.sh#L252-L274

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants