Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osrf/ubuntu_armhf:focal fails to run on native ARM hardware #38

Open
nuclearsandwich opened this issue Apr 27, 2020 · 14 comments
Open

osrf/ubuntu_armhf:focal fails to run on native ARM hardware #38

nuclearsandwich opened this issue Apr 27, 2020 · 14 comments

Comments

@nuclearsandwich
Copy link
Member

armhf builds on ci.ros2.org currently fail in the Build Dockerfile step with an error like the following:

00:00:01.726 Step 13/58 : RUN apt-get update && apt-get install --no-install-recommends -y   lsb-release net-tools sudo   curl   gnupg2   apt-transport-https
00:00:01.865  ---> Running in eacf1551b457
00:00:02.777 Get:1 http://ports.ubuntu.com focal InRelease [265 kB]
00:00:03.149 Get:2 http://ports.ubuntu.com focal-updates InRelease [89.1 kB]
00:00:03.202 Err:1 http://ports.ubuntu.com focal InRelease
00:00:03.202   At least one invalid signature was encountered.
00:00:03.229 Get:3 http://ports.ubuntu.com focal-backports InRelease [89.2 kB]
00:00:03.289 Err:2 http://ports.ubuntu.com focal-updates InRelease
00:00:03.289   At least one invalid signature was encountered.
00:00:03.374 Err:3 http://ports.ubuntu.com focal-backports InRelease
00:00:03.374   At least one invalid signature was encountered.
00:00:03.380 Fetched 443 kB in 49710d 6h 26min 44s (0 B/s)
00:00:03.380 Reading package lists...
00:00:04.773 W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://ports.ubuntu.com focal InRelease: At least one invalid signature was encountered.
00:00:04.773 W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://ports.ubuntu.com focal-updates InRelease: At least one invalid signature was encountered.
00:00:04.773 W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://ports.ubuntu.com focal-backports InRelease: At least one invalid signature was encountered.
00:00:04.773 W: Failed to fetch http://ports.ubuntu.com/dists/focal/InRelease  At least one invalid signature was encountered.
00:00:04.773 W: Failed to fetch http://ports.ubuntu.com/dists/focal-updates/InRelease  At least one invalid signature was encountered.
00:00:04.773 W: Failed to fetch http://ports.ubuntu.com/dists/focal-backports/InRelease  At least one invalid signature was encountered.
00:00:04.773 W: Some index files failed to download. They have been ignored, or old ones used instead.

When running the image on ARM hardware such as an EC2 a1.xlarge instance operations such as touch /tmp/foo would fail with:

touch: setting times of '/tmp/foo': Operation not permitted

Running the image on an amd64 host (which is possible with binfmt and the bundled qemu-arm-static binary present on the image) does not exhibit these issues.

Using my local workstation to run the image via qemu I created an additional layer which has strace installed

FROM osrf/ubuntu_armhf:focal
RUN apt update && apt install -y strace

I pushed that to a container registry to try on my ARM host and strace the issue:

docker run -ti --cap-add SYS_PTRACE nuclearsandwich/ubuntu_armhf:focal
#  strace touch /tmp/foo
execve("/usr/bin/touch", ["touch", "/tmp/foo"], 0xffc43564 /* 10 vars */) = 0
brk(NULL)                               = 0x266b000
uname({sysname="Linux", nodename="ea0637374bb7", ...}) = 0
access("/etc/ld.so.preload", R_OK)      = 0
openat(AT_FDCWD, "/etc/ld.so.preload", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=25, ...}) = 0
mmap2(NULL, 25, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xf7b17000
close(3)                                = 0
openat(AT_FDCWD, "/opt/libpreload-semop.so", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\10\3\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=7116, ...}) = 0
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7b15000
mmap2(NULL, 69680, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xf7add000
mprotect(0xf7ade000, 61440, PROT_NONE)  = 0
mmap2(0xf7aed000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0xf7aed000
close(3)                                = 0
munmap(0xf7b17000, 25)                  = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=14689, ...}) = 0
mmap2(NULL, 14689, PROT_READ, MAP_PRIVATE, 3, 0) = 0xf7b11000
close(3)                                = 0
openat(AT_FDCWD, "/lib/arm-linux-gnueabihf/libc.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\331\252\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=972500, ...}) = 0
mmap2(NULL, 1038540, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xf79df000
mprotect(0xf7ac7000, 65536, PROT_NONE)  = 0
mmap2(0xf7ad7000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe8000) = 0xf7ad7000
mmap2(0xf7adb000, 6348, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf7adb000
close(3)                                = 0
set_tls(0xf7b15f50)                     = 0
mprotect(0xf7ad7000, 8192, PROT_READ)   = 0
mprotect(0xf7aed000, 4096, PROT_READ)   = 0
mprotect(0xb14000, 4096, PROT_READ)     = 0
mprotect(0xf7b18000, 4096, PROT_READ)   = 0
munmap(0xf7b11000, 14689)               = 0
brk(NULL)                               = 0x266b000
brk(0x268c000)                          = 0x268c000
openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK|O_LARGEFILE, 0666) = 3
dup2(3, 0)                              = 0
close(3)                                = 0
syscall_0x19c(0, 0, 0, 0, 0, 0)         = -1 EPERM (Operation not permitted)
close(0)                                = 0
write(2, "touch: ", 7touch: )                  = 7
write(2, "setting times of '/tmp/foo'", 27setting times of '/tmp/foo') = 27
write(2, ": Operation not permitted", 25: Operation not permitted) = 25
write(2, "\n", 1
)                       = 1
close(1)                                = 0
close(2)                                = 0
exit_group(1)                           = ?
+++ exited with 1 +++

I have no idea what syscall_0x19c is supposed to be when it generates the EPERM.
using strace with apt-update gives a longer, similar issue but with syscall_0x193.

#37 recently introduced an ld preload wrapper to resolve a glibc/system call issue with qemu but I've built an image without that addition and the problem remains.

I've tried the image on an ARM host running 16.04 and 18.04 but not with 20.04 yet.

@zmichaels11
Copy link

zmichaels11 commented Apr 28, 2020

I'm not sure if its related, but the APT sources differ from the ones used by arm32v7/ubuntu:focal

arm32v7/ubuntu:focal

root@8dcf34e5b8ed:/# cat /etc/apt/sources.list
# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to
# newer versions of the distribution.
deb http://ports.ubuntu.com/ubuntu-ports/ focal main restricted
# deb-src http://ports.ubuntu.com/ubuntu-ports/ focal main restricted

## Major bug fix updates produced after the final release of the
## distribution.
deb http://ports.ubuntu.com/ubuntu-ports/ focal-updates main restricted
# deb-src http://ports.ubuntu.com/ubuntu-ports/ focal-updates main restricted

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team. Also, please note that software in universe WILL NOT receive any
## review or updates from the Ubuntu security team.
deb http://ports.ubuntu.com/ubuntu-ports/ focal universe
# deb-src http://ports.ubuntu.com/ubuntu-ports/ focal universe
deb http://ports.ubuntu.com/ubuntu-ports/ focal-updates universe
# deb-src http://ports.ubuntu.com/ubuntu-ports/ focal-updates universe

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
deb http://ports.ubuntu.com/ubuntu-ports/ focal multiverse
# deb-src http://ports.ubuntu.com/ubuntu-ports/ focal multiverse
deb http://ports.ubuntu.com/ubuntu-ports/ focal-updates multiverse
# deb-src http://ports.ubuntu.com/ubuntu-ports/ focal-updates multiverse

## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
deb http://ports.ubuntu.com/ubuntu-ports/ focal-backports main restricted universe multiverse
# deb-src http://ports.ubuntu.com/ubuntu-ports/ focal-backports main restricted universe multiverse

## Uncomment the following two lines to add software from Canonical's
## 'partner' repository.
## This software is not part of Ubuntu, but is offered by Canonical and the
## respective vendors as a service to Ubuntu users.
# deb http://archive.canonical.com/ubuntu focal partner
# deb-src http://archive.canonical.com/ubuntu focal partner

deb http://ports.ubuntu.com/ubuntu-ports/ focal-security main restricted
# deb-src http://ports.ubuntu.com/ubuntu-ports/ focal-security main restricted
deb http://ports.ubuntu.com/ubuntu-ports/ focal-security universe
# deb-src http://ports.ubuntu.com/ubuntu-ports/ focal-security universe
deb http://ports.ubuntu.com/ubuntu-ports/ focal-security multiverse
# deb-src http://ports.ubuntu.com/ubuntu-ports/ focal-security multiverse

nuclearsandwich/ubuntu_armhf:focal

root@95d4d9343fb4:/# cat /etc/apt/sources.list
deb http://ports.ubuntu.com focal main restricted universe multiverse
deb http://ports.ubuntu.com focal-updates main restricted universe multiverse
deb http://ports.ubuntu.com focal-backports main restricted universe multiverse

Edit:
This seams irrelevant since changing /etc/apt/sources.list in arm32v7/ubuntu:focal causes apt-get update to perform as expected.

@zmichaels11
Copy link

I found a similar bug here: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1867675
Following the conversation, its mentioned that a new syscall was introduced in Linux 5.x and my host is running Linux 4.15
Attempting to update to Ubuntu:Focal for host

@zmichaels11
Copy link

Updating to Ubuntu:Focal on an arm64 host did not fix the issue.

@zmichaels11
Copy link

Found a workaround: add --security-opt seccomp=unconfined
root cause seems to be a Docker issue with seccomp (see: moby/moby#40734)

@thomas-moulard
Copy link

@nuclearsandwich This seems in infra issue on Open Robotics side. Is there any additional action needed on our end?

@nuclearsandwich
Copy link
Member Author

@zmichaels11 thanks for identifying a workaround. Do you happen to know if seccomp must be entirely disabled with unconfined or is there a specific set of system calls we can lift the restrictions on?

This seems in infra issue on Open Robotics side. Is there any additional action needed on our end?

I think it's back on us to find out if running with seccomp=unconfined is going to work on our farms. IDing the issue was a big help.

@zmichaels11
Copy link

I don't know what syscall is being blocked by seccomp and I also don't know if its possible to partially disable seccomp.

@clalancette
Copy link
Contributor

I've taken a look at what is going on here.

Looking at the strace output from #38 (comment) , the problem is clearly that the syscall 0x19c failed. Looking through the linked ticket, likely what is happening is that the newer libc in Ubuntu focal is calling that syscall, but the older kernel we are using on our buildfarm (probably Xenial-era) doesn't understand it. Seccomp then likely denies it.

Looking a bit deeper, I found that syscall 0x19c (decimal 412) here: https://github.com/torvalds/linux/blob/c45e8bccecaf633480d378daff11e122dfd5e96d/arch/arm64/include/asm/unistd32.h#L836 . It's the utimensat64 system call, which is used for setting the timestamp on a file. This is part of the new 64-bit time for 32-bit platform support in the kernel (to solve the year 2038 problem).

However, we don't actually care too much about year 2038 support here. We won't be running on the old Xenial kernel forever, and I hope this buildfarm will be long retired before 2038. So I think an alternate workaround here may be to do an LD_PRELOAD that captures that syscall and redirects it to the utimensat (non-64bit) system call. This may be a bit of a game of whack-a-mole, as we may have to do this for several system calls in this area, but the end result is that we won't have to disable seccomp across the board.

I'll spend a little time here trying to come up with something. If I can't finish it today, I'll leave my findings here later.

@clalancette
Copy link
Contributor

I went down a couple of blind alleys here, but here's what I have so far. First, it looks like touch uses futimens, which eventually gets around to calling utimensat_time64 as appropriate. So that's what we need to overload. Here's the code:

#include <unistd.h>
#include <asm/unistd.h>
#include <sys/syscall.h>
#include <time.h>

int futimens(int fd, const struct timespec tsp[2])
{
    return syscall(__NR_utimensat, fd, NULL, &tsp[0], 0);
}

Putting it in a file like wrap_futimens.c, we can compile it:

gcc -fPIC -shared -o wrap_futimens.so wrap_futimens.c

Then using the same procedure as

chroot $chroot_dir echo /opt/libpreload-semop.so > $chroot_dir/etc/ld.so.preload
, we can register it to overload the calls as necessary.

That's as far as I was able to get today. I haven't tested on Docker to see if it lets touch get further, but I'm running out of time and knowledge. I'll follow up with @nuclearsandwich next week and see if we can test this.

@nuclearsandwich
Copy link
Member Author

Thanks to @zmichaels11 for finding the seccomp behavior. The mystery system calls in my strace results were the new time64 system calls. The EPERM failures are still present when running the focal containers on focal but the system call names are properly resolved when being denied. Docker has an updated seccomp profile which allows these calls and moby/moby#40769 backports it to 19.03. We can deploy that profile to our agents without waiting for a release to include it without interfering with operation on other platforms and I like that solution much better than allowing containers to make arbitrary system calls. However when actually testing with the updated profile we still hit the EPERM issue

$ docker run -ti --security-opt seccomp=/tmp/seccomp-time64.json osrf/ubuntu_armhf:focal touch /tmp/foo
touch: setting times of '/tmp/foo': Operation not permitted

There is also mysterious behavior that isn't outright failure.

ubuntu@ip-10-0-2-231:~$ date
Tue May  5 13:10:55 UTC 2020
ubuntu@ip-10-0-2-231:~$ docker run osrf/ubuntu_armhf:focal date
Thu May 14 11:30:03 UTC 1970

Both of these anomalies disappear when running with seccomp=unconfined. The next step is to figure out why using the explicit profile isn't working, whether it's operator error or something missing from the profile.

@ruffsl
Copy link
Member

ruffsl commented May 18, 2020

Running the image on an amd64 host (which is possible with binfmt and the bundled qemu-arm-static binary present on the image) does not exhibit these issues.

I've tried the image on an ARM host running 16.04 and 18.04 but not with 20.04 yet.

Alternatively, would it be possible to just stick with using amd64 host for the docker daemon + static QEMU installed, or update the build farm workers for ARM to use the same kernel version as the targeted LTS?

https://discourse.ros.org/t/announcing-ros-docker-images-for-arm-and-debian/2467

@matthews-jca
Copy link

Just piping up to note that this bug is effecting our ability to port existing code and work on new projects where we are running ROS2 under arm32 (and arm64, amd64) embedded hardware platforms. Is their any known workarounds?

jacobperron added a commit to ros2/ros_buildfarm_config that referenced this issue Jun 22, 2020
This can be reenabled when build issues with Ubuntu Focal on armhf are resolved.
See osrf/multiarch-docker-image-generation#38.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
jacobperron added a commit to ros2/ros_buildfarm_config that referenced this issue Jun 22, 2020
This can be reenabled when build issues with Ubuntu Focal on armhf are resolved.
See osrf/multiarch-docker-image-generation#38.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
@AchmadFathoni
Copy link

@matthews-jca downgrade to Ubuntu 18.04 and install ros2 eloquent. It's been few months, do you have better workaround?

@matthews-jca
Copy link

Unfortunately no workaround yet. We are looking at doing our own builds internally for foxy from a debian 10 base in the next couple months

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants