New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osrf/ubuntu_armhf:focal fails to run on native ARM hardware #38
Comments
|
I'm not sure if its related, but the APT sources differ from the ones used by arm32v7/ubuntu:focal nuclearsandwich/ubuntu_armhf:focal Edit: |
|
I found a similar bug here: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1867675 |
|
Updating to Ubuntu:Focal on an arm64 host did not fix the issue. |
|
Found a workaround: add |
|
@nuclearsandwich This seems in infra issue on Open Robotics side. Is there any additional action needed on our end? |
|
@zmichaels11 thanks for identifying a workaround. Do you happen to know if seccomp must be entirely disabled with unconfined or is there a specific set of system calls we can lift the restrictions on?
I think it's back on us to find out if running with seccomp=unconfined is going to work on our farms. IDing the issue was a big help. |
|
I don't know what syscall is being blocked by seccomp and I also don't know if its possible to partially disable seccomp. |
|
I've taken a look at what is going on here. Looking at the strace output from #38 (comment) , the problem is clearly that the syscall 0x19c failed. Looking through the linked ticket, likely what is happening is that the newer libc in Ubuntu focal is calling that syscall, but the older kernel we are using on our buildfarm (probably Xenial-era) doesn't understand it. Seccomp then likely denies it. Looking a bit deeper, I found that syscall 0x19c (decimal 412) here: https://github.com/torvalds/linux/blob/c45e8bccecaf633480d378daff11e122dfd5e96d/arch/arm64/include/asm/unistd32.h#L836 . It's the However, we don't actually care too much about year 2038 support here. We won't be running on the old Xenial kernel forever, and I hope this buildfarm will be long retired before 2038. So I think an alternate workaround here may be to do an LD_PRELOAD that captures that syscall and redirects it to the I'll spend a little time here trying to come up with something. If I can't finish it today, I'll leave my findings here later. |
|
I went down a couple of blind alleys here, but here's what I have so far. First, it looks like Putting it in a file like Then using the same procedure as
That's as far as I was able to get today. I haven't tested on Docker to see if it lets |
|
Thanks to @zmichaels11 for finding the seccomp behavior. The mystery system calls in my strace results were the new time64 system calls. The EPERM failures are still present when running the focal containers on focal but the system call names are properly resolved when being denied. Docker has an updated seccomp profile which allows these calls and moby/moby#40769 backports it to 19.03. We can deploy that profile to our agents without waiting for a release to include it without interfering with operation on other platforms and I like that solution much better than allowing containers to make arbitrary system calls. However when actually testing with the updated profile we still hit the EPERM issue There is also mysterious behavior that isn't outright failure. Both of these anomalies disappear when running with seccomp=unconfined. The next step is to figure out why using the explicit profile isn't working, whether it's operator error or something missing from the profile. |
Alternatively, would it be possible to just stick with using amd64 host for the docker daemon + static QEMU installed, or update the build farm workers for ARM to use the same kernel version as the targeted LTS? https://discourse.ros.org/t/announcing-ros-docker-images-for-arm-and-debian/2467 |
|
Just piping up to note that this bug is effecting our ability to port existing code and work on new projects where we are running ROS2 under arm32 (and arm64, amd64) embedded hardware platforms. Is their any known workarounds? |
This can be reenabled when build issues with Ubuntu Focal on armhf are resolved. See osrf/multiarch-docker-image-generation#38. Signed-off-by: Jacob Perron <jacob@openrobotics.org>
This can be reenabled when build issues with Ubuntu Focal on armhf are resolved. See osrf/multiarch-docker-image-generation#38. Signed-off-by: Jacob Perron <jacob@openrobotics.org>
|
@matthews-jca downgrade to Ubuntu 18.04 and install ros2 eloquent. It's been few months, do you have better workaround? |
|
Unfortunately no workaround yet. We are looking at doing our own builds internally for foxy from a debian 10 base in the next couple months |
armhf builds on ci.ros2.org currently fail in the Build Dockerfile step with an error like the following:
When running the image on ARM hardware such as an EC2 a1.xlarge instance operations such as
touch /tmp/foowould fail with:Running the image on an amd64 host (which is possible with binfmt and the bundled qemu-arm-static binary present on the image) does not exhibit these issues.
Using my local workstation to run the image via qemu I created an additional layer which has strace installed
I pushed that to a container registry to try on my ARM host and strace the issue:
I have no idea what syscall_0x19c is supposed to be when it generates the EPERM.
using strace with apt-update gives a longer, similar issue but with syscall_0x193.
#37 recently introduced an ld preload wrapper to resolve a glibc/system call issue with qemu but I've built an image without that addition and the problem remains.
I've tried the image on an ARM host running 16.04 and 18.04 but not with 20.04 yet.
The text was updated successfully, but these errors were encountered: