-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only do PCL workarounds for x86 #73
Conversation
Hm, that is pretty weird. We need these workarounds for aarch64 as well, and they do work on my local Pine64 board. I'll take a look at this today and see if I can figure something out. |
Are you able to build the dockerfile with this invocation locally? I suspect it would have the same issue but haven't checked myself, I've just been doing investigations on CI jobs. This are the jobs I was investigating with: http://ci.ros2.org/view/All/job/test_ci_turtlebot-demo_linux-aarch64/ (they will be deleted after this issue is resolved). The first 10 jobs I was messing around with the docker cache on the host (eventually I just completely removed Realising that the cache probably wasn't the issue, jobs 11-20 I was playing around with the job configuration. In those jobs, the only ones that successfully built+ran the docker image were ones that either:
That's what led me to the conclusion that it's the dockerfile that's causing it to not build properly, not docker issues on the host. |
Hm, this is getting stranger. If I log into the packet.net server, and run:
by hand, it builds the docker image just fine. I did have to make a few edits to the Dockerfile to make this work; namely, I had to edit FROM ubuntu:xenial to be FROM aarch64/ubuntu:xenial, and I had to comment out everything having to do with RTI, but other than that it seemed to work. I'm not sure what is going on in the context of jenkins that is causing it to fail. Still looking into it. |
OK. So having the RTI stuff commented out actually materially affects this issue in ways I don't understand. If I have all of the RTI stuff in the Dockerfile, I can see the problem clearly when running by hand on the packet.net server. Oddly, if I comment out the very last ADD of the rticonnext-dds_tools debian file, then things start working again. Similarly, if I comment out one or the other of the
(it doesn't matter which one), it also starts working. I can't say I understand any of this, but that last point lends itself to a workaround; a single RUN statement that does all of the workarounds at once. I don't like it, because I don't understand the underlying issue, but that could be a workaround for now. I'm going to try that out in a CI build. |
It irks me that the issue is worked around by reducing the number of If we hit other errors that seem to be related to image layering, looking closer at aufs bugs might be worthwhile. |
@nuclearsandwich It also concerns me a lot. There is clearly something wrong, because removing some combination of ADD and RUN statements makes the thing work. Further, this only happens on aarch64, so I'm pretty convinced it is a bug in the lower layers (Docker, aufs, or the kernel). Unfortunately, I can't afford to spend another day mucking around with this, so I'd like to go with the workaround for now. |
All right, the turtlebot builds that actually use PCL work (the unstable bit of that is because of a slight packaging bug in cartographer, which I will address later). One more CI, just to check that we don't affect the "regular" jobs, then I'll open a new PR with my changes: |
Yeah that's fine, I just wanted to put a breadcrumb down before I even forgot that I once knew of errors like that. |
You might be seeing this one: ros-infrastructure/ros_buildfarm#377 |
@dirk-thomas Oh, interesting. Let me try that out. |
The magic fix didn't work. Also, looking at the linked github issue, the symptoms are different. I think we'll still have to go with my workaround for now. |
Looks like #71 broke the new turtlebot CI job for aarch64:![Build Status](https://camo.githubusercontent.com/8982af223fe23b80427b65aacd37db476757e9df0d25ce30246a7cc3db3aca21/687474703a2f2f63692e726f73322e6f72672f6275696c645374617475732f69636f6e3f6a6f623d63695f747572746c65626f742d64656d6f5f6c696e75782d61617263683634266275696c643d3132)
It's a docker mounting error (
error creating aufs mount to /var/lib/docker/aufs/mnt/25641437ee2e6f787c6877d8df5ca9441c85594145074c71f3d0ca27082f484d: invalid argument
) but it persists even with a clean cache. From testing it seems to only have issues mounting if these lines are run (e.g. ifINSTALL_TURTLEBOT2_DEMO_DEPS
is false it's fine)I don't have quite enough context to know if this is the appropriate fix (maybe these lines need to be completely reworked?), but it's one thing that works. @clalancette could you provide input please?
Turtlebot CI jobs:![Build Status](https://camo.githubusercontent.com/0eba54fa7890a2eeb3171e5503e8332eb551873b34ba8a6e873fd6f8bfe970b1/687474703a2f2f63692e726f73322e6f72672f6275696c645374617475732f69636f6e3f6a6f623d63695f747572746c65626f742d64656d6f5f6c696e7578266275696c643d3130)
![Build Status](https://camo.githubusercontent.com/b076594cf482b8d573d8168b42b53921bd126f4d82faaa4c9d677ac2dc8f67c4/687474703a2f2f63692e726f73322e6f72672f6275696c645374617475732f69636f6e3f6a6f623d63695f747572746c65626f742d64656d6f5f6c696e75782d61617263683634266275696c643d3131)
linux:
linux-aarch64: