Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest YCM devel of robotology-superbuild on Travis with Docker blocks during bootstrap on "Performing build step for 'YCM'' #254

Closed
traversaro opened this issue May 3, 2019 · 27 comments

Comments

@traversaro
Copy link
Member

See https://travis-ci.org/robotology/robotology-superbuild/builds/527715806 . Until yesterday, everything went fine. I already restarted a job, and the same problem appeared again.

I wonder if the problem is related to this commit: 912e0d6

@traversaro
Copy link
Member Author

Apparently the problem is there also with v0.10.2 (see https://travis-ci.org/robotology/robotology-superbuild/builds/527850737?utm_source=github_status&utm_medium=notification), so the problem is probably some weird Travis problem in accessing one of the remote repos used in the YCM bootstrap (ref #105).

traversaro added a commit to robotology/robotology-superbuild that referenced this issue May 3, 2019
This commit is just to debug on Travis robotology/ycm-cmake-modules#254
@traversaro
Copy link
Member Author

I am unable to reproduce the problem on my local machine.

@traversaro
Copy link
Member Author

traversaro commented May 3, 2019

I enabled YCM_BOOSTRAP_VERBOSE in a test build (robotology/robotology-superbuild#201) and apparently the problem is some strange interaction of CMake's git repo (even if apparently we are using a GitHub-based mirror) and Travis. The line

 [ 16%] Downloading file Copyright.txt from CMake git repository (ref v3.14.3)

took ~2/3 minutes, and now the build seems stuck to:

[ 17%] Downloading file Modules/BasicConfigVersion-SameMajorVersion.cmake.in from CMake git repository (ref v3.14.3)

see https://travis-ci.org/robotology/robotology-superbuild/jobs/527863239 .

After a while, download this file result in Timeout:

[ 17%] Downloading file Modules/BasicConfigVersion-SameMajorVersion.cmake.in from CMake git repository (ref v3.14.3)
-- Cannot download file https://raw.githubusercontent.com/Kitware/CMake/v3.14.3/Modules/BasicConfigVersion-SameMajorVersion.cmake.in
  Network problem or not existing file.
  CMake Error at /home/travis/build/robotology/robotology-superbuild/build/robotology/YCM/cmake-next/CMakeFiles/cmake-v3.14.3.dir/ycm_download_Modules_BasicConfigVersion_SameMajorVersion_cmake_in_real.cmake:9 (file):
  file DOWNLOAD HASH mismatch
    for file: [/home/travis/build/robotology/robotology-superbuild/build/robotology/YCM/cmake-next/CMakeFiles/cmake-v3.14.3.dir/downloads/Modules/BasicConfigVersion-SameMajorVersion.cmake.in]
      expected hash: [f89dc13283012256d1cb702ffb0bc235cee5cc5a]
        actual hash: [da39a3ee5e6b4b0d3255bfef95601890afd80709]
             status: [28;"Timeout was reached"].
  Retrying.
[ 18%] Downloading file Modules/BasicConfigVersion-SameMinorVersion.cmake.in from CMake git repository (ref v3.14.3)

@traversaro
Copy link
Member Author

traversaro commented May 3, 2019

Related tweets:

Twitter
@TravisCI getting a lot of "No output has been received in the last 10m0s" as of this morning. Any elevated error rates? seems to be happening at different stages of my builds.”
Twitter
“anyone else having travis CI issues since mid-day yesterday? all of our builds since then are failing with timeout errors after ~20 minutes 😬️”

@beaugunderson
Copy link

i'm in contact with travis support and will let y'all know if i hear anything useful from them :)

@traversaro
Copy link
Member Author

I tried replicating the issue on ycm's Travis, but apparently everything is working correctly there. Probably the issue is related to the fact the the robotology-superbuild Travis build is working inside a docker container.

@traversaro
Copy link
Member Author

i'm in contact with travis support and will let y'all know if i hear anything useful from them :)

Thanks a lot @beaugunderson !

@beaugunderson
Copy link

if you want to contact their support as well you can reference our support ticket so they can cross-reference it; it's 6994

@zak10
Copy link

zak10 commented May 3, 2019

I'm not familiar at all with this repository, but my builds have also been getting stuck and timing out. Strangely, it happens during different stages and it is only for one of my repositories. I reran builds from old commits that were fine and those are also breaking so assuming this is a travis problem.

Running a python 3.6 container, tests via docker compose.

@beaugunderson
Copy link

beaugunderson commented May 3, 2019

fwiw we also use docker/docker-compose in our tests...

edit: verified that we're still pinned to the same docker-compose version, and docker-ce also has an identical version between old successful and now failing builds (18.06.3~ce~3-0~ubuntu)

@zak10
Copy link

zak10 commented May 3, 2019

@beaugunderson do your tests run in multiple threads?

@beaugunderson
Copy link

beaugunderson commented May 3, 2019

@zak10 yup, we use gnu parallel

edit: and also pytest + multiple xdist workers for the python tests that run inside one of the containers, if that's what you were asking

@alex
Copy link

alex commented May 4, 2019

I'm also experiencing this issue -- also in docker builds.

@traversaro traversaro changed the title Latest YCM devel on Travis of robotology-superbuild blocks during Bootstrap on "Performing build step for 'YCM'' Latest YCM devel of robotology-superbuild on Travis with Docker blocks during bootstrap on "Performing build step for 'YCM'' May 4, 2019
@zak10
Copy link

zak10 commented May 4, 2019

We're using pytest + xdist to run tests in parallel. I'm attempting now to remove xdist completely to see if that alleviates the issue.

edit: I was unable to alleviate any issues and now it seems the issue has spread to a second repository with a similar configuration. Still no word from travis - I think it may be time for me to pull the trigger and switch to a different CI provider.

@beaugunderson
Copy link

still no response from travis since last thursday :(

@beaugunderson
Copy link

currently trying to force dist: xenial and see if that has any effect, will report back...

@beaugunderson
Copy link

beaugunderson commented May 6, 2019

Hi Beau,

Thank you for getting in touch, and we apologize for the frustration caused by this ongoing issue with slow or stalling downloads from within Docker containers.

We’ve been receiving similar reports since late on Friday and have proceeded to open an incident to acknowledge the situation here: https://www.traviscistatus.com/incidents/kyf149kl6bvp. We'll be posting updates to the status as we continue to investigate this issue.

In the meantime, you might be able to stabilize your builds and prevent having to manually restart them by adding travis_retry to your failing commands e.g.

travis_retry docker build [...]

This will effectively retry the command up to 3 times if it fails.

Once again we apologize for the inconvenience caused as we work diligently towards a resolution. In the meantime, if you have any questions or concerns, please do not hesitate to get in touch. We'd be happy to assist.

Travis CI's Status Page - Slow or stalling downloads inside Docker container.

@beaugunderson
Copy link

i feel like this fundamentally misunderstands the issue, sadly

@beaugunderson
Copy link

dist: xenial did not help, FYI

@zak10
Copy link

zak10 commented May 6, 2019

yep - got the same canned response from them. at least they're acknowledging the issue now. unfortunately for them, I've already begun the process of migrating to Circle.

best of luck!

@traversaro
Copy link
Member Author

traversaro commented May 6, 2019

I tried the --network=host workaround mentioned in https://www.traviscistatus.com/incidents/kyf149kl6bvp in https://travis-ci.org/robotology/robotology-superbuild/jobs/528940445, and it seems to be working.

Travis CI's Status Page - Slow or stalling downloads inside Docker container.

@beaugunderson
Copy link

beaugunderson commented May 6, 2019

awesome @traversaro; i'm trying that now as well (slightly more complicated with docker-compose, but it seems like network_mode: ${NETWORK_MODE:-bridge} inside the build: option would work, and then specifying NETWORK_MODE=host for travis builds)

@beaugunderson
Copy link

this is more of a pain than i thought, especially if you already use networks...

@traversaro
Copy link
Member Author

Related issue: ros-industrial/industrial_ci#364 .

@beaugunderson
Copy link

travis is claiming that this issue is fixed but we're still getting 100% failures... around the time they were testing fixes we got a few passes, so it certainly improved for a small period of time, but now it's broken again and i'm having a hard time getting their support to do anything about it

@traversaro
Copy link
Member Author

traversaro commented May 8, 2019

On our side, in the last day all the build that were affected by that (see https://travis-ci.org/robotology/robotology-superbuild/jobs/529722072) are working correctly. Given @beaugunderson input, probably it make sense to monitor a bit more for possible failures.

@traversaro
Copy link
Member Author

On our side, we did not observed Travis failures related to this in the past week. I think we can close the issue, at least for what concern YCM and its bootstrap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants