Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Docker images on GH Actions #33099

Closed
mkoeppe opened this issue Dec 30, 2021 · 38 comments
Closed

Build Docker images on GH Actions #33099

mkoeppe opened this issue Dec 30, 2021 · 38 comments

Comments

@mkoeppe
Copy link
Contributor

mkoeppe commented Dec 30, 2021

As suggested in https://groups.google.com/g/sage-devel/c/jIwovtSXtdw/m/2HR5RraTCwAJ, we set up a GH Actions workflow for building Docker images, replacing the GitLab workflow.

This ticket is focused on replicating the features of the GitLab workflow, in particular the incremental build and the distinction of sagemath and sagemath-dev builds.

See also:

Depends on #29784

CC: @saraedum @koffie @dimpase @tobiasdiez

Component: docker

Author: Matthias Koeppe, ...

Branch/Commit: u/mkoeppe/build_docker_images_on_gh_actions @ a4f05e2

Issue created by migration from https://trac.sagemath.org/ticket/33099

@mkoeppe mkoeppe added this to the sage-9.5 milestone Dec 30, 2021
@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 30, 2021

Dependencies: #29784

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 30, 2021

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 30, 2021

Commit: 5e0ae74

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 30, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

5e0ae74.github/workflows/docker.yml: New

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 30, 2021

comment:4

This gets me to:

Step 30/50 : RUN git fetch "$HOME/sage-context" HEAD     && if [ -e docker/.commit ]; then           git reset `cat docker/.commit`           || ( echo "Could not find commit `cat docker/.commit` in your local Git history. Please merge in the latest built develop branch to fix this: git fetch trac && git merge `cat docker/.commit`." && exit 1 )        else           echo "You are building from $ARTIFACT_BASE which has no docker/.commit file. That's a bug unless you are building from source-clean or something similar."           && git reset FETCH_HEAD           && git checkout -f FETCH_HEAD;        fi     && git merge --ff-only FETCH_HEAD     && git log -1 --format=%H > docker/.commit     && rm -rf .git
 ---> Running in f1263c0d1624
warning: reject HEAD because shallow roots are not allowed to be updated
fatal: Could not parse object 'fbca269f627bf6a8bc6f0a611ed7e26260ebc994'.

(https://github.com/mkoeppe/sage/runs/4669053076?check_suite_focus=true)

Help welcome

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 30, 2021

Author: Matthias Koeppe, ...

@koffie
Copy link

koffie commented Dec 30, 2021

comment:5

The problem is clearly this fetch. The fetch occurs between ( $HOME/sage-context and SAGE_ROOT=/home/sage/sage ) come from. SAGE_ROOT is the sage repo that was previously build (i.e. the $ARTIFACT_BASE which I'm guessing is sagemath/sagemath-dev:develop at this point). And the code of the github branch that triggered this action should be in $HOME/sage-context if I am not mistaken. Either of these repositories is shallow (and I am not 100% sure which one, but I am guessing $HOME/sage-context) and the rest of this is written with that in mind.

Before attempting to fetch from sage-context we should first test whether it is shallow by setting that repo as our WORKDIR and do

git rev-parse --is-shallow-repository

If I am correct this should evaluate to True. If so we should unshallow it. My preferred way would be to make sure that we get an unshallow repo by using by passing the correct arguments to the actions/checkout@v2 command, however I don't know whether that is possible. Otherwise we should unshallow the relevant repository first by fetching all of the history using:

git fetch --unshallow

This unshallowing will of course only work if the shallow clone has a remote configured that is not shallow, so if unshallowing fails we should probably first set the remote of the sage-context to something sensible (I guess preferrably the github mirror since we are on github anyway).

Hope this helps.

p.s. this entire source-from-context target in the Dockerfile feels super hacky, and I am wondering whether build times can be kept down to something reasonable with other means.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 30, 2021

comment:6

Thanks. I'm trying if setting the fetch-depth parameter helps in https://github.com/mkoeppe/sage/runs/4670244759?check_suite_focus=true

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 30, 2021

Changed commit from 5e0ae74 to cd9ec81

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 30, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

cd9ec81.github/workflows/docker.yml: Set fetch-depth

@koffie
Copy link

koffie commented Dec 30, 2021

comment:8

Ok, if the value of 5000 is not enough then 2147483647 = 231 - 1 is the largest value that git can still handle.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 30, 2021

comment:9

The error is cleared but it looks like it is building the version that comes with the latest published sagemath-dev image (some 9.3.beta)

@koffie
Copy link

koffie commented Dec 30, 2021

comment:10

I'm thinking that this is to be expected because according to: https://hub.docker.com/r/sagemath/sagemath-dev/tags?page=1

Julian (saraedum) only pushed sage 9.4 to the latest and the 9.4 tags and not the develop tag when he manually updated the docker images while our gitlab CI was broken. I think this is not a problem caused by the CI code, but should be fixable by pushing the correct build to docker hub once manually to bootstrap the entire process.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 30, 2021

comment:11

Well that's the correct base but it should surely be building the current sources!

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 30, 2021

Changed commit from cd9ec81 to 931926d

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 30, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

931926d.github/workflows/docker.yml: Also set CI_COMMIT_SHA

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 31, 2021

comment:13

This whole patching business does not work:

Step 32/50 : RUN if git status --porcelain | read CHANGES; then         git -c user.name=docker-build -c user.email=docker-build@sage.invalid stash -u         && git stash show -p > "$HOME"/sage-context.patch;     else         touch "$HOME"/sage-context.patch;     fi
 ---> Running in 5a7e47032090
Removing intermediate container 5a7e47032090
 ---> ac803f77917f
Step 33/50 : WORKDIR $SAGE_ROOT
 ---> Running in e5f79bbb0c52
Removing intermediate container e5f79bbb0c52
 ---> f51686240290
Step 34/50 : RUN patch -p1 --verbose < "$HOME"/sage-context.patch
 ---> Running in 8973a252aee1
Hmm...  I can't seem to find a patch in there anywhere.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 31, 2021

Changed commit from 931926d to 86eca94

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 31, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

86eca94docker/Dockerfile: Use patch --verbose

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 31, 2021

Changed commit from 86eca94 to be139ca

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 31, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

be139ca.github/workflows/docker.yml: Try with fetch-depth=0

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 31, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

ab0c87ddocker/Dockerfile: Use V=0 with make
58d3b53docker/Dockerfile: Install gettext for bootstrap
4f03c06docker/Dockerfile: Use DEBIAN_FRONTEND=noninteractive with apt-get install

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 31, 2021

Changed commit from be139ca to 4f03c06

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 31, 2021

comment:17

OK, with fetch-depth=0 it seems to be working

Step 30/50 : RUN git fetch "$HOME/sage-context" HEAD     && if [ -e docker/.commit ]; then           git reset `cat docker/.commit`           || ( echo "Could not find commit `cat docker/.commit` in your local Git history. Please merge in the latest built develop branch to fix this: git fetch trac && git merge `cat docker/.commit`." && exit 1 )        else           echo "You are building from $ARTIFACT_BASE which has no docker/.commit file. That's a bug unless you are building from source-clean or something similar."           && git reset FETCH_HEAD           && git checkout -f FETCH_HEAD;        fi     && git merge --ff-only FETCH_HEAD     && git log -1 --format=%H > docker/.commit     && rm -rf .git
 ---> Running in 8fdb2d75b70f
From /home/sage/sage-context
 * branch                  HEAD       -> FETCH_HEAD

It took 2.59 seconds to enumerate unstaged changes after reset.  You can
use '--quiet' to avoid this.  Set the config setting reset.quiet to true
to make this the default.
Updating fbca269f62..4f03c06d41
Fast-forward
 .github/workflows/ci-cygwin-minimal.yml            |    48 +-
 .github/workflows/ci-cygwin-standard.yml           |    48 +-
 .github/workflows/docker.yml                       |    39 +
 .github/workflows/extract-sage-local.sh            |     4 +-
 .github/workflows/lint.yml                         |    16 +-
 ...

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 31, 2021

comment:18

And now I'm running into what this warning warns about:

# Warning: If you set ARTIFACT_BASE to something else than source-clean, the   #
# build is not going to use the build-time-dependencies target but rely on     #
# whatever tools are installed in ARTIFACT_BASE.                               #

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 31, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

d1db07edocker/Dockerfile (make-build): Make sure build deps are present
a4f05e2.github/workflows/docker.yml: Add deployment step

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 31, 2021

Changed commit from 4f03c06 to a4f05e2

@tobiasdiez
Copy link
Contributor

comment:20

Why is this whole patching business needed in the first place? Installing relevant system packages + compiling all other packages + building sage should be below the github action timelimit, so a fresh build without caching upon every push to develop should work.

I would also suggest to use the docker action to build and publish the image: https://github.com/marketplace/actions/build-and-push-docker-images

@koffie
Copy link

koffie commented Dec 31, 2021

comment:21

I heard from Julian that right now the docker build process takes 8 hours on his computer due to #32998 . I am all for simplifying the docker file so that it fits within the github actions CI timeout.

The max timeout for a single job on github actions is 6 hours so if #32998 is fixed this should be possible. If that is not enough, we could have multiple jobs for different parts of the build process split up according to different docker targets, pushing the different stages separately to dockerhub.

Also right now the Dockerfile installs almost no system packages iirc, so it could also be sped up by using more of those instead of building everything ourselves.

@dimpase
Copy link
Member

dimpase commented Dec 31, 2021

comment:22

what is the point here of using fewer system packages than possible?

@koffie
Copy link

koffie commented Dec 31, 2021

comment:23

Replying to @dimpase:

what is the point here of using fewer system packages than possible?

The only thing I know of why, is cause of this discussion #32998 comment:5 , which you are probably also aware of.

Personally I don't care how many system packages we use, and only care about efficient and maintainable docker file. So I am all in favor of using more system packages, especially if that makes it easier to fix this issue.

Sadly enough I don't have time myself to work on this right now aside from leaving some comments that are hopefully useful.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 31, 2021

comment:24

Replying to @tobiasdiez:

Why is this whole patching business needed in the first place? Installing relevant system packages + compiling all other packages + building sage should be below the github action timelimit, so a fresh build without caching upon every push to develop should work.

To my understanding, this is designed for providing very fast incremental builds so that all branches on trac can be tested with it.

Of course for just building the Docker images for release tags, a much simpler workflow would do the job.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 31, 2021

comment:25

Replying to @koffie:

The max timeout for a single job on github actions is 6 hours so if #32998 is fixed this should be possible.

Yes, we know that of course from our existing portability CI on GH Actions. ubuntu-focal-minimal (https://github.com/sagemath/sage/runs/4497390653?check_suite_focus=true) is around 6 hours and sometimes times out, but with system packages installed, ubuntu-focal-standard (https://github.com/sagemath/sage/runs/4497390666?check_suite_focus=true) finishes within 3.5 hours.

@mkoeppe

This comment has been minimized.

@tobiasdiez
Copy link
Contributor

comment:27

Replying to @mkoeppe:

Replying to @tobiasdiez:

Why is this whole patching business needed in the first place? Installing relevant system packages + compiling all other packages + building sage should be below the github action timelimit, so a fresh build without caching upon every push to develop should work.

To my understanding, this is designed for providing very fast incremental builds so that all branches on trac can be tested with it.

Of course for just building the Docker images for release tags, a much simpler workflow would do the job.

While I'm very much in favor of having a github action that runs on all branches, I'm not sure if docker is the right technology for it. Especially in view of the recent progress in the build script it should be easier to reuse partial build outputs from the develop branch also for other branches. For example, one could cache sage-local using the built-in github action cache (with hashing the package directory to invalidate the cache once the packages change). I think this is easier to maintain than the ingenious but somewhat hacky git-diff code in the docker build.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Jan 1, 2022

comment:28

Well, if the workflow does not use docker, then it does not create Docker images, which are what you need in #32749.

@saraedum
Copy link
Member

saraedum commented Jan 3, 2022

comment:29

As the original author of these Dockerfile hacks…

If we can retain the possibility to create a docker image for every branch that could be quite beneficial at it would allow people to play with code changes in a review on binder instead of having to rebuild locally. However, we never integrated a button for this in trac, even though all the other infrastructure for it exists.

I am not sure if this is enough of a reason to repair these hacks. It's more important to have working docker images again than to have a feature that nobody is using anyway.

@mkoeppe mkoeppe removed this from the sage-9.5 milestone Jan 5, 2022
@mkoeppe mkoeppe added the pending label Jan 5, 2022
@mkoeppe
Copy link
Contributor Author

mkoeppe commented Oct 28, 2023

Closing as completed in #36047, #36385.

@mkoeppe mkoeppe closed this as completed Oct 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants