Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

single dockerfile and single step in the docker building CI #863

Merged
merged 39 commits into from
May 7, 2023

Conversation

shimwell
Copy link
Member

@shimwell shimwell commented Mar 2, 2023

Description

This PR is an alternative to #822
In addition to moving the scripts into a single dockerfile this PR also changes the CI for publishing dockerfiles.
Currently we had a branching and caching approach to the CI dockerfile building that is efficient but complex.
This approach reduces the complexity of the CI and allows all the perturbations to run in parallel.

This might actually end up being quicker to build and quicker to fail when something is broken as the CI doesn't have breaks waiting for all the docker stages to reach the same stage to proceed to the next stage. Also if caching can be figured out then this workflow could make use of pre-built images to be very fast

Motivation and Context

simpler is nice for maintainers 😄

Changes

refactored CI

Behavior

single dockerfile instead of scripts and simpler CI

@shimwell
Copy link
Member Author

shimwell commented Mar 2, 2023

Action is running on my branch with 24 jobs in parallel
https://github.com/shimwell/DAGMC/actions/runs/4315972934
Screenshot from 2023-03-02 16-13-40

@shimwell
Copy link
Member Author

shimwell commented Mar 2, 2023

All jobs on my fork CI passed, it took 3 hours 28mins

Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this started @shimwell.

I think it went a little to far/too fast and has lost the narrative of what we want to do:

  1. build docker images to be used by CI later on that only have appropriate dependencies - up to moab (this part can proceed in parallel)
  2. push those images to the registry to be used later using a temporary tag
  3. build and test DAGMC using those images as the container (this part can proceed in parallel)
  4. rename the docker images in the registry to a more stable tag

BONUS: figure out how to populate a cache for the first docker build using the existing docker images available in the registry. (separate PR?)

- name: Set up QEMU
uses: docker/setup-qemu-action@v1
uses: docker/setup-qemu-action@v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need QEMU

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QEMU has been removed from the yml file so I think this is resolved

with:
file: CI/Dockerfile
target: external_deps
target: dagmc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recall that this should only build the images to be used in CI for building/testing DAGMC.

We should only build to moab, and then test using this image for building DAGMC as we will later be doing in CI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server-stage of multi stage build is set to moab so I think this is now resolved

uses: actions/checkout@v3
MOAB=${{ matrix.moab_versions }}
build_mw_reg_tests=ON
tags: ghcr.io/${{ github.repository_owner }}/dagmc-ci-ubuntu-${{ matrix.ubuntu_versions }}-${{ matrix.compiler}}-ext-hdf5_${{ matrix.hdf5_versions}}-moab_${{ matrix.moab_versions }}-dagmc:ci_testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is push: false then tags don't matter, but I think we should push.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushing now taken care of by the multistage docker build action so I think this is resolved

- name: Set up QEMU
uses: docker/setup-qemu-action@v1
if: ${{ github.repository_owner == 'svalinn' }}
uses: docker/setup-qemu-action@v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also don't need QEMU here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QEMU has been removed from the yml file so I think this is resolved


build-housekeeping-img:
needs: build-base-img
build-test-dagmc-img:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename based on comments below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has now been renamed to build-dependency-img so I think this is resolved

.github/workflows/docker_publish.yml Show resolved Hide resolved
@shimwell
Copy link
Member Author

shimwell commented Mar 3, 2023

Had a go at all those comments and pushed, CI is running on my fork https://github.com/shimwell/DAGMC/actions/runs/4325712183

@shimwell
Copy link
Member Author

shimwell commented Mar 3, 2023

CI on fork passed https://github.com/shimwell/DAGMC/actions/runs/4325712183

2nd stage only took a few seconds so the local stage cache appears to have worked.

This used the local stage cache instead of the container repo. But the layer has the same tag locally and I'm the container repo.

Perhaps this is not what we want

Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely looks simpler and probably easier to maintain over the long term. I think there is still a discussion point about how to use the docker image cache.

Comment on lines 37 to 38
OFF,
ON,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't appear to be passing this into the docker build or the tag naming and therefore have duplicate images being built and tagged. We should discuss whether we really need the statically linked version. Not sure I remember who requires/desires that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_exe has been removed from the matrix so I think this is resolved

- name: Set up QEMU
uses: docker/setup-qemu-action@v1
build_mw_reg_tests=ON
dependency_image_location=ghcr.io/${{ github.repository_owner }}/dagmc-ci-ubuntu-${{ matrix.ubuntu_versions }}-${{ matrix.compiler}}-ext-hdf5_${{ matrix.hdf5_versions}}-moab_${{ matrix.moab_versions }}:ci_testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to see this working, but wonder if we are better off with a different strategy that uses the cache instead? something to discuss.

Copy link
Member Author

@shimwell shimwell Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependency_image_location has now been removed and the PR has moved to a multistage build so I think this is resolved

Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of additional comments

@@ -141,14 +63,17 @@ jobs:
UBUNTU_VERSION=${{ matrix.ubuntu_versions }}
COMPILER=${{ matrix.compiler }}
HDF5=${{ matrix.hdf5_versions }}
tags: ghcr.io/${{ github.repository_owner }}/dagmc-ci-ubuntu-${{ matrix.ubuntu_versions }}-${{ matrix.compiler}}-ext-hdf5_${{ matrix.hdf5_versions}}:ci_testing
MOAB=${{ matrix.moab_versions }}
build_mw_reg_tests=ON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we want this off?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this build arg has been removed from the yml and dockerfile so I think this is now resolved

steps:
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
build_mw_reg_tests=ON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want this off?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this build arg has been removed from the yml and dockerfile so I think this is now resolved

with:
file: CI/Dockerfile
target: moab
target: dagmc_test
context: .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we may not need this line in this case. The action documentation indicates that the right version of the repository will be checked out in the Docker image by default. This isn't true in the case of our dependencies though, so I can see why we'd need it there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line has been removed from the yml file so I think this one is resolved

@bquan0
Copy link
Contributor

bquan0 commented Apr 20, 2023

I just opened a PR to the branch this PR is on to replace the docker build actions with the multistage build action.

Use multistage docker build action
@gonuke
Copy link
Member

gonuke commented Apr 24, 2023

@shimwell - this appears to need a rebase...

@gonuke
Copy link
Member

gonuke commented Apr 24, 2023

This also failed here (https://github.com/shimwell/DAGMC/actions/runs/4752350398/jobs/8442595961) - apparently it failed to push to GHCR? and it was successful for @bquan0 here (https://github.com/bquan0/DAGMC/actions/runs/4749240986/jobs/8436860572) but was based on cached version??

@bquan0
Copy link
Contributor

bquan0 commented Apr 25, 2023

I ran into that problem a few times on my workflow runs too and I mentioned it in the PR. I solved it by going to the settings of the package it was trying to push to, then checking the DAGMC repo under the "Manage Actions access" section.

@gonuke
Copy link
Member

gonuke commented Apr 30, 2023

Another PR for some final cleanup

@gonuke
Copy link
Member

gonuke commented Apr 30, 2023

Another PR for some final cleanup

Github actions for this PR are successful

Two small changes to clean things up a little further
@shimwell
Copy link
Member Author

Thanks Paul, I've merged that in.

@gonuke
Copy link
Member

gonuke commented Apr 30, 2023

Confirmation of success here

Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is in good shape and is a useful model for many projects going forward. I'd love another review since I had a hand in authoring this in the end

@gonuke gonuke requested a review from pshriwise May 4, 2023 13:06
Copy link
Member

@pshriwise pshriwise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small line comments from me, but this looks really nice! I was expecting a much larger changeset to accomplish this.

Q: Which of the stages gets uploaded in the end?

.github/workflows/docker_publish.yml Outdated Show resolved Hide resolved
.github/workflows/docker_publish.yml Outdated Show resolved Hide resolved
src: ghcr.io/${{ github.repository_owner }}/dagmc-ci-ubuntu-${{ matrix.ubuntu_versions }}-${{ matrix.compiler}}-ext-hdf5_${{ matrix.hdf5_versions}}-moab_${{ matrix.moab_versions }}/dagmc:refs_heads_${{ github.ref_name }}-bk0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the bk0 suffix specific to something or arbitrary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, but I see bk0 appears in two places. perhaps @gonuke or @bquan0 know where that suffix came from

Copy link
Member

@gonuke gonuke May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its generated by the multistage-build-action - not sure why the author uses that string.

CI/Dockerfile Show resolved Hide resolved
CI/Dockerfile Outdated Show resolved Hide resolved
Co-authored-by: Patrick Shriwise <pshriwise@gmail.com>
@gonuke
Copy link
Member

gonuke commented May 4, 2023

Q: Which of the stages gets uploaded in the end?

Every stage that is explicitly referenced in a multistage-docker-build-action is pushed to the repo:

  • initially: base, external_deps, hdf5, moab
  • during testing: dagmc, dagmc_test

They each get pushed with the custom tag and then we convert push the dagmc stage with tags stable and latest as well, but only when running from the svalinn repo.

Co-authored-by: Paul Wilson <paul.wilson@wisc.edu>
@pshriwise
Copy link
Member

I'd love to approve and merge this, but there a runner isn't picking up the Mac testing job unfortunately.

@gonuke
Copy link
Member

gonuke commented May 5, 2023

Saw that...☹️

@gonuke
Copy link
Member

gonuke commented May 6, 2023

Github has turned off the MacOS 10.15 runners, so I made this PR to move us forward.

@gonuke
Copy link
Member

gonuke commented May 7, 2023

This is still passing here: https://github.com/shimwell/DAGMC/actions/runs/4904019862

Copy link
Member

@pshriwise pshriwise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to utilizing these improvements in future PRs!! Thanks @gonuke and @shimwell for all the detective work 🤩 📈

@pshriwise pshriwise merged commit 8af1efe into svalinn:develop May 7, 2023
@gonuke
Copy link
Member

gonuke commented May 7, 2023

Thanks for the teamwork @shimwell @bquan0 & @pshriwise !

@shimwell
Copy link
Member Author

shimwell commented May 7, 2023

Delighted this one got in, a real team effort. Nice work all. What shall we do next 😁

@pshriwise
Copy link
Member

@shimwell
Copy link
Member Author

shimwell commented May 7, 2023

posting error message here so we don't lose it when the CI gets old

[ 55%] Building CXX object test/CMakeFiles/mbcn_test.dir/mbcn_test.cpp.o
Installing collected packages: pymoab
  Found existing installation: pymoab 5.4.1
    Can't uninstall 'pymoab'. No files were found to uninstall.
  Running setup.py develop for pymoab
    Complete output from command /usr/bin/python3 -c "import setuptools, tokenize;__file__='/root/build_dir/moab/bld/pymoab/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" develop --no-deps --prefix=/root/build_dir/moab/bld/pymoab:
    running develop
    error: can't create or remove files in install directory
    
    The following error occurred while trying to add or remove files in the
    installation directory:
    
        [Errno 2] No such file or directory: '/root/build_dir/moab/bld/pymoab/lib/python3.6/site-packages/test-easy-install-5760.write-test'
    
    The installation directory you specified (via --install-dir, --prefix, or
    the distutils default setting) was:
    
        /root/build_dir/moab/bld/pymoab/lib/python3.6/site-packages
    
    This directory does not currently exist.  Please create it and try again, or
    choose a different installation directory (using the -d or --install-dir
    option).
    
    
    ----------------------------------------
  Can't roll back pymoab; was not uninstalled
Command "/usr/bin/python3 -c "import setuptools, tokenize;__file__='/root/build_dir/moab/bld/pymoab/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" develop --no-deps --prefix=/root/build_dir/moab/bld/pymoab" failed with error code 1 in /root/build_dir/moab/bld/pymoab/
pymoab/CMakeFiles/pymoab-local-install.dir/build.make:58: recipe for target 'pymoab/CMakeFiles/pymoab-local-install' failed
make[2]: *** [pymoab/CMakeFiles/pymoab-local-install] Error 1
CMakeFiles/Makefile2:1238: recipe for target 'pymoab/CMakeFiles/pymoab-local-install.dir/all' failed
make[1]: *** [pymoab/CMakeFiles/pymoab-local-install.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 55%] Linking CXX executable ../bin/mbcn_test
[ 55%] Built target mbcn_test
make: *** [all] Error 2
Makefile:140: recipe for target 'all' failed
Error: Process completed with exit code 2.

@shimwell
Copy link
Member Author

shimwell commented May 7, 2023

e08954f49424: Pull complete
Digest: sha256:3c332407c190c017dbe049e8e9c9d54c5e44166b484c56fd414821bc54e5c233
Status: Downloaded newer image for akhilerm/repo-copy:latest
crane [copy ghcr.io/svalinn/dagmc-ci-ubuntu-22.04-gcc-ext-hdf5_1.10.4-moab_5.3.0/dagmc:refs_heads_develop-bk0 ghcr.io/svalinn/dagmc-ci-ubuntu-22.04-gcc-ext-hdf5_1.10.4-moab_5.3.0:stable]
2023/05/07 17:46:34 Copying from ghcr.io/svalinn/dagmc-ci-ubuntu-22.04-gcc-ext-hdf5_1.10.4-moab_5.3.0/dagmc:refs_heads_develop-bk0 to ghcr.io/svalinn/dagmc-ci-ubuntu-22.04-gcc-ext-hdf5_1.10.4-moab_5.3.0:stable
Error: fetching "ghcr.io/svalinn/dagmc-ci-ubuntu-22.04-gcc-ext-hdf5_1.10.4-moab_5.3.0/dagmc:refs_heads_develop-bk0": GET https://ghcr.io/v2/svalinn/dagmc-ci-ubuntu-22.04-gcc-ext-hdf5_1.10.4-moab_5.3.0/dagmc/manifests/refs_heads_develop-bk0: MANIFEST_UNKNOWN: manifest unknown
panic: exit status 1

goroutine 1 [running]:
main.main()
	/go/src/github.com/akhilerm/repo-copy/main.go:20 +0x15a
Error: The process '/usr/bin/docker' failed with exit code 2

@gonuke
Copy link
Member

gonuke commented May 8, 2023

Yep - working on that, too ☹️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants