Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache the pulled Docker image with GHA? #23

Closed
firasm opened this issue Feb 20, 2022 · 5 comments
Closed

Cache the pulled Docker image with GHA? #23

firasm opened this issue Feb 20, 2022 · 5 comments
Labels
question Further information is requested

Comments

@firasm
Copy link

firasm commented Feb 20, 2022

First of all, thanks for creating these examples, they've been a Godsend for me! Much appreciated.

This may be beyond the scope of this project, but is there a way to cache the pulled docker images within a GitHub Action?

I am trying to cut down on the run-time of my github action, and everything is very fast, except the initial step of pulling the docker image - which takes ~55% of the total time. Ideally the first time it would take the full time, and then subsequently, it would use a cached Docker image.

Screen Shot 2022-02-19 at 10 18 03 PM

I have been trying to read up about this, and it sounds like there is some new functionality added that makes this possible, but my attempts to make it worked haven't yielded much fruit.

Some references

Adding this to my GHA should work, but I don't think I've got the placement quite right:

      - name: Cache Docker layers
        uses: actions/cache@v2
        with:
          path: /tmp/.buildx-cache
          key: ${{ runner.os }}-single-buildx-${{ github.sha }}
          restore-keys: |
            ${{ runner.os }}-single-buildx

Let me know if I can provide any more information

@firasm firasm changed the title Is there a way to cache the pulled Docker image on GHA? Cache the pulled Docker image with GHA? Feb 20, 2022
@alerque
Copy link
Collaborator

alerque commented Feb 20, 2022

I think you are on the wrong track with all of those leads actually. Except for the last one, all of them have to do with caching Docker builds. You are not hung up building layers, just pulling them from a registry. The final lead is about caching pulled layers, but it only operates on things run inside the job runner (i.e. like the earlier ones, this would be useful if your project was a Docker build), not on steps injected into it by Actions.

Actions actually already caches pulls used for it's workflow steps behind the scenes, but it also has lots of different runners and your are not guaranteed to even be in the same data-center between runs.

Also it looks like from your logs you may not be pulling out upstream images, you seem to be pulling from a fork of your own. One thing you could do to speed that up just a touch is setup the action.yaml file in your fork to pull explicitly from the ghcr.io registry and send the builds your use for your jobs there. That will get the images needed for your builds as close to edge-cached next to the Actions runner as you are going to get.

@firasm
Copy link
Author

firasm commented Feb 20, 2022

Oh! You're right - thanks for leading me away from the path I was about to walk through blindly. Explanation was also much appreciated.

Yes, I had to install some latex packages using tlmgr for my use-case and I thought forking the pandoclatex docker image would be better than installing the packages each time.


I will try to see what the GH Contains Registry is all about, and see if I can do as you're suggesting. It looks like I'll need to first figure out how to store my docker image in the GHCR.

For now, I will include my GHA in case anyone else finds it useful, the only change/addition is to add a section at the end that automatically commits the pandoc'd file to the repo (rather than having it as an artifact). As you can see, I was also trying to loop through the files_list, but that's still a WIP so for now I've just copied it over twice (which doesn't seem to be much slower than processing only one of the .md files.

name: Build PDF

on:
  workflow_dispatch:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
    
jobs:
  create_custom_pandoc_docker:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    
    - name: File List
      id: files_list
      run: |
        echo "::set-output name=files::$(printf '"%s" ' chapters/*.md | sed -e 's/chapters\///g')" # sed removes chapters/
    - uses: docker://firasm/pandoclatex:2022-02
      with:
        args: >-
          -s  ./chapters/01-large_classes.md
          -V  colorlinks=true
          -V  linkcolor=blue
          -V  urlcolor=blue
          --bibliography  ./bib/references.bib
          --citeproc
          --template=./templates/eisvogel.tex
          --csl ./templates/institute-of-physics-harvard.csl
          -o  ./output/01-large_classes.md
          
    - uses: docker://firasm/pandoclatex:2022-02
      with:
        args: >-
          -s  ./chapters/02-assessments.md
          -V  colorlinks=true
          -V  linkcolor=blue
          -V  urlcolor=blue
          --bibliography  ./bib/references.bib
          --citeproc
          --template=./templates/eisvogel.tex
          --csl ./templates/institute-of-physics-harvard.csl
          -o  ./output/02-assessments.pdf
    - name: GH Add and Commit
      uses: EndBug/add-and-commit@v7
      with:
        message: 'AUTO: Generated PDF in output/'
        branch: main

I'll try and follow up with my solution, but if I don't it means I didn't figure it out; feel free to close if the issue becomes stale.

@alerque
Copy link
Collaborator

alerque commented Feb 22, 2022

I would suggest forking the pandoclatex repo is the wrong move. I would setup a very simple repo for your builder that just has a Dockerfile and action.yml config. The dockerfile should pull the pandoclatex image you want, then make the package changes/additions/whatever. Then build that dockerfile and push it to the container registry for the repo, then setup the action.yaml file so that your project can be used as an action and specifically use the syntax to pull a tagged image from ghcr.io not the default one to build the local Dockerfile on run. Then when you use your repo as an action step you should get pretty fast cached image pulls.

If you need examples of projects setup this way I can probably find some. Just note you need to save the correct tag in the action file, then tag it with that tag, then push the tagged image to the registry. That way running the action from the tag will actually pull the correct prebuilt image.

Also note for your "looping" you should probably switch to using the designated host container method rather than the step method in your final workflow. There is an example sitting in #11.

@firasm
Copy link
Author

firasm commented Feb 25, 2022

I did it (I think!) !! There was a bunch of stuff I didn't fully understand, and I used the boilerplate GHA to setup the repo action, so hopefully I did it correctly.

Thanks @alerque - those instructions were amazing. In case anyone is trying to reproduce this in the future, here's what I did:

Screen Shot 2022-02-24 at 10 27 48 PM

  • I'm not 100% sure I did the tagging right to make sure I'm accessing the right image like you said.
uses: docker://ghcr.io/firasm/pandoc_image:@sha256-3eb14009b8180bca91fb2f22a6d93d69253fc18ebaa0d0916025e8f88ad2e218.sig

I'll update this when I figure it out, if you have an example handy that'd be great otherwise I'm sure I'll eventually plod through it.

@alerque
Copy link
Collaborator

alerque commented Feb 25, 2022

The Docker image creation / tagging / signing is a little more complex than it needs to be, but it does seem to get the right job done in the end.

The only thing I see missing is the "easy" way to run such an action. You have a way to call it via the very specific Docker invocation, but you could also add an action.yml to the repo that has that docker tag call (and optionally tag the image repository itself if you wanted access to versions other than whatever you mark as the current one). Subsequently projects would be able to use the action as just uses: firasm/pandoc_image@master rather than the verbose uses: above. You can also setup default arguments that way. Here is a sample config using for setting up a repository for use as an action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants