Cache the pulled Docker image with GHA? #23

firasm · 2022-02-20T06:27:50Z

First of all, thanks for creating these examples, they've been a Godsend for me! Much appreciated.

This may be beyond the scope of this project, but is there a way to cache the pulled docker images within a GitHub Action?

I am trying to cut down on the run-time of my github action, and everything is very fast, except the initial step of pulling the docker image - which takes ~55% of the total time. Ideally the first time it would take the full time, and then subsequently, it would use a cached Docker image.

I have been trying to read up about this, and it sounds like there is some new functionality added that makes this possible, but my attempts to make it worked haven't yielded much fruit.

Some references

Here is the most promising source about this

Adding this to my GHA should work, but I don't think I've got the placement quite right:

      - name: Cache Docker layers
        uses: actions/cache@v2
        with:
          path: /tmp/.buildx-cache
          key: ${{ runner.os }}-single-buildx-${{ github.sha }}
          restore-keys: |
            ${{ runner.os }}-single-buildx

Another GHA that claims to cache (untested)
A potential lead

Let me know if I can provide any more information

The text was updated successfully, but these errors were encountered:

alerque · 2022-02-20T09:21:02Z

I think you are on the wrong track with all of those leads actually. Except for the last one, all of them have to do with caching Docker builds. You are not hung up building layers, just pulling them from a registry. The final lead is about caching pulled layers, but it only operates on things run inside the job runner (i.e. like the earlier ones, this would be useful if your project was a Docker build), not on steps injected into it by Actions.

Actions actually already caches pulls used for it's workflow steps behind the scenes, but it also has lots of different runners and your are not guaranteed to even be in the same data-center between runs.

Also it looks like from your logs you may not be pulling out upstream images, you seem to be pulling from a fork of your own. One thing you could do to speed that up just a touch is setup the action.yaml file in your fork to pull explicitly from the ghcr.io registry and send the builds your use for your jobs there. That will get the images needed for your builds as close to edge-cached next to the Actions runner as you are going to get.

firasm · 2022-02-20T21:42:13Z

Oh! You're right - thanks for leading me away from the path I was about to walk through blindly. Explanation was also much appreciated.

Yes, I had to install some latex packages using tlmgr for my use-case and I thought forking the pandoclatex docker image would be better than installing the packages each time.

I will try to see what the GH Contains Registry is all about, and see if I can do as you're suggesting. It looks like I'll need to first figure out how to store my docker image in the GHCR.

For now, I will include my GHA in case anyone else finds it useful, the only change/addition is to add a section at the end that automatically commits the pandoc'd file to the repo (rather than having it as an artifact). As you can see, I was also trying to loop through the files_list, but that's still a WIP so for now I've just copied it over twice (which doesn't seem to be much slower than processing only one of the .md files.

name: Build PDF

on:
  workflow_dispatch:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
    
jobs:
  create_custom_pandoc_docker:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    
    - name: File List
      id: files_list
      run: |
        echo "::set-output name=files::$(printf '"%s" ' chapters/*.md | sed -e 's/chapters\///g')" # sed removes chapters/
    - uses: docker://firasm/pandoclatex:2022-02
      with:
        args: >-
          -s  ./chapters/01-large_classes.md
          -V  colorlinks=true
          -V  linkcolor=blue
          -V  urlcolor=blue
          --bibliography  ./bib/references.bib
          --citeproc
          --template=./templates/eisvogel.tex
          --csl ./templates/institute-of-physics-harvard.csl
          -o  ./output/01-large_classes.md
          
    - uses: docker://firasm/pandoclatex:2022-02
      with:
        args: >-
          -s  ./chapters/02-assessments.md
          -V  colorlinks=true
          -V  linkcolor=blue
          -V  urlcolor=blue
          --bibliography  ./bib/references.bib
          --citeproc
          --template=./templates/eisvogel.tex
          --csl ./templates/institute-of-physics-harvard.csl
          -o  ./output/02-assessments.pdf
    - name: GH Add and Commit
      uses: EndBug/add-and-commit@v7
      with:
        message: 'AUTO: Generated PDF in output/'
        branch: main

I'll try and follow up with my solution, but if I don't it means I didn't figure it out; feel free to close if the issue becomes stale.

alerque · 2022-02-22T07:52:25Z

I would suggest forking the pandoclatex repo is the wrong move. I would setup a very simple repo for your builder that just has a Dockerfile and action.yml config. The dockerfile should pull the pandoclatex image you want, then make the package changes/additions/whatever. Then build that dockerfile and push it to the container registry for the repo, then setup the action.yaml file so that your project can be used as an action and specifically use the syntax to pull a tagged image from ghcr.io not the default one to build the local Dockerfile on run. Then when you use your repo as an action step you should get pretty fast cached image pulls.

If you need examples of projects setup this way I can probably find some. Just note you need to save the correct tag in the action file, then tag it with that tag, then push the tagged image to the registry. That way running the action from the tag will actually pull the correct prebuilt image.

Also note for your "looping" you should probably switch to using the designated host container method rather than the step method in your final workflow. There is an example sitting in #11.

firasm · 2022-02-25T06:49:18Z

I did it (I think!) !! There was a bunch of stuff I didn't fully understand, and I used the boilerplate GHA to setup the repo action, so hopefully I did it correctly.

Thanks @alerque - those instructions were amazing. In case anyone is trying to reproduce this in the future, here's what I did:

Create a new empty repo. Mine is here
- Add a Dockerfile
- Add an actions .yml file
The action should now run and create a project for you
There you should click on "Package settings" and add your repo so that it can use it as an action (not sure if this step is needed, I think it is...)

I'm not 100% sure I did the tagging right to make sure I'm accessing the right image like you said.

uses: docker://ghcr.io/firasm/pandoc_image:@sha256-3eb14009b8180bca91fb2f22a6d93d69253fc18ebaa0d0916025e8f88ad2e218.sig

I'll update this when I figure it out, if you have an example handy that'd be great otherwise I'm sure I'll eventually plod through it.

alerque · 2022-02-25T10:39:24Z

The Docker image creation / tagging / signing is a little more complex than it needs to be, but it does seem to get the right job done in the end.

The only thing I see missing is the "easy" way to run such an action. You have a way to call it via the very specific Docker invocation, but you could also add an action.yml to the repo that has that docker tag call (and optionally tag the image repository itself if you wanted access to versions other than whatever you mark as the current one). Subsequently projects would be able to use the action as just uses: firasm/pandoc_image@master rather than the verbose uses: above. You can also setup default arguments that way. Here is a sample config using for setting up a repository for use as an action.

firasm changed the title ~~Is there a way to cache the pulled Docker image on GHA?~~ Cache the pulled Docker image with GHA? Feb 20, 2022

alerque closed this as completed Feb 25, 2022

alerque added the question Further information is requested label Feb 25, 2022

cagix mentioned this issue Feb 25, 2022

[Tooling] Multi-Arch Docker Image Programmiermethoden-CampusMinden/Prog2-Lecture#43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache the pulled Docker image with GHA? #23

Cache the pulled Docker image with GHA? #23

firasm commented Feb 20, 2022

alerque commented Feb 20, 2022

firasm commented Feb 20, 2022 •

edited

Loading

alerque commented Feb 22, 2022

firasm commented Feb 25, 2022

alerque commented Feb 25, 2022

Cache the pulled Docker image with GHA? #23

Cache the pulled Docker image with GHA? #23

Comments

firasm commented Feb 20, 2022

Some references

alerque commented Feb 20, 2022

firasm commented Feb 20, 2022 • edited Loading

alerque commented Feb 22, 2022

firasm commented Feb 25, 2022

alerque commented Feb 25, 2022

firasm commented Feb 20, 2022 •

edited

Loading