Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert cache to use github.ref instead of github.sha #5460

Merged
merged 1 commit into from
May 2, 2023

Conversation

kamilkrzyskow
Copy link
Collaborator

@kamilkrzyskow kamilkrzyskow commented May 2, 2023

Fixes my mistake from #5448.
I added a code annotation to the Publishing Guide, explaining when should the users expect a cache update.
I considered to add another sentence "If your project requires more frequent cache updates, then use github.sha", but I guess this is a too much of an edge-case, that people might just overuse "just in case". Also using a YYYY-MM-DD date instead would be better than github.sha, because it would limit caches to 1 day.

Sorry for the PR chaos 馃檹馃槄

@squidfunk squidfunk merged commit f08bbb4 into squidfunk:master May 2, 2023
3 checks passed
@squidfunk
Copy link
Owner

Thanks for the quick fix and no bad feelings! 鈽猴笍

@kamilkrzyskow
Copy link
Collaborator Author

@squidfunk
This will probably come of as some bad joke, and I'm embarrassed to say it, but the action saved the refs/heads/master github.ref
https://github.com/squidfunk/mkdocs-material/actions/runs/4861591365/jobs/8666814586#step:14:5

This means that even for pull request merges the github.ref uses the branch rules 馃槰
https://docs.github.com/en/actions/learn-github-actions/contexts#github-context

The fully-formed ref of the branch or tag that triggered the workflow run. For workflows triggered by push, this is the branch or tag ref that was pushed. For workflows triggered by pull_request, this is the pull request merge branch. For workflows triggered by release, this is the release tag created. For other triggers, this is the branch or tag ref that triggered the workflow run. This is only set if a branch or tag is available for the event type. The ref given is fully-formed, meaning that for branches the format is refs/heads/<branch_name>,

for pull requests it is refs/pull/<pr_number>/merge, and for tags it is refs/tags/<tag_name>. For example, refs/heads/feature-branch-1.

After rereading that I figured that the pull request github.ref would work out, but I was mistaken, yet again...


So I'm currently out of ideas, because I do think that github.sha might be used to abuse the cache creation system with 5 million users and 10GB storage cap, and adding a step to setup a date variable seems to be too much, but maybe the lesser evil 馃

    - name: Set Date String
      run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_ENV
    # ...
    key: mkdocs-material-${{ env.date }}
    # ...

Sorry for the confusion once again 馃槥

@squidfunk
Copy link
Owner

squidfunk commented May 2, 2023

Haha, no worries. Many thanks for taking the time and investigating, much appreciated. Should we revert the last commit? I'm not sure what to do here and would follow your recommendation. I'm not sure whether date is a good idea though. It might lead to flaky builds that resolve on the next day 馃槄

@kamilkrzyskow
Copy link
Collaborator Author

kamilkrzyskow commented May 2, 2023

It might lead to flaky builds that resolve on the next day

I'm not sure if I understand this sentence.
Are you saying that adding a date might create a scenario where a build doesn't work one day and works on the next day?

The cache is being restored from the previous key with a matching prefix, therefore using github.sha could allow any commit to corrupt the cache and then the user would have to look over a long list of caches, and delete them 1-by-1, to find out which one was corrupted first.
Assuming someone works from the GitHub web interface, each change would invoke a CI run adding one cache. 10 such commits a day would mean 70 commits/caches per week, which would be close to 10~15MB x 70 ~= 1GB of cache, but that is only for 1 repository. Taking it further, people could use the github.sha in their other projects, because "it works" (just like me in #5448), therefore I came to the conclusion that "advertising" this approach in such a popular repository isn't ideal, and ...

Should we revert the last commit?

... therefore I dobut reverting to the previous commit is the way to go.


Using the approach with a date would limit the amount of caches to 7 per week (assuming daily commits), and would allow the users to delete them, more easily, in case of any issues. The downside is of course the rather non-straightforward step inside of the workflow file.


Another approach could be splitting the cache into restore and saving steps (which I ignored previously):
https://github.com/actions/cache/blob/main/save/README.md#re-evaluate-cache-key-while-saving
Who would have thought that this would allow to force save into the same cache 馃槄

uses: actions/cache/restore@v3
id: restore-cache # step id required to get the output, and use same hash
with:
    restore-keys: mkdocs-material-

Save approach #1, which uses the same key returned from the restore step:

uses: actions/cache/save@v3
with:
    path: .cache
    key: ${{ steps.restore-cache.outputs.cache-matched-key }}

Save approach #2, which generated the hash based on the contents of .cache, I believe you intended to do it like that previously:

uses: actions/cache/save@v3
with:
    path: .cache
    key: mkdocs-material-${{ hashfiles('.cache/**') }}

This would be the "cleanest" way to properly manage caches, but it once again adds more steps to copy / paste, and for the user to understand, in a way it's worse than the date. I'll see if I can manage to introduce it in a good way to the current documentation.

@squidfunk
Copy link
Owner

Thanks again! I'm having a little hard time wrapping my head around this (currently fighting other battles), so I'll trust your recommendation on this. If you wish to follow up on this, feel free to take the necessary time to come up with a solution that is simple to follow for the user (= less maintenance for us maintainers because of fewer questions) and works reasonably well. Maybe someone else with more experience can share some ideas as well 鈽猴笍

It might lead to flaky builds that resolve on the next day

I'm not sure if I understand this sentence.
Are you saying that adding a date might create a scenario where a build doesn't work one day and works on the next day?

As we both know, caching is hard. We had some problems a while ago with stale Python dependencies that were tricky to resolve. For this reason, I deactivated caching of pip-installed dependencies. Scoping caches to days introduces a new axis of fragility. When we run into a caching issue on day 1, and try to debug it on day 2, it might be hard to reproduce. When it's scoped to a ref, it's likely easier to debug, as we don't have time working against us. That's basically what I meant. I'm all for solutions that are simple, yet solve the widest range of problems while being easy to debug and extend. Maybe we can find such a solution here as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants