Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cache key creation to use date suffix #5463

Merged

Conversation

kamilkrzyskow
Copy link
Collaborator

@kamilkrzyskow kamilkrzyskow commented May 3, 2023

Continuation of #5460
3rd attempt at choosing the correct caching strategy, wish me luck πŸ™‹

In my last comment in the previous PR I wrongly assumed that using separate cache/restore and cache/save actions would allow to overwrite an already created cache. I tried it and unfortunately it turned out that caches are read only once created 😞
πŸ—‘οΈ

Another suggestion, when using separate actions, was to use ${{ hashFiles('.cache/**') }} to only update the cache when it actually changes. I tried it and ultimately, I personally don't like it, this approach has multiple "traps":

  • In the case that ${{ hashFiles('.cache/**') }} returns empty (lack of created cache) it would create a cache with only the prefix as the name mkdocs-material-. I believe that such cache would take priority over mkdocs-material-xxx, because the restore-keys would first look for an identical match and then act as prefixes. Therefore there is a chance that newer caches wouldn't be used. At least I think so 😡
    EDIT: So, I read the action log incorrectly. If the .cache path doesn't exist then it won't create any cache, the mkdocs-material- cache was created, because I tried to access an incorrect github context property. Hence, it kept accessing the mkdocs-material- cache afterwards 🀦
  • More chances for the user to make an error, 2 separate steps to copy and paste, 2 separate paths and keys to fill out. Showing people how to use the env could mitigate that, but I believe there would be some people missing some parts of the process.

Also this approach restores the caches only via the restore-keys and then it can happen that ${{ hashFiles('.cache/**') }} returns the same hash as before, but the save action will still try to save and fail using the same cache. Putting this all together, it seems to be a bad design choice, a "code smell" in a way, because most cache attempts will "warn" about a failure somewhere.
πŸ—‘οΈ


Having tried the separate actions and ultimately deciding against using them, I turned to the date approach, as my last resort. By the way, I found this cache workarounds guide, to update the cache it shows an example using github.run_id, which basically is the same as using github.sha πŸ˜…, but there is a comment # Can use time based key as well, so I think it's the right way 🀘 .

I tried it and very much liked the simplicity of adding only 1 line to the workflow file, admittedly a Linux command, which could be considered cryptic, but oh well. Once familiarising themselves with the command, users could also modify the date formatting string to change the frequency of the caches to their liking. I set the frequency to 1 week. πŸŽ‰


As a side-note, trying to run the documentation.yml workflow on my fork revealed a lack of some dependencies in the Install Python dependencies step. Seems like they come with the Insiders filesπŸ˜„

@kamilkrzyskow
Copy link
Collaborator Author

I hope that the added explanation in the publishing guide is succinct enough. I won't be able to make changes till the evening.

@squidfunk
Copy link
Owner

@kamilkrzyskow I was busy working on the social cards, which are now released, an we removed the caching section from the docs, since this should be part of the publishing guide (it's already mentioned there). Could you rebase your PR?

@kamilkrzyskow
Copy link
Collaborator Author

kamilkrzyskow commented May 12, 2023

I think I did it correctly @squidfunk, the cache workflow example is still in the privacy plugin page πŸ€”

@squidfunk
Copy link
Owner

Thanks again for taking the time to explain your rationale behind the proposed changes and update the PR. Your work is very much appreciated! I think this is now good to merge, so let's give it a spin!

@squidfunk squidfunk merged commit 93a63b8 into squidfunk:master May 14, 2023
3 checks passed
@squidfunk
Copy link
Owner

squidfunk commented Sep 21, 2023

Coming back to this topic – I think the current solution is not ideal. The problem is that the cache is not updated if a hit occurred on the primary key (which you already acknowledged), which means plugins like the git-committers-plugin will run again and again doing unnecessary work that did not change between commits.

I think the week number is too coarse. We need to update the cache more often. 😐

BildschirmΒ­foto 2023-09-21 um 17 47 20

@kamilkrzyskow
Copy link
Collaborator Author

kamilkrzyskow commented Sep 21, 2023

Then using the whole date to make it update daily would have been better in your case πŸ€” or perhaps an even more aggressive time based cache, with added hours and some modulo magic to update it even more frequently.

Defining a cache name specific to certain changes is rather context specific, I didn't think it would apply to the general public, so I just used the time, as updating on each commit basically ignores the idea of cache.

Perhaps you'd need to make use of the separate cache-read and cache-save workflow hmm.
This might perhaps allow you to read the mkdocs-material- prefix and then save new caches after the plugin detected changes, but I don't know how to transform this detection of changes into a proper unique cache name that isn't a commit hash.

@kamilkrzyskow
Copy link
Collaborator Author

Yeah, you could probably use the #2 approach from my comment here #5460 (comment) save the cache of the hashed .cache directory.
Though, I have the impression that I did try it in some similar way and I encountered errors when either the cache name wasn't unique (was already created before) or was in use by the restore action πŸ€”

@squidfunk
Copy link
Owner

Thanks for your input! I'm not sure #2 would work, because the .cache folder is not committed and it would hash the files from a non-existent directory to resolve the cache. Thus, the hash would always be the same. Only when the cache is saved at the end, there are files in the cache and it will lead to another hash. I think this might be a problem when you have builds that don't write into the .cache folder at all and then save it at the end – it would then be a hit on the primary restore key each time with an empty folder, so it would not fall back to the alternate restore key. I might be missing something, though.

@kamilkrzyskow
Copy link
Collaborator Author

Hmm, in your case the .cache directory should be re-created after the cache-restore action, since you already have a previously saved cache. Also the directory should be created even without a previous cache, since you don't ever disable cache creation. So there should be no issue with the empty / nonexistent directory.

But let's assume there is a case where the cache directory doesn't exist. Perhaps it's possible to use some sort of conditional in the workflow using environmental variables.

https://github.com/actions/cache/blob/main/save/README.md#always-save-cache

Like here instead of the if: always() maybe you can check the cache hash twice, before building mkdocs and after, to compare the hashes in the conditional.
That's just a loose idea, I don't know if the if can access environmental variables, but perhaps you could create a text file containing some information for the conditional.

I am not able to check it myself in the moment.

@squidfunk
Copy link
Owner

Okay, so I think I misread. If we split caching into both actions, we'll always save the cache and there should be no problem. If we don't do that, then GitHub will only save the cache if it was not able to restore. I'll give that a try ☺️

Thanks again for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants