Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip timestamp append by default #138

Open
lukasz-mitka opened this issue Mar 21, 2023 · 16 comments
Open

Skip timestamp append by default #138

lukasz-mitka opened this issue Mar 21, 2023 · 16 comments

Comments

@lukasz-mitka
Copy link

Turns out for busy repos the default timestamp costs quite a lot since cache gets dropped a lot, thus increasing build times.
Please consider setting append-timestamp to false by default.
I would expect a major release with it to save people any issues.

Also what was the purpose of the timestamp? Does it give any benefit I'm missing?

@hendrikmuhs
Copy link
Owner

Can you show a comparison with and without this setting?

cache gets dropped a lot, thus increasing build times.

The setting controls how the ccache directory is up and dowloaded from github, not ccache itself. This has nothing todo with cache hits and misses in ccache.

All your questions are answered and discussed in the relevant issues and PRs, see
#126 and #117

As stated in #126 (comment) changing the default will break most existing workflows. This new setting is only useful for workflows that use their own logic for creating the cache key.

@lukasz-mitka
Copy link
Author

Can you show a comparison with and without this setting?

No, sorry. To busy to play around with it. But scenario is: multiple builds, up to 1GB for each build, multiple branches/PRs active at the same time.
Doing series of commits to a single branch creates new cache for each push, overflowing the GitHub cache (10GB limit).
Without timestamp existing caches are overwritten filling the limit more slowly.
Keep in mind each branch has a separate cache regardless of the key used.

The setting controls how the ccache directory is up and dowloaded from github, not ccache itself. This has nothing todo with cache hits and misses in ccache.

yeap, I'm talking about GitHub cache. Once over limit it will be removed resulting in lost potential.

All your questions are answered and discussed in the relevant issues and PRs, see
#126 and #117

Thank you for linking them. I missed them somehow.
So to summarize: timestamp is a result of someone on the internet using it for no apparent reason and you copying it?
Unless there's a real justification I suggest removing it.

changing the default will break most existing workflows

Probably it won't brake anything. Anyway, that's why I suggested adding it and releasing under new major version.

@hendrikmuhs
Copy link
Owner

No need to get rude.

I am always open for suggestions. I am not willing to change something based on gut feelings. I gave you the answer, it clearly stated that overwriting with the same key has a race condition:

As stated in #126 (comment) changing the default will break most existing workflows. This new setting is only useful for workflows that use their own logic for creating the cache key.

The official documentation contains:

You cannot change the contents of an existing cache. Instead, you can create a new cache with a new key.

Nevertheless I merged the option to disable time-stamping for users that want more control and know what they are doing.

@lukasz-mitka
Copy link
Author

Sorry, wasn't trying to be rude, just direct.

based on gut feelings

IMO using timestamp seems to be a 'gut feeling' of someone. I'd love to be proven wrong.
Switched all my workflows to append-timestamp: false, I'll let you know if I run into any issues.

I merged the option to disable time-stamping

I appreciate that!

You cannot change the contents of an existing cache.

Yes, it's being overwritten instead.

Final note: Please include timestamp info in main README.md.

@hendrikmuhs
Copy link
Owner

Happy to hear about your experience.

@vadz you indicated problems in #126 (comment), maybe you can share your experience, too.

I understand the documentation that every entry is immutable, further on the documentation describes that in case of an exact match, the creation of a new entry doesn't execute. But we want to create a new entry on every run, see Cache Hits and Misses.

Adding documentation and writing about this repeatedly coming up concern is still on my list. I did not find the time for it yet.

@lukasz-mitka
Copy link
Author

Cache Hits and Misses

That's very interesting, but it works a bit differently than described. Cache get saved even after cache hit.
Log extract:

# SETUP
Run hendrikmuhs/ccache-action@fba817f3c0db4f854d7a3ee688241d6754da166e
/usr/bin/docker exec  f8abbfaf3bce46645f87ab7480cb1017857e7cb8633b18ff88a9f0b54924156c sh -c "cat /etc/*release | grep ^ID"
Restore cache
  Received 243269632 of 859941342 (28.3%), 230.4 MBs/sec
  Received 595591168 of 859941342 (69.3%), 282.0 MBs/sec
  Received 855747038 of 859941342 (99.5%), 270.6 MBs/sec
  Received 859941342 of 859941342 (100.0%), 149.7 MBs/sec
  Cache Size: ~820 MB (859941342 B)
  /usr/bin/tar -xf /__w/_temp/7c7db733-b2e9-4c18-b01e-9f15ce7a0aa8/cache.tgz -P -C /__w/repo/repo -z
  Cache restored successfully
  Restored from cache key "ccache-Linux-key-".

# POST
Post job cleanup.
/usr/bin/docker exec  f8abbfaf3bce46645f87ab7480cb1017857e7cb8633b18ff88a9f0b54924156c sh -c "cat /etc/*release | grep ^ID"
ccache stats
Save cache using key "ccache-Linux-key-".
/usr/bin/tar --posix -cf cache.tgz --exclude cache.tgz -P -C /__w/repo/repo --files-from manifest.txt -z
Cache Size: ~1093 MB (1146236412 B)
Cache saved successfully

@vadz
Copy link
Contributor

vadz commented Mar 24, 2023

Maybe something has changed since January, but saving the cache under the same name definitely failed for me back then. I'll try to retest it again but won't be able to do it immediately.

FWIW I do agree that if not using timestamps works it should be the default.

@lukasz-mitka
Copy link
Author

It's been a month since I'm running 'no timestamp' in a couple of repos.
No issues noticed and no complaints from devs.

@hendrikmuhs please consider reopening this issue and making the change.

@hendrikmuhs hendrikmuhs reopened this Apr 21, 2023
@hansfn
Copy link
Contributor

hansfn commented May 23, 2023

Maybe something has changed since January, but saving the cache under the same name definitely failed for me back then. I'll try to retest it again but won't be able to do it immediately.

I just tested turning off appending of timestamps and immediately got:

Failed to save: Unable to reserve cache with key ..., another job may be creating this cache. More details: Cache already exists

This was the first job after turning it off so there was no old existing cache without timestamp. However, looking at repo/actions/caches I can see a (new) cache without timestamp. It seems it was save after all.

Is there some other way we can fix this - that the cache space is used up too quickly? Maybe keep using timestamps but remove old caches (same key, older timestamp) automatically? Is that doable?

Added: A final thought- could the error message be wrong? @lukasz-mitka Could you check the output of the "Post Load ccache" step? Maybe you have the error message all the time, but it still works?

@hendrikmuhs
Copy link
Owner

Thanks for letting us know! Seems the issue is still present.

Is there some other way we can fix this - that the cache space is used up too quickly? Maybe keep using timestamps but remove old caches (same key, older timestamp) automatically? Is that doable?

I would be very surprised if githubs cache does not handle that internally. Usually a cache works LRU-based, older cache entries should get evicted automatically when the cache space is full and a new entry is pushed. I think we should not try to optimize on that level.

However ccache has a lot of options besides the upper size, e.g. it is possible to evict entries on age, change compression, etc.

@hansfn
Copy link
Contributor

hansfn commented May 23, 2023

The reason I think we could do better than the default automatic eviction, is that as developers we know more about the selected workflow. It's not always true that the oldest cache should be evicted first. In a typical PR workflow, you would like to keep the cache for the PR as long as possible while multiple old caches for the master / main branch has no value. The point is that we are using up the space so fast that PR caches that we want to preserve is evicted.

I wonder if a new "replace-cache: true" setting could work? By replace, I mean after the new cache is saved with the new timestamp, older caches are deleted. I assume you can't delete the last cache for the same reason it failed to save when not appending timestamps.

@hendrikmuhs
Copy link
Owner

Caches are separated by branch, however the resource budget is per repository. I hoped to find the eviction logic in action/cache, but that seems to be just the client component. So the exact eviction logic seems githubs secret(if anyone has other information, please let us know). Ideally it is not just deleting the oldest entry, but takes the branch name into account.

However we don't know and I don't like implementing something based on vague assumptions. It would be nice if someone who faces the problem can instrument it's CI, e.g. by listing the caches as described here. This would allow us to reverse-engineer the logic by seeing which caches github choose when evicting entries.

It still sounds wrong to me to try fix cache eviction from the client. When we have proof that something works wrong w.r.t. cache evictions we can also contact github to fix it server side.

FWIW ccache itself has some interesting options as well, e.g. the cache could be pruned by age after every run to keep the individual ccaches smaller.

@hansfn
Copy link
Contributor

hansfn commented Oct 27, 2023

Just for the record: We still struggle with this, that "wrong" caches are evicted, when there is a lot of activity in our mono repo. I think what happens is that long living PR loose their cache, because a lot of other (short lived) PRs are merged to master. We just spent our entire included minutes quota on two days. It normally takes over 20 ...

Assuming a lot of people use the PR workflow, fixing this should be useful for other people - not just us. To repeat myself: Is there some way we could avoid saving all these unnecessary caches for the main / master branch?

Added:

To be honest: I don't understand what is going on. The action loads a cache, so it's not evicted, but ccache reports 100% miss. Any clues?

@hendrikmuhs
Copy link
Owner

@hansfn do you limit the cache size?

This might help to prevent eviction. In addition I wonder if the --evict-older-than ccache option could help here. It should be possible to try this as part of your workflows and if it turns out useful integrate it into the action as another option.

The eviction itself isn't controlled by the action, so I don't see how this can be fixed in this repository.

To be honest: I don't understand what is going on. The action loads a cache, so it's not evicted, but ccache reports 100% miss. Any clues?

Is it possible that the compiler suite/binary changed due to an update of the base image?

@Alexander-Wilms
Copy link

I encountered this as well:

Post job cleanup.
ccache stats
Save cache using key "ccache-linux-gcc-test-no-PR-".
/usr/bin/tar --posix -cf cache.tzst --exclude cache.tzst -P -C /home/runner/work/vcmi/vcmi --files-from manifest.txt --use-compress-program zstdmt
Failed to save: Unable to reserve cache with key ccache-linux-gcc-test-no-PR-, another job may be creating this cache. More details: Cache already exists. Scope: refs/heads/develop, Key: ccache-linux-gcc-test-no-PR-, Version: 15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295

I append the PR ID or "-no-PR" to the key so each PR uses its own cache so they don't interfere with each other. Of course this leads to a huge cache (~100 GB) and to avoid that the cache of older PRs gets evicted because newer PRs have a bunch of timestamped caches, I wanted to use append-timestamp: false

    # ensure the ccache for each PR is separate so they don't interfere with each other
    # fall back to ccache of the vcmi/vcmi repo if no PR-specific ccache is found
    - name: Ccache for PRs
      uses: hendrikmuhs/ccache-action@v1.2
      if: ${{ github.event.number != '' }}
      with:
        key: ${{ matrix.preset }}-PR-${{ github.event.number }}
        restore-keys: |
          ${{ matrix.preset }}-PR-${{ github.event.number }}
          ${{ matrix.preset }}-no-PR
        max-size: "5G" # actual cache takes up less space, at most ~1 GB
        verbose: 2
        append-timestamp: false

    - name: Ccache for everything but PRs
      uses: hendrikmuhs/ccache-action@v1.2
      if: ${{ github.event.number == '' }}
      with:
        key: ${{ matrix.preset }}-no-PR
        restore-keys: |
          ${{ matrix.preset }}-no-PR
        max-size: "5G" # actual cache takes up less space, at most ~1 GB
        verbose: 2
        append-timestamp: false

@hansfn
Copy link
Contributor

hansfn commented Oct 31, 2023

Thanks a lot for replying, @hendrikmuhs. Yes, we limit cache size as much as we can. (Typically 2-300 MB.)

Is it possible that the compiler suite/binary changed due to an update of the base image?

No, we compile using our own Docker image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants