Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix JENKINS-66898] make the cache thread safe #3

Merged
merged 8 commits into from Jul 29, 2022

Conversation

mawinter69
Copy link
Contributor

fix JENKINS-66898

This is the same as jenkinsci/workflow-cps-global-lib-plugin#151

there are several conditions where we have race conditions in
the shared library cache :

  • when multiple builds start at the same time and try to use an expired
    cached shared library. The first one will start deleting the cache
    dir. The second one will see that the cachedir was modified and
    consider it up-to-date while the first is still deleting

  • when multiple builds start at the same time with no cache entry
    available. The first has created the cachedir but not yet the lock
    file, the second will not see the lockfile but the cachedir

  • the background cleaner is about to clean when a new build starts

  • an administrator decides to clear the cache right before a new build
    wants to use the cache library

All these can lead to the situtation that the library for a build is
copied only partially and fails the build

This PR fixes the race conditions by using a ReadWriteLock to ensure nobody reads when a cache entry is created/updated/deleted

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

there are several conditions where we have race conditions in
the shared library cache :

- when multiple builds start at the same time and try to use an expired
  cached shared library. The first one will start deleting the cache
  dir. The second one will see that the cachedir was modified and
  consider it up-to-date while the first is still deleting

- when multiple builds start at the same time with no cache entry
  available. The first has created the cachedir but not yet the lock
  file, the second will not see the lockfile but the cachedir

- the background cleaner is about to clean when a new build starts

- an administrator decides to clear the cache right before a new build
  wants to use the cache library

All these can lead to the situtation that the library for a build is
copied only partially and fails the build

This PR fixes the race conditions by using a ReadWriteLock to ensure nobody reads when a cache entry is created/updated/deleted
@sboardwell
Copy link

This PR looks great. Looking forward to the feature.

add the possibility to force delete the cache dir ignoring any possible
problems this can cause.
@sboardwell
Copy link

Is anything holding this up?

Do not explicitly test the output. In rare cases job 2 starts before job
1, then the outputs are wrong. In any case the builds succeed which they
didn't before the fix
@mawinter69
Copy link
Contributor Author

@dwnusbaum could look into this?

Copy link
Member

@jtnord jtnord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a trivial typo I beleive this is good to go

@@ -65,7 +66,13 @@
@Extension public class LibraryAdder extends ClasspathAdder {

private static final Logger LOGGER = Logger.getLogger(LibraryAdder.class.getName());

private static ConcurrentHashMap<String, ReentrantReadWriteLock> cacheRetrieveLock = new ConcurrentHashMap<>();
Copy link
Member

@jtnord jtnord Jul 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will over time grow unbounded, however I am not expecting a lot of changes in the name here (name, version, trusted, source) so using caffeine or WeakHashMap with synchronization is probably overkill (let alone the potential for premature eviction due to the use of a non interned string as a key).

…CachingConfiguration/help-forceDelete.html

Co-authored-by: James Nord <jtnord@users.noreply.github.com>
@car-roll car-roll added the bug Something isn't working label Jul 29, 2022
@car-roll car-roll requested review from jglick and removed request for jglick July 29, 2022 15:58
@car-roll car-roll enabled auto-merge July 29, 2022 16:01
@car-roll car-roll merged commit 84da9c5 into jenkinsci:master Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
5 participants