New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix JENKINS-66898] make the cache thread safe #3
Conversation
there are several conditions where we have race conditions in the shared library cache : - when multiple builds start at the same time and try to use an expired cached shared library. The first one will start deleting the cache dir. The second one will see that the cachedir was modified and consider it up-to-date while the first is still deleting - when multiple builds start at the same time with no cache entry available. The first has created the cachedir but not yet the lock file, the second will not see the lockfile but the cachedir - the background cleaner is about to clean when a new build starts - an administrator decides to clear the cache right before a new build wants to use the cache library All these can lead to the situtation that the library for a build is copied only partially and fails the build This PR fixes the race conditions by using a ReadWriteLock to ensure nobody reads when a cache entry is created/updated/deleted
src/main/java/org/jenkinsci/plugins/workflow/libs/LibraryCachingConfiguration.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/workflow/libs/LibraryCachingConfiguration.java
Outdated
Show resolved
Hide resolved
without the fix in this PR, the test fails almost always.
This PR looks great. Looking forward to the feature. |
add the possibility to force delete the cache dir ignoring any possible problems this can cause.
…-lib-plugin into JENKINS-66898
Is anything holding this up? |
Do not explicitly test the output. In rare cases job 2 starts before job 1, then the outputs are wrong. In any case the builds succeed which they didn't before the fix
@dwnusbaum could look into this? |
...ources/org/jenkinsci/plugins/workflow/libs/LibraryCachingConfiguration/help-forceDelete.html
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than a trivial typo I beleive this is good to go
@@ -65,7 +66,13 @@ | |||
@Extension public class LibraryAdder extends ClasspathAdder { | |||
|
|||
private static final Logger LOGGER = Logger.getLogger(LibraryAdder.class.getName()); | |||
|
|||
private static ConcurrentHashMap<String, ReentrantReadWriteLock> cacheRetrieveLock = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will over time grow unbounded, however I am not expecting a lot of changes in the name here (name, version, trusted, source) so using caffeine or WeakHashMap
with synchronization is probably overkill (let alone the potential for premature eviction due to the use of a non interned string as a key).
…CachingConfiguration/help-forceDelete.html Co-authored-by: James Nord <jtnord@users.noreply.github.com>
fix JENKINS-66898
This is the same as jenkinsci/workflow-cps-global-lib-plugin#151
there are several conditions where we have race conditions in
the shared library cache :
when multiple builds start at the same time and try to use an expired
cached shared library. The first one will start deleting the cache
dir. The second one will see that the cachedir was modified and
consider it up-to-date while the first is still deleting
when multiple builds start at the same time with no cache entry
available. The first has created the cachedir but not yet the lock
file, the second will not see the lockfile but the cachedir
the background cleaner is about to clean when a new build starts
an administrator decides to clear the cache right before a new build
wants to use the cache library
All these can lead to the situtation that the library for a build is
copied only partially and fails the build
This PR fixes the race conditions by using a ReadWriteLock to ensure nobody reads when a cache entry is created/updated/deleted