[JENKINS-38992] add ability to cache shared libraries #85

agabrys · 2019-09-26T11:26:46Z

Hello,
We hit the same problem like Julien Duchesne in #50 . We have a lot of jobs which use our library. Unfortunately, too many requests to our git server ends with 403 permission denied (our server classifies our requests as DOS 😉). I analyzed the comments in #50 and tried to prepare a new implementation proposal.

How it works:

for cache entries TTL is used to determine when they should be removed/refreshed (it means how long the cache is valid). It works differently than in [JENKINS-38992] Allow caching library versions #50 because it uses creation time instead of access time
the operation responsible for reading cache are in LibraryRetriever (as was suggested in [JENKINS-38992] Allow caching library versions #50). It means it works for libraries added in Jenkins panel and loaded by library step (developers are able to set additionalKey to prevents overwriting cache by different git libraries)
file locking mechanism has been introduced to manage the cache properly when a lot of jobs are executed at the same time. It works differently than in [JENKINS-38992] Allow caching library versions #50 which fallback to non-cache mode ASAP. In this proposal one slave will download the library, and all other will use it. An example:
- we have 3 slaves (1, 2, 3), which are executed at the same time
- 1 and 2 try to read the cache, cache is outdated so they try to update the cache (first read)
- 2 tries to get the write lock, and get it, it starts updating the cache (write)
- 1 tries to get the write lock, but it is taken by 2, so they are waiting for the read lock one more time (second read)
- 3 tries to get the first read lock, but write operation is in progress, so it is waiting (first read)
- 2 has finished updating the library, loads it and ends the job
- 1 tries to get the read lock, it is available, so it reads the cache and ends (second read)
- 3 tries to get the read lock, it is available, so it reads the cache and ends (first read)
if slaves are not able to read cache, then they will fallback to non-cache mode
it is possible to define which versions should be excluded from caching by using regular expression (as was suggested in [JENKINS-38992] Allow caching library versions #50)
all parameters related to polling (waiting for locks), cleaning (delete all entries), and cache storage implementation are configurable

Stuff to improve:

in [JENKINS-38992] Allow caching library versions #50 was proposed to add SCM#getKey to cache entry id, I didn't do it because it is available only for legacySCM. Modern provides only SCMSource#getId which according to documentation does not guarantee to return the same values for the same repositories. Please let me know what should I do? I can add both or add it only for legacySCM
we use Kubernetes, and I see that library is always downloaded before the slave container is created. It sounds to me that libraries are always downloaded by code executed on master. If this is true, then I should be able to remove locking based on files and use Java locks. What do you think?

Missing stuff:

I didn't write tests and documentation because I don't know if you accept the main idea. If you think it has potential and could be merged, then I'll add all missing stuff (now we are testing it on our live system 😉 )
the commit message is ugly, but I'll add a nice detailed message if you accept the main idea

Comments:
We are testing it now on environment which for a single build schedules additional 120 jobs. All those jobs need our library. The cache works quite stable for TTL >= 30 seconds, for TTL = 3 seconds it switches a lot of time to non-cache mode. We haven't hit a problem with broken cache yet (it is possible when two threads ask for a write lock exactly at the same time, and next write to the same directory).

Please let me know what should I improve. We really needs this feature, so we have capacity for adjusting the PR to your comments.

Kind regards

agabrys · 2019-10-10T09:37:57Z

Hello,
I did a test and this is no more valid:

we use Kubernetes, and I see that library is always downloaded before the slave container is created. It sounds to me that libraries are always downloaded by code executed on master. If this is true, then I should be able to remove locking based on files and use Java locks. What do you think?

library step may be executed in steps, so it is of course possible to request the library on slave.

One more improvement idea is remove all Jenkins command line parameters and create a new view to manage libraries. Then the cleaner etc. could be configured from the UI (or by using XML configuration file).

agabrys · 2019-12-20T11:01:38Z

Anybody? 🙏 We are open to introduce all necessary changes 🙂

agabrys · 2021-06-22T10:22:15Z

Closed in favour of #50.

agabrys force-pushed the feature/JENKINS-38992 branch from ae7755d to c32c050 Compare May 13, 2020 19:24

proposal

c5a352d

agabrys force-pushed the feature/JENKINS-38992 branch from c32c050 to c5a352d Compare May 13, 2020 19:31

agabrys closed this Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JENKINS-38992] add ability to cache shared libraries #85

[JENKINS-38992] add ability to cache shared libraries #85

agabrys commented Sep 26, 2019 •

edited

agabrys commented Oct 10, 2019 •

edited

agabrys commented Dec 20, 2019 •

edited

agabrys commented Jun 22, 2021

[JENKINS-38992] add ability to cache shared libraries #85

[JENKINS-38992] add ability to cache shared libraries #85

Conversation

agabrys commented Sep 26, 2019 • edited

agabrys commented Oct 10, 2019 • edited

agabrys commented Dec 20, 2019 • edited

agabrys commented Jun 22, 2021

agabrys commented Sep 26, 2019 •

edited

agabrys commented Oct 10, 2019 •

edited

agabrys commented Dec 20, 2019 •

edited