Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-38992] add ability to cache shared libraries #85

Closed
wants to merge 1 commit into from

Conversation

agabrys
Copy link

@agabrys agabrys commented Sep 26, 2019

Hello,
We hit the same problem like Julien Duchesne in #50 . We have a lot of jobs which use our library. Unfortunately, too many requests to our git server ends with 403 permission denied (our server classifies our requests as DOS 馃槈). I analyzed the comments in #50 and tried to prepare a new implementation proposal.

How it works:

  • for cache entries TTL is used to determine when they should be removed/refreshed (it means how long the cache is valid). It works differently than in [JENKINS-38992] Allow caching library versions聽#50 because it uses creation time instead of access time
  • the operation responsible for reading cache are in LibraryRetriever (as was suggested in [JENKINS-38992] Allow caching library versions聽#50). It means it works for libraries added in Jenkins panel and loaded by library step (developers are able to set additionalKey to prevents overwriting cache by different git libraries)
  • file locking mechanism has been introduced to manage the cache properly when a lot of jobs are executed at the same time. It works differently than in [JENKINS-38992] Allow caching library versions聽#50 which fallback to non-cache mode ASAP. In this proposal one slave will download the library, and all other will use it. An example:
    • we have 3 slaves (1, 2, 3), which are executed at the same time
    • 1 and 2 try to read the cache, cache is outdated so they try to update the cache (first read)
    • 2 tries to get the write lock, and get it, it starts updating the cache (write)
    • 1 tries to get the write lock, but it is taken by 2, so they are waiting for the read lock one more time (second read)
    • 3 tries to get the first read lock, but write operation is in progress, so it is waiting (first read)
    • 2 has finished updating the library, loads it and ends the job
    • 1 tries to get the read lock, it is available, so it reads the cache and ends (second read)
    • 3 tries to get the read lock, it is available, so it reads the cache and ends (first read)
  • if slaves are not able to read cache, then they will fallback to non-cache mode
  • it is possible to define which versions should be excluded from caching by using regular expression (as was suggested in [JENKINS-38992] Allow caching library versions聽#50)
  • all parameters related to polling (waiting for locks), cleaning (delete all entries), and cache storage implementation are configurable

Stuff to improve:

  • in [JENKINS-38992] Allow caching library versions聽#50 was proposed to add SCM#getKey to cache entry id, I didn't do it because it is available only for legacySCM. Modern provides only SCMSource#getId which according to documentation does not guarantee to return the same values for the same repositories. Please let me know what should I do? I can add both or add it only for legacySCM
  • we use Kubernetes, and I see that library is always downloaded before the slave container is created. It sounds to me that libraries are always downloaded by code executed on master. If this is true, then I should be able to remove locking based on files and use Java locks. What do you think?

Missing stuff:

  • I didn't write tests and documentation because I don't know if you accept the main idea. If you think it has potential and could be merged, then I'll add all missing stuff (now we are testing it on our live system 馃槈 )
  • the commit message is ugly, but I'll add a nice detailed message if you accept the main idea

Comments:
We are testing it now on environment which for a single build schedules additional 120 jobs. All those jobs need our library. The cache works quite stable for TTL >= 30 seconds, for TTL = 3 seconds it switches a lot of time to non-cache mode. We haven't hit a problem with broken cache yet (it is possible when two threads ask for a write lock exactly at the same time, and next write to the same directory).

Please let me know what should I improve. We really needs this feature, so we have capacity for adjusting the PR to your comments.

Kind regards

@agabrys
Copy link
Author

agabrys commented Oct 10, 2019

Hello,
I did a test and this is no more valid:

we use Kubernetes, and I see that library is always downloaded before the slave container is created. It sounds to me that libraries are always downloaded by code executed on master. If this is true, then I should be able to remove locking based on files and use Java locks. What do you think?

library step may be executed in steps, so it is of course possible to request the library on slave.

One more improvement idea is remove all Jenkins command line parameters and create a new view to manage libraries. Then the cleaner etc. could be configured from the UI (or by using XML configuration file).

@agabrys
Copy link
Author

agabrys commented Dec 20, 2019

Anybody? 馃檹 We are open to introduce all necessary changes 馃檪

@agabrys
Copy link
Author

agabrys commented Jun 22, 2021

Closed in favour of #50.

@agabrys agabrys closed this Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant