New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Artifact Caching Proxy (ACP) does not cache artifacts from Maven Central #3969
Comments
Started working on this topic with @smerle33 The main "setup" challenge, if we remove the We are working on this ACP setup and we'll update the issue accordingly |
Yes, I've found a setup which work but that will require changes to the settings.xml (due to the relative path Additionnally, I've found that disabling ipv6 resolution decrease the amount of system calls made by Nginx when resolving DNS names. I believe this could help in production on any of the dualstack networks we have. I need more work to formally prove there are improvement though but it's going on a good direction. |
Another interesting performance point: with the current setup, the I initially wanted to introduce the "central caching" as a new feature to avoid breaking the existing setups during the deployments, but as I'm seeing a 2x to 10x performances boost in the dependency resolution on my local ACP test, I believe we should try delegating the fallback mechansim fully to Nginx. |
First step: Update of the ACP helm chart where the Nginx configuration is located - jenkins-infra/helm-charts#1110 The local tests are really promising for performances: I get between 5x to 10x faster |
Next steps: the following PRs have been opened in draft with the new expected |
The performance improvement is compelling, but the (potential) downside is that now these builds are using a different implementation of the fallback mechanism than local builds. As long as the two stay in sync, I think this should be fine. In other words, I believe the performance benefit is worth the (small) risk of diverging from local builds, simply mentioning it for completeness. |
Thanks for the feedback and the confirmation. I believe we'll have to carefully check the maven plugin and the incremental mechanism. Proceeding to the deployment of this new setup and we'll check the results in the next 24h |
This PR is not expected to be merged or review and will stay as a draft as a matter of caution. It is aimed for testing jenkins-infra/helpdesk#3969
…#847 This PR is not expected to be merged or review and will stay as a draft as a matter of caution. It is aimed for testing jenkins-infra/helpdesk#3969 and relies on jenkins-infra/pipeline-library#847
Deployed, messages added (status, ci.jenkins.io homepage, IRC). Email to developer incoming. First tests shows no errors on build of jenkinsci/jenkins-infra-test, I've triggered a bom build to "load" cache (note it is distributed so we need a few builds before hitting a fully cached thing). I've opened jenkins-infra/pipeline-library#847 to be used to print the Maven transfer information if needed to debug (example: jenkinsci/jenkins-infra-test-plugin#120). |
First results on the branch
=> At first superficial sight, caching Maven Central seems beneficial
=> I've triggered multiple builds of BOM and Core to warm up the caches everywhere (BOM has the |
Update: watching the logs of DigitalOcean's ACP instances, removing the
|
Confirmed: @smerle33 is in the ingress access logs and maps to all the 404 requests ✅ |
Update:
|
The
I'll check the result in the same way on the Core builds and git-plugin build later today |
I watched some plugin builds today, and performance was back to the level I remember it being a few months ago. |
All the plugin builds that I watched today were performing very well. Builds where the compile stage took more than 5 minutes were consistently 2 minutes or less. |
I detected one surprise from the artifact caching proxy on Digital Ocean. https://ci.jenkins.io/job/Core/job/jenkins/job/PR-9099/1/ reports on the Java 11 build that an artifact unexpectedly had less content than was expected. I've marked the build as "keep forever" in case you want to investigate further. https://ci.jenkins.io/job/Core/job/jenkins/job/PR-9099/1/pipeline-console/?selected-node=96 is the specific task that shows the message. |
the ACP have receive this connexion closed by jfrog 21#21: *37961 upstream prematurely closed connection while reading upstream as the second build went through https://ci.jenkins.io/job/Core/job/jenkins/job/PR-9099/2/ |
Thanks so much for investigating @smerle33 . I'm closing this as "not planned". |
And reopening because I failed to read that this was the entire issue, not just my report of a transient failure. Apologies for the noise of closing and reopening. |
I believe we can close this particular issue as "Done" under the following assumptions:
=> Without any objection I'll close this in the next hours |
I agree that this issue can be closed. Is this another transient issue? |
Oh, it says |
Looks like a transient issue at first sight:
A few notes: There only 2 lines of error logs for the whole ACP services (6 replicas across the 3 clouds) in the past 24h as per the datadog access log collections:
=> both happened in DigitalOcean and both artefacts are associated to a lot of warning messages such as
which i saw yesterday on certain big artifacts such as Jenkins WARs. Currently checking the metrics to check how is the memory buffer of Nginx behaving: I believe setting http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_max_temp_file_size to |
… gzip with upstream This change is motivated by jenkins-infra/helpdesk#3969 (comment). The goal is to decrease or remove the warning messages: ``` an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/<...> while reading upstream ```` It includes the 2 following elements: - Increasing the proxy buffer size given the size of the files served by ACP. It maps to JFrog Artifactory recommended settings (https://jfrog.com/help/r/artifactory-how-to-enable-tls-within-the-jfrog-platform/artifactory-nginx-configuration). - Note: This will increase memory usage but the ACP metrics shows than there is enough memory to handle this change - Enable gzip for requests with the upstream to decrease the size of in-memory required buffers of responses - Note: This will increase CPU usage but the ACP metrics shows that CPU is doing almost nothing. Signed-off-by: Damien Duportal <damien.duportal@gmail.com>
For info: jenkins-infra/helm-charts#1119 is opened to fine tune the buffers |
Service(s)
Artifact-caching-proxy
Summary
The
mirrorOf
directive in our Artifact Caching Proxy (ACP) configuration in https://github.com/jenkins-infra/jenkins-infra/blob/ac501efc3cee3b613761e3026dd9798ddf827ec4/dist/profile/templates/jenkinscontroller/casc/artifact-caching-proxy.yaml.erb#L14 contains!central
, which means that Central artifacts are downloaded directly from https://repo.maven.apache.org and not via ACP. This introduces a degree of fragility into our builds. While https://repo.maven.apache.org has not gone down completely, its performance fluctuates widely—sometimes it is quite fast, while other times it exhibits latency high enough to triple or quadruple build times, in some cases long enough for builds to time out and fail outright.For local builds that use a persistent
~/.m2/repository
without ACP, the performance of Central is not very important. Building core or a plugin for the first time on a clean system will indeed result in a lot of Central artifacts being downloaded from https://repo.maven.apache.org, which could take a long time if you get unlucky and latency happens to be high, but at least the result will be cached for subsequent builds. Over time, the cache will fill up with commonly-used artifacts and the only Central artifacts downloaded from https://repo.maven.apache.org will be a handful of new releases.In contrast, CI builds do not have a persistent
~/.m2/repository
, so every CI run (which includes multiple builds on different JDK versions and platforms) will download all Central artifacts from https://repo.maven.apache.org each time, exposing us to build fragility caused by performance problems in infrastructure that is out of our control. The design of CI builds not having a persistent~/.m2/repository
is predicated on high-performance artifact downloading, which is in conflict with the use of an uncached third-party server in each build. To resolve this conflict, Central artifacts for CI builds should be cached in ACP, where they can be reliably downloaded with low latency in CI builds without a persistent~/.m2/repository
.Reproduction steps
Run a plugin, core, or BOM build 6 times a day for 6 days and observe the large amount of variance in build times based on the current performance of a third-party service that is out of our control.
The text was updated successfully, but these errors were encountered: