New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry HTTP connections to remote repositories during dependency resolution #4629
Comments
|
My situation is kind same. One of my machines fails 90%, another about 30%, and other two works perfectly fine. I am getting the same error. Each time on different dependency. I tried increase timeouts with Then I got message about downloaded dependency: And immediate I got error: Why I am getting an error about downloaded dependency? The full end of the log: |
|
Problem fixed after increasing RAM in the CI machine. |
|
Unfortunately, resources wasn't a problem. Builds start failing again, no matter how much resources I am giving. Also, between logs I found some fails with Retry HTTP connections would be a solution here. |
|
I'm frequently getting This happens because JitPack builds artifacts on the fly, meaning that dependency resolution will take however long the build takes. The build in question is pretty simple, and took only 23 seconds to complete (see https://jitpack.io/network/bisq/bisq-p2p/-b1528bf3fd-1/build.log), but this was still long enough to cause Gradle to time out as seen above. It's worth noting here that I'm running into this now because I just migrated my project's builds from Maven to Gradle, and Maven never caused these timeout issues. I'd be happy to see the retry solutions discussed above, but perhaps it's worth considering simply increasing Gradle's default UPDATE: I just searched around a bit and saw #3370 and #3371, in which the
The original intent of these properties may have been for testing, but in actual fact they're useful in the real world for the reasons detailed above. I guess I'll use them anyway, but will make sure to feel bad about it now that they're "internal" ;) |
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-* component builds to fail when resolving dependencies built on the fly via JitPack, e.g.: https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525. Solution: Increase timeout values to 120 seconds, which should be more that sufficient. See: - gradle/gradle#3370 - gradle/gradle#3371 - gradle/gradle#4629
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-* component builds to fail when resolving dependencies built on the fly via JitPack, e.g.: https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525. Solution: Increase timeout values to 120 seconds, which should be more that sufficient. See: - gradle/gradle#3370 - gradle/gradle#3371 - gradle/gradle#4629
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-* component builds to fail when resolving dependencies built on the fly via JitPack, e.g.: https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525. Solution: Increase timeout values to 120 seconds, which should be more that sufficient. See: - gradle/gradle#3370 - gradle/gradle#3371 - gradle/gradle#4629
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-* component builds to fail when resolving dependencies built on the fly via JitPack, e.g.: https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525. Solution: Increase timeout values to 120 seconds, which should be more that sufficient. See: - gradle/gradle#3370 - gradle/gradle#3371 - gradle/gradle#4629
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-* component builds to fail when resolving dependencies built on the fly via JitPack, e.g.: https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525. Solution: Increase timeout values to 120 seconds, which should be more that sufficient. See: - gradle/gradle#3370 - gradle/gradle#3371 - gradle/gradle#4629
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-* component builds to fail when resolving dependencies built on the fly via JitPack, e.g.: https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525. Solution: Increase timeout values to 120 seconds, which should be more that sufficient. See: - gradle/gradle#3370 - gradle/gradle#3371 - gradle/gradle#4629
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-* component builds to fail when resolving dependencies built on the fly via JitPack, e.g.: https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525. Solution: Increase timeout values to 120 seconds, which should be more that sufficient. See: - gradle/gradle#3370 - gradle/gradle#3371 - gradle/gradle#4629
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-* component builds to fail when resolving dependencies built on the fly via JitPack, e.g.: https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525. Solution: Increase timeout values to 120 seconds, which should be more that sufficient. See: - gradle/gradle#3370 - gradle/gradle#3371 - gradle/gradle#4629
|
Is it possible to get any traction on this or for gradle devs to provide a work-around? We have a similar situation to those above, and while we also fiddle with the connection/socket knobs, we need something like a proper retry mechanism around dependency resolution. Having something like failsafe built in, or at least copying the semantics, would be a huge step forward IMO. |
|
We are using gradle 4.7 with a local maven repository proxy (set up in Nexus 3) of the corporate repository located on the other continent. We are running into timeout problems with this setup - the scenario seems to be: gradle asks proxy for data, data is not yet fetched and cached inside the proxy so proxy starts fetching the data from the repository on the other continent, gradle waits for proxy answer while proxy is fetching the data, gradle time outs as it gets no data from the proxy. Is there a way to fine tune the gradle http timeout and retry values (ideally per one selected repository)? Or does anyone has some ideas for bypassing the problem? We think about forcing regular proxy synchronisation, so it usually have the artifacts cached, so response times are short. |
|
@wszeboreq they do have internal props you can play with as is noted THIS pull request. These are the typical connection property configs, and does not take into account retries or anything of that nature, but it's something. |
|
@cdancy Thank you very much! Early testing suggests that passing '-Dorg.gradle.internal.http.socketTimeout=300000' (socket timeout of 5 minutes) as gradle wrapper ('gradlew') parameter works and has helped. Should we fill a feature request for the ability for configuring various timeout connection properties officially, ideally also per each repository? Also, I have no low level network connection programming knowledge, but I would expect Nexus sending some kind of "keepalive" packets to the client (gradle) while the proxy is fetching the remote artifacts and the client honoring them (not timing out). |
|
@wszeboreq if you follow the code/issue/pull-request trail it would seem they don't want to advertise these properties for various reasons. In either event we use them here anyway and just set them within our projects |
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-* component builds to fail when resolving dependencies built on the fly via JitPack, e.g.: https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525. Solution: Increase timeout values to 120 seconds, which should be more that sufficient. See: - gradle/gradle#3370 - gradle/gradle#3371 - gradle/gradle#4629
|
Expanding off of this: It would also be nice if gradle did this for certain 'obviously retryable' HTTP error codes (e.g. |
|
Another common issue with Nexus repositories with LDAP backing is an LDAP blip causing an intermittent 401 error when fetching dependencies. If there was a retry parameter this would not be an issue. Is there an undocumented retry for the http client that downloads the deps? I like the idea of introducing failsafe around the deps http fetch based on a gradle system property or something similar to that. Configurable retry with exponential backoffs, etc. |
|
Running into a similar problem with Artifactory SaaS. This is a major issue specifically in the case when the Gradle caches are cleaned and you redownload the dependencies. |
|
Definitely could use a retry feature. We are using a nexus repo and pretty regularly we will see dependency downloads stall and cause a failure. Increasing the timeout doesn't help in this case because the download just stops partway though and never resumes. |
|
Undisclosed Gradle Enterprise customer also reports and votes this issue. Reference 1918. |
|
Please add this ASAP. The lack of retries on these types of issues that are typically transient is the largest contributing factor to decreasing the reliability of our CI builds in a large enterprise environment. Increasing the read timeout does help mitigate the issue, but does not resolve the issue. Maybe I need to dig into it a bit more, but I never had this issue with Maven and am now wondering how Maven solved it. |
|
We also see several builds failing each day when trying to resolve dependencies. This blocks all following steps in our continuous deployment pipeline. Manually rescheduling failed builds is a time-consuming step we would like to avoid. |
This commit reworks the strategy used to blacklist repositories. In the case an error occurs when trying to access a remote resource, if the error is not a missing resource, we're going to retry twice before actually blacklisting. Between each try, we're going to wait, and the wait is increasing between each trial exponentially. There are two internal parameters which allow tweaking the behavior: - `org.gradle.internal.repository.max.retries` (default 3) is the number of retries (initial included) - `org.gradle.internal.repository.initial.backoff` is the initial time before retrying, in milliseconds (default 125) Fixes #4629
|
@eskatos awesome and thanks!!! |
|
So 125ms, 250ms, 500ms? That seems way too aggressive for a default backoff? |
|
@nddipiazza agreed. If a network resource is not available chances are it's not going to be damn near immediately after the initial pop :) Maybe start at 500ms or 1sec? IDK. At least it's configurable so I won't complain too much. |
|
A bit of context on why the backoff is chosen like that: blacklisting is implemented in order to avoid builds hanging for too long when there are connectivity issues (and for reproducibility). With longer backoffs, we reduce the ability to interrupt builds early. So this is just a matter of finding the right tradeoff. It's likely people from China will have to tweak it to be higher, while if your connection is mostly stable a smaller backoff would make sense. In any case it's configurable via an internal property, and we'll work in the future in making it more configurable. Said differently, we prefer to fix and discuss the details later :) |
|
@melix we have a VERY heavy (lots and lots of containers each running gradle) CI workload all of which is banging away at an Artifactory server which, more times than we care, does give us network hiccups which don't recover as quick as we'd like. But again ... this is getting something in place, and it's a long time coming, and am just grateful you guys got something out there for devs to use. |
|
yeah by defaulting with 3 retries you might end up indirectly causing nexus/artifactory/etc issues for administrators unexpectedly after their users start making the gradle 5 upgrade. i am surprised not to see this as an option defaulting to retry = 1 |
|
A Gradle 5 upgrade wouldn't cause more issues if you don't have any networking problem. I guess you are saying that if you have a hammered Artifactory server, that is not capable of handling the load, having short backoffs could make it worse. That's a possibility, but it wouldn't change the fact that the server is under load. Actually, whatever the backoff, you would have retries for the same amount of requests. I'm happy to consider a longer default backoff though, as soon as it's reasonable. Also as explained here the default is configurable, but intentionally not supported (internal property) as we might want to configure it differently in the future. |
|
+1 for making the default longer in 5.0. It's odd to have a 125ms backoff combined with a 30s timeout. They should be in the same order of magnitude I think. |
|
Given the feedback here I'm increasing the default backoff to 1s. |
|
Have same problem in 5.5 version. Can somebody help how set timeout more than 30s? |
default is 30 seconds, this uses 10 minutes to avoid things like:
* What went wrong:
A problem occurred configuring root project 'org.fdroid.fdroid'.
> Could not resolve all files for configuration ':classpath'.
> Could not download auto-value.jar (com.google.auto.value:auto-value:1.5.2)
> Could not get resource 'https://repo.maven.apache.org/maven2/com/google/auto/value/auto-value/1.5.2/auto-value-1.5.2.jar'.
> Read timed out
* https://stackoverflow.com/a/49646993
* gradle/gradle#4629 (comment)
* https://github.com/gradle/gradle/pull/3371/files
|
Hi, is the workaround mentioned here still valid? Looking at the answers and commits linked to this issue, I understood that one needs to update gradle.propertiessystemProp.org.gradle.internal.http.connectionTimeout=120000 systemProp.org.gradle.internal.http.socketTimeout=120000 systemProp.org.gradle.internal.repository.max.retries=10 systemProp.org.gradle.internal.repository.initial.backoff=500 The One can verify that this is taken into account by setting this to This should retry on the conditions listed here Could this be added to the user manual and exposed as public properties, or will they stay internal? Final set of properties in |
Should tentative be tentatives in the final properties set? |
Use case
At LinkedIn we run 100K Gradle builds per day and every day we have 20+ failed builds due to flakiness of connection to our Artifactory proxies (Gradle forum post with other details). Example build failure: gist.
While we work on making our infra more reliable it would be great to have retry logic in Gradle. Something similar as you already implemented for communication with distributed cache.
Expected Behavior
Example implementation (up to the design):
Current Behavior
Build fails because a dependency cannot be resolved (example gist). The error message from Gradle shows the URL that failed. If I try this URL directly in my browser, it works. If I re-run the build, it works.
Suggested next steps
Gradle folks, what do you think about the idea and the example implementation?
The text was updated successfully, but these errors were encountered: