Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry HTTP connections to remote repositories during dependency resolution #4629

Closed
mockitoguy opened this issue Mar 7, 2018 · 29 comments
Closed
Assignees
Labels
a:feature A new functionality in:dependency-management DO NOT USE
Milestone

Comments

@mockitoguy
Copy link
Contributor

Use case

At LinkedIn we run 100K Gradle builds per day and every day we have 20+ failed builds due to flakiness of connection to our Artifactory proxies (Gradle forum post with other details). Example build failure: gist.

While we work on making our infra more reliable it would be great to have retry logic in Gradle. Something similar as you already implemented for communication with distributed cache.

Expected Behavior

Example implementation (up to the design):

  • given timeout exception during dependency resolution, retry 2 more times.
  • if it still fails
    • don't attempt further retries during this build
    • in the exception message, write how many times we retried
  • if it works, emit a "lifecycle" message that HTTP request was successful with x retries

Current Behavior

Build fails because a dependency cannot be resolved (example gist). The error message from Gradle shows the URL that failed. If I try this URL directly in my browser, it works. If I re-run the build, it works.

Suggested next steps

Gradle folks, what do you think about the idea and the example implementation?

@bigdaz bigdaz added in:dependency-management DO NOT USE a:feature A new functionality labels Mar 8, 2018
@neworld
Copy link

neworld commented Mar 10, 2018

My situation is kind same. One of my machines fails 90%, another about 30%, and other two works perfectly fine. I am getting the same error. Each time on different dependency. I tried increase timeouts with -Dhttp.socketTimeout=120000 -Dhttp.connectionTimeout=120000, but the whole build fails in under 1 minute. I found the strange thing in the log. First, I am getting JVM status events and some connections are closed:

Emitting JVM memory status event {Maximum: 477626368, Committed: 265814016}
Emitting JVM memory status event {Maximum: 477626368, Committed: 265814016}
Emitting JVM memory status event {Maximum: 477626368, Committed: 265814016}
RMI Scheduler(0): close connection
RMI Scheduler(0): close connection
RMI TCP Connection(1)-127.0.0.1: (port 41678) connection closed
RMI TCP Connection(1)-127.0.0.1: close connection
Emitting JVM memory status event {Maximum: 477626368, Committed: 265814016}
Emitting JVM memory status event {Maximum: 477626368, Committed: 265814016}
Emitting JVM memory status event {Maximum: 477626368, Committed: 265814016}
http-outgoing-26: Close connection
http-outgoing-26: Shutdown connection
Connection discarded

Then I got message about downloaded dependency:

Connection released: [id: 26][route: {s}->https://plugins.gradle.org:443][total kept alive: 9; route allocated: 2 of 20; total allocated: 9 of 20]
Completing Build operation 'Download https://plugins.gradle.org/m2/org/jetbrains/kotlin/kotlin-android-extensions/1.2.30/kotlin-android-extensions-1.2.30.jar'
Downloaded artifact 'kotlin-android-extensions.jar (org.jetbrains.kotlin:kotlin-android-extensions:1.2.30)' from resolver: Gradle Central Plugin Repository
Completing Build operation 'Resolve kotlin-android-extensions.jar (org.jetbrains.kotlin:kotlin-android-extensions:1.2.30)'
Build operation 'Resolve kotlin-android-extensions.jar (org.jetbrains.kotlin:kotlin-android-extensions:1.2.30)' completed

And immediate I got error:

* What went wrong:
A problem occurred configuring root project 'game'.
> Could not resolve all artifacts for configuration ':classpath'.
   > Could not download kotlin-android-extensions.jar (org.jetbrains.kotlin:kotlin-android-extensions:1.2.30)
      > Could not get resource 'https://plugins.gradle.org/m2/org/jetbrains/kotlin/kotlin-android-extensions/1.2.30/kotlin-android-extensions-1.2.30.jar'.
         > Could not GET 'https://plugins.gradle.org/m2/org/jetbrains/kotlin/kotlin-android-extensions/1.2.30/kotlin-android-extensions-1.2.30.jar'.
            > Read timed out

Why I am getting an error about downloaded dependency? The full end of the log:

23:13:21.849 [DEBUG] [org.gradle.process.internal.health.memory.MemoryManager] Emitting JVM memory status event {Maximum: 477626368, Committed: 265814016}
23:13:26.849 [DEBUG] [org.gradle.process.internal.health.memory.MemoryManager] Emitting JVM memory status event {Maximum: 477626368, Committed: 265814016}
23:13:31.849 [DEBUG] [org.gradle.process.internal.health.memory.MemoryManager] Emitting JVM memory status event {Maximum: 477626368, Committed: 265814016}
23:13:34.174 [DEBUG] [sun.rmi.transport.tcp] RMI Scheduler(0): close connection
23:13:34.204 [DEBUG] [sun.rmi.transport.tcp] RMI Scheduler(0): close connection
23:13:34.352 [DEBUG] [sun.rmi.transport.tcp] RMI TCP Connection(1)-127.0.0.1: (port 41678) connection closed
23:13:34.352 [DEBUG] [sun.rmi.transport.tcp] RMI TCP Connection(1)-127.0.0.1: close connection
23:13:36.849 [DEBUG] [org.gradle.process.internal.health.memory.MemoryManager] Emitting JVM memory status event {Maximum: 477626368, Committed: 265814016}
23:13:41.849 [DEBUG] [org.gradle.process.internal.health.memory.MemoryManager] Emitting JVM memory status event {Maximum: 477626368, Committed: 265814016}
23:13:46.849 [DEBUG] [org.gradle.process.internal.health.memory.MemoryManager] Emitting JVM memory status event {Maximum: 477626368, Committed: 265814016}
23:13:47.591 [DEBUG] [org.apache.http.impl.conn.DefaultManagedHttpClientConnection] http-outgoing-26: Close connection
23:13:47.592 [DEBUG] [org.apache.http.impl.conn.DefaultManagedHttpClientConnection] http-outgoing-26: Shutdown connection
23:13:47.592 [DEBUG] [org.apache.http.impl.execchain.MainClientExec] Connection discarded
23:13:47.592 [DEBUG] [org.apache.http.impl.conn.PoolingHttpClientConnectionManager] Connection released: [id: 26][route: {s}->https://plugins.gradle.org:443][total kept alive: 9; route allocated: 2 of 20; total allocated: 9 of 20]
23:13:47.593 [DEBUG] [org.gradle.internal.progress.DefaultBuildOperationExecutor] Completing Build operation 'Download https://plugins.gradle.org/m2/org/jetbrains/kotlin/kotlin-android-extensions/1.2.30/kotlin-android-extensions-1.2.30.jar'
23:13:47.595 [DEBUG] [org.gradle.api.internal.artifacts.ivyservice.ivyresolve.CachingModuleComponentRepository] Downloaded artifact 'kotlin-android-extensions.jar (org.jetbrains.kotlin:kotlin-android-extensions:1.2.30)' from resolver: Gradle Central Plugin Repository
23:13:47.595 [DEBUG] [org.gradle.internal.progress.DefaultBuildOperationExecutor] Completing Build operation 'Resolve kotlin-android-extensions.jar (org.jetbrains.kotlin:kotlin-android-extensions:1.2.30)'
23:13:47.595 [DEBUG] [org.gradle.internal.progress.DefaultBuildOperationExecutor] Build operation 'Resolve kotlin-android-extensions.jar (org.jetbrains.kotlin:kotlin-android-extensions:1.2.30)' completed
23:13:47.595 [DEBUG] [org.gradle.internal.work.DefaultWorkerLeaseService] Worker lease root.1.56.57 completed (1 worker(s) in use)
23:13:47.595 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] main: released lock on root.1.56.57
23:13:47.597 [DEBUG] [org.gradle.internal.progress.DefaultBuildOperationExecutor] Completing Build operation 'Resolve files of :classpath'
23:13:47.597 [DEBUG] [org.gradle.internal.progress.DefaultBuildOperationExecutor] Build operation 'Resolve files of :classpath' completed
23:13:47.599 [DEBUG] [org.gradle.internal.progress.DefaultBuildOperationExecutor] Completing Build operation 'Apply script build.gradle.kts to root project 'game''
23:13:47.599 [DEBUG] [org.gradle.configuration.project.BuildScriptProcessor] Timing: Running the build script took 37.687 secs
23:13:47.600 [DEBUG] [org.gradle.internal.progress.DefaultBuildOperationExecutor] Build operation 'Apply plugin org.gradle.kotlin.dsl.provider.KotlinScriptBasePlugin to root project 'game'' started
23:13:47.600 [DEBUG] [org.gradle.internal.progress.DefaultBuildOperationExecutor] Build operation 'Apply plugin org.gradle.kotlin.dsl.provider.KotlinScriptRootPlugin to root project 'game'' started
23:13:47.601 [DEBUG] [org.gradle.model.internal.registry.DefaultModelRegistry] Project : - Registering model element 'tasks.kotlinDslAccessorsSnapshot' (hidden = false)
23:13:47.602 [DEBUG] [org.gradle.internal.progress.DefaultBuildOperationExecutor] Completing Build operation 'Apply plugin org.gradle.kotlin.dsl.provider.KotlinScriptRootPlugin to root project 'game''
23:13:47.602 [DEBUG] [org.gradle.internal.progress.DefaultBuildOperationExecutor] Build operation 'Apply plugin org.gradle.kotlin.dsl.provider.KotlinScriptRootPlugin to root project 'game'' completed
23:13:47.602 [DEBUG] [org.gradle.model.internal.registry.DefaultModelRegistry] Project : - Registering model element 'tasks.kotlinDslAccessorsReport' (hidden = false)
23:13:47.603 [DEBUG] [org.gradle.internal.progress.DefaultBuildOperationExecutor] Completing Build operation 'Apply plugin org.gradle.kotlin.dsl.provider.KotlinScriptBasePlugin to root project 'game''
23:13:47.603 [DEBUG] [org.gradle.internal.progress.DefaultBuildOperationExecutor] Build operation 'Apply plugin org.gradle.kotlin.dsl.provider.KotlinScriptBasePlugin to root project 'game'' completed
23:13:47.603 [DEBUG] [org.gradle.internal.progress.DefaultBuildOperationExecutor] Completing Build operation 'Configure project :'
23:13:47.603 [DEBUG] [org.gradle.internal.progress.DefaultBuildOperationExecutor] Completing Build operation 'Configure build'
23:13:47.623 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]
23:13:47.624 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] FAILURE: Build failed with an exception.
23:13:47.624 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]
23:13:47.624 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] * What went wrong:
23:13:47.625 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] A problem occurred configuring root project 'game'.
23:13:47.626 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] > Could not resolve all artifacts for configuration ':classpath'.
23:13:47.627 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]    > Could not download kotlin-android-extensions.jar (org.jetbrains.kotlin:kotlin-android-extensions:1.2.30)
23:13:47.628 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]       > Could not get resource 'https://plugins.gradle.org/m2/org/jetbrains/kotlin/kotlin-android-extensions/1.2.30/kotlin-android-extensions-1.2.30.jar'.
23:13:47.628 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]          > Could not GET 'https://plugins.gradle.org/m2/org/jetbrains/kotlin/kotlin-android-extensions/1.2.30/kotlin-android-extensions-1.2.30.jar'.
23:13:47.629 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]             > Read timed out
23:13:47.630 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]
23:13:47.630 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] * Try:
23:13:47.631 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] Run with --stacktrace option to get the stack trace.  Run with --scan to get full insights.
23:13:47.633 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]
23:13:47.633 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] * Get more help at https://help.gradle.org
23:13:47.633 [ERROR] [org.gradle.internal.buildevents.BuildResultLogger]
23:13:47.634 [ERROR] [org.gradle.internal.buildevents.BuildResultLogger] BUILD FAILED in 1m 41s

@neworld
Copy link

neworld commented Mar 10, 2018

Problem fixed after increasing RAM in the CI machine.

@neworld
Copy link

neworld commented Mar 11, 2018

Unfortunately, resources wasn't a problem. Builds start failing again, no matter how much resources I am giving. Also, between logs I found some fails with SSL peer shut down incorrectly. Of course, retry works fine.

Retry HTTP connections would be a solution here.

@cbeams
Copy link
Contributor

cbeams commented Mar 21, 2018

I'm frequently getting Read time out errors from Gradle when resolving a new version of a dependency from JitPack, e.g.:

~/Work/bisq-network/bisq-core[master]
$ gradle clean build

FAILURE: Build failed with an exception.

* What went wrong:
Could not resolve all files for configuration ':compileClasspath'.
> Could not resolve network.bisq:bisq-p2p:-SNAPSHOT.
  Required by:
      project :
   > Could not resolve network.bisq:bisq-p2p:-SNAPSHOT.
      > Unable to load Maven meta-data from https://jitpack.io/network/bisq/bisq-p2p/-SNAPSHOT/maven-metadata.xml.
         > Could not HEAD 'https://jitpack.io/network/bisq/bisq-p2p/-SNAPSHOT/maven-metadata.xml'.
            > Read timed out

This happens because JitPack builds artifacts on the fly, meaning that dependency resolution will take however long the build takes. The build in question is pretty simple, and took only 23 seconds to complete (see https://jitpack.io/network/bisq/bisq-p2p/-b1528bf3fd-1/build.log), but this was still long enough to cause Gradle to time out as seen above.

It's worth noting here that I'm running into this now because I just migrated my project's builds from Maven to Gradle, and Maven never caused these timeout issues.

I'd be happy to see the retry solutions discussed above, but perhaps it's worth considering simply increasing Gradle's default http.connectionTimeout setting?

UPDATE: I just searched around a bit and saw #3370 and #3371, in which the connectionTimeout was in fact raised from a 10s default to a 30s default. So perhaps I somehow just crossed the threshold with my 23 second build above (I'm using Gradle 4.6 and the changes above made it into 4.4 if I understand correctly). What I really don't get, though, is why these properties were moved to the internal namespace. Why wouldn't you want these to be user-editable values? @oehme wrote in #3370 (comment):

Also, the timeout property should be prefixed with org.gradle.internal. It is only there for testing and not meant to be user-editable.

The original intent of these properties may have been for testing, but in actual fact they're useful in the real world for the reasons detailed above. I guess I'll use them anyway, but will make sure to feel bad about it now that they're "internal" ;)

cbeams added a commit to bisq-network/bisq-statsnode that referenced this issue Mar 22, 2018
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-*
component builds to fail when resolving dependencies built on the fly
via JitPack, e.g.:
https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525.

Solution: Increase timeout values to 120 seconds, which should be more
that sufficient.

See:
 - gradle/gradle#3370
 - gradle/gradle#3371
 - gradle/gradle#4629
cbeams added a commit to bisq-network/bisq-monitor-deprecated that referenced this issue Mar 22, 2018
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-*
component builds to fail when resolving dependencies built on the fly
via JitPack, e.g.:
https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525.

Solution: Increase timeout values to 120 seconds, which should be more
that sufficient.

See:
 - gradle/gradle#3370
 - gradle/gradle#3371
 - gradle/gradle#4629
cbeams added a commit to bisq-network/bisq-p2p that referenced this issue Mar 22, 2018
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-*
component builds to fail when resolving dependencies built on the fly
via JitPack, e.g.:
https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525.

Solution: Increase timeout values to 120 seconds, which should be more
that sufficient.

See:
 - gradle/gradle#3370
 - gradle/gradle#3371
 - gradle/gradle#4629
cbeams added a commit to bisq-network/bisq-seednode that referenced this issue Mar 22, 2018
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-*
component builds to fail when resolving dependencies built on the fly
via JitPack, e.g.:
https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525.

Solution: Increase timeout values to 120 seconds, which should be more
that sufficient.

See:
 - gradle/gradle#3370
 - gradle/gradle#3371
 - gradle/gradle#4629
cbeams added a commit to bisq-network/bisq-core that referenced this issue Mar 22, 2018
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-*
component builds to fail when resolving dependencies built on the fly
via JitPack, e.g.:
https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525.

Solution: Increase timeout values to 120 seconds, which should be more
that sufficient.

See:
 - gradle/gradle#3370
 - gradle/gradle#3371
 - gradle/gradle#4629
cbeams added a commit to bisq-network/dao that referenced this issue Mar 22, 2018
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-*
component builds to fail when resolving dependencies built on the fly
via JitPack, e.g.:
https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525.

Solution: Increase timeout values to 120 seconds, which should be more
that sufficient.

See:
 - gradle/gradle#3370
 - gradle/gradle#3371
 - gradle/gradle#4629
cbeams added a commit to cbeams/bisq-pricenode that referenced this issue Mar 22, 2018
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-*
component builds to fail when resolving dependencies built on the fly
via JitPack, e.g.:
https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525.

Solution: Increase timeout values to 120 seconds, which should be more
that sufficient.

See:
 - gradle/gradle#3370
 - gradle/gradle#3371
 - gradle/gradle#4629
cbeams added a commit to bisq-network/bisq that referenced this issue Mar 22, 2018
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-*
component builds to fail when resolving dependencies built on the fly
via JitPack, e.g.:
https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525.

Solution: Increase timeout values to 120 seconds, which should be more
that sufficient.

See:
 - gradle/gradle#3370
 - gradle/gradle#3371
 - gradle/gradle#4629
@cdancy
Copy link

cdancy commented Mar 29, 2018

Is it possible to get any traction on this or for gradle devs to provide a work-around? We have a similar situation to those above, and while we also fiddle with the connection/socket knobs, we need something like a proper retry mechanism around dependency resolution. Having something like failsafe built in, or at least copying the semantics, would be a huge step forward IMO.

@wszeboreq
Copy link

We are using gradle 4.7 with a local maven repository proxy (set up in Nexus 3) of the corporate repository located on the other continent. We are running into timeout problems with this setup - the scenario seems to be: gradle asks proxy for data, data is not yet fetched and cached inside the proxy so proxy starts fetching the data from the repository on the other continent, gradle waits for proxy answer while proxy is fetching the data, gradle time outs as it gets no data from the proxy.

Is there a way to fine tune the gradle http timeout and retry values (ideally per one selected repository)? Or does anyone has some ideas for bypassing the problem? We think about forcing regular proxy synchronisation, so it usually have the artifacts cached, so response times are short.

@cdancy
Copy link

cdancy commented May 30, 2018

@wszeboreq they do have internal props you can play with as is noted THIS pull request. These are the typical connection property configs, and does not take into account retries or anything of that nature, but it's something.

@wszeboreq
Copy link

@cdancy Thank you very much! Early testing suggests that passing '-Dorg.gradle.internal.http.socketTimeout=300000' (socket timeout of 5 minutes) as gradle wrapper ('gradlew') parameter works and has helped.

Should we fill a feature request for the ability for configuring various timeout connection properties officially, ideally also per each repository?

Also, I have no low level network connection programming knowledge, but I would expect Nexus sending some kind of "keepalive" packets to the client (gradle) while the proxy is fetching the remote artifacts and the client honoring them (not timing out).

@cdancy
Copy link

cdancy commented May 30, 2018

@wszeboreq if you follow the code/issue/pull-request trail it would seem they don't want to advertise these properties for various reasons. In either event we use them here anyway and just set them within our projects gradle.properties file as we've hit similar issues and it has helped. Would be nice to get some actual retry logic like I noted in one of my comments above but this will have to suffice for now.

mrosseel pushed a commit to mrosseel/pricenode that referenced this issue Jun 12, 2018
Problem: Gradle's default 30 second HTTP timeouts often cause bisq-*
component builds to fail when resolving dependencies built on the fly
via JitPack, e.g.:
https://travis-ci.org/bisq-network/bisq-core/builds/356777615#L518-L525.

Solution: Increase timeout values to 120 seconds, which should be more
that sufficient.

See:
 - gradle/gradle#3370
 - gradle/gradle#3371
 - gradle/gradle#4629
@Aaron1011
Copy link
Contributor

Aaron1011 commented Jul 3, 2018

Expanding off of this:

It would also be nice if gradle did this for certain 'obviously retryable' HTTP error codes (e.g. 504 Gateway Timeout), and not just client-side timeouts.

@nddipiazza
Copy link

nddipiazza commented Jul 25, 2018

Another common issue with Nexus repositories with LDAP backing is an LDAP blip causing an intermittent 401 error when fetching dependencies. If there was a retry parameter this would not be an issue.

Is there an undocumented retry for the http client that downloads the deps?

I like the idea of introducing failsafe around the deps http fetch based on a gradle system property or something similar to that. Configurable retry with exponential backoffs, etc.

@gkohen
Copy link

gkohen commented Aug 2, 2018

Running into a similar problem with Artifactory SaaS. This is a major issue specifically in the case when the Gradle caches are cleaned and you redownload the dependencies.
In our case these users are not connected to LDAP or SAML (internal users). I'd rather see the retry before timeout (fail quickly and retry). But I'd welcome both.

@designed4device
Copy link

Definitely could use a retry feature. We are using a nexus repo and pretty regularly we will see dependency downloads stall and cause a failure. Increasing the timeout doesn't help in this case because the download just stops partway though and never resumes.

@george-moberly
Copy link

Undisclosed Gradle Enterprise customer also reports and votes this issue. Reference 1918.

@Hazok
Copy link

Hazok commented Sep 20, 2018

Please add this ASAP. The lack of retries on these types of issues that are typically transient is the largest contributing factor to decreasing the reliability of our CI builds in a large enterprise environment.

Increasing the read timeout does help mitigate the issue, but does not resolve the issue.

Maybe I need to dig into it a bit more, but I never had this issue with Maven and am now wondering how Maven solved it.

@acetous
Copy link

acetous commented Sep 25, 2018

We also see several builds failing each day when trying to resolve dependencies. This blocks all following steps in our continuous deployment pipeline. Manually rescheduling failed builds is a time-consuming step we would like to avoid.
Please add retry support or an easy way to configure timeouts for our build environments.

@ljacomet ljacomet added this to the 5.0 RC1 milestone Oct 9, 2018
@melix melix self-assigned this Oct 10, 2018
melix added a commit that referenced this issue Oct 11, 2018
This commit reworks the strategy used to blacklist repositories. In the case
an error occurs when trying to access a remote resource, if the error is not
a missing resource, we're going to retry twice before actually blacklisting.

Between each try, we're going to wait, and the wait is increasing between
each trial exponentially. There are two internal parameters which allow
tweaking the behavior:

- `org.gradle.internal.repository.max.retries` (default 3) is the number
of retries (initial included)
- `org.gradle.internal.repository.initial.backoff` is the initial time
before retrying, in milliseconds (default 125)

Fixes #4629
@cdancy
Copy link

cdancy commented Oct 12, 2018

@eskatos awesome and thanks!!!

@eskatos
Copy link
Member

eskatos commented Oct 12, 2018

@cdancy I didn't do anything, @melix fixed it!

@nddipiazza
Copy link

nddipiazza commented Oct 12, 2018

So 125ms, 250ms, 500ms? That seems way too aggressive for a default backoff?

@cdancy
Copy link

cdancy commented Oct 12, 2018

@nddipiazza agreed. If a network resource is not available chances are it's not going to be damn near immediately after the initial pop :) Maybe start at 500ms or 1sec? IDK. At least it's configurable so I won't complain too much.

@melix
Copy link
Contributor

melix commented Oct 12, 2018

A bit of context on why the backoff is chosen like that: blacklisting is implemented in order to avoid builds hanging for too long when there are connectivity issues (and for reproducibility). With longer backoffs, we reduce the ability to interrupt builds early. So this is just a matter of finding the right tradeoff. It's likely people from China will have to tweak it to be higher, while if your connection is mostly stable a smaller backoff would make sense. In any case it's configurable via an internal property, and we'll work in the future in making it more configurable. Said differently, we prefer to fix and discuss the details later :)

@cdancy
Copy link

cdancy commented Oct 12, 2018

@melix we have a VERY heavy (lots and lots of containers each running gradle) CI workload all of which is banging away at an Artifactory server which, more times than we care, does give us network hiccups which don't recover as quick as we'd like. But again ... this is getting something in place, and it's a long time coming, and am just grateful you guys got something out there for devs to use.

@nddipiazza
Copy link

nddipiazza commented Oct 12, 2018

yeah by defaulting with 3 retries you might end up indirectly causing nexus/artifactory/etc issues for administrators unexpectedly after their users start making the gradle 5 upgrade. i am surprised not to see this as an option defaulting to retry = 1

@melix
Copy link
Contributor

melix commented Oct 12, 2018

A Gradle 5 upgrade wouldn't cause more issues if you don't have any networking problem. I guess you are saying that if you have a hammered Artifactory server, that is not capable of handling the load, having short backoffs could make it worse. That's a possibility, but it wouldn't change the fact that the server is under load. Actually, whatever the backoff, you would have retries for the same amount of requests. I'm happy to consider a longer default backoff though, as soon as it's reasonable. Also as explained here the default is configurable, but intentionally not supported (internal property) as we might want to configure it differently in the future.

@oehme
Copy link
Contributor

oehme commented Oct 12, 2018

+1 for making the default longer in 5.0. It's odd to have a 125ms backoff combined with a 30s timeout. They should be in the same order of magnitude I think.

@melix
Copy link
Contributor

melix commented Oct 13, 2018

Given the feedback here I'm increasing the default backoff to 1s.

@Asgoret
Copy link

Asgoret commented Jul 9, 2019

Have same problem in 5.5 version. Can somebody help how set timeout more than 30s?

eighthave added a commit to f-droid/fdroidserver that referenced this issue Feb 14, 2020
default is 30 seconds, this uses 10 minutes to avoid things like:

* What went wrong:
A problem occurred configuring root project 'org.fdroid.fdroid'.
> Could not resolve all files for configuration ':classpath'.
   > Could not download auto-value.jar (com.google.auto.value:auto-value:1.5.2)
      > Could not get resource 'https://repo.maven.apache.org/maven2/com/google/auto/value/auto-value/1.5.2/auto-value-1.5.2.jar'.
         > Read timed out

* https://stackoverflow.com/a/49646993
* gradle/gradle#4629 (comment)
* https://github.com/gradle/gradle/pull/3371/files
@hverlin
Copy link

hverlin commented Jul 5, 2022

Hi, is the workaround mentioned here still valid?

Looking at the answers and commits linked to this issue, I understood that one needs to update gradle.properties and updates the following properties:

gradle.properties
systemProp.org.gradle.internal.http.connectionTimeout=120000
systemProp.org.gradle.internal.http.socketTimeout=120000
systemProp.org.gradle.internal.repository.max.retries=10
systemProp.org.gradle.internal.repository.initial.backoff=500

The max.retries part was not working for me. Seems that it was renamed to max.tentatives in this commit (see code)

One can verify that this is taken into account by setting this to -1 (systemProp.org.gradle.internal.repository.max.tentatives=-1). Running any gradle command should fail (e.g. gradle as).

This should retry on the conditions listed here


Could this be added to the user manual and exposed as public properties, or will they stay internal?

Final set of properties in gradle.properties:

# Set the socket timeout to 5 minutes
systemProp.org.gradle.internal.http.connectionTimeout=120000
systemProp.org.gradle.internal.http.socketTimeout=120000

# the number of retries (initial included) (default 3)
systemProp.org.gradle.internal.repository.max.tentative=10

# the initial time before retrying, in milliseconds (default 125)
systemProp.org.gradle.internal.repository.initial.backoff=500

@yaohwu
Copy link

yaohwu commented Oct 10, 2022

Hi, is the workaround mentioned here still valid?

Looking at the answers and commits linked to this issue, I understood that one needs to update gradle.properties and updates the following properties:

gradle.properties
The max.retries part was not working for me. Seems that it was renamed to max.tentatives in this commit (see code)

One can verify that this is taken into account by setting this to -1 (systemProp.org.gradle.internal.repository.max.tentatives=-1). Running any gradle command should fail (e.g. gradle as).

This should retry on the conditions listed here

Could this be added to the user manual and exposed as public properties, or will they stay internal?

Final set of properties in gradle.properties:

# Set the socket timeout to 5 minutes
systemProp.org.gradle.internal.http.connectionTimeout=120000
systemProp.org.gradle.internal.http.socketTimeout=120000

# the number of retries (initial included) (default 3)
systemProp.org.gradle.internal.repository.max.tentative=10

# the initial time before retrying, in milliseconds (default 125)
systemProp.org.gradle.internal.repository.initial.backoff=500

Should tentative be tentatives in the final properties set?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:feature A new functionality in:dependency-management DO NOT USE
Projects
None yet
Development

No branches or pull requests