Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve robustness of CD toolchain #3967

Closed
NotMyFault opened this issue Feb 29, 2024 · 3 comments · Fixed by jenkins-infra/repository-permissions-updater#3786
Closed

Improve robustness of CD toolchain #3967

NotMyFault opened this issue Feb 29, 2024 · 3 comments · Fixed by jenkins-infra/repository-permissions-updater#3786
Assignees

Comments

@NotMyFault
Copy link
Member

NotMyFault commented Feb 29, 2024

Service(s)

Incrementals

Summary

Recently, PRU builds fails frequently on trusted.ci with a 503:

16:57:29  2024-02-29 15:57:29.216+0000 [id=1]	INFO	o.c.g.r.c.PlainObjectMetaMethodSite#doInvoke: Create/update the secret MAVEN_USERNAME for jenkinsci/bouncycastle-api-plugin encrypted with key f
16:57:32  Exception in thread "main" java.io.IOException: Server returned HTTP response code: 503 for URL: https://api.github.com/repos/jenkinsci/bouncycastle-api-plugin/actions/secrets/MAVEN_USERNAME
16:57:32  	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1924)
16:57:32  	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)
16:57:32  	at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:250)
16:57:32  	at sun.net.www.protocol.https.HttpsURLConnectionImpl$getInputStream$1.call(Unknown Source)
16:57:32  	at io.jenkins.infra.repository_permissions_updater.GitHubAPI$GitHubImpl.createOrUpdateRepositorySecret(GitHubAPI.groovy:104)
16:57:32  	at jdk.internal.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
16:57:32  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
16:57:32  	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
16:57:32  	at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
16:57:32  	at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSite.invoke(PogoMetaMethodSite.java:167)
16:57:32  	at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:70)
16:57:32  	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:151)
16:57:32  	at io.jenkins.infra.repository_permissions_updater.ArtifactoryPermissionsUpdater$_generateTokens_closure8.doCall(ArtifactoryPermissionsUpdater.groovy:466)
16:57:32  	at jdk.internal.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
16:57:32  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
16:57:32  	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
16:57:32  	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101)
16:57:32  	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
16:57:32  	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263)
16:57:32  	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041)
16:57:32  	at groovy.lang.Closure.call(Closure.java:405)
16:57:32  	at groovy.lang.Closure.call(Closure.java:421)
16:57:32  	at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2330)
16:57:32  	at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2315)
16:57:32  	at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2356)
16:57:32  	at org.codehaus.groovy.runtime.dgm$186.invoke(Unknown Source)
16:57:32  	at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:244)
16:57:32  	at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:53)
16:57:32  	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
16:57:32  	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:115)
16:57:32  	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127)
16:57:32  	at io.jenkins.infra.repository_permissions_updater.ArtifactoryPermissionsUpdater.generateTokens(ArtifactoryPermissionsUpdater.groovy:438)
16:57:32  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
16:57:32  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
16:57:32  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
16:57:32  	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
16:57:32  	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101)
16:57:32  	at org.codehaus.groovy.runtime.callsite.StaticMetaMethodSite$StaticMetaMethodSiteNoUnwrapNoCoerce.invoke(StaticMetaMethodSite.java:149)
16:57:32  	at org.codehaus.groovy.runtime.callsite.StaticMetaMethodSite.callStatic(StaticMetaMethodSite.java:100)
16:57:32  	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallStatic(CallSiteArray.java:55)
16:57:32  	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:196)
16:57:32  	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:208)
16:57:32  	at io.jenkins.infra.repository_permissions_updater.ArtifactoryPermissionsUpdater.main(ArtifactoryPermissionsUpdater.groovy:504)

The repository is interchangeable, but the 503 causes the entire (!!) pipeline to fail.

I propose to catch the 503 and continue the pipeline to improve the robustness. Otherwise, no CD tokens are refreshed, and a single failure is causing a CD outage for 4h, implying the next build is green again.

ref #3966 and #3962

Reproduction steps

No response

@NotMyFault NotMyFault added the triage Incoming issues that need review label Feb 29, 2024
@lemeurherve
Copy link
Member

lemeurherve commented Feb 29, 2024

That would be nice indeed.

I'm currently preparing something for #2843 (comment) to publish job states on reports.jenkins.io, these errors could be reported there if needed.

@timja
Copy link
Member

timja commented Mar 1, 2024

So the retrying is 'working' as in its retrying but I think its getting rate limited.

If I search retrying this is the 'heatmap' I get:

image

i.e. it gets to a certain point of the build and then it just keeps failing on each one.

So at least this should handle the random 503s we get sometimes, but the 403s need more work.

Looking at the Jenkinsfile its using a bot user token.

This should be changed to a GitHub app, but the code will need some refactoring to use GitHub apps and to refresh the token appropriately.

@timja
Copy link
Member

timja commented Mar 1, 2024

Might be ok, lets keep an eye on it

@timja timja closed this as completed Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants