Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EC2FleetCloud.updateStatus can cause big delays at jenkins queue #55

Closed
vitgorbunov opened this issue Mar 9, 2019 · 8 comments
Closed

Comments

@vitgorbunov
Copy link

vitgorbunov commented Mar 9, 2019

We have bottleneck at jenkins queue because EC2FleetCloud.updateStatus holds queue lock for a long time. See stacktrace below.
Out fleet has ~100 instances
I see a lot of messages like this

Mar 11, 2019 5:08:08 AM com.amazon.jenkins.ec2fleet.EC2FleetCloud terminateInstance
INFO: Attempting to terminate instance: i-079f2eabb225f6cc3
Mar 11, 2019 5:08:15 AM com.amazon.jenkins.ec2fleet.EC2FleetCloud terminateInstance
INFO: Not terminating i-079f2eabb225f6cc3 because we need a minimum of 100 instances running.

During terminateInstance call it triggers updateStatus call which takes 5-10 sec.
I don't know how often IdleRetentionStrategy called, but sometimes we had job not starting for 20 min with 40 executors available.

Here are similar bugs in other plugins
https://issues.jenkins-ci.org/browse/JENKINS-54988
https://issues.jenkins-ci.org/browse/JENKINS-38815

What I understood from comments to these bugs is that jenkins can trigger this call very often and it should be pretty fast.
Perhaps the solution from https://issues.jenkins-ci.org/browse/JENKINS-38815 can be applied here as well. I mean execute describeInstances by timer and cache result, so isTerminated can just use cached result.

"jenkins.util.Timer [#6]" Id=44 Group=main RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
	at sun.security.ssl.InputRecord.read(InputRecord.java:503)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
	-  locked java.lang.Object@18d580a8
	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
	at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
	-  locked sun.security.ssl.AppInputStream@5dd71dd6
	at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
	at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
	at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
	at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
	at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1285)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1101)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:758)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:732)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:714)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:674)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:656)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:520)
	at com.amazonaws.services.ec2.AmazonEC2Client.doInvoke(AmazonEC2Client.java:19296)
	at com.amazonaws.services.ec2.AmazonEC2Client.invoke(AmazonEC2Client.java:19263)
	at com.amazonaws.services.ec2.AmazonEC2Client.invoke(AmazonEC2Client.java:19252)
	at com.amazonaws.services.ec2.AmazonEC2Client.executeDescribeInstances(AmazonEC2Client.java:9457)
	at com.amazonaws.services.ec2.AmazonEC2Client.describeInstances(AmazonEC2Client.java:9429)
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.isTerminated(EC2FleetCloud.java:396)
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.updateStatus(EC2FleetCloud.java:328)
	-  locked com.amazon.jenkins.ec2fleet.EC2FleetCloud@67d680a7
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.terminateInstance(EC2FleetCloud.java:460)
	-  locked com.amazon.jenkins.ec2fleet.EC2FleetCloud@67d680a7
	at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:59)
	-  locked com.amazon.jenkins.ec2fleet.EC2FleetCloud@67d680a7
	at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:15)
	at hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72)
	at hudson.model.Queue._withLock(Queue.java:1381)
	at hudson.model.Queue.withLock(Queue.java:1258)
	at hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63)
	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

	Number of locked synchronizers = 2
	- java.util.concurrent.locks.ReentrantLock$NonfairSync@4b401c2a
	- java.util.concurrent.ThreadPoolExecutor$Worker@5150cb76
@vitgorbunov
Copy link
Author

Please take a look. If suggested fix seems ok for you - I can submit PR

@vitgorbunov
Copy link
Author

I've updated initial comment with more correct description.
FYI: We applied forked version with the fix I proposed 3 weeks ago and it works perfect so far. I never saw such queue performance on both ec2-spot and ec2-fleet plugins.

@terma
Copy link

terma commented May 9, 2019

Looks like PR for this #59 don't know why it's not auto connected.

@terma
Copy link

terma commented Jun 4, 2019

hi, starting from version 1.2.2 plugin uses batch API to check if EC2 instance terminated, should fix this problem.

@vitgorbunov
Copy link
Author

Thanks, will check it out.

@terma
Copy link

terma commented Sep 19, 2019

hi, you can try new 1.11.2 version which creates near-zero delays for Queue and takes Queue lock only for a short time when add or delete nodes.

@vitgorbunov
Copy link
Author

Hi

Thanks for information! We recently upgraded to 1.10.2 and it works well so far. Nice to know there are more improvements in next version.

@terma
Copy link

terma commented Sep 19, 2019

np, thx for feedback about 1.10.2 )))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants