Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while creating slaves #43

Closed
ktf opened this issue Jun 12, 2014 · 13 comments
Closed

Error while creating slaves #43

ktf opened this issue Jun 12, 2014 · 13 comments

Comments

@ktf
Copy link

ktf commented Jun 12, 2014

Hi,

I've a working mesos + jenkins setup. I can execute mesos jobs just fine most of the time.

However every now and then it looks like something happens in the jenkins plugin and it cannot create new slaves on demand anymore. Looking at the logs I see a few:


Jun 12, 2014 9:50:51 AM hudson.remoting.AbstractByteArrayCommandTransport$1 handle
WARNING: Failed to construct Command
java.io.EOFException
        at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
        at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
        at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:40)
        at hudson.remoting.AbstractByteArrayCommandTransport$1.handle(AbstractByteArrayCommandTransport.java:61)
        at org.jenkinsci.remoting.nio.NioChannelHub$2.run(NioChannelHub.java:529)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:111)
        at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Jun 12, 2014 9:50:51 AM org.jenkinsci.remoting.nio.NioChannelHub run
WARNING: Failed to select
java.nio.channels.ClosedChannelException
        at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:802)
        at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:426)
        at org.jenkinsci.remoting.nio.Closeables$1.close(Closeables.java:20)
        at org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport.closeR(NioChannelHub.java:289)
        at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport$1.call(NioChannelHub.java:226)
        at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport$1.call(NioChannelHub.java:224)
        at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:474)
        at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Jun 12, 2014 9:50:51 AM jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed

and at some point I get:

INFO: Status update: task value: "mesos-jenkins-935484f7-14aa-4715-a4b0-c3b83040586d"
 is in state TASK_RUNNING
java.lang.IllegalStateException: Unknown taskId: value: "mesos-jenkins-935484f7-14aa-4715-a4b0-c3b83040586d"

        at org.jenkinsci.plugins.mesos.JenkinsScheduler.statusUpdate(JenkinsScheduler.java:353)

and the only solution is to restart the jenkins instance. Notice that running mesos-execute by hand still works. Ideas? I'm using 0.18.2 on RHEL6.

@vinodkone
Copy link
Contributor

That exception is thrown by jenkins master when a jenkins slave is removed by the plugin when its idle. The unknown taskid is interesting. More logs around the task would help diagnose this.

@ktf
Copy link
Author

ktf commented Jun 12, 2014

Indeed, I think the task ID is the actual problem. Here is a log with:

grep -B 3 -A 3 mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139 /var/log/jenkins/jenkins.log > jenkins.log

Jun 12, 2014 9:54:41 AM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
INFO: Received offers 7
Jun 12, 2014 9:54:41 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Launching slave computer mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139
Jun 12, 2014 9:54:41 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Sending a request to start jenkins slave mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139
Jun 12, 2014 9:54:41 AM org.jenkinsci.plugins.mesos.JenkinsScheduler requestJenkinsSlave
INFO: Enqueuing jenkins slave request
Jun 12, 2014 9:54:41 AM org.jenkinsci.plugins.mesos.JenkinsScheduler terminateJenkinsSlave
--
INFO: Killing mesos task value: "mesos-jenkins-ddc1e61c-7576-42ed-890f-07a21d36ba15"

Jun 12, 2014 9:54:41 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Launching slave computer mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139
Jun 12, 2014 9:54:41 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Sending a request to start jenkins slave mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139
Jun 12, 2014 9:54:41 AM org.jenkinsci.plugins.mesos.JenkinsScheduler requestJenkinsSlave
INFO: Enqueuing jenkins slave request
Jun 12, 2014 9:54:41 AM org.jenkinsci.plugins.mesos.JenkinsScheduler terminateJenkinsSlave
--
WARNING: Asked to kill unknown mesos task value: "mesos-jenkins-06543a84-96c8-4ca5-b5ab-e158e6573cc2"

Jun 12, 2014 9:54:41 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Launching slave computer mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139
Jun 12, 2014 9:54:41 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Sending a request to start jenkins slave mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139
Jun 12, 2014 9:54:41 AM org.jenkinsci.plugins.mesos.JenkinsScheduler requestJenkinsSlave
INFO: Enqueuing jenkins slave request
Jun 12, 2014 9:54:41 AM hudson.slaves.NodeProvisioner update
--
Jun 12, 2014 9:54:42 AM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
INFO: Offer matched! Creating mesos task
Jun 12, 2014 9:54:42 AM org.jenkinsci.plugins.mesos.JenkinsScheduler createMesosTask
INFO: Launching task mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139 with URI https://YYYY/jenkins/jnlpJars/slave.jar
Jun 12, 2014 9:54:42 AM org.jenkinsci.plugins.mesos.JenkinsScheduler matches
WARNING: Ignoring disk resources from offer
Jun 12, 2014 9:54:42 AM org.jenkinsci.plugins.mesos.JenkinsScheduler matches
--
Jun 12, 2014 9:54:42 AM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
INFO: Offer matched! Creating mesos task
Jun 12, 2014 9:54:42 AM org.jenkinsci.plugins.mesos.JenkinsScheduler createMesosTask
INFO: Launching task mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139 with URI https://YYYY/jenkins/jnlpJars/slave.jar
Jun 12, 2014 9:54:42 AM org.jenkinsci.plugins.mesos.JenkinsScheduler matches
WARNING: Ignoring disk resources from offer
Jun 12, 2014 9:54:42 AM org.jenkinsci.plugins.mesos.JenkinsScheduler matches
--
Jun 12, 2014 9:54:42 AM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
INFO: Offer matched! Creating mesos task
Jun 12, 2014 9:54:42 AM org.jenkinsci.plugins.mesos.JenkinsScheduler createMesosTask
INFO: Launching task mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139 with URI https://YYYY/jenkins/jnlpJars/slave.jar
Jun 12, 2014 9:54:42 AM org.jenkinsci.plugins.mesos.JenkinsScheduler statusUpdate
INFO: Status update: task value: "mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139"
 is in state TASK_LOST
Jun 12, 2014 9:54:42 AM org.jenkinsci.plugins.mesos.JenkinsScheduler supervise
INFO: Active slaves: true | Pending slaves: false | Active tasks: true
Jun 12, 2014 9:54:42 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Finished launching slave computer mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139
Jun 12, 2014 9:54:42 AM org.jenkinsci.plugins.mesos.JenkinsScheduler statusUpdate
INFO: Status update: task value: "mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139"
 is in state TASK_LOST
java.lang.IllegalStateException: Unknown taskId: value: "mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139"

    at org.jenkinsci.plugins.mesos.JenkinsScheduler.statusUpdate(JenkinsScheduler.java:353)
Jun 12, 2014 9:54:42 AM org.jenkinsci.plugins.mesos.JenkinsScheduler$1 run
--
Jun 12, 2014 9:58:00 AM hudson.model.AsyncPeriodicWork$1 run
INFO: Finished PeriodicReincarnation. 0 ms
Jun 12, 2014 9:58:00 AM org.jenkinsci.plugins.mesos.MesosRetentionStrategy check
INFO: Disconnecting offline computer mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139
Jun 12, 2014 9:58:00 AM org.jenkinsci.plugins.mesos.MesosSlave terminate
INFO: Terminating slave mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139
Jun 12, 2014 9:58:00 AM org.jenkinsci.plugins.mesos.MesosRetentionStrategy check
INFO: Disconnecting offline computer mesos-jenkins-cce778ba-befe-451c-8988-a55adb605e8f
Jun 12, 2014 9:58:00 AM org.jenkinsci.plugins.mesos.MesosSlave terminate
--
Jun 12, 2014 9:58:00 AM org.jenkinsci.plugins.mesos.JenkinsScheduler requestJenkinsSlave
INFO: Enqueuing jenkins slave request
Jun 12, 2014 9:58:00 AM org.jenkinsci.plugins.mesos.JenkinsScheduler terminateJenkinsSlave
INFO: Terminating jenkins slave mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139
Jun 12, 2014 9:58:00 AM org.jenkinsci.plugins.mesos.JenkinsScheduler terminateJenkinsSlave
WARNING: Asked to kill unknown mesos task value: "mesos-jenkins-640f87cc-3092-4ccc-a117-0113df8f2139"

Jun 12, 2014 9:58:00 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Launching slave computer mesos-jenkins-b5197e50-0e79-4d9c-a598-e87321cdc13b

The log around the actual error looks like:

Jun 12, 2014 9:55:02 AM org.jenkinsci.plugins.mesos.JenkinsScheduler matches
INFO: Ignoring ports resources from offer
Jun 12, 2014 9:55:02 AM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
INFO: Offer matched! Creating mesos task
Jun 12, 2014 9:55:02 AM org.jenkinsci.plugins.mesos.JenkinsScheduler createMesosTask
INFO: Launching task mesos-jenkins-cce778ba-befe-451c-8988-a55adb605e8f with URI https://YYYY/jenkins/jnlpJars/slave.jar
Jun 12, 2014 9:55:02 AM org.jenkinsci.plugins.mesos.JenkinsScheduler matches
WARNING: Ignoring disk resources from offer
Jun 12, 2014 9:55:02 AM org.jenkinsci.plugins.mesos.JenkinsScheduler matches
INFO: Ignoring ports resources from offer
Jun 12, 2014 9:55:02 AM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
INFO: Offer matched! Creating mesos task
Jun 12, 2014 9:55:02 AM org.jenkinsci.plugins.mesos.JenkinsScheduler createMesosTask
INFO: Launching task mesos-jenkins-cce778ba-befe-451c-8988-a55adb605e8f with URI https://YYYY/jenkins/jnlpJars/slave.jar
Jun 12, 2014 9:55:02 AM org.jenkinsci.plugins.mesos.JenkinsScheduler statusUpdate
INFO: Status update: task value: "mesos-jenkins-cce778ba-befe-451c-8988-a55adb605e8f"
 is in state TASK_LOST
Jun 12, 2014 9:55:02 AM org.jenkinsci.plugins.mesos.JenkinsScheduler supervise
INFO: Active slaves: true | Pending slaves: false | Active tasks: false
Jun 12, 2014 9:55:02 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Finished launching slave computer mesos-jenkins-cce778ba-befe-451c-8988-a55adb605e8f
Jun 12, 2014 9:55:03 AM org.jenkinsci.plugins.mesos.JenkinsScheduler statusUpdate
INFO: Status update: task value: "mesos-jenkins-cce778ba-befe-451c-8988-a55adb605e8f"
 is in state TASK_RUNNING
java.lang.IllegalStateException: Unknown taskId: value: "mesos-jenkins-cce778ba-befe-451c-8988-a55adb605e8f"

        at org.jenkinsci.plugins.mesos.JenkinsScheduler.statusUpdate(JenkinsScheduler.java:353)
Jun 12, 2014 9:55:03 AM org.jenkinsci.plugins.mesos.JenkinsScheduler$1 run
SEVERE: The mesos driver was aborted!
Jun 12, 2014 9:55:07 AM hudson.TcpSlaveAgentListener$ConnectionHandler run
INFO: Accepted connection #408 from /128.142.135.224:57227
Jun 12, 2014 9:55:10 AM org.jenkinsci.plugins.mesos.MesosCloud restartMesos
INFO: Scheduler was down, restarting the scheduler
Jun 12, 2014 9:55:10 AM org.jenkinsci.plugins.mesos.JenkinsScheduler stop
WARNING: Unable to stop Mesos driver:  driver is null.
Jun 12, 2014 9:55:10 AM org.jenkinsci.plugins.mesos.JenkinsScheduler <init>
INFO: JenkinsScheduler instantiated with jenkins https://YYYY/jenkins/ and mesos XXXX:5050
Jun 12, 2014 9:55:10 AM org.jenkinsci.plugins.mesos.MesosCloud provision
INFO: Provisioning Jenkins Slave on Mesos with 1 executors. Remaining excess workload: 0 executors)
Jun 12, 2014 9:55:10 AM hudson.slaves.NodeProvisioner update
INFO: Started provisioning MesosCloud from MesosCloud with 1 executors. Remaining excess workload:0.0
Jun 12, 2014 9:55:10 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher <init>
INFO: Constructing MesosComputerLauncher
Jun 12, 2014 9:55:10 AM org.jenkinsci.plugins.mesos.JenkinsScheduler$1 run
INFO: Initializing the Mesos driver with options
User:
Framework Name: jenkins2
Checkpointing: false

@ktf
Copy link
Author

ktf commented Jun 12, 2014

Hi. I confirm the issue seems to be bound to the "Unknown taskId" message. It happened again and again I had the same message.

@webdev
Copy link

webdev commented Oct 16, 2014

Anything interesting in this thread?

I'm having the same issue


Oct 16, 2014 6:35:17 AM org.jenkinsci.plugins.mesos.MesosSlave <init>
INFO: Constructing Mesos slave
Oct 16, 2014 6:35:27 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Launching slave computer mesos-jenkins-9db7ff76-e6fc-4afe-a850-3a7284307ace
Oct 16, 2014 6:35:27 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
WARNING: Not launching mesos-jenkins-9db7ff76-e6fc-4afe-a850-3a7284307ace because the Mesos Jenkins scheduler is not running
Oct 16, 2014 6:35:27 AM org.jenkinsci.plugins.mesos.MesosSlave terminate
INFO: Terminating slave mesos-jenkins-9db7ff76-e6fc-4afe-a850-3a7284307ace
Oct 16, 2014 6:35:27 AM hudson.slaves.NodeProvisioner update
INFO: MesosCloud provisioningE successfully completed. We have now 2 computer(s)
Oct 16, 2014 6:35:27 AM org.jenkinsci.plugins.mesos.MesosCloud provision
INFO: Provisioning Jenkins Slave on Mesos with 1 executors. Remaining excess workload: 0 executors)
Oct 16, 2014 6:35:27 AM hudson.slaves.NodeProvisioner update
INFO: Started provisioning MesosCloud from MesosCloud with 1 executors. Remaining excess workload:0.0
Oct 16, 2014 6:35:27 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher <init>
INFO: Constructing MesosComputerLauncher
Oct 16, 2014 6:35:27 AM org.jenkinsci.plugins.mesos.JenkinsScheduler terminateJenkinsSlave
INFO: Terminating jenkins slave mesos-jenkins-9db7ff76-e6fc-4afe-a850-3a7284307ace
Oct 16, 2014 6:35:27 AM org.jenkinsci.plugins.mesos.JenkinsScheduler terminateJenkinsSlave
WARNING: Asked to kill unknown mesos task value: "mesos-jenkins-9db7ff76-e6fc-4afe-a850-3a7284307ace"

Oct 16, 2014 6:35:27 AM org.jenkinsci.plugins.mesos.MesosSlave <init>
INFO: Constructing Mesos slave

Oct 16, 2014 6:35:37 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Launching slave computer mesos-jenkins-143d71bc-1be3-4177-a62e-2fc89877d71b
Oct 16, 2014 6:35:37 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
WARNING: Not launching mesos-jenkins-143d71bc-1be3-4177-a62e-2fc89877d71b because the Mesos Jenkins scheduler is not running
Oct 16, 2014 6:35:37 AM org.jenkinsci.plugins.mesos.MesosSlave terminate
INFO: Terminating slave mesos-jenkins-143d71bc-1be3-4177-a62e-2fc89877d71b
Oct 16, 2014 6:35:37 AM hudson.slaves.NodeProvisioner update
INFO: MesosCloud provisioningE successfully completed. We have now 2 computer(s)
Oct 16, 2014 6:35:37 AM org.jenkinsci.plugins.mesos.MesosCloud provision
INFO: Provisioning Jenkins Slave on Mesos with 1 executors. Remaining excess workload: 0 executors)
Oct 16, 2014 6:35:37 AM hudson.slaves.NodeProvisioner update
INFO: Started provisioning MesosCloud from MesosCloud with 1 executors. Remaining excess workload:0.0
Oct 16, 2014 6:35:37 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher <init>
INFO: Constructing MesosComputerLauncher
Oct 16, 2014 6:35:37 AM org.jenkinsci.plugins.mesos.JenkinsScheduler terminateJenkinsSlave
INFO: Terminating jenkins slave mesos-jenkins-143d71bc-1be3-4177-a62e-2fc89877d71b
Oct 16, 2014 6:35:37 AM org.jenkinsci.plugins.mesos.JenkinsScheduler terminateJenkinsSlave
WARNING: Asked to kill unknown mesos task value: "mesos-jenkins-143d71bc-1be3-4177-a62e-2fc89877d71b"

@vinodkone
Copy link
Contributor

Looks like the scheduler is not running. Do you know why? Is it registered with the master?

@webdev
Copy link

webdev commented Oct 16, 2014

Well, Mesos thinks that it is running, but for whatever reason mesos-plugin thinks that it's not?

@geoffroyjabouley
Copy link
Contributor

Hello
i am having similar issue here. I am using Mesos 0.19.1.

It seems that Mesos framework try to create the slave twice, resulting in a crash of the Mesos driver ...
It then fails to create new slaves, and leaves the current running slaves tasks in a TASK_RUNNING state on Mesos cluster.

I have set in bold relevant information.


Oct 29, 2014 10:45:06 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Launching slave computer mesos-jenkins-5051ed7a-6e4b-4a92-9948-08acaa97c4b8
Oct 29, 2014 10:45:06 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Sending a request to start jenkins slave mesos-jenkins-5051ed7a-6e4b-4a92-9948-08acaa97c4b8
Oct 29, 2014 10:45:06 AM org.jenkinsci.plugins.mesos.JenkinsScheduler requestJenkinsSlave
INFO: Enqueuing jenkins slave request
Oct 29, 2014 10:45:06 AM hudson.remoting.AbstractByteArrayCommandTransport$1 handle
WARNING: Failed to construct Command
java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
at java.io.ObjectInputStream.(ObjectInputStream.java:299)
at hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:40)
at hudson.remoting.AbstractByteArrayCommandTransport$1.handle(AbstractByteArrayCommandTransport.java:61)
at org.jenkinsci.remoting.nio.NioChannelHub$2.run(NioChannelHub.java:584)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:111)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Oct 29, 2014 10:45:06 AM org.jenkinsci.plugins.mesos.JenkinsScheduler terminateJenkinsSlave
INFO: Terminating jenkins slave mesos-jenkins-3b6f8e0e-ab9c-46ce-9aa5-09edc6789089
Oct 29, 2014 10:45:06 AM org.jenkinsci.plugins.mesos.JenkinsScheduler terminateJenkinsSlave
INFO: Killing mesos task value: "mesos-jenkins-3b6f8e0e-ab9c-46ce-9aa5-09edc6789089"

Oct 29, 2014 10:45:06 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Launching slave computer mesos-jenkins-5051ed7a-6e4b-4a92-9948-08acaa97c4b8
Oct 29, 2014 10:45:06 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Sending a request to start jenkins slave mesos-jenkins-5051ed7a-6e4b-4a92-9948-08acaa97c4b8
Oct 29, 2014 10:45:06 AM org.jenkinsci.plugins.mesos.JenkinsScheduler requestJenkinsSlave
INFO: Enqueuing jenkins slave request
Oct 29, 2014 10:45:06 AM hudson.slaves.NodeProvisioner update
INFO: MesosCloud provisioningE successfully completed. We have now 4 computer(s)
Oct 29, 2014 10:45:07 AM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
INFO: Received offers 4
Oct 29, 2014 10:45:07 AM org.jenkinsci.plugins.mesos.JenkinsScheduler matches
WARNING: Ignoring disk resources from offer
Oct 29, 2014 10:45:07 AM org.jenkinsci.plugins.mesos.JenkinsScheduler matches
INFO: Ignoring ports resources from offer
Oct 29, 2014 10:45:07 AM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
INFO: Offer matched! Creating mesos task
Oct 29, 2014 10:45:07 AM org.jenkinsci.plugins.mesos.JenkinsScheduler createMesosTask
INFO: Launching task mesos-jenkins-5051ed7a-6e4b-4a92-9948-08acaa97c4b8 with URI http://10.195.30.11:31042/jnlpJars/slave.jar
Oct 29, 2014 10:45:07 AM org.jenkinsci.plugins.mesos.JenkinsScheduler createMesosTask
INFO: Launching in Container Mode:docker:///test/jenkins_slave_base:master
Oct 29, 2014 10:45:07 AM org.jenkinsci.plugins.mesos.JenkinsScheduler matches
WARNING: Ignoring disk resources from offer
Oct 29, 2014 10:45:07 AM org.jenkinsci.plugins.mesos.JenkinsScheduler matches
INFO: Ignoring ports resources from offer
Oct 29, 2014 10:45:07 AM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
INFO: Offer matched! Creating mesos task
Oct 29, 2014 10:45:07 AM org.jenkinsci.plugins.mesos.JenkinsScheduler createMesosTask
INFO: Launching task mesos-jenkins-5051ed7a-6e4b-4a92-9948-08acaa97c4b8 with URI http://10.195.30.11:31042/jnlpJars/slave.jar
Oct 29, 2014 10:45:07 AM org.jenkinsci.plugins.mesos.JenkinsScheduler createMesosTask
INFO: Launching in Container Mode:docker:///test/jenkins_slave_base:master
Oct 29, 2014 10:45:07 AM org.jenkinsci.plugins.mesos.JenkinsScheduler statusUpdate
INFO: Status update: task value: "mesos-jenkins-5051ed7a-6e4b-4a92-9948-08acaa97c4b8"
is in state TASK_LOST

Oct 29, 2014 10:45:07 AM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Finished launching slave computer mesos-jenkins-5051ed7a-6e4b-4a92-9948-08acaa97c4b8
Oct 29, 2014 10:45:08 AM org.jenkinsci.plugins.mesos.JenkinsScheduler statusUpdate
INFO: Status update: task value: "mesos-jenkins-5051ed7a-6e4b-4a92-9948-08acaa97c4b8"
is in state TASK_RUNNING

Exception in thread "Thread-122" java.lang.IllegalStateException: Unknown taskId: value: "mesos-jenkins-5051ed7a-6e4b-4a92-9948-08acaa97c4b8"
at org.jenkinsci.plugins.mesos.JenkinsScheduler.statusUpdate(JenkinsScheduler.java:374)
I1029 10:45:08.580153 88 sched.cpp:755] Aborting framework '20141028-200722-287228682-5050-1188-0008'

Oct 29, 2014 10:45:08 AM org.jenkinsci.plugins.mesos.JenkinsScheduler$1 run
SEVERE: The Mesos driver was aborted! Status code: 3

@maselvaraj
Copy link
Contributor

I also faced this issue and that is because of a race condition on scheduler start-up when there are multiple builds in queue. This pull request should address this issue: #70

@vinodkone
Copy link
Contributor

Thanks @maselvaraj for the PR. I'm not exactly clear what the race is here. Can you add more explanation here or in your PR?

Also, I realized that throwing an IllegalStateException on receiving status update from unknown task id is a bug. I will fix that shortly.

@maselvaraj
Copy link
Contributor

Sure Vinod. I have added more details on the PR. Please let me know if you have any questions

@sedninja
Copy link

sedninja commented Oct 31, 2016

I am seeing the same symptoms as described in this old bug report with the latest version of the jenkins-mesos plugin (0.13.1) with mesos v1.0.1 and jenkins v2.7.4. I don't get the same stack traces but the unknown task id log bits along with ultimately the jenkins framework deregistering from mesos and being unable to create any new slaves for builds.

An interesting addition to the description for me is that if we go into the Jenkins UI and save the global config settings, jenkins will then re-register with mesos and begin doing work again

@damomurf
Copy link
Contributor

I'm having exactly the same issues as @sedninja: Jenkins 1.651.3 Mesos 1.1.0 and 0.13.1 (and also a self-built 0.13.1 with the mesos library dependency increased to 1.1.0).

@chr0n1x
Copy link

chr0n1x commented Jan 19, 2017

bump. Seeing this with version 0.14 of the plugin, mesos 1.0.1 - that's the version of my running cluster and the installed native lib.

@maselvaraj

Specifically:

Jun 12, 2014 9:55:10 AM org.jenkinsci.plugins.mesos.JenkinsScheduler stop
WARNING: Unable to stop Mesos driver:  driver is null.

jeschkies pushed a commit that referenced this issue Nov 1, 2019
…43)

Summary:
* Use only MesosPodRecordRepository, thus drop InMemoryPodRepository
* Incorporate imperative USI interface
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants