Fail to create agents with GPU attached from Machine Configuration in Jenkins settings UI. Instances with guest accelerators do not support live migration. #63

dkozlov · 2019-04-08T05:48:37Z

Hello, it seems that it is possible to create GPU agent by specifying instance template for creating instances. But it is not possible to create GPU agent without specifying instance template. See related issue https://issues.jenkins-ci.org/browse/JENKINS-52708.

As workaround you can use following dkozlov@7b7af84

Could you please disable GPU support in Machine configuration UI or fix it


Provisioning node from config com.google.jenkins.plugins.computeengine.InstanceConfiguration@3bafb6a8 for excess workload of 1 units of label 'jenkins-gpu'

Apr 08, 2019 5:23:23 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud availableNodeCapacity

Found capacity for 99 nodes in cloud 

Apr 08, 2019 5:23:24 AM WARNING com.google.jenkins.plugins.computeengine.ComputeEngineCloud provision

Error provisioning node
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "Instances with guest accelerators do not support live migration.",
    "reason" : "badRequest"
  } ],
  "message" : "Instances with guest accelerators do not support live migration."
}
	at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1067)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
	at com.google.jenkins.plugins.computeengine.client.ComputeClient.insertInstance(ComputeClient.java:374)
	at com.google.jenkins.plugins.computeengine.InstanceConfiguration.provision(InstanceConfiguration.java:319)
	at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.provision(ComputeEngineCloud.java:203)
	at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715)
	at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320)
	at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:62)
	at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809)
	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

The text was updated successfully, but these errors were encountered:

dkozlov · 2019-04-08T05:50:24Z

rachely3n · 2019-04-10T00:32:47Z

Hmm, it seems like your code change would resolve issues. Is there a reason you have not submitted a pull request?

dkozlov · 2019-04-10T22:17:16Z

@rachely3n it is workaround for GCP instances with GPUs because GCP instances without GPUs can migrate to other hardware without downtime.

rachely3n · 2019-04-11T00:45:33Z

I see, so that (terminate for migrate) should only apply for instances with GPU's?

dkozlov · 2019-04-11T06:21:32Z

@rachely3n, yes, (terminate for migrate) should only apply for instances with GPU's.

stephenashank · 2019-09-24T21:23:05Z

Given that this can be accomplished with instance templates, closing.

dkozlov · 2019-09-25T02:39:08Z

@stephenashank, Will GPU support in Machine configuration UI be available only with instance templates? If yes, then will GPU controls in Machine configuration UI other than instance templates will be disabled? #68 (comment)

stephenashank · 2019-09-25T20:42:37Z

I'm revisiting this today, and realize that it is possible still to create instances with GPUs without using instance templates. By checking "preemptible", this also by definition means that it will terminate on host maintenance. The disadvantage of this is that your instance could possibly be terminated even without maintenance. The other workaround is to use an instance template, which provides the desired flexibility but requires you to use multiple interfaces for configuration.

I would prefer not to remove this feature. In my opinion the ideal solution, as mentioned here and in #68, is to change the value of "onHostMaintenance" if GPUs are configured.

stephenashank · 2019-09-25T20:50:32Z

For reference, this issue now tracks the specific work of changing the value of "onHostMaintenance" when the AcceleratorConfiguration is defined in the scheduling() method of InstanceConfiguration.java. This is a small scope to change but because there are workarounds, moving to backlog right now.

verdverm · 2019-12-13T17:09:37Z

What about adding another option for host maintenance policy?

Does it make sense to have better parity with the GCP console options?

rachely3n self-assigned this Apr 10, 2019

rachely3n mentioned this issue Apr 12, 2019

Terminate on migrate for instances with GPU's attached. #68

Closed

stephenashank added the wontfix This will not be worked on label Sep 24, 2019

stephenashank closed this as completed Sep 24, 2019

stephenashank mentioned this issue Sep 24, 2019

Update documentation #145

Open

stephenashank unassigned rachely3n Sep 25, 2019

stephenashank reopened this Sep 25, 2019

stephenashank added backlog Issues that we are not currently addressing good first issue Good for newcomers and removed wontfix This will not be worked on labels Sep 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to create agents with GPU attached from Machine Configuration in Jenkins settings UI. Instances with guest accelerators do not support live migration. #63

Fail to create agents with GPU attached from Machine Configuration in Jenkins settings UI. Instances with guest accelerators do not support live migration. #63

dkozlov commented Apr 8, 2019

dkozlov commented Apr 8, 2019 •

edited

Loading

rachely3n commented Apr 10, 2019

dkozlov commented Apr 10, 2019

rachely3n commented Apr 11, 2019

dkozlov commented Apr 11, 2019

stephenashank commented Sep 24, 2019

dkozlov commented Sep 25, 2019 •

edited

Loading

stephenashank commented Sep 25, 2019

stephenashank commented Sep 25, 2019

verdverm commented Dec 13, 2019

Fail to create agents with GPU attached from Machine Configuration in Jenkins settings UI. Instances with guest accelerators do not support live migration. #63

Fail to create agents with GPU attached from Machine Configuration in Jenkins settings UI. Instances with guest accelerators do not support live migration. #63

Comments

dkozlov commented Apr 8, 2019

dkozlov commented Apr 8, 2019 • edited Loading

rachely3n commented Apr 10, 2019

dkozlov commented Apr 10, 2019

rachely3n commented Apr 11, 2019

dkozlov commented Apr 11, 2019

stephenashank commented Sep 24, 2019

dkozlov commented Sep 25, 2019 • edited Loading

stephenashank commented Sep 25, 2019

stephenashank commented Sep 25, 2019

verdverm commented Dec 13, 2019

dkozlov commented Apr 8, 2019 •

edited

Loading

dkozlov commented Sep 25, 2019 •

edited

Loading