Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to create agents with GPU attached from Machine Configuration in Jenkins settings UI. Instances with guest accelerators do not support live migration. #63

Open
dkozlov opened this issue Apr 8, 2019 · 10 comments
Labels
backlog Issues that we are not currently addressing good first issue Good for newcomers

Comments

@dkozlov
Copy link

dkozlov commented Apr 8, 2019

Hello, it seems that it is possible to create GPU agent by specifying instance template for creating instances. But it is not possible to create GPU agent without specifying instance template. See related issue https://issues.jenkins-ci.org/browse/JENKINS-52708.

As workaround you can use following dkozlov@7b7af84

Could you please disable GPU support in Machine configuration UI or fix it


Provisioning node from config com.google.jenkins.plugins.computeengine.InstanceConfiguration@3bafb6a8 for excess workload of 1 units of label 'jenkins-gpu'

Apr 08, 2019 5:23:23 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud availableNodeCapacity

Found capacity for 99 nodes in cloud 

Apr 08, 2019 5:23:24 AM WARNING com.google.jenkins.plugins.computeengine.ComputeEngineCloud provision

Error provisioning node
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "Instances with guest accelerators do not support live migration.",
    "reason" : "badRequest"
  } ],
  "message" : "Instances with guest accelerators do not support live migration."
}
	at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1067)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
	at com.google.jenkins.plugins.computeengine.client.ComputeClient.insertInstance(ComputeClient.java:374)
	at com.google.jenkins.plugins.computeengine.InstanceConfiguration.provision(InstanceConfiguration.java:319)
	at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.provision(ComputeEngineCloud.java:203)
	at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715)
	at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320)
	at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:62)
	at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809)
	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

@dkozlov
Copy link
Author

dkozlov commented Apr 8, 2019

Screenshot from 2019-04-08 08-51-06
Screenshot from 2019-04-08 08-34-33

@rachely3n rachely3n self-assigned this Apr 10, 2019
@rachely3n
Copy link
Contributor

Hmm, it seems like your code change would resolve issues. Is there a reason you have not submitted a pull request?

@dkozlov
Copy link
Author

dkozlov commented Apr 10, 2019

@rachely3n it is workaround for GCP instances with GPUs because GCP instances without GPUs can migrate to other hardware without downtime.

@rachely3n
Copy link
Contributor

I see, so that (terminate for migrate) should only apply for instances with GPU's?

@dkozlov
Copy link
Author

dkozlov commented Apr 11, 2019

@rachely3n, yes, (terminate for migrate) should only apply for instances with GPU's.

@stephenashank
Copy link
Contributor

Given that this can be accomplished with instance templates, closing.

@dkozlov
Copy link
Author

dkozlov commented Sep 25, 2019

@stephenashank, Will GPU support in Machine configuration UI be available only with instance templates? If yes, then will GPU controls in Machine configuration UI other than instance templates will be disabled? #68 (comment)

@stephenashank
Copy link
Contributor

I'm revisiting this today, and realize that it is possible still to create instances with GPUs without using instance templates. By checking "preemptible", this also by definition means that it will terminate on host maintenance. The disadvantage of this is that your instance could possibly be terminated even without maintenance. The other workaround is to use an instance template, which provides the desired flexibility but requires you to use multiple interfaces for configuration.

I would prefer not to remove this feature. In my opinion the ideal solution, as mentioned here and in #68, is to change the value of "onHostMaintenance" if GPUs are configured.

@stephenashank stephenashank reopened this Sep 25, 2019
@stephenashank stephenashank added backlog Issues that we are not currently addressing good first issue Good for newcomers and removed wontfix This will not be worked on labels Sep 25, 2019
@stephenashank
Copy link
Contributor

For reference, this issue now tracks the specific work of changing the value of "onHostMaintenance" when the AcceleratorConfiguration is defined in the scheduling() method of InstanceConfiguration.java. This is a small scope to change but because there are workarounds, moving to backlog right now.

@verdverm
Copy link

What about adding another option for host maintenance policy?

Does it make sense to have better parity with the GCP console options?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Issues that we are not currently addressing good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants