Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up GPU device if available #62

Merged
merged 9 commits into from
Jul 29, 2019
Merged

Set up GPU device if available #62

merged 9 commits into from
Jul 29, 2019

Conversation

ldx
Copy link
Contributor

@ldx ldx commented Jul 26, 2019

Two main changes:

  • Build AMI with a relatively recent NVIDIA driver and toolchain.
  • Set up units for GPU access if an NVIDIA GPU is detected on the host.

I also built a dev AMI, but I have not promoted it to prod yet. Probably we want to pin our excessively large customer base to the current stable itzo release and prod AMI first.

To test it, have server.yml use the milpadev AMI and set the default instance type to e.g. p2.xlarge. Then create a pod using e.g. tensorflow-gpu:

kubectl run tf --image=tensorflow/tensorflow:latest-gpu --overrides='{"apiVersion": "apps/v1", "spec": {"template": {"metadata": {"annotations": {"kubernetes.io/target-runtime":"kiyot"}}, "spec": {"nodeSelector": {"kubernetes.io/role": "milpa-worker"}}}}}' --command=true -- sleep infinity

Once it's running, exec into the pod, and check that tensorflow is able to detect and use the GPU:

python -c "import tensorflow as tf; sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))"

Next step is to have kiyot request GPU(s) when the pod sets a GPU resource limit, and implement our deviceplugin for the kubelet that exposes GPU capabilities.

@ldx ldx requested a review from justnoise July 26, 2019 23:07
Copy link
Contributor

@justnoise justnoise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job getting this working!

@ldx ldx merged commit 088d616 into master Jul 29, 2019
@ldx ldx deleted the vilmos-imagebuild-nvidia branch July 29, 2019 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants