-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix latest GPU container image tags #667
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doh! though... do we need a separate latest
and latest-gpu
?
Please notice that the failing step on #666 is the With out current configuration, not specifiying an |
btw do you know if we support OSX? I presume yes? |
Partially: we seem to support macOS1 on GitHub; on GitLab, runner platforms are hardcoded, and the Mach-O loader would have a hard time trying to read ELF files. Lines 209 to 213 in a2eedb6
Lines 149 to 151 in a2eedb6
However, who cares? Apple doesn't support CUDA on modern macOS systems, despite having NVIDIA GPU devices. 1 OS X is a thing from the past. 馃槃 |
try { | ||
await exec('cuda-smi'); | ||
} catch (err) { | ||
gpu = false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think of this for 馃崕 support?
could go back to
try { | |
await exec('cuda-smi'); | |
} catch (err) { | |
gpu = false; | |
} | |
gpu = false; |
if you prefer...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is cuda-smi
supposed to do? 馃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's effectively what nvidia-smi
is called on (some?) 馃崕 systems tmux-plugins/tmux-cpu#24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's a third party tool (?)
Should we rely on that to detect CUDA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk, the impression I got was that CUDA can be installed on a mac without an nvidia-smi
binary but with a cuda-smi
binary available. Would be nice to get confirmation though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think that this change fixes #666
The real issue is that our CML AMI seems not to be having nvidia/cuda...
If you see the tests are the same.
- One running inside our CML docker image
- One running directly in the baremetal cloud runner. Its that OS the one which does not know nothing about nvidia-smi
@DavidGOrtega, are you sure that |
Ah! True Gitlab runs using the docker executor no?
|
Exactly! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its fine.
Baremetal will check nvidia-smi and determine gpu and use the proper image
@0x2b3bfa0 before I do the merge one thing that we have to take in mind. Was gpu image still working with non GPU instances? I had some issues before |
@DavidGOrtega, there won't be any issues with GPU images on non-GPU machines unless we apply iterative/terraform-provider-iterative#151 |
We were pushing the
latest
tag twice[1, 2] with the latest GPU and non-GPU images, in parallel. Whichever image got pushed first ended up overriding the other one. 馃檲