Conversation
Enhanced GPU discovery leveraging existing device_info
angeloskath
left a comment
There was a problem hiding this comment.
I think in principle this is very nice but there are a few ways where it deviates from standard MLX practice which may sound pedantic (and I am open to comments) but I think are slightly important.
To begin with I would deal with conditional compilation through CMake rather than ifdefs if anything to keep it the same as the rest of the codebase. Basically device_info.cpp should not exist in /mlx but rather per backend.
Secondly, I think device related functions like set_default_device and is_available so far referred to all devices, meaning GPU and CPU. I think that is nice and we should keep it this way.
A simple way to achieve that is to pass a device to mx.device_count or mx.device_info. Then the info can be implemented in the corresponding ::gpu:: or ::cpu:: namespace the same way as the is_available function.
It's basically great, I am mostly asking to move things around, wdyt?
|
What should we do with |
|
Yes I do think we should deprecate |
|
This is a really nice addition. I also agree with the above comments from @angeloskath regarding the organization / API. Otherwise looking forward to merging it! |
|
Thanks for the pointers! I'll take a refactoring pass to adjust accordingly. |
zcbenz
left a comment
There was a problem hiding this comment.
I think we should unify the filenames:
backend/cpu/device_info.cppbackend/cuda/device_info.cppbackend/metal/device_info.cppdevice_info.h
and probably get rid of the cpu/available.cpp and cuda/cuda.cpp files which only have the implementation of is_available.
zcbenz
left a comment
There was a problem hiding this comment.
This is a very useful API and I'm good with the changes, just a few more nitpickings.
Would still need a few more reviews from other maintainers to merge.
|
|
||
| // Get CPU architecture string | ||
| std::string get_cpu_architecture() { | ||
| #if defined(__aarch64__) || defined(__arm64__) |
There was a problem hiding this comment.
This implementation returns the architecture that the program was compiled against rather than the real hardware. For example a x64 binary running on arm64 platform through a compatibility layer would report the CPU architecture as x64.
I don't think it is an important thing and not needed to be fixed in this PR, but I think we should exclude it from device_info's result, in case users rely on the behavior which would make it hard to change.
There was a problem hiding this comment.
I'll work on a fast-follow to improve this.
578c634 to
5bb55ef
Compare
angeloskath
left a comment
There was a problem hiding this comment.
I think it looks great. Thanks @zcbenz for the thorough comments. Let's merge after the tests pass.
Proposed changes
Enhanced GPU discovery leveraging existing device_info. In particular for CUDA this tries to use NVML (if present) which provides an overall system view of VRAM consumption compared to cudaMemGetInfo which is scoped just to the current process when calculating consumed memory. This also exposes PCI IDs and UUIDs for devices which can help correlate with other APIs for deduplication or additional queries.
Simple example program
Output on a dual GPU linux system
Output on a Mac
Checklist
Put an
xin the boxes that apply.pre-commit run --all-filesto format my code / installed pre-commit prior to committing changes