GPU discovery by dhiltgen · Pull Request #3055 · ml-explore/mlx

dhiltgen · 2026-01-24T00:37:32Z

Proposed changes

Enhanced GPU discovery leveraging existing device_info. In particular for CUDA this tries to use NVML (if present) which provides an overall system view of VRAM consumption compared to cudaMemGetInfo which is scoped just to the current process when calculating consumed memory. This also exposes PCI IDs and UUIDs for devices which can help correlate with other APIs for deduplication or additional queries.

Simple example program

#!/usr/bin/env python3
import mlx.core as mx

def main():
    print("="*70)
    print("MLX GPU Device Discovery Demo")
    print("="*70)

    # Check backends
    print(f"\nCUDA available: {mx.cuda.is_available()}")
    print(f"Metal available: {mx.metal.is_available()}")

    # Device count
    count = mx.device_count()
    print(f"\nTotal GPU devices: {count}")

    if count == 0:
        print("No GPU devices found.")
        return

    # Enumerate devices
    for i in range(count):
        info = mx.device_info(i)
        print(f"\n{'='*70}")
        print(f"Device {i}")
        print(f"{'='*70}")
        for key, value in sorted(info.items()):
            if 'memory' in key and isinstance(value, int):
                print(f"  {key:30s}: {value:15,} bytes ({value/(1024**3):.2f} GB)")
            else:
                print(f"  {key:30s}: {value}")

if __name__ == "__main__":
    main()

Output on a dual GPU linux system

% ./test_device_discovery.py 
======================================================================
MLX GPU Device Discovery Demo
======================================================================

CUDA available: True
Metal available: False

Total GPU devices: 2

======================================================================
Device 0
======================================================================
  architecture                  : sm_120
  compute_capability_major      : 12
  compute_capability_minor      : 0
  device_name                   : NVIDIA RTX PRO 6000 Blackwell Workstation Edition
  free_memory                   : 101,384,192,000 bytes (94.42 GB)
  pci_bus_id                    : 0000:43:00.0
  total_memory                  : 102,641,958,912 bytes (95.59 GB)
  uuid                          : GPU-e0be9369-843f-a56c-2def-efc2848da030

======================================================================
Device 1
======================================================================
  architecture                  : sm_120
  compute_capability_major      : 12
  compute_capability_minor      : 0
  device_name                   : NVIDIA RTX PRO 6000 Blackwell Workstation Edition
  free_memory                   : 101,384,192,000 bytes (94.42 GB)
  pci_bus_id                    : 0000:6f:00.0
  total_memory                  : 102,641,958,912 bytes (95.59 GB)
  uuid                          : GPU-c681a258-86c7-673f-3f2e-77c50fd2ae0c

Output on a Mac

% test_device_discovery.py 
======================================================================
MLX GPU Device Discovery Demo
======================================================================

CUDA available: False
Metal available: True

Total GPU devices: 1

======================================================================
Device 0
======================================================================
  architecture                  : applegpu_g15s
  device_name                   : Apple M3 Max
  max_buffer_length             : 77309411328
  max_recommended_working_set_size: 103079215104
  memory_size                   : 137,438,953,472 bytes (128.00 GB)
  resource_limit                : 499000

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

Enhanced GPU discovery leveraging existing device_info

angeloskath

I think in principle this is very nice but there are a few ways where it deviates from standard MLX practice which may sound pedantic (and I am open to comments) but I think are slightly important.

To begin with I would deal with conditional compilation through CMake rather than ifdefs if anything to keep it the same as the rest of the codebase. Basically device_info.cpp should not exist in /mlx but rather per backend.

Secondly, I think device related functions like set_default_device and is_available so far referred to all devices, meaning GPU and CPU. I think that is nice and we should keep it this way.

A simple way to achieve that is to pass a device to mx.device_count or mx.device_info. Then the info can be implemented in the corresponding ::gpu:: or ::cpu:: namespace the same way as the is_available function.

It's basically great, I am mostly asking to move things around, wdyt?

mlx/backend/cuda/cuda.cpp

zcbenz · 2026-01-24T02:17:31Z

What should we do with mx.metal.device_info, leave it be or deprecate it?

awni · 2026-01-24T14:22:33Z

Yes I do think we should deprecate mx.metal.device_info and pull it into the high level mx.device_info.

awni · 2026-01-24T14:24:19Z

This is a really nice addition. I also agree with the above comments from @angeloskath regarding the organization / API. Otherwise looking forward to merging it!

dhiltgen · 2026-01-24T16:53:00Z

Thanks for the pointers! I'll take a refactoring pass to adjust accordingly.

zcbenz

I think we should unify the filenames:

backend/cpu/device_info.cpp
backend/cuda/device_info.cpp
backend/metal/device_info.cpp
device_info.h

and probably get rid of the cpu/available.cpp and cuda/cuda.cpp files which only have the implementation of is_available.

mlx/backend/cuda/cuda.cpp

mlx/backend/gpu/device_info.cpp

mlx/device.cpp

zcbenz

This is a very useful API and I'm good with the changes, just a few more nitpickings.

Would still need a few more reviews from other maintainers to merge.

mlx/backend/cuda/eval.cpp

mlx/backend/metal/CMakeLists.txt

mlx/backend/cuda/device_info.cpp

mlx/backend/metal/eval.cpp

mlx/backend/no_gpu/eval.cpp

zcbenz · 2026-01-25T23:20:33Z

mlx/backend/cpu/device_info.cpp

+
+// Get CPU architecture string
+std::string get_cpu_architecture() {
+#if defined(__aarch64__) || defined(__arm64__)


This implementation returns the architecture that the program was compiled against rather than the real hardware. For example a x64 binary running on arm64 platform through a compatibility layer would report the CPU architecture as x64.

I don't think it is an important thing and not needed to be fixed in this PR, but I think we should exclude it from device_info's result, in case users rely on the behavior which would make it hard to change.

I'll work on a fast-follow to improve this.

angeloskath

I think it looks great. Thanks @zcbenz for the thorough comments. Let's merge after the tests pass.

GPU discovery

adba64c

Enhanced GPU discovery leveraging existing device_info

angeloskath reviewed Jan 24, 2026

View reviewed changes

zcbenz reviewed Jan 24, 2026

View reviewed changes

mlx/backend/cuda/cuda.cpp Outdated Show resolved Hide resolved

review comments

680bf32

zcbenz reviewed Jan 25, 2026

View reviewed changes

mlx/backend/cuda/cuda.cpp Outdated Show resolved Hide resolved

mlx/backend/gpu/device_info.cpp Outdated Show resolved Hide resolved

mlx/device.cpp Outdated Show resolved Hide resolved

review comments

675ee55

dhiltgen force-pushed the discovery branch from 7bd0711 to 675ee55 Compare January 25, 2026 18:31

zcbenz approved these changes Jan 25, 2026

View reviewed changes

Comments

5bb55ef

angeloskath force-pushed the discovery branch from 578c634 to 5bb55ef Compare January 26, 2026 07:54

angeloskath approved these changes Jan 26, 2026

View reviewed changes

awni merged commit a828e76 into ml-explore:main Jan 26, 2026
16 checks passed

dhiltgen mentioned this pull request Jan 26, 2026

Improve CPU discovery #3068

Merged

4 tasks

BrewTestBot mentioned this pull request Jan 27, 2026

mlx 0.30.4 Homebrew/homebrew-core#264789

Closed

1 task

dhiltgen mentioned this pull request Jan 27, 2026

MLX-C updates for device_info enhancements ml-explore/mlx-c#93

Merged

Comments

Conversation

dhiltgen commented Jan 24, 2026

Proposed changes

Simple example program

Output on a dual GPU linux system

Output on a Mac

Checklist

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zcbenz commented Jan 24, 2026

Uh oh!

awni commented Jan 24, 2026

Uh oh!

awni commented Jan 24, 2026

Uh oh!

dhiltgen commented Jan 24, 2026

Uh oh!

zcbenz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zcbenz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zcbenz Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

dhiltgen Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

dhiltgen Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants