Skip to content

Comments

GPU discovery#3055

Merged
awni merged 4 commits intoml-explore:mainfrom
dhiltgen:discovery
Jan 26, 2026
Merged

GPU discovery#3055
awni merged 4 commits intoml-explore:mainfrom
dhiltgen:discovery

Conversation

@dhiltgen
Copy link
Contributor

Proposed changes

Enhanced GPU discovery leveraging existing device_info. In particular for CUDA this tries to use NVML (if present) which provides an overall system view of VRAM consumption compared to cudaMemGetInfo which is scoped just to the current process when calculating consumed memory. This also exposes PCI IDs and UUIDs for devices which can help correlate with other APIs for deduplication or additional queries.

Simple example program

#!/usr/bin/env python3
import mlx.core as mx

def main():
    print("="*70)
    print("MLX GPU Device Discovery Demo")
    print("="*70)

    # Check backends
    print(f"\nCUDA available: {mx.cuda.is_available()}")
    print(f"Metal available: {mx.metal.is_available()}")

    # Device count
    count = mx.device_count()
    print(f"\nTotal GPU devices: {count}")

    if count == 0:
        print("No GPU devices found.")
        return

    # Enumerate devices
    for i in range(count):
        info = mx.device_info(i)
        print(f"\n{'='*70}")
        print(f"Device {i}")
        print(f"{'='*70}")
        for key, value in sorted(info.items()):
            if 'memory' in key and isinstance(value, int):
                print(f"  {key:30s}: {value:15,} bytes ({value/(1024**3):.2f} GB)")
            else:
                print(f"  {key:30s}: {value}")

if __name__ == "__main__":
    main()

Output on a dual GPU linux system

% ./test_device_discovery.py 
======================================================================
MLX GPU Device Discovery Demo
======================================================================

CUDA available: True
Metal available: False

Total GPU devices: 2

======================================================================
Device 0
======================================================================
  architecture                  : sm_120
  compute_capability_major      : 12
  compute_capability_minor      : 0
  device_name                   : NVIDIA RTX PRO 6000 Blackwell Workstation Edition
  free_memory                   : 101,384,192,000 bytes (94.42 GB)
  pci_bus_id                    : 0000:43:00.0
  total_memory                  : 102,641,958,912 bytes (95.59 GB)
  uuid                          : GPU-e0be9369-843f-a56c-2def-efc2848da030

======================================================================
Device 1
======================================================================
  architecture                  : sm_120
  compute_capability_major      : 12
  compute_capability_minor      : 0
  device_name                   : NVIDIA RTX PRO 6000 Blackwell Workstation Edition
  free_memory                   : 101,384,192,000 bytes (94.42 GB)
  pci_bus_id                    : 0000:6f:00.0
  total_memory                  : 102,641,958,912 bytes (95.59 GB)
  uuid                          : GPU-c681a258-86c7-673f-3f2e-77c50fd2ae0c

Output on a Mac

% test_device_discovery.py 
======================================================================
MLX GPU Device Discovery Demo
======================================================================

CUDA available: False
Metal available: True

Total GPU devices: 1

======================================================================
Device 0
======================================================================
  architecture                  : applegpu_g15s
  device_name                   : Apple M3 Max
  max_buffer_length             : 77309411328
  max_recommended_working_set_size: 103079215104
  memory_size                   : 137,438,953,472 bytes (128.00 GB)
  resource_limit                : 499000

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

Enhanced GPU discovery leveraging existing device_info
Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in principle this is very nice but there are a few ways where it deviates from standard MLX practice which may sound pedantic (and I am open to comments) but I think are slightly important.

To begin with I would deal with conditional compilation through CMake rather than ifdefs if anything to keep it the same as the rest of the codebase. Basically device_info.cpp should not exist in /mlx but rather per backend.

Secondly, I think device related functions like set_default_device and is_available so far referred to all devices, meaning GPU and CPU. I think that is nice and we should keep it this way.

A simple way to achieve that is to pass a device to mx.device_count or mx.device_info. Then the info can be implemented in the corresponding ::gpu:: or ::cpu:: namespace the same way as the is_available function.

It's basically great, I am mostly asking to move things around, wdyt?

@zcbenz
Copy link
Collaborator

zcbenz commented Jan 24, 2026

What should we do with mx.metal.device_info, leave it be or deprecate it?

@awni
Copy link
Member

awni commented Jan 24, 2026

Yes I do think we should deprecate mx.metal.device_info and pull it into the high level mx.device_info.

@awni
Copy link
Member

awni commented Jan 24, 2026

This is a really nice addition. I also agree with the above comments from @angeloskath regarding the organization / API. Otherwise looking forward to merging it!

@dhiltgen
Copy link
Contributor Author

Thanks for the pointers! I'll take a refactoring pass to adjust accordingly.

Copy link
Collaborator

@zcbenz zcbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should unify the filenames:

  • backend/cpu/device_info.cpp
  • backend/cuda/device_info.cpp
  • backend/metal/device_info.cpp
  • device_info.h

and probably get rid of the cpu/available.cpp and cuda/cuda.cpp files which only have the implementation of is_available.

Copy link
Collaborator

@zcbenz zcbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very useful API and I'm good with the changes, just a few more nitpickings.

Would still need a few more reviews from other maintainers to merge.


// Get CPU architecture string
std::string get_cpu_architecture() {
#if defined(__aarch64__) || defined(__arm64__)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation returns the architecture that the program was compiled against rather than the real hardware. For example a x64 binary running on arm64 platform through a compatibility layer would report the CPU architecture as x64.

I don't think it is an important thing and not needed to be fixed in this PR, but I think we should exclude it from device_info's result, in case users rely on the behavior which would make it hard to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on a fast-follow to improve this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks great. Thanks @zcbenz for the thorough comments. Let's merge after the tests pass.

@awni awni merged commit a828e76 into ml-explore:main Jan 26, 2026
16 checks passed
@dhiltgen dhiltgen mentioned this pull request Jan 26, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants