NVIDIA DALI pipelines into Xarray to facilitate direct loading of data into GPU memory. #9249

negin513 · 2024-07-12T07:38:19Z

negin513
Jul 12, 2024

Is your feature request related to a problem?

I suggest the integration of NVIDIA DALI (Data Loading Library) pipelines into Xarray to enable efficient loading of data directly into GPU memory to avoid CPU-GPU transfer bottlenecks and enhance performance for AI/ML workflows across multiple GPUs.

Describe the solution you'd like

Ideally we can extend the xr.open_dataset() function to accept a new argument, for example dali_pipeline. This argument will allow users to pass a DALI pipeline object directly for Xarray dataset loading.

dali_pipeline = dali.Pipeline(batch_size=N, num_threads=Y, device_id=Z)

ds = xr.open_mfdataset(my_files,  dali_pipeline=dali_pipeline)

Here is an example of NVIDIA DALI numpy reader that can be used with Xarray.

Describe alternatives you've considered

Users manually load data into CPU memory, preprocess it using custom scripts, and then transfer it to GPU memory for ML workflows, which leads to huge latency and overhead in CPU-GPU memory transfers, especially for distributed multi-GPU ML workflows.

As more people are training ML models using with multiple GPUs, this integration will significantly improve their workflows and reduce CPU-GPU memory transfer overhead.

Additional context

I look forward to the community's feedback and am happy to assist with the implementation process.

TomNicholas · 2024-07-12T15:59:07Z

TomNicholas
Jul 12, 2024
Maintainer

This sounds really cool and powerful!

Ideally we can extend the xr.open_dataset() function to accept a new argument, for example dali_pipeline.

I think it's very unlikely that we added a new argument to open_dataset for this, but custom xarray backends can add whatever keyword arguments they want, so couldn't you just add this pipeline feature in a custom backend? Maybe in the kvikio backend?

0 replies

negin513 · 2024-07-15T20:36:20Z

negin513
Jul 15, 2024
Author

Hey @TomNicholas , Yes this should definitely be an argument for a backend rather than an argument for open_dataset. DALI's pipeline is a bit different from kvikio backend (possibly an alternative to it), but I can see both being part of a single backend too. Mostly I have added this issue to gauge community's interest in this and gather some thoughts around usefulness of this feature.
Maybe @weiji14 has explored DALI already for data streaming to GPU? @weiji14 please let me know.

One big constraint here is GDS setup requirements.

7 replies

negin513 Jul 15, 2024
Author

Thanks so much @weiji14 for your response. I like the idea of cupy-xarray as one-stop shop for the GPU engines that use GDS. I have created an issue for this on CuPy-Xarray: xarray-contrib/cupy-xarray#54

negin513 Jul 16, 2024
Author

From what I understand, DALI only uses GDS for the numpy-reader, where the header is parsed on the CPU and data is loaded directly to GPUs.

Now I am wondering how the new Zarr refactoring and codecs implementations in ZarrV3 might facilitate data loading with kvikio for a full GPU-native Zarr reader, but admittedly I don't know the details of Zarr V3 developments.

weiji14 Jul 16, 2024

Now I am wondering how the new Zarr refactoring and codecs implementations in ZarrV3 might facilitate data loading with kvikio for a full GPU-native Zarr reader, but admittedly I don't know the details of Zarr V3 developments.

@dcherian pointed me to this PR at zarr-developers/zarr-python#1967 which seems to be using some ZarrV3 mechanisms (specifically async loading), though I haven't looked too closely.

From what I understand, DALI only uses GDS for the numpy-reader, where the header is parsed on the CPU and data is loaded directly to GPUs.

The experimental kvikIO work he did at xarray-contrib/cupy-xarray#10 also has some hacks on parsing coordinate metadata/headers into CPU, and data direct to GPUs, because xarray did not support GPU-backed coordinates previously (the situation has improved, but still work to do on duck arrays I think). It remains to be seen on whether we want everything (data+metadata) on the GPU, or just the data itself, I'd assume the latter because we probably don't need GPU-operations on metadata (yet).

TomNicholas Jul 16, 2024
Maintainer

xarray did not support GPU-backed coordinates previously

I don't know exactly what the situation with that was, but I strongly suspect that the recent changes I made to Xarray to get virtualizarr.ManifestArray coordinates to work will have helped with this too.

negin513 Jul 16, 2024
Author

Good to know @weiji14 and @TomNicholas . I think at this point, we don't care about metadata on GPUs ... not much ML workflows are training with metadata (and if anything meta data is usually very small), so metadata can safely be loaded on CPU but we can about streaming the data directly to GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA DALI pipelines into Xarray to facilitate direct loading of data into GPU memory. #9249

{{title}}

Replies: 2 comments 7 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

NVIDIA DALI pipelines into Xarray to facilitate direct loading of data into GPU memory. #9249

negin513 Jul 12, 2024

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Replies: 2 comments · 7 replies

TomNicholas Jul 12, 2024 Maintainer

negin513 Jul 15, 2024 Author

negin513 Jul 15, 2024 Author

negin513 Jul 16, 2024 Author

weiji14 Jul 16, 2024

TomNicholas Jul 16, 2024 Maintainer

negin513 Jul 16, 2024 Author

negin513
Jul 12, 2024

Replies: 2 comments 7 replies

TomNicholas
Jul 12, 2024
Maintainer

negin513
Jul 15, 2024
Author

negin513 Jul 15, 2024
Author

negin513 Jul 16, 2024
Author

TomNicholas Jul 16, 2024
Maintainer

negin513 Jul 16, 2024
Author