Replies: 2 comments 7 replies
-
This sounds really cool and powerful!
I think it's very unlikely that we added a new argument to |
Beta Was this translation helpful? Give feedback.
-
Hey @TomNicholas , Yes this should definitely be an argument for a backend rather than an argument for One big constraint here is GDS setup requirements. |
Beta Was this translation helpful? Give feedback.
-
Is your feature request related to a problem?
I suggest the integration of NVIDIA DALI (Data Loading Library) pipelines into Xarray to enable efficient loading of data directly into GPU memory to avoid CPU-GPU transfer bottlenecks and enhance performance for AI/ML workflows across multiple GPUs.
Describe the solution you'd like
Ideally we can extend the
xr.open_dataset()
function to accept a new argument, for exampledali_pipeline
. This argument will allow users to pass a DALI pipeline object directly for Xarray dataset loading.Here is an example of NVIDIA DALI numpy reader that can be used with Xarray.
Describe alternatives you've considered
Users manually load data into CPU memory, preprocess it using custom scripts, and then transfer it to GPU memory for ML workflows, which leads to huge latency and overhead in CPU-GPU memory transfers, especially for distributed multi-GPU ML workflows.
As more people are training ML models using with multiple GPUs, this integration will significantly improve their workflows and reduce CPU-GPU memory transfer overhead.
Additional context
I look forward to the community's feedback and am happy to assist with the implementation process.
Beta Was this translation helpful? Give feedback.
All reactions