Address some TODONVDEC #960

NicolasHug · 2025-10-14T16:38:23Z

This PR addresses a bunch of TODOs related to the BETA CUDA interface. More will come later. See comments below for details.

NicolasHug · 2025-10-15T12:15:46Z

src/torchcodec/_core/BetaCudaDeviceInterface.cpp

  }

  if (videoParser_) {
-    // TODONVDEC P2: consider caching this? Does DALI do that?


Answer: No, DALI doesn't cache the parser.

NicolasHug · 2025-10-15T12:17:44Z

src/torchcodec/_core/CUDACommon.cpp


 UniqueNppContext getNppStreamContext(const torch::Device& device) {
-  torch::DeviceIndex nonNegativeDeviceIndex = getNonNegativeDeviceIndex(device);
+  int deviceIndex = getDeviceIndex(device);


Going from torch::DeviceIndex to int is a fix. cudaGetDeviceProperties and nppCtx->nCudaDeviceId below both expect int, not torch::DeviceIndex which is int8_t

NicolasHug · 2025-10-15T12:18:33Z

src/torchcodec/_core/CUDACommon.cpp

+        cudaGetDevice(&deviceIndex) == cudaSuccess,
+        "Failed to get current CUDA device.");
+  }
+  return deviceIndex;


Another fix: getNonNegativeDeviceIndex() used to map -1 to 0, which is incorrect. -1 should be mapped to the "current device", which can be arbitrary and set by a context manager from Python.

NicolasHug · 2025-10-15T12:20:07Z

src/torchcodec/_core/NVDECCache.h

-    // but:
-    // - we should consider using std::unordered_map
-    // - we should consider a more sophisticated and potentially less strict
-    // cache key comparison logic


I realized that using an unorderd_map would force us to define our own hash function, so that's really not worth it. The n log(n) complexity of map isn't relevant anyway considering the very small sizes we deal with.

On the potentailly less strict cache logic: I looked into more details at DALI's implementation and they do the same thing, modulo decoder re-configuration, which is tracked in another TODO. So we can remove this one.

NicolasHug · 2025-10-15T12:20:57Z

test/utils.py

    # - before calling add_video_stream(device=device, device_variant=device_variant)
-    #
-    # TODONVDEC P2: Find a less clunky way to test the BETA CUDA interface. It
-    # will ultimately depend on how we want to publicly expose it.


This was addressed by #959 and there is another TODO to remove the legacy use of cuda:0:beta, which will transitively resolve this one.

NicolasHug · 2025-10-15T12:21:17Z

test/utils.py

-    # a duration. CPU decoder fails too!
+    # Those are missing the `duration` field so they fail in approximate mode (on all devices).
+    # TODO: we should address this, see
+    # https://github.com/meta-pytorch/torchcodec/issues/945


This isn't an NVDEC TODO, it's a general TODO.

NicolasHug · 2025-10-15T12:22:31Z

src/torchcodec/_core/BetaCudaDeviceInterface.cpp

-    NVDECCache::getCache(device_.index())
-        .returnDecoder(&videoFormat_, std::move(decoder_));
+    NVDECCache::getCache(device_).returnDecoder(
+        &videoFormat_, std::move(decoder_));


Above and everywhere else in the PR, anything related to "device index" relates to the creation of getDeviceIndex() which is now in CUDACommon.cpp, and replaces the old (buggy!) getNonNegativeDeviceIndex.

NicolasHug · 2025-10-15T12:22:44Z

src/torchcodec/_core/Cache.h

+// Forward declaration of getDeviceIndex which exists in CUDACommon.h
+// This avoids circular dependency between Cache.h and CUDACommon.cpp which also
+// needs to include Cache.h
+int getDeviceIndex(const torch::Device& device);


Not thrilled by this, happy to consider alternatives.

We could:

include Cache.h within CUDACommon.cpp and not bother about the potential circular issue

Have a forward declaration of PerGpuCache within CudaCommon.cpp instead (not sure if that would work since it's a template class).

Forward declarations of templates are possible, but awkward. But if Cache.h needs to know CUDA stuff, then it's not really generic. The fact we have a circular dependency hints to me that we're not quite modeling things correctly. I can think of two alternatives:

We add the ability to convert a torch::Device into a device-specific index to the DeviceInterface. That's still a little awkward, though, as we'd want to tell PerGpuCache the DeviceInterface so it would know where to get the index.

PerGpuCache accepts the device index itself, instead of a torch::Device. That then assumes the device logic does the conversion itself, which I think is reasonable.

I prefer 2, but that might be a bigger refactor than this PR. So I'm good with this for now, and we can file a task for option 2, if you also agree it makes sense.

I think 2 is reasonable, we could align both caches to just take an index as input. Let me follow-up on that

NicolasHug · 2025-10-15T12:25:43Z

src/torchcodec/_core/Cache.h

-  // that case we set the device index to 0. That's used in per-gpu cache
-  // implementation and during initialization of CUDA and FFmpeg contexts
-  // which require non negative indices.
-  deviceIndex = std::max<at::DeviceIndex>(deviceIndex, 0);


Note: this was wrong! -1 should be mapped to the current device, not to 0.

NicolasHug · 2025-10-15T12:27:23Z

src/torchcodec/_core/CudaDeviceInterface.cpp

  // We set the device because we may be called from a different thread than
  // the one that initialized the cuda context.
-  cudaSetDevice(nonNegativeDeviceIndex);
+  cudaSetDevice(deviceIndex);


Here as well: cudaSetDevice expects an int, not an implicitly-cast torch::DeviceIndex

Address some TODOs

756060c

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 14, 2025

NicolasHug added 3 commits October 14, 2025 17:52

remove more

aa382fc

more

4723ecd

Merge branch 'main' of github.com:pytorch/torchcodec into todossss

968ecb8

NicolasHug mentioned this pull request Oct 15, 2025

Refactor CudaDeviceInterface::getCudaContext #956

Merged

NicolasHug added 4 commits October 15, 2025 12:52

Merge branch 'main' of github.com:pytorch/torchcodec into todossss

37c0b0d

WIP

633c4b3

Create common g_cached_npp_ctxs

85d58fb

Avoid circular dep

a24501c

NicolasHug commented Oct 15, 2025

View reviewed changes

NicolasHug marked this pull request as ready for review October 15, 2025 12:28

scotts approved these changes Oct 15, 2025

View reviewed changes

Merge branch 'main' into todossss

18c8080

NicolasHug merged commit 25e42e1 into meta-pytorch:main Oct 15, 2025
23 checks passed

NicolasHug added a commit to NicolasHug/torchcodec that referenced this pull request Oct 27, 2025

Address some TODONVDEC (meta-pytorch#960)

c9c56a2

Address some TODONVDEC #960

Address some TODONVDEC #960

Uh oh!

Conversation

NicolasHug commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NicolasHug commented Oct 14, 2025 •

edited

Loading

NicolasHug Oct 15, 2025 •

edited

Loading