Skip to content

Conversation

scotts
Copy link
Contributor

@scotts scotts commented Aug 28, 2024

Currently, the code in VideoDecoder.cpp has a lot of #ifdef ENABLE_CUDA directly in functions. This refactor applies the following principles:

  1. The logic in VideoDecoder.cpp should not need #ifdef ENABLE_CUDA inside of it.
  2. CUDA code should be localized to explicit CUDA source files.
  3. We should limit the entry points from generic code into CUDA code through an explicit API. Functions in this API will throw an exception if called from generic code that was not compiled with CUDA support.

Note: this was originally #193, but I abandoned that because #196 temporarily moved GPU support into a feature branch.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 28, 2024
@scotts scotts marked this pull request as draft August 28, 2024 12:58
@scotts scotts marked this pull request as ready for review August 29, 2024 20:00
@scotts scotts requested a review from ahmadsharif1 August 29, 2024 20:33
@ahmadsharif1
Copy link
Contributor

ahmadsharif1 commented Aug 30, 2024

I like the change to the NVTX macro. Even the other code changes are fine, but I don't see what problem is solved by moving the code to a new set of files. That makes reading the code harder because cmd+f doesn't work and you have to do a search to find it. It may also create merge conflicts when we make other changes to the VideoDecoder.cpp file when we are merging this branch back into the main branch. My vote is to avoid tons of code movement on this branch to keep it mergable into the main branch. We can revisit this once this branch is merged back to main.

In other words I don't agree with principle (1) and (2).

Principle (3) I agree with but then the API should be a generic one. I suspect eventually we will land on an API like:

  1. an API to decode to an AVFrame
  2. an API to do memory allocation on- or off-device
  3. an API to color convert from AVFrame to whatever output destination with minimal copying

(2) and (3) probably need to be generic and accept a torch::Device.

(3) could then be a trampoline function to do cuda conversion if needed. That way it's not specific to CUDA anymore -- and we can later on add support for other hardware.

Also it seems a bit weird to have more ifdefs inside cuda-specific files. If a file is cuda specific already, why does it need ifdefs? It seems better to put the ifdefs in the trampoline code so cuda files can just assume that cuda is available and wont even be compiled if cuda is not available. Though that may require some template magic or other way to write the trampoline functions.

This type of API was done by me in an internal diff to improve performance for batch decoding because currently we are doing inefficient copies because we entangle the creation of the output tensor to color conversion. By decoupling memory allocation, decoding and color conversion we can reduce the decode time by half for some frames.

Lastly, Fbcode uses buck TARGETS files that would also need to be updated with this change, but we can worry about those later.

@scotts
Copy link
Contributor Author

scotts commented Sep 3, 2024

@ahmadsharif1, I think you make a good point that we should avoid changes on this branch until we're ready to merge into main. I also think whatever we do to refactor the organization of CUDA code should follow the APIs you proposed - we should wait until we have those APIs.

On how to avoid the ifdefs: I think we can do that by pushing the decision up to compilation and linking. We create libraries that always include CUDA and always don't include CUDA. We decide which libraries to link against at build time. We wouldn't have a trampoline, but we would need a clear API to implement.

Closing, and we'll revisit when we can follow those APIs and we're already working on main.

@scotts scotts closed this Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants