-
Notifications
You must be signed in to change notification settings - Fork 64
Cuda refactor #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda refactor #197
Conversation
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
I like the change to the NVTX macro. Even the other code changes are fine, but I don't see what problem is solved by moving the code to a new set of files. That makes reading the code harder because cmd+f doesn't work and you have to do a search to find it. It may also create merge conflicts when we make other changes to the VideoDecoder.cpp file when we are merging this branch back into the main branch. My vote is to avoid tons of code movement on this branch to keep it mergable into the main branch. We can revisit this once this branch is merged back to main. In other words I don't agree with principle (1) and (2). Principle (3) I agree with but then the API should be a generic one. I suspect eventually we will land on an API like:
(2) and (3) probably need to be generic and accept a torch::Device. (3) could then be a trampoline function to do cuda conversion if needed. That way it's not specific to CUDA anymore -- and we can later on add support for other hardware. Also it seems a bit weird to have more ifdefs inside cuda-specific files. If a file is cuda specific already, why does it need ifdefs? It seems better to put the ifdefs in the trampoline code so cuda files can just assume that cuda is available and wont even be compiled if cuda is not available. Though that may require some template magic or other way to write the trampoline functions. This type of API was done by me in an internal diff to improve performance for batch decoding because currently we are doing inefficient copies because we entangle the creation of the output tensor to color conversion. By decoupling memory allocation, decoding and color conversion we can reduce the decode time by half for some frames. Lastly, Fbcode uses buck TARGETS files that would also need to be updated with this change, but we can worry about those later. |
@ahmadsharif1, I think you make a good point that we should avoid changes on this branch until we're ready to merge into main. I also think whatever we do to refactor the organization of CUDA code should follow the APIs you proposed - we should wait until we have those APIs. On how to avoid the ifdefs: I think we can do that by pushing the decision up to compilation and linking. We create libraries that always include CUDA and always don't include CUDA. We decide which libraries to link against at build time. We wouldn't have a trampoline, but we would need a clear API to implement. Closing, and we'll revisit when we can follow those APIs and we're already working on main. |
Currently, the code in
VideoDecoder.cpp
has a lot of#ifdef ENABLE_CUDA
directly in functions. This refactor applies the following principles:VideoDecoder.cpp
should not need#ifdef ENABLE_CUDA
inside of it.Note: this was originally #193, but I abandoned that because #196 temporarily moved GPU support into a feature branch.