-
Notifications
You must be signed in to change notification settings - Fork 64
BETA CUDA interface: separate it from the default interface #931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BETA CUDA interface: separate it from the default interface #931
Conversation
torch::Tensor dummyTensorForCudaInitialization = torch::empty( | ||
{1}, torch::TensorOptions().dtype(torch::kUInt8).device(device_)); | ||
|
||
nppCtx_ = getNppStreamContext(device_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above: moved the dummy CUDA context initialization to the constructor, like how it was already done for the CudaDeviceInterface.
Also, this BETACudaInterface now needs its own NPP context. Previously, it was relying on its CudaDeviceInterface member (now removed).
at::cuda::getCurrentCUDAStream(device_.index()); | ||
|
||
frameOutput.data = convertNV12FrameToRGB( | ||
avFrame, device_, nppCtx_, nvdecStream, preAllocatedOutputTensor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above:
- we now call
convertNV12FrameToRGB()
for color conversion, instead of callingdefaultCudaInterface_->convertAVFrameToFrameOutput
.convertNV12FrameToRGB()
is now common to both interfaces. - previously, the stream synchronization between NVDEC and NPP was done within
defaultCudaInterface_->convertAVFrameToFrameOutput()
. AnddefaultCudaInterface_->convertAVFrameToFrameOutput
had to know that the NVDEC stream was:0
for theCudaDeviceInterface
, or the default stream forBetaCudaDeviceInterface
. Now, each interface explicitly specifies what the NVDEC stream is by passing it down toconvertNV12FrameToRGB
, and this is now where the stream synchronization happens. - On the TODONVDEC P2: I am doing further investigations but my current understanding is that the new BetaCudaDeviceInterface will never need to call
maybeConvertAVFrameToNV12OrRGB24
- which is a GOOD thing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
99% of this file is copy/pasted from what previously was inside of CudaDeviceInterface
.
#include <libavutil/hwcontext_cuda.h> | ||
#include <libavutil/pixdesc.h> | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below: the stuff that was removed from this file was either:
- logic that was specific for the
BetaCudaDeviceInterface
: not needed anymore now that we have removed that dependency - moved into
CUDACommon.cpp
.
// Above we checked that the AVFrame was on GPU, but that's not enough, we | ||
// also need to check that the AVFrame is in AV_PIX_FMT_NV12 format (8 bits), | ||
// because this is what the NPP color conversion routines expect. This SHOULD | ||
// be enforced by our call to maybeConvertAVFrameToNV12OrRGB24() above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below:
Note how we removed the if (avFrame->hw_frames_ctx != nullptr) ...
which was needed because this could be called from the Beta interface, where avFrame->hw_frames_ctx
would be null. Now we don't need to check for that anymore.
: "unknown"), | ||
", but we expected AV_PIX_FMT_NV12. " | ||
"That's unexpected, please report this to the TorchCodec repo."); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below: we would previously do the NVDEC / NPP stream synchronization here. Now, it's done a bit later inside of convertNV12FrameToRGB
. So this portion just becomes about figuring out what the NVDEC
stream is so we can pass it down to convertNV12FrameToRGB
.
torch::Tensor& dst = frameOutput.data; | ||
if (preAllocatedOutputTensor.has_value()) { | ||
dst = preAllocatedOutputTensor.value(); | ||
} else { | ||
dst = allocateEmptyHWCTensor(frameDims, device_); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation above was moved inside of convertNV12FrameToRGB
c23087f
to
68878f2
Compare
|
||
#pragma once | ||
|
||
#include <npp.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This include isn't strictly needed here as CUDACommon.h
includes it.
|
||
#include "src/torchcodec/_core/BetaCudaDeviceInterface.h" | ||
|
||
#include "src/torchcodec/_core/CUDACommon.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This include isn't strictly needed as BetaCudaDeviceInterface.h
includes it.
This PR is based on #930, so to review it, look at
68878f2
(#931)The PR looks scary but it's mostly just moving code around, and deleting / simplifying some parts. I added comments below to guide the review.
Currently in
main
, the BETA CUDA interface holds a default CUDA interface member, and uses it for the color-conversion stepconvertAVFrameToFrameOutput
. This is problematic, because:BetaCudaDeviceInterface
to do some fake dummy initialization of theCudaDeviceInterface
classCudaDeviceInterface
to know that it may be called byBetaCudaDeviceInterface
. This led to a bunch ofif/else
logic that is confusing, and a clear anti-pattern. The interfaces should be independent of each others.This PR resolves this problem by:
BetaCudaDeviceInterface
onCudaDeviceInterface
if/else
logic inCudaDeviceInterface
.CudaDeviceInterface
doesn't need to worry about the fact that it might be called by aBetaCudaDeviceInterface
.CUDACommon.cpp
file.