-
Notifications
You must be signed in to change notification settings - Fork 64
Add batch decoding support to CUDA #319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ahmadsharif1 . Only minor suggestions from me.
This is not immediately related to this PR, but now that we publicly expose CUDA, we'll want to beef-up our CUDA tests. They're pretty minimal right now. The test utils that I linked-to below will be useful. Let's follow-up with that in a separate PR (happy to help).
rawOutput, output, preAllocatedOutputTensor); | ||
} else if (streamInfo.options.device.type() == torch::kCUDA) { | ||
// TODO: handle pre-allocated output tensor | ||
// TODO: we should fold preAllocatedOutputTensor into RawDecodedOutput. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: move this TODO outside of this else/if block (just on top of it?), because this applies to CPU as well, not just to the CUDA branch. We may also want to open an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
instances of ``VideoDecoder`` in parallel. Use a higher number for multi-threaded | ||
decoding which is best if you are running a single instance of ``VideoDecoder``. | ||
Default: 1. | ||
device (str or torch.device, optional): The device to use for decoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device (str or torch.device, optional): The device to use for decoding. | |
device (str or torch.device, optional): The device to use for decoding. Default: "cpu". |
decoding which is best if you are running a single instance of ``VideoDecoder``. | ||
Default: 1. | ||
device (str or torch.device, optional): The device to use for decoding. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't introduced in this PR, but we might as well fix it here: the .. note::
below should be part of the parameter description of the dimension_order
parameter. Do you mind moving it back up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test/utils.py
Outdated
|
||
|
||
# Asserts that at most percentage of the elements are different by more than abs_tolerance. | ||
def assert_tensor_nearly_equal(frame1, frame2, percentage=0.3, abs_tolerance=20): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit regarding the name: we already have assert_tensor_close
, which semantically convey the same meaning as "nearly equal" to me. So the distinction between these isn't obvious. Maybe assert_tensor_close_on_at_least(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Even better if we can use the same utility function - I suspect that the logic we're doing in this function is quite similar to what torch.testing.assert_close()
is already doing.
The answer also might be that we just eliminate both assert_tensor_close()
and assert_tensor_nearly_equal()
, and just use plain torch.testing.assert_close()
with scenario-specific tolerances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are doing something different here compared to assert_close.
I could use assert_close but the tolerances were quite high. I actually did use it in my first PR:
# TODO: Figure out how to parameterize this test to run on both CPU and CUDA.abs | ||
# The question is how to have the @needs_cuda decorator with the pytest.mark.parametrize | ||
# decorator on the same test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's simple!
We just need to define this new util
and it can be used like this
If you want, we can merge this PR as-is and follow-up with that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do that as a follow-up
assert_tensor_equal(frames0and180[1], reference_frame180) | ||
|
||
@needs_cuda | ||
def test_get_frames_at_indices_with_cuda(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll also want to test get_frames_in_range
, and all the batch-APIs?
I feel like we should be parametrizing a fair amount of our tests. But this can be done as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do that as a follow-up
convertAVFrameToDecodedOutputOnCuda()
.Sampler benchmark results:
CUDA is only worth it for lots of decoding (and could win at throughput) and potentially for higher resolution videos.
Also interestingly enough the variability in CUDA is quite low.