Skip to content

Conversation

ahmadsharif1
Copy link
Contributor

@ahmadsharif1 ahmadsharif1 commented Oct 29, 2024

  1. Allocate a batch tensor on the correct device. When cuda is passed in it uses that now.
  2. Pass in the batch tensor's view to the color conversion function convertAVFrameToDecodedOutputOnCuda().
  3. Add a test to test frame contents.
  4. Added a TODO to eventually merge preAllocatedOutputTesnor into RawDecodedOutput because it doesn't make sense to pass in two output data pointers.
  5. Add device to VideoDecoder class
  6. Update sampler benchmark to take in device and video arguments from the commandline

Sampler benchmark results:

CPU:
python benchmarks/samplers/benchmark_samplers.py --device=cpu
----------
num_clips = 1
clips_at_random_indices     med = 23.16ms +- 16.18  med fps = 431.8
clips_at_regular_indices    med = 5.67ms +- 0.43  med fps = 1764.3
clips_at_random_timestamps  med = 22.54ms +- 16.21  med fps = 443.7
clips_at_regular_timestamps med = 7.46ms +- 5.66  med fps = 1339.7
----------
num_clips = 50
clips_at_random_indices     med = 2400.86ms +- 803.05  med fps = 208.3
clips_at_regular_indices    med = 1343.50ms +- 288.18  med fps = 372.2
clips_at_random_timestamps  med = 1170.24ms +- 727.77  med fps = 427.3
clips_at_regular_timestamps med = 950.92ms +- 294.30  med fps = 515.3

CUDA:
python benchmarks/samplers/benchmark_samplers.py --device=cuda:0
----------
num_clips = 1
[AVHWDeviceContext @ 0x8793680] Using current CUDA context.
clips_at_random_indices     med = 245.46ms +- 116.64  med fps = 40.7
clips_at_regular_indices    med = 284.49ms +- 39.86  med fps = 35.2
clips_at_random_timestamps  med = 264.93ms +- 115.74  med fps = 37.7
clips_at_regular_timestamps med = 283.26ms +- 9.99  med fps = 35.3
----------
num_clips = 50
[AVHWDeviceContext @ 0x8d0d680] Using current CUDA context.
clips_at_random_indices     med = 308.00ms +- 104.52  med fps = 1623.4
clips_at_regular_indices    med = 286.54ms +- 12.69  med fps = 1744.9
clips_at_random_timestamps  med = 368.12ms +- 105.73  med fps = 1358.3
clips_at_regular_timestamps med = 285.32ms +- 13.19  med fps = 1717.4

CUDA is only worth it for lots of decoding (and could win at throughput) and potentially for higher resolution videos.

Also interestingly enough the variability in CUDA is quite low.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 29, 2024
@ahmadsharif1 ahmadsharif1 marked this pull request as ready for review October 29, 2024 20:19
Copy link
Contributor

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ahmadsharif1 . Only minor suggestions from me.

This is not immediately related to this PR, but now that we publicly expose CUDA, we'll want to beef-up our CUDA tests. They're pretty minimal right now. The test utils that I linked-to below will be useful. Let's follow-up with that in a separate PR (happy to help).

rawOutput, output, preAllocatedOutputTensor);
} else if (streamInfo.options.device.type() == torch::kCUDA) {
// TODO: handle pre-allocated output tensor
// TODO: we should fold preAllocatedOutputTensor into RawDecodedOutput.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: move this TODO outside of this else/if block (just on top of it?), because this applies to CPU as well, not just to the CUDA branch. We may also want to open an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

instances of ``VideoDecoder`` in parallel. Use a higher number for multi-threaded
decoding which is best if you are running a single instance of ``VideoDecoder``.
Default: 1.
device (str or torch.device, optional): The device to use for decoding.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
device (str or torch.device, optional): The device to use for decoding.
device (str or torch.device, optional): The device to use for decoding. Default: "cpu".

decoding which is best if you are running a single instance of ``VideoDecoder``.
Default: 1.
device (str or torch.device, optional): The device to use for decoding.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't introduced in this PR, but we might as well fix it here: the .. note:: below should be part of the parameter description of the dimension_order parameter. Do you mind moving it back up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

test/utils.py Outdated


# Asserts that at most percentage of the elements are different by more than abs_tolerance.
def assert_tensor_nearly_equal(frame1, frame2, percentage=0.3, abs_tolerance=20):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit regarding the name: we already have assert_tensor_close, which semantically convey the same meaning as "nearly equal" to me. So the distinction between these isn't obvious. Maybe assert_tensor_close_on_at_least(...)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Even better if we can use the same utility function - I suspect that the logic we're doing in this function is quite similar to what torch.testing.assert_close() is already doing.

The answer also might be that we just eliminate both assert_tensor_close() and assert_tensor_nearly_equal(), and just use plain torch.testing.assert_close() with scenario-specific tolerances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are doing something different here compared to assert_close.

I could use assert_close but the tolerances were quite high. I actually did use it in my first PR:

#242 (comment)

Comment on lines +225 to +227
# TODO: Figure out how to parameterize this test to run on both CPU and CUDA.abs
# The question is how to have the @needs_cuda decorator with the pytest.mark.parametrize
# decorator on the same test.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's simple!

We just need to define this new util

https://github.com/pytorch/vision/blob/e9a3213524a0abd609ac7330cf170b9e19917d39/test/common_utils.py#L122-L125

and it can be used like this

https://github.com/pytorch/vision/blob/e9a3213524a0abd609ac7330cf170b9e19917d39/test/test_utils.py#L221

If you want, we can merge this PR as-is and follow-up with that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that as a follow-up

assert_tensor_equal(frames0and180[1], reference_frame180)

@needs_cuda
def test_get_frames_at_indices_with_cuda(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also want to test get_frames_in_range, and all the batch-APIs?
I feel like we should be parametrizing a fair amount of our tests. But this can be done as a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that as a follow-up

@ahmadsharif1 ahmadsharif1 merged commit dc16154 into meta-pytorch:main Oct 30, 2024
37 of 40 checks passed
@ahmadsharif1 ahmadsharif1 deleted the cuda13 branch October 30, 2024 15:57
@ahmadsharif1 ahmadsharif1 mentioned this pull request Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants