-
Notifications
You must be signed in to change notification settings - Fork 76
Rework HWC / CHW dimension order conversions #277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8e06aa6
025bf27
f83ada9
72717bd
291bc87
887ae42
9418cb3
6b3da59
6a2190c
c8f2e79
5113b9c
9387537
bcb4e50
5db658e
e23acb7
96deb24
c2f2e59
faa0178
3484615
22126c4
176a652
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -34,6 +34,31 @@ double ptsToSeconds(int64_t pts, const AVRational& timeBase) { | |
| return ptsToSeconds(pts, timeBase.den); | ||
| } | ||
|
|
||
| // Returns a [N]CHW *view* of a [N]HWC input tensor, if the options require so. | ||
| // The [N] leading batch-dimension is optional i.e. the input tensor can be 3D | ||
| // or 4D. | ||
| // Calling permute() is guaranteed to return a view as per the docs: | ||
| // https://pytorch.org/docs/stable/generated/torch.permute.html | ||
| torch::Tensor MaybePermuteHWC2CHW( | ||
| const VideoDecoder::VideoStreamDecoderOptions& options, | ||
| torch::Tensor& hwcTensor) { | ||
| if (options.dimensionOrder == "NHWC") { | ||
| return hwcTensor; | ||
| } | ||
| auto numDimensions = hwcTensor.dim(); | ||
| auto shape = hwcTensor.sizes(); | ||
| if (numDimensions == 3) { | ||
| TORCH_CHECK(shape[2] == 3, "Not a HWC tensor: ", shape); | ||
| return hwcTensor.permute({2, 0, 1}); | ||
| } else if (numDimensions == 4) { | ||
| TORCH_CHECK(shape[3] == 3, "Not a NHWC tensor: ", shape); | ||
| return hwcTensor.permute({0, 3, 1, 2}); | ||
| } else { | ||
| TORCH_CHECK( | ||
| false, "Expected tensor with 3 or 4 dimensions, got ", numDimensions); | ||
| } | ||
| } | ||
|
|
||
| struct AVInput { | ||
| UniqueAVFormatContext formatContext; | ||
| std::unique_ptr<AVIOBytesContext> ioBytesContext; | ||
|
|
@@ -167,28 +192,13 @@ VideoDecoder::BatchDecodedOutput::BatchDecodedOutput( | |
| const VideoStreamDecoderOptions& options, | ||
| const StreamMetadata& metadata) | ||
| : ptsSeconds(torch::empty({numFrames}, {torch::kFloat64})), | ||
| durationSeconds(torch::empty({numFrames}, {torch::kFloat64})) { | ||
| if (options.dimensionOrder == "NHWC") { | ||
| frames = torch::empty( | ||
| {numFrames, | ||
| options.height.value_or(*metadata.height), | ||
| options.width.value_or(*metadata.width), | ||
| 3}, | ||
| {torch::kUInt8}); | ||
| } else if (options.dimensionOrder == "NCHW") { | ||
| frames = torch::empty( | ||
| {numFrames, | ||
| 3, | ||
| options.height.value_or(*metadata.height), | ||
| options.width.value_or(*metadata.width)}, | ||
| torch::TensorOptions() | ||
| .memory_format(torch::MemoryFormat::ChannelsLast) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am wondering if using this is identical to permuting a NHWC tensor at the end. I am not 100% sure. Do you?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I don't understand what you mean. What image did you mean to link to? Note that this PR should e strictly more efficient:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry the image wasn't uploaded properly:
I am not sure about the performance implications of doing a permute instead of From this page: https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html#:~:text=What%20is%20Channels%20Last,pixel%2Dper%2Dpixel).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I still don't really understand where you're coming from. Can you please share a link? Is this relevant for this PR? Again, the change involved in this PR is:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the link. We're not concerned about memory format (contiguous vs channels-last) in this PR. This is a related but distinct concern to the dimension order.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Link in the edited comment above. It's from the channels-last page. I am not 100% sure if creating a NHWC and permuting it is the same as creating a NCHW with channels-last and working with that. The code that you deleted was doing the latter. A benchmark may show a difference -- or not. Do you know? |
||
| .dtype({torch::kUInt8})); | ||
| } else { | ||
| TORCH_CHECK( | ||
| false, "Unsupported frame dimensionOrder =" + options.dimensionOrder) | ||
| } | ||
| } | ||
| durationSeconds(torch::empty({numFrames}, {torch::kFloat64})), | ||
| frames(torch::empty( | ||
| {numFrames, | ||
| options.height.value_or(*metadata.height), | ||
| options.width.value_or(*metadata.width), | ||
| 3}, | ||
| {torch::kUInt8})) {} | ||
|
|
||
| VideoDecoder::VideoDecoder() {} | ||
|
|
||
|
|
@@ -890,22 +900,27 @@ void VideoDecoder::convertAVFrameToDecodedOutputOnCPU( | |
| if (output.streamType == AVMEDIA_TYPE_VIDEO) { | ||
| if (streamInfo.colorConversionLibrary == ColorConversionLibrary::SWSCALE) { | ||
| torch::Tensor tensor; | ||
| int width = streamInfo.options.width.value_or(frame->width); | ||
| int height = streamInfo.options.height.value_or(frame->height); | ||
| if (preAllocatedOutputTensor.has_value()) { | ||
| // TODO: check shape of preAllocatedOutputTensor? | ||
| tensor = preAllocatedOutputTensor.value(); | ||
| auto shape = tensor.sizes(); | ||
| TORCH_CHECK( | ||
| (shape.size() == 3) && (shape[0] == height) && | ||
| (shape[1] == width) && (shape[2] == 3), | ||
| "Expected tensor of shape ", | ||
| height, | ||
| "x", | ||
| width, | ||
| "x3, got ", | ||
| shape); | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any idea how to make this single call shorter 🤔 ?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you can do: But I'm not sure. The main thing I'm not sure about is if an array literal will auto-convert into a the corresponding
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I was mainly hoping to avoid the stack :p |
||
| } else { | ||
| int width = streamInfo.options.width.value_or(frame->width); | ||
| int height = streamInfo.options.height.value_or(frame->height); | ||
| tensor = torch::empty( | ||
| {height, width, 3}, torch::TensorOptions().dtype({torch::kUInt8})); | ||
| } | ||
|
|
||
| rawOutput.data = tensor.data_ptr<uint8_t>(); | ||
| convertFrameToBufferUsingSwsScale(rawOutput); | ||
|
|
||
| if (streamInfo.options.dimensionOrder == "NCHW") { | ||
| tensor = tensor.permute({2, 0, 1}); | ||
| } | ||
| output.frame = tensor; | ||
| } else if ( | ||
| streamInfo.colorConversionLibrary == | ||
|
|
@@ -916,6 +931,14 @@ void VideoDecoder::convertAVFrameToDecodedOutputOnCPU( | |
| "Invalid color conversion library: " + | ||
| std::to_string(static_cast<int>(streamInfo.colorConversionLibrary))); | ||
| } | ||
| if (!preAllocatedOutputTensor.has_value()) { | ||
| // We only convert to CHW if a pre-allocated tensor wasn't passed. When a | ||
| // pre-allocated tensor is passed, it's up to the caller (typically a | ||
| // batch API) to do the conversion. This is more efficient as it allows | ||
| // batch NHWC tensors to be permuted only once, instead of permuting HWC | ||
| // tensors N times. | ||
| output.frame = MaybePermuteHWC2CHW(streamInfo.options, output.frame); | ||
| } | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I still think there's some smell to this. Whether the tensor was pre-allocated and whether it should be permuted should be orthogonal concepts. And it should be up to the higher-level decoding entry-points (basically the moral equivalent of the public methods) to do the conversion. It's not trivial because
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed on your reasoning and the principles. One way to square the circle is to split the public facing part of
Then all of the internal calls to
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am fine with low-level functions only dealing with HWC. AFAICT, most (all?) low-level code deals with HWC because it has better performance.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, that sounds good. I'll try to implement that in a follow-up PR
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's not quite the case. In It may still return both HWC and CHW (instead of just HWC), and this is what I want to fix as a follow-up |
||
|
|
||
| } else if (output.streamType == AVMEDIA_TYPE_AUDIO) { | ||
| // TODO: https://github.com/pytorch-labs/torchcodec/issues/85 implement | ||
|
|
@@ -1046,6 +1069,7 @@ VideoDecoder::BatchDecodedOutput VideoDecoder::getFramesAtIndices( | |
| } | ||
| i++; | ||
| } | ||
| output.frames = MaybePermuteHWC2CHW(options, output.frames); | ||
| return output; | ||
| } | ||
|
|
||
|
|
@@ -1081,7 +1105,7 @@ VideoDecoder::BatchDecodedOutput VideoDecoder::getFramesInRange( | |
| output.ptsSeconds[f] = singleOut.ptsSeconds; | ||
| output.durationSeconds[f] = singleOut.durationSeconds; | ||
| } | ||
|
|
||
| output.frames = MaybePermuteHWC2CHW(options, output.frames); | ||
| return output; | ||
| } | ||
|
|
||
|
|
@@ -1134,6 +1158,7 @@ VideoDecoder::getFramesDisplayedByTimestampInRange( | |
| // need this special case below. | ||
| if (startSeconds == stopSeconds) { | ||
| BatchDecodedOutput output(0, options, streamMetadata); | ||
| output.frames = MaybePermuteHWC2CHW(options, output.frames); | ||
| return output; | ||
| } | ||
|
|
||
|
|
@@ -1176,6 +1201,7 @@ VideoDecoder::getFramesDisplayedByTimestampInRange( | |
| output.ptsSeconds[f] = singleOut.ptsSeconds; | ||
| output.durationSeconds[f] = singleOut.durationSeconds; | ||
| } | ||
| output.frames = MaybePermuteHWC2CHW(options, output.frames); | ||
|
|
||
| return output; | ||
| } | ||
|
|
@@ -1302,11 +1328,6 @@ torch::Tensor VideoDecoder::convertFrameToTensorUsingFilterGraph( | |
| torch::Tensor tensor = torch::from_blob( | ||
| filteredFramePtr->data[0], shape, strides, deleter, {torch::kUInt8}); | ||
| StreamInfo& activeStream = streams_[streamIndex]; | ||
| if (activeStream.options.dimensionOrder == "NCHW") { | ||
| // The docs guaranty this to return a view: | ||
| // https://pytorch.org/docs/stable/generated/torch.permute.html | ||
| tensor = tensor.permute({2, 0, 1}); | ||
| } | ||
| return tensor; | ||
| } | ||
|
|
||
|
|
||

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this robust if the width/height is 3?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will have false positive for the extremely rare (and probably degenerate) case where a video width is 3.
This check is the very very best we can do at this stage. The alternative is to not check anything.