Skip to content

Conversation

@NicolasHug
Copy link
Contributor

@NicolasHug NicolasHug commented Nov 6, 2024

This PR adds checks for the shape of the output tensors we allocate. This should help us avoid some potential memory corruption issues.

This PR does not change any existing logic (well, not really, see details in comments below). It only adds safer checks against existing assumptions.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 6, 2024
expectedOutputWidth,
"x3, got ",
shape);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check above validates the shape of the pre-allocated output tensor. It was previously only done for swscale, but it should also be done for filtergraph, hence why I moved it up.

int VideoDecoder::convertFrameToBufferUsingSwsScale(
int streamIndex,
const AVFrame* frame,
torch::Tensor& outputTensor) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the signature of convertFrameToBufferUsingSwsScale for it to be very similar to convertFrameToTensorUsingFilterGraph. The signatures are now:

int VideoDecoder::convertFrameToBufferUsingSwsScale(
    int streamIndex,
    const AVFrame* frame,
    torch::Tensor& outputTensor);

torch::Tensor VideoDecoder::convertFrameToTensorUsingFilterGraph(
    int streamIndex,
    const AVFrame* frame);

Their signatures being so different previously was a constant source of confusion for me.

I realize that this change seemingly goes against our TODO of folding the pre-allocated tensor within RawOutput. But I think this TODO will in fact be easier to address with that change, as both APIs will be more closely aligned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function now returning the height?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that we can check against it and have the swscale check next to the filtergraph check. It makes the logic of both libraries more symmetric.

TORCH_CHECK_EQ(filteredFrame->format, AV_PIX_FMT_RGB24);
auto frameDims = getHeightAndWidthFromOptionsOrAVFrame(
streams_[streamIndex].options, *filteredFrame.get());
auto frameDims = getHeightAndWidthFromResizedAVFrame(*filteredFrame.get());
Copy link
Contributor Author

@NicolasHug NicolasHug Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted this logic to what it previously was before #332. I had some doubt in #332 (comment) already, and now I think it's clear that this logic for finding height and width should be treated as a separate logic from the 2 existing ones.

I have updated the big note below, with my updated understanding.

(I know it's a bit complicated. But the complexity isn't caused by this PR!)


// TODO height and width info of output tensor comes from the metadata, which
// may not be accurate. How do we make sure we won't corrupt memory if the
// allocated tensor is too short/large?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmadsharif1 any idea on how to do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can look at rawOutput.frame's dimensions and use those, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to get height and width from the AVFrame instead of from the Metadata then? I don't mind but that's a change of logic. Do you know why it wasn't done like that before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline, I'll submit a follow-up to do that.

TORCH_CHECK(
(shape.size() == 3) && (shape[0] == expectedOutputHeight) &&
(shape[1] == expectedOutputWidth) && (shape[2] == 3),
"Expected pre-allocated tensor of shape ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also be able to do (shape == {expectedOutputHeight, expectedOutputWidth, 3}) if you think that's clearer. I'm not sure.

}
}

FrameDims getHeightAndWidthFromResizedAVFrame(const AVFrame& resizedAVFrame) {
Copy link
Contributor

@scotts scotts Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the function itself has no opinion on what kind of frame it is (resized or not), I think the name should just be getHeightAndWidthFromAVFrame().

Read comment below. I get the motivation for grepping through the code. I don't think this is amazing, but it should help us get to a cleaner place, and it does make explicit how convoluted things are.

@scotts
Copy link
Contributor

scotts commented Nov 6, 2024

The changes look good to me, but I think @ahmadsharif1 should also review.

int VideoDecoder::convertFrameToBufferUsingSwsScale(
int streamIndex,
const AVFrame* frame,
torch::Tensor& outputTensor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function now returning the height?

// for pre-allocating batched output tensors (we could pre-allocate those only
// once we decode the first frame to get the info frame the AVFrame, but that's
// a more complex logic).
// The source of truth for height and width really is the (resized) AVFrame:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decoded output from ffmpeg isn't resized. The resize happens later


// There's nothing preventing you from calling this on a non-resized frame, but
// please don't.
FrameDims getHeightAndWidthFromResizedAVFrame(const AVFrame& resizedAVFrame);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getHeightAndWidth can just be called getFrameDims


// TODO height and width info of output tensor comes from the metadata, which
// may not be accurate. How do we make sure we won't corrupt memory if the
// allocated tensor is too short/large?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can look at rawOutput.frame's dimensions and use those, right?

@NicolasHug NicolasHug merged commit c2bea4b into meta-pytorch:main Nov 8, 2024
37 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants