Skip to content

Conversation

@NicolasHug
Copy link
Contributor

@NicolasHug NicolasHug commented Oct 29, 2024

This PR is about where and when we call MaybePermuteHWC2CHW(). It's not about tensor allocation (this will come, later).


At a high-level, this PR changes all conditional call patterns like:

if (cond) {
  output.frames = MaybePermuteHWC2CHW(output.frames)
}

to a plain, unconditional

output.frames = MaybePermuteHWC2CHW(output.frames)

This makes it a lot simpler to reason about our output shape permutation. In main, cond is typically input-dependent (but really, caller-dependent), and it leads to a state that's hard to reason about.

Another benefit of this PR is that now all low-level decoding routines (like convertAVFrameToDecodedOutputOnCPU()) have a simpler interface: they only ever take and return HWC tensors.


At a lower level, the following changes were made:

  • MaybePermuteHWC2CHW() is now a method so we can pass a streamIndex. It makes its interface slightly simpler.
  • It's now up to every high-level decoding function to call MaybePermuteHWC2CHW().
  • Some methods like getFrameAtIndex() and getNextDecodedOutputNoDemux() were used both as a high-level decoding entry-point and as a low-level subroutine of other entry-points. I split those into getFrameAtIndex()/getFrameAtIndexInternal() and getNextFrame()/getNextDecodedOutputNoDemux() to clearly distinguish between the public entry point and the underlying private helper. Note that this isn't just a "nice-to-have" or a nit-pick, it's a necessary change for the goal of this PR.
  • getNextFrame() is the new public entry point, getNextDecodedOutputNoDemux() is now private.

A follow-up of this PR will be to unify the tensor allocation. I think it'd make sense for tensors to always be pre-allocated by the high-level decoding entry points. It will allow us to unify the allocation logic in a single place.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 29, 2024
raise ValueError(f"No GLIBCXX symbols found in {symbol_matches}. Something is wrong.")
raise ValueError(
f"No GLIBCXX symbols found in {symbol_matches}. Something is wrong."
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why, my linter started to want to format this. If the linter on our CI job is OK with it, can we let this in? Otherwise I have to manually revert it on all my commits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linters gonna lint.

Comment on lines -932 to -939
if (!preAllocatedOutputTensor.has_value()) {
// We only convert to CHW if a pre-allocated tensor wasn't passed. When a
// pre-allocated tensor is passed, it's up to the caller (typically a
// batch API) to do the conversion. This is more efficient as it allows
// batch NHWC tensors to be permuted only once, instead of permuting HWC
// tensors N times.
output.frame = MaybePermuteHWC2CHW(streamInfo.options, output.frame);
}
Copy link
Contributor Author

@NicolasHug NicolasHug Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removal is actually the key change. Everything else is just (sensible) patching until tests work

@scotts
Copy link
Contributor

scotts commented Oct 29, 2024

Looks good to me. @ahmadsharif1 should also review.

return rawOutput;
}

VideoDecoder::DecodedOutput VideoDecoder::getNextFrame() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency this should have a NoDemux suffix

int streamIndex,
int64_t frameIndex,
std::optional<torch::Tensor> preAllocatedOutputTensor = std::nullopt);
DecodedOutput getNextDecodedOutputNoDemux(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have an internal suffix?

DecodedOutput& output,
std::optional<torch::Tensor> preAllocatedOutputTensor = std::nullopt);

DecodedOutput getFrameAtIndexInternal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you have a comment somewhere saying these are always returned in HWC?

You could have a convention that Internal suffix'd functions always return in HWC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes definitely, ultimately I want to label all functions in the decoding stack with expected input and output shapes. I'll follow-up with that, I think it'll be part of a sequence of PRs, after the one about up-leveling the tensor allocation

@NicolasHug NicolasHug merged commit daf3631 into meta-pytorch:main Oct 29, 2024
40 checks passed
@NicolasHug NicolasHug deleted the unify_shape_handling branch October 29, 2024 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants