-
Notifications
You must be signed in to change notification settings - Fork 74
Refac: Straightforward output shape permutation #317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refac: Straightforward output shape permutation #317
Conversation
| raise ValueError(f"No GLIBCXX symbols found in {symbol_matches}. Something is wrong.") | ||
| raise ValueError( | ||
| f"No GLIBCXX symbols found in {symbol_matches}. Something is wrong." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why, my linter started to want to format this. If the linter on our CI job is OK with it, can we let this in? Otherwise I have to manually revert it on all my commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linters gonna lint.
| if (!preAllocatedOutputTensor.has_value()) { | ||
| // We only convert to CHW if a pre-allocated tensor wasn't passed. When a | ||
| // pre-allocated tensor is passed, it's up to the caller (typically a | ||
| // batch API) to do the conversion. This is more efficient as it allows | ||
| // batch NHWC tensors to be permuted only once, instead of permuting HWC | ||
| // tensors N times. | ||
| output.frame = MaybePermuteHWC2CHW(streamInfo.options, output.frame); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removal is actually the key change. Everything else is just (sensible) patching until tests work
|
Looks good to me. @ahmadsharif1 should also review. |
| return rawOutput; | ||
| } | ||
|
|
||
| VideoDecoder::DecodedOutput VideoDecoder::getNextFrame() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency this should have a NoDemux suffix
| int streamIndex, | ||
| int64_t frameIndex, | ||
| std::optional<torch::Tensor> preAllocatedOutputTensor = std::nullopt); | ||
| DecodedOutput getNextDecodedOutputNoDemux( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have an internal suffix?
| DecodedOutput& output, | ||
| std::optional<torch::Tensor> preAllocatedOutputTensor = std::nullopt); | ||
|
|
||
| DecodedOutput getFrameAtIndexInternal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you have a comment somewhere saying these are always returned in HWC?
You could have a convention that Internal suffix'd functions always return in HWC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes definitely, ultimately I want to label all functions in the decoding stack with expected input and output shapes. I'll follow-up with that, I think it'll be part of a sequence of PRs, after the one about up-leveling the tensor allocation
This PR is about where and when we call
MaybePermuteHWC2CHW(). It's not about tensor allocation (this will come, later).At a high-level, this PR changes all conditional call patterns like:
to a plain, unconditional
This makes it a lot simpler to reason about our output shape permutation. In
main,condis typically input-dependent (but really, caller-dependent), and it leads to a state that's hard to reason about.Another benefit of this PR is that now all low-level decoding routines (like
convertAVFrameToDecodedOutputOnCPU()) have a simpler interface: they only ever take and return HWC tensors.At a lower level, the following changes were made:
MaybePermuteHWC2CHW()is now a method so we can pass astreamIndex. It makes its interface slightly simpler.MaybePermuteHWC2CHW().getFrameAtIndex()andgetNextDecodedOutputNoDemux()were used both as a high-level decoding entry-point and as a low-level subroutine of other entry-points. I split those intogetFrameAtIndex()/getFrameAtIndexInternal()andgetNextFrame()/getNextDecodedOutputNoDemux()to clearly distinguish between the public entry point and the underlying private helper. Note that this isn't just a "nice-to-have" or a nit-pick, it's a necessary change for the goal of this PR.getNextFrame()is the new public entry point,getNextDecodedOutputNoDemux()is now private.A follow-up of this PR will be to unify the tensor allocation. I think it'd make sense for tensors to always be pre-allocated by the high-level decoding entry points. It will allow us to unify the allocation logic in a single place.