-
Notifications
You must be signed in to change notification settings - Fork 70
Description
The Python VideoDecoder needs to know how many frames are in a stream. It asserts this at startup:
https://github.com/pytorch/torchcodec/blob/dd44f57180070572eee9e7d35d38f9c03569f6f9/src/torchcodec/decoders/_video_decoder.py#L387-L390
It needs to know this because the number of frames are the length of the decoder as a sequence:
https://github.com/pytorch/torchcodec/blob/dd44f57180070572eee9e7d35d38f9c03569f6f9/src/torchcodec/decoders/_video_decoder.py#L122-L123
Some videos are missing the number of frames in their metadata. For exact mode, this is not a problem because we compute the number of frames when we scan the stream. For approximate mode, we currently cannot instantiate a decoder for such videos. Approximate mode also requires the FPS - that's how we figure out indices. So for a video that is missing its number of frames, if it has its FPS and time duration, we should be able to compute the number of frames.
In terms of implementation, we should hide all of this inside of the metadata class rather than changing any logic in VideoDecoder. We already make num_frames a property and figure out which is the best value to return:
https://github.com/pytorch/torchcodec/blob/dd44f57180070572eee9e7d35d38f9c03569f6f9/src/torchcodec/_core/_metadata.py#L125-L133
We should also figure out if num_frames_from_header is None, and if it is, return the calculation. Doing this change is very straight-forward, but we'll need testing to make sure we don't run into segfaults.