-
Notifications
You must be signed in to change notification settings - Fork 64
Add custom_frame_mappings to VideoDecoder init #799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add custom_frame_mappings to VideoDecoder init #799
Conversation
Thanks for the benchmarks! That's quite interesting. It'd be interesting to use the If that's faster than |
I believe this is it - I updated the PR description with my updated benchmarking script and results. By reducing the JSON size to contain only the necessary information, the performance of Edit: Here's my alternative code for benchmarking the
|
897e15d
to
0d661f5
Compare
e220d14
to
be9466b
Compare
@pytest.mark.skipif( | ||
get_ffmpeg_major_version() in (4, 5), | ||
reason="ffprobe isn't accurate on ffmpeg 4 and 5", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ffprobe does include the fields pts
and duration
in older versions, using the name pkt_pts
and pkt_duration
, so we can enable these tests by falling back to the pts...
keys.
custom_frame_mappings_data = ( | ||
read_custom_frame_mappings(custom_frame_mappings) | ||
if custom_frame_mappings is not None | ||
else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ever valid for seek_mode == "custom_frame_mappings"
and custom_frame_mappings_data is None
to both be true? If no, then we should probably raise an error here.
Actually, shouldn't line 93 mean that custom_frame_mappings
cannot be None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, that case would not be valid.
Currently, creating a VideoDecoder with seek_mode = custom_frame_mappings
and custom_frame_mappings=None
will throw an error in add_video_stream
, which is called on line 121.
Do you think it would be helpful to error sooner?
except json.JSONDecodeError: | ||
raise ValueError( | ||
"Invalid custom frame mappings. " | ||
"It should be a valid JSON string or a JSON file object." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our type annotation, we only say we accept string-like objects, not file-like objects. In other parts of the code, I've indicated a file-like object with io.RawIOBase
and io.BufferedReader
: https://github.com/pytorch/torchcodec/blob/c3eea9f9f42d3d51f9c53ba19be6c11cc88c21dd/src/torchcodec/decoders/_video_decoder.py#L74-L77
For our doc string, however, we just say "file-like object" since those actual types are probably meaningless to most users: https://github.com/pytorch/torchcodec/blob/c3eea9f9f42d3d51f9c53ba19be6c11cc88c21dd/src/torchcodec/decoders/_video_decoder.py#L26
Which introduces a good point: since this a top-level API that we expect users to call, we need a docstring. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the references, I'll add a docstring and use the term file-like object
for uniformity.
device: Optional[Union[str, torch_device]] = "cpu", | ||
seek_mode: Literal["exact", "approximate"] = "exact", | ||
seek_mode: Literal["exact", "approximate", "custom_frame_mappings"] = "exact", | ||
custom_frame_mappings: Optional[Union[bytes, bytearray, str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two values should also be explained in the docstring.
"It should be a valid JSON string or a JSON file object." | ||
) | ||
# These keys are prefixed with "pkt_" in ffmpeg 4 and ffmpeg 5 | ||
pts_key = "pkt_pts" if "pts" not in input_data["frames"][0] else "pts" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but it would be helpful to users to do explicit error checks on our assumptions about the data:
- Assert that
input_data
is non-empty. - Assert that
"frames"
is a valid key. - Assert that one of the valid pts and duration keys exist. (Rather than assuming it must be the other one.)
If any of the above isn't true, we'll still fail, but it will just be with KeyError
exceptions that the cause of may not be obvious to users.
test/test_decoders.py
Outdated
stream_index=0, | ||
device=device, | ||
custom_frame_mappings=f.read(), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the comments I left in read_custom_frame_mappings()
, I think we should also test for valid JSON that does not follow the schema that FFmpeg produces.
test/utils.py
Outdated
def create_custom_frame_mappings(self, stream_index: int) -> None: | ||
result = json.loads(self.generate_custom_frame_mappings(stream_index)) | ||
# These keys are prefixed with "pkt_" in ffmpeg 4 and ffmpeg 5 | ||
pts_key = "pkt_pts" if "pts" not in result["frames"][0] else "pts" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's simpler to just call read_custom_frame_mappings()
here. I believe it's accomplishing the same thing, and as that function evolves, so to will this utils function need to evolve.
}, | ||
... | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also say something about the relationship with seek_mode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also add that in order to accomodate for different FFmpeg versions we also allow pkt_pts
and pkt_duration
?
test/test_decoders.py
Outdated
def test_custom_frame_mappings_init_fails_invalid_json(self, tmp_path, device): | ||
invalid_json_path = tmp_path / "invalid_json" | ||
with open(invalid_json_path, "w+") as f: | ||
f.write("""'{"invalid": "json"'""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this is not obviously invalid - it's invalid because it's missing the closing }
, but I had to read really closely to catch that! Part of the reason I had difficulty is that there's so much quoting going on. I think it might be better to do something like:
f.write("garbage input")
That is, something that doesn't even look like JSON at a glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! There's a few small things we can address before merging.
) | ||
|
||
|
||
def read_custom_frame_mappings( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
def read_custom_frame_mappings( | |
def _read_custom_frame_mappings( |
This makes it more obvious that it is private. It is already private even without the leading underscore, because it's only exposed within _video_decoder.py
which itself has the underscore. But explicitly having an underscore makes it even more obvious, and consistent with the other private functions defined in this file
num_ffmpeg_threads: int = 1, | ||
device: Optional[Union[str, torch_device]] = "cpu", | ||
seek_mode: Literal["exact", "approximate"] = "exact", | ||
seek_mode: Literal["exact", "approximate", "custom_frame_mappings"] = "exact", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC we decided not to expose custom_frame_mappings
at the public level, and just force users to rely on the default seek_mode (approximate) if they pass custom_frames_mapping
? That's consistent with the existing (correct) check below:
if custom_frame_mappings is not None and seek_mode == "approximate":
raise ValueError(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe @NicolasHug is right here. I completely forgot about that.
|
||
if not pts_key or not duration_key or not key_frame_present: | ||
raise ValueError( | ||
"Invalid custom frame mappings. The 'pts', 'duration', and 'key_frame' keys are required in the frame metadata." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably want to indicate that pkt_pts
and pkt_duration
are also OK?
(float(frame[pts_key]), frame["key_frame"], float(frame[duration_key])) | ||
for frame in input_data["frames"] | ||
] | ||
all_frames, is_key_frame, duration = map(Tensor, zip(*frame_data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be torch.tensor
, not Tensor
which is a type
Also in general, list comprehensions are preferred to map
(but we can keep it as-is)
assert ( | ||
len(all_frames) == len(is_key_frame) == len(duration) | ||
), "Mismatched lengths in frame index data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For user-facing assertions we prefer if + ValueError
rather than assert
which will lead to an AssertionError
. The ValueError is more explicitly about bad user-input.
test/test_decoders.py
Outdated
decoder = VideoDecoder(asset.path) | ||
decoder.get_frame_at(10) | ||
|
||
def setup_frame_mappings(tmp_path: str, file: bool, stream_index: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit tmp_path isn't a str, it's Path, otherwise we wouldn't be able to apply /
to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - since type annotations are not used elsewhere in this file I will remove them from this function.
stream_index=stream_index, | ||
device=device, | ||
custom_frame_mappings=custom_frame_mappings, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: everything below can probably be out of the context manager. It's preferable to end the CM's scope as soon as it's not needed.
This PR enables
custom_frame_mappings
to be used in the Python VideoDecoder.read_custom_frame_mappings()
. This parses a JSON str or JSON file to extractall_frames
,is_key_frame
,duration
.seek_mode
ifcustom_frame_mappings
is passed in to avoid the exact mode scan during initialization.Benchmarking
I wrote a short benchmarking script to test the initialization times of
custom_frame_mappings
versusexact
.custom_frame_mapping
mode was quicker thanexact
mode if theffprobe
command used to generate the frame mapping json used the-show_entries
option to reduce the JSON size.With this optimization the performance improvement increased with longer videos.
Without this optimization, using the full
ffprobe
output,exact
mode was faster.The results on
nasa_13013.mp4
:The results on a generated video,
mandelbrot_1920x1080_120s.mp4
:The benchmarking code: