-
Notifications
You must be signed in to change notification settings - Fork 70
Add video encoding tutorial doc #1063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great tutorial @Dan-Flores , thank you! I left a few minor comments. Let's also edit the docstring of the VideoEncoder methods, and link to the relevant sections here in this tutorial.
I think that since we'll now have to ship these large decoded video frames with our docs, we are potentially increasing the "docs" size by a few dozens of MB. But I think that's OK.
examples/encoding/video_encoding.py
Outdated
|
|
||
| # %% | ||
| # First, we'll download a video and decode some frames to tensors. | ||
| # These will be the input to the VideoEncoder. For more details on decoding, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to use :class:`~torchcodec.encoders.VideoEncoder` everywhere.
It should link to the docstring page. Right now it doesn't, because you need to add it to https://github.com/meta-pytorch/torchcodec/blob/main/docs/source/api_ref_encoders.rst?plain=1
examples/encoding/video_encoding.py
Outdated
|
|
||
|
|
||
| # Video source: https://www.pexels.com/video/adorable-cats-on-the-lawn-4977395/ | ||
| # License: CC0. Author: Altaf Shah. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can still use it, but I don't see the license being explicitly CC0.
| # License: CC0. Author: Altaf Shah. | |
| # Author: Altaf Shah. |
examples/encoding/video_encoding.py
Outdated
| raw_video_bytes = response.content | ||
|
|
||
| decoder = VideoDecoder(raw_video_bytes) | ||
| frames = decoder[:60] # Get first 60 frames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use get_frames_in_range instead, it's more efficient, and we want users to use the most efficient decoding methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slicing actually calls get_frames_in_range():
torchcodec/src/torchcodec/decoders/_video_decoder.py
Lines 193 to 203 in 1ea235a
| def _getitem_slice(self, key: slice) -> Tensor: | |
| assert isinstance(key, slice) | |
| start, stop, step = key.indices(len(self)) | |
| frame_data, *_ = core.get_frames_in_range( | |
| self._decoder, | |
| start=start, | |
| stop=stop, | |
| step=step, | |
| ) | |
| return frame_data |
examples/encoding/video_encoding.py
Outdated
| # round-trip encode/decode process works as expected: | ||
|
|
||
| decoder_verify = VideoDecoder(encoded_frames) | ||
| decoded_frames = decoder_verify[:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
examples/encoding/video_encoding.py
Outdated
| capture_output=True, | ||
| text=True, | ||
| ) | ||
| print(f"Codec used in {output}: {result.stdout.strip()}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great section above. The only issue is that it pollutes the codespace with libx264_encoded.mp4 and hevc_encoded.mp4. Let's use temporary files instead, with e.g. https://docs.python.org/3/library/tempfile.html
examples/encoding/video_encoding.py
Outdated
| # Codec Selection | ||
| # --------------- | ||
| # | ||
| # The ``codec`` parameter specifies which video codec to use for encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could start this section by indicating that by default, the codec is selected automatically based on the container format, for example "mp4" tends to default to h264 (I think? Please check me on this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicolasHug, made a similar comment to what I did above. :) Doing it here or in the previous section are both great to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an intro here explaining the default behavior, this way all codec related text is under the same header. I added the mp4 -> h264 example, as it is often the case in my experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this works well. My one follow-up suggestion is to connect the sentence about codec selection with the default. Something like, "If you want a codec other than the default, use the codec parameter." Followed by the explanation of what it is.
| # %% | ||
| # Low quality (high CRF) | ||
| low_quality_output = encoder.to_tensor(format="mp4", codec="libx264", crf=50) | ||
| play_video(low_quality_output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well done, it's really cool to visually see the effect it has on quality!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I came here to say the same thing. :)
examples/encoding/video_encoding.py
Outdated
| # to check available options for your selected codec. | ||
| # | ||
|
|
||
| import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use pathlib instead https://docs.python.org/3/library/pathlib.html#pathlib.Path.stat
|
Let's add a reference to this tutorial in |
examples/encoding/video_encoding.py
Outdated
| # | ||
| # :class:`~torchcodec.encoders.VideoEncoder` supports encoding frames into a | ||
| # file via the :meth:`~torchcodec.encoders.VideoEncoder.to_file` method, to | ||
| # file-like objects via the :meth:`~torchcodec.encoders.VideoEncoder.to_filelike` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # file-like objects via the :meth:`~torchcodec.encoders.VideoEncoder.to_filelike` | |
| # file-like objects via the :meth:`~torchcodec.encoders.VideoEncoder.to_file_like` |
|
|
||
| print(f"Re-decoded video: {decoded_frames.shape = }") | ||
| print(f"Original frames: {frames.shape = }") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an excellent place to explain that the format parameter selects the default codec - we can also briefly explain the difference between, say, an mp4 video file and the actual codec used to decode and encode the video streams in that file. If this is well explained in any externall FFmpeg docs, we can link to those as well.
That then sets us up for the next section, as the natural next question a reader may have is, what if I don't want the default codec?
At the end of the "Codec Selection" section, we should give some guidance on when to just use format and when to specify codec as well. Nothing elaborate, just a sentence or two. I think that will go a long way to informing our about the relationship between these two options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions, I added some brief guidance on codec vs format at the end.
I drafted text to explain the difference between container-format and codec, but I am worried it dilutes the "Codec Selection" section with text that is not specific to the API. I would be happy to add a link, but I was not able to find useful FFmpeg docs on this subject.
examples/encoding/video_encoding.py
Outdated
| # control of encoding settings beyond the common parameters. | ||
| # | ||
| # For example, some potential extra options for the commonly used H.264 codec, ``libx264`` include: | ||
| # For example, with , ``libx264``: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the second "For example" is from before adding the first.
| extra_options (dict[str, Any], optional): A dictionary of additional | ||
| encoder options to pass, e.g. ``{"qp": 5, "tune": "film"}``. | ||
| Values will be converted to strings before passing to the encoder. | ||
| See :ref:`extra_options` for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not edit the docstrings, as I expect they can be useful as a quick reference to valid values. I am open to suggestions on this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstrings look good, I left minor suggestions above. I agree it is super valuable to have short descriptions of valid values in there.
NicolasHug
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you! Left some minor comments that are easy to address but LGTM
| "yuv420p", "yuv444p"). If not specified, uses codec's default format. | ||
| See :ref:`pixel_format` for details. | ||
| crf (int or float, optional): Constant Rate Factor for encoding quality. Lower values | ||
| mean better quality. Valid range depends on the encoder (commonly 0-51). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and for the other docstrings, I think we should rather say
| mean better quality. Valid range depends on the encoder (commonly 0-51). | |
| mean better quality. Valid range depends on the encoder (e.g. 0-51 for libx264). |
| Defaults to None (which will use encoder's default). | ||
| See :ref:`crf` for details. | ||
| preset (str or int, optional): Encoder option that controls the tradeoff between | ||
| encoding speed and compression. Valid values depend on the encoder (commonly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in the other methods. I suggest this because it wasn't immediately obvious to me what "compression" meant in this case.
| encoding speed and compression. Valid values depend on the encoder (commonly | |
| encoding speed and compression (output size). Valid values depend on the encoder (commonly |
| See :ref:`preset` for details. | ||
| extra_options (dict[str, Any], optional): A dictionary of additional | ||
| encoder options to pass, e.g. ``{"qp": 5, "tune": "film"}``. | ||
| Values will be converted to strings before passing to the encoder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this, I think this is an implementation detail.
| extra_options (dict[str, Any], optional): A dictionary of additional | ||
| encoder options to pass, e.g. ``{"qp": 5, "tune": "film"}``. | ||
| Values will be converted to strings before passing to the encoder. | ||
| See :ref:`extra_options` for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstrings look good, I left minor suggestions above. I agree it is super valuable to have short descriptions of valid values in there.
examples/encoding/video_encoding.py
Outdated
| # %% | ||
| # For most cases, you can simply specify the format parameter and let the FFmpeg select the default codec. | ||
| # However, specifying the codec parameter is useful to select a particular codec implementation | ||
| # (``libx264`` vs ``libx265``) or to have more control over the encoding behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion but I think this last part could be left out. The intro to this section (starting with "By default, the codec ...") is really good and covers all what is necessary IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading it back it is a bit repetitive. Since its the only text after a code block, I'll remove it for consistency.
examples/encoding/video_encoding.py
Outdated
| # For example, with the commonly used H.264 codec, ``libx264``: | ||
| # | ||
| # - Values range from 0 (lossless) to 51 (worst quality) | ||
| # - Values 17 or 18 are conisdered visually lossless, and the default is 23. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # - Values 17 or 18 are conisdered visually lossless, and the default is 23. | |
| # - Values 17 or 18 are considered visually lossless, and the default is 23. |
examples/encoding/video_encoding.py
Outdated
| # For example, with the commonly used H.264 codec, ``libx264`` presets include: | ||
| # | ||
| # - ``"ultrafast"`` (fastest), ``"fast"``, ``"medium"`` (default), ``"slow"``, ``"veryslow"`` (slowest, best compression). | ||
| # - See additional details in the `H.264 Video Encoding Guide <https://trac.ffmpeg.org/wiki/Encode/H.264#a2.Chooseapresetandtune>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely a nitpick, but I think we don't need bullet points here.
No description provided.