-
Notifications
You must be signed in to change notification settings - Fork 71
Add codec selection to VideoEncoder API #1038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Dan-Flores
merged 9 commits into
meta-pytorch:main
from
Dan-Flores:codec_select_encode_option
Nov 14, 2025
+162
−10
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
976bd2c
add codec selection + logic, add valid codecs test
Dan-Flores d60764e
consistent arg order in ops.py
Dan-Flores adb469d
fix test w correct frame dims
Dan-Flores cd7f8f1
wip new tests2
Dan-Flores d4257fd
Merge branch 'main' of https://github.com/meta-pytorch/torchcodec int…
Dan-Flores 6c15e76
cleaned tests, add fbcode skip
Dan-Flores 8b48e18
skip vp9 on windows, update error message to suggest calling ffmpeg cli
Dan-Flores b943504
Merge branch 'main' into codec_select_encode_option
Dan-Flores b045b6c
add defensive check for avFormatContext_->oformat
Dan-Flores File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -36,6 +36,7 @@ def to_file( | |
| self, | ||
| dest: Union[str, Path], | ||
| *, | ||
| codec: Optional[str] = None, | ||
| pixel_format: Optional[str] = None, | ||
| crf: Optional[Union[int, float]] = None, | ||
| preset: Optional[Union[str, int]] = None, | ||
|
|
@@ -46,6 +47,9 @@ def to_file( | |
| dest (str or ``pathlib.Path``): The path to the output file, e.g. | ||
| ``video.mp4``. The extension of the file determines the video | ||
| container format. | ||
| codec (str, optional): The codec to use for encoding (e.g., "libx264", | ||
| "h264"). If not specified, the default codec | ||
| for the container format will be used. | ||
| pixel_format (str, optional): The pixel format for encoding (e.g., | ||
| "yuv420p", "yuv444p"). If not specified, uses codec's default format. | ||
| crf (int or float, optional): Constant Rate Factor for encoding quality. Lower values | ||
|
|
@@ -61,6 +65,7 @@ def to_file( | |
| frames=self._frames, | ||
| frame_rate=self._frame_rate, | ||
| filename=str(dest), | ||
| codec=codec, | ||
| pixel_format=pixel_format, | ||
| crf=crf, | ||
| preset=preset, | ||
|
|
@@ -70,6 +75,7 @@ def to_tensor( | |
| self, | ||
| format: str, | ||
| *, | ||
| codec: Optional[str] = None, | ||
| pixel_format: Optional[str] = None, | ||
| crf: Optional[Union[int, float]] = None, | ||
| preset: Optional[Union[str, int]] = None, | ||
|
|
@@ -78,7 +84,10 @@ def to_tensor( | |
|
|
||
| Args: | ||
| format (str): The container format of the encoded frames, e.g. "mp4", "mov", | ||
| "mkv", "avi", "webm", "flv", etc. | ||
| "mkv", "avi", "webm", "flv", etc. | ||
| codec (str, optional): The codec to use for encoding (e.g., "libx264", | ||
| "h264"). If not specified, the default codec | ||
| for the container format will be used. | ||
| pixel_format (str, optional): The pixel format to encode frames into (e.g., | ||
| "yuv420p", "yuv444p"). If not specified, uses codec's default format. | ||
| crf (int or float, optional): Constant Rate Factor for encoding quality. Lower values | ||
|
|
@@ -90,13 +99,14 @@ def to_tensor( | |
| (which will use encoder's default). | ||
|
|
||
| Returns: | ||
| Tensor: The raw encoded bytes as 4D uint8 Tensor. | ||
| Tensor: The raw encoded bytes as 1D uint8 Tensor. | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Drive by change: this should say |
||
| """ | ||
| preset_value = str(preset) if isinstance(preset, int) else preset | ||
| return _core.encode_video_to_tensor( | ||
| frames=self._frames, | ||
| frame_rate=self._frame_rate, | ||
| format=format, | ||
| codec=codec, | ||
| pixel_format=pixel_format, | ||
| crf=crf, | ||
| preset=preset_value, | ||
|
|
@@ -107,6 +117,7 @@ def to_file_like( | |
| file_like, | ||
| format: str, | ||
| *, | ||
| codec: Optional[str] = None, | ||
| pixel_format: Optional[str] = None, | ||
| crf: Optional[Union[int, float]] = None, | ||
| preset: Optional[Union[str, int]] = None, | ||
|
|
@@ -121,6 +132,9 @@ def to_file_like( | |
| int = 0) -> int``. | ||
| format (str): The container format of the encoded frames, e.g. "mp4", "mov", | ||
| "mkv", "avi", "webm", "flv", etc. | ||
| codec (str, optional): The codec to use for encoding (e.g., "libx264", | ||
| "h264"). If not specified, the default codec | ||
| for the container format will be used. | ||
| pixel_format (str, optional): The pixel format for encoding (e.g., | ||
| "yuv420p", "yuv444p"). If not specified, uses codec's default format. | ||
| crf (int or float, optional): Constant Rate Factor for encoding quality. Lower values | ||
|
|
@@ -137,6 +151,7 @@ def to_file_like( | |
| frame_rate=self._frame_rate, | ||
| format=format, | ||
| file_like=file_like, | ||
| codec=codec, | ||
| pixel_format=pixel_format, | ||
| crf=crf, | ||
| preset=preset, | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -572,6 +572,27 @@ class TestVideoEncoder: | |
| def decode(self, source=None) -> torch.Tensor: | ||
| return VideoDecoder(source).get_frames_in_range(start=0, stop=60) | ||
|
|
||
| def _get_codec_spec(self, file_path): | ||
| """Helper function to get codec name from a video file using ffprobe.""" | ||
| result = subprocess.run( | ||
| [ | ||
| "ffprobe", | ||
| "-v", | ||
| "error", | ||
| "-select_streams", | ||
| "v:0", | ||
| "-show_entries", | ||
| "stream=codec_name", | ||
| "-of", | ||
| "default=noprint_wrappers=1:nokey=1", | ||
| str(file_path), | ||
| ], | ||
| capture_output=True, | ||
| check=True, | ||
| text=True, | ||
| ) | ||
| return result.stdout.strip() | ||
|
|
||
| @pytest.mark.parametrize("method", ("to_file", "to_tensor", "to_file_like")) | ||
| def test_bad_input_parameterized(self, tmp_path, method): | ||
| if method == "to_file": | ||
|
|
@@ -610,6 +631,16 @@ def test_bad_input_parameterized(self, tmp_path, method): | |
| ) | ||
| getattr(encoder, method)(**valid_params) | ||
|
|
||
| with pytest.raises( | ||
| RuntimeError, | ||
| match=r"Video codec invalid_codec_name not found.", | ||
| ): | ||
| encoder = VideoEncoder( | ||
| frames=torch.zeros((5, 3, 64, 64), dtype=torch.uint8), | ||
| frame_rate=30, | ||
| ) | ||
| encoder.to_file(str(tmp_path / "output.mp4"), codec="invalid_codec_name") | ||
|
|
||
| with pytest.raises(RuntimeError, match=r"crf=-10 is out of valid range"): | ||
| encoder = VideoEncoder( | ||
| frames=torch.zeros((5, 3, 64, 64), dtype=torch.uint8), | ||
|
|
@@ -990,3 +1021,72 @@ def write(self, data): | |
| RuntimeError, match="File like object must implement a seek method" | ||
| ): | ||
| encoder.to_file_like(NoSeekMethod(), format="mp4") | ||
|
|
||
| @pytest.mark.skipif( | ||
| in_fbcode(), | ||
| reason="ffprobe not available internally", | ||
| ) | ||
| @pytest.mark.parametrize( | ||
| "format,codec_spec", | ||
| [ | ||
| ("mp4", "h264"), | ||
| ("mp4", "hevc"), | ||
| ("mkv", "av1"), | ||
| ("avi", "mpeg4"), | ||
| pytest.param( | ||
| "webm", | ||
| "vp9", | ||
| marks=pytest.mark.skipif( | ||
| IS_WINDOWS, reason="vp9 codec not available on Windows" | ||
| ), | ||
| ), | ||
| ], | ||
| ) | ||
| def test_codec_parameter_utilized(self, tmp_path, format, codec_spec): | ||
| # Test the codec parameter is utilized by using ffprobe to check the encoded file's codec spec | ||
| frames = torch.zeros((10, 3, 64, 64), dtype=torch.uint8) | ||
| dest = str(tmp_path / f"output.{format}") | ||
|
|
||
| VideoEncoder(frames=frames, frame_rate=30).to_file(dest=dest, codec=codec_spec) | ||
| actual_codec_spec = self._get_codec_spec(dest) | ||
| assert actual_codec_spec == codec_spec | ||
|
|
||
| @pytest.mark.skipif( | ||
| in_fbcode(), | ||
| reason="ffprobe not available internally", | ||
| ) | ||
| @pytest.mark.parametrize( | ||
| "codec_spec,codec_impl", | ||
| [ | ||
| ("h264", "libx264"), | ||
| ("av1", "libaom-av1"), | ||
| pytest.param( | ||
| "vp9", | ||
| "libvpx-vp9", | ||
| marks=pytest.mark.skipif( | ||
| IS_WINDOWS, reason="vp9 codec not available on Windows" | ||
| ), | ||
| ), | ||
| ], | ||
| ) | ||
| def test_codec_spec_vs_impl_equivalence(self, tmp_path, codec_spec, codec_impl): | ||
| # Test that using codec spec gives the same result as using default codec implementation | ||
| # We cannot directly check codec impl used, so we assert frame equality | ||
| frames = torch.randint(0, 256, (10, 3, 64, 64), dtype=torch.uint8) | ||
|
|
||
| spec_output = str(tmp_path / "spec_output.mp4") | ||
| VideoEncoder(frames=frames, frame_rate=30).to_file( | ||
| dest=spec_output, codec=codec_spec | ||
| ) | ||
|
|
||
| impl_output = str(tmp_path / "impl_output.mp4") | ||
| VideoEncoder(frames=frames, frame_rate=30).to_file( | ||
| dest=impl_output, codec=codec_impl | ||
| ) | ||
|
|
||
| assert self._get_codec_spec(spec_output) == codec_spec | ||
| assert self._get_codec_spec(impl_output) == codec_spec | ||
|
|
||
| frames_spec = self.decode(spec_output).data | ||
| frames_impl = self.decode(impl_output).data | ||
| torch.testing.assert_close(frames_spec, frames_impl, rtol=0, atol=0) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great tests! |
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized oformat is a pointer so let's add