[RFC] VideoEncoder Python API

This RFC proposes the design / parameters exposed in our VideoEncoder Python API.

The implementation aims to be simple to use without a deep understanding of FFmpeg. To achieve this, the encoder should rely on FFmpeg to select the default codec and most, if not all, other parameters similar to the FFmpeg CLI.

### Proposed API
```python
def VideoEncoder(
    frames: Tensor,
    frame_rate: int,
) -> None:
	pass


def to_file(
  dest: str
) -> None:
	pass

VideoEncoder(frames=frames, frame_rate=25).to_file("output.mov") 


def to_tensor(
  format: str
) -> torch.Tensor:
	pass

encoded_tensor = VideoEncoder(frames=frames, frame_rate=25).to_tensor("mov")
```

This exposes the following parameters:
* frame_rate, frames
  *  We cannot reasonably assume a frame_rate, so it must be provided. 

These parameters were considered, but will not be exposed:
* codec
  * We will rely on FFmpeg to select a valid default codec for the provided container format. 
* crf, bit_rate
  * Internally, we can set crf to ensure the encoder is handling the frame data correctly. Generally, we will allow FFmpeg to determine quality via bit_rate or crf based on the codec's defaults.
* gop_size, max_b_frames
  * These parameters affect frame compression and seeking performance. We can rely on the codec's default values for now.
* output_pixel_format
  * We can rely on FFmpeg to select the best pixel format using [avcodec_find_best_pix_fmt_of_list](https://www.ffmpeg.org/doxygen/5.1/group__lavc__misc__pixfmt.html#ga9e74b43a3433ccfe836814f0a6371aa0), which aims to minimize loss. For lossless encoding, this should default to YUV444 when available.

### Questions
* What would a multistream encoder look like, given this proposal and the AudioEncoder?

1. Modular approach, similar to one laid out in [Audio Encoder Design](https://github.com/NicolasHug/torchcodec/blob/encoder_design/audio_encoder_design.md#thinking-ahead)
```python
Encoder()
	.add_audio(AudioEncoder(samples, sample_rate))
	.add_video(VideoEncoder(frames, frame_rate))
	.to_file(filename)
```

2. Combined approach, similar to [torchvision's write_video](https://github.com/pytorch/vision/blob/e3b5d3a8bf5e8636462fd8bce9897bccc690b2a0/torchvision/io/video.py#L58)
```python

Encoder(audio_samples=samples, sample_rate=sample_rate, video_frames=frames, frame_rate=frame_rate)
	.to_file(filename)
```
* How (if at all) should the VideoEncoder report to the user that their FFmpeg installation is missing a default codec? 
  * For reference, this is an error raised by FFmpeg CLI when attempting to encode to `webm` on FFmpeg 4.3.2: 
`Automatic encoder selection failed for output stream #0:0. Default encoder for format webm (codec vp8) is probably disabled. Please choose an encoder manually.`
* Which (if any) additional quality parameters should be exposed? Ex. [preset](https://trac.ffmpeg.org/wiki/Encode/H.264#Preset), [CRF](https://trac.ffmpeg.org/wiki/Encode/H.264#crf)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] VideoEncoder Python API #907

Proposed API

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] VideoEncoder Python API #907

Description

Proposed API

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions