<h3 align = center>How to make a simple video using FFMPEG</h3>

Introduciton: Recently, I helped someone to make youtube videos. Originally I used a interface software but the free version has a lot of limitations. Plus, using a command-line software can be faster for certain functions (but for others interface version is better). I use AI to write the majority of the code here. At first, I thought it was easy but it turned out to be a lot more complicated than I thought. I will explain below. Though complicated, I learnt some techinical stuff about videos. <br>
And, recently, I started learning data science and learnt to use Jupyter Notebook. I find Jupyter Notebook a lot easier when I need to run a script of many different functions. <br>
In this tutorial, you will learn some basics of making a video using FFMPEG, including: <br>
- how to overlay video, music and images on the main video
- how to combine different videos
- how to trim a video
- how to use subtitle in ass format and burn it in a video
- how to add a mosiac
- how to extract audio and convert to text
- how to translate the subtitle to different languages
- how to zoom in an image
- how to concatenate videos
- how to insert a black still(or other colors) with text when introducing a topic


<h3 align = center>Things you need to download</h3>
The most important is <i><b>ffmpeg</b></i>. Use home brew to download it (I am using Mac). <br>
Here are the python packages I use:  <br>
<i>subprocess, os, sys, shlex, pathlib, re,datetime, math, json, PIL, fractions, matplotli, tempfile, shutil, cv2, numpy, whisper, opencc</i> <br>
The code we are using here is in <b><i>video_commands.py</i></b>.


<h3 align = center> Let's begin! </h3>
The media files used in this file are in the folder <i>media_files</i>. The main video is a documentary in 1944 called <i>Marines at Tarawa - Return to Guam</i>. Originall it is 40 minutes but I already trimmed it to 5 minutes. This is the main video of this tutorial called <i><b>marines_5min.mov</i></b>. If you want to trim it yourself, you can download it here: <br>
<a href ="https://archive.org/details/publicmovies212/Marines_at_Tarawa_Return_to_Guam.webm">https://archive.org/details/publicmovies212/Marines_at_Tarawa_Return_to_Guam.webm</a>

<h3 align = center>Trimming a video </h3>
Say, we want to extract <i>11:00 to 16:00</i> from the video Marines_at_Tarawa_Return_to_Guam.mp4(the original 40-min video) and rename the output to <i>media_files/marines_5min.mov</i>, here is the ffmpeg command:<br><br>
<i>ffmpeg -y -ss 11:00 -to 16:00 -i Marines_at_Tarawa_Return_to_Guam.mp4 -c:v libx264 -preset veryfast -crf 18 -c:a copy  media_files/marines_5min.mov</i> <br><br>
Below is the python version. Pay attention to these parameters in the code:<br><br>
<b><i>-c:v libx264 -preset veryfast -crf 18</i></b>  <br><br>
HEVC 264(libx264) is a very common compression format and compatible with most video players. An efficient compression format can give you a similar video quality and smaller file size.(This is something new I learn. I used to think the larger the size, the better the quality!) <br><br>
And the actual quality depends on <b><i>crf</i></b> and <b><i>preset</i></b>. <b><i>crf</i></b> ranges from 0 - 50. The smaller the value, the better the quality. <b><i>preset</i></b> has values of "veryslow","slow","medium","fast","veryfast" and "ultrafast". These two together determine the compression time and file size. If you choose a small value of <b><i>crf</i></b> and "veryslow" for <b><i>preset</i></b>, you end up having a video a lot larger than the original.<br><br>

Usually, <b><i> -crf 18 -preset fast</b></i> is good enough.



In [None]:
from video_commands import * 
main_video = "Marines_at_Tarawa_Return_to_Guam.mp4"
start_time = "11:00" #you can enter as 1)an interger which is second, 2)mm:ss, or 3)hh:mm:ss
end_time = "16:00" 
output_file = "media_files/marines_5min.mov"
trim_video(main_video, start_time, end_time, output_file,crf=18, preset = "fast")

<h3 align = center>Codec of a video</h3>
Now we have produced the main video, let's take a look at the codec(technical details). It has got video and audio parts:<br>
ffmpeg code for video: <br>
<i>ffprobe -v error -select_streams v:0 -show_entries stream=codec_name,width,height,r_frame_rate,pix_fmt -of csv=p=0 media_files/marines_5min.mov</i>
ffmpeg code for audio: <br>
<i>ffprobe -v error -select_streams a:0 -show_entries stream=codec_name,sample_rate,channels -of csv=p=0 media_files/marines_5min.mov</i>
<br><br>
Now let's run the code below:<br>

In [None]:
from video_commands import * 
file_list = ["media_files/marines_5min.mov"]
print_media_info(file_list)

Here is the output on the screen when you run the python code above:<br><br>

---- media_files/marines_5min.mov ---- <br>
video info:  ffprobe -v error -select_streams v:0 -show_entries stream=codec_name,width,height,r_frame_rate,pix_fmt -of csv=p=0 <br> media_files/marines_5min.mov<br>
Video: h264,556,412,yuv420p,25/1 <br>
audio info:  ffprobe -v error -select_streams a:0 -show_entries stream=codec_name,sample_rate,channels -of csv=p=0 <br>media_files/marines_5min.mov<br>
Audio: aac,44100,2<br>
video start and end time:0.000000,300.000000<br>
audio start and end time:0.000000,300.009002<br>

For the video part: <br>
<i>h264</i> is the compression format I just talked about<br>
<i>556,412 </i>are the width and height of the video respectively<br>
<i>yuv420p</i> is the Chroma Subsampling Scheme. I don't know the importance of this.<br>
<i>25/1</i> is the frame rate<br>

For the audio part: <br>
<i>aac</i> is the most commom audio codec
<i>44100</i>(44.1kHz) us  the standard sample rate for audio CDs
<i>2</i> The number of audio channels

When concatenating videos of different format, they have to be re-encoded first or concatenation would fail. We will get back to this later. <br>
The most important thing to note here is the final lines: <br>
<i>video start and end time:0.000000,300.000000<br>
audio start and end time:0.000000,300.009002</i><br>

You see the video and audio don't have the same duration. This would cause asynchronization when concatenating different videos(even though the difference is < 0.1s, the asynchronization is noticeable). You see the video and audio don't match. I struggled a lot to understand this. I used to think it was the problem of the ffmpeg command but indeed it is not. You have to trim either the video or the audio so that the duration match. We will do this later.<br>


<h3 align = center>Splitting a video into sections and combining with black stills</h3>
For my video, I need to insert a few black stills to separate the video in different sections. The black still looks like this:
<p align="center">
    <img src="https://helen-poon.github.io/ffmpeg_video/black_screen.png" 
         alt="Black screen separator" 
         style="max-width: 100%; height: auto;">
</p>


Say, the first black still appears at 1:30, the second one 2:30 and the third one 4:00. Then we have to split the video into different sections: <br>
1) 0:00 - 1:30 <br>
2) 1:30 - 2:30 <br>
3) 2:30 - 4:00 <br>
4) 4:00 - the end <br>
First, let's split the main video into 4 sections, namely <i>video1.mov</i>, <i>video2.mov</i>, <i>video3.mov</i> and <i>video4.mov</i><br><br>
<i>ffmpeg -y -ss 0.000 -i media_files/marines_5min.mov -t 90.000 -c:v libx264 -preset fast -c:a pcm_s16le -ar 48000 -ac 2 -movflags +faststart video1.mov</i><br>
<i> ffmpeg -y -ss 90.000 -i media_files/marines_5min.mov -t 60.000 -c:v libx264 -preset fast -c:a pcm_s16le -ar 48000 -ac 2 -movflags +faststart video2.mov</i><br>
<i>ffmpeg -y -ss 150.000 -i media_files/marines_5min.mov -t 90.000 -c:v libx264 -preset fast -c:a pcm_s16le -ar 48000 -ac 2 -movflags +faststart video3.mov </i><br>
<i>ffmpeg -y -ss 240.000 -i media_files/marines_5min.mov -t 60.009 -c:v libx264 -preset fast -c:a pcm_s16le -ar 48000 -ac 2 -movflags +faststart video4.mov
 </i><br><br>

 Below is the python code. Let's run it.<br>


In [None]:
main_video = "media_files/marines_5min.mov"
the_end = get_video_length(main_video) #we retrive the exact end time in hh:mm:ss by calling this function
sections = ["0:00 - 1:30","1:30 - 2:30","2:30 - 4:00",f"4:00-{the_end}"]
#Below are the output files. The length of this list has to match the length of sections
output_files = ["video1.mov","video2.mov","video3.mov","video4.mov"] 
split_video(main_video, sections, output_files, audio_codec="wav")

<h3 align = center>Creating a black still</h3>
Now we are going to create some black stills to be inserted into the video. The main video has 4 different sections, so we insert 3 black stills. Say, the text are "Section 1", "Section 2", and "Section 3", each last for 5 seconds. The output names are  "black1.mov","black2.mov"and "black3.mov". We first create a blank black still with the same dimensions(556x412) as the main video:<br><br>

<i>ffmpeg -y -f lavfi -i color=c=black:s=556x412:r=25.0:d=5 -f lavfi -i anullsrc=r=44100:cl=stereo -shortest -c:v libx264 -pix_fmt yuv420p -c:a pcm_s16le temp_black.mp4</i><br>

Then python creates an ass file with the text called <i>temp_sub.ass</i>. Finally <i>temp_sub.ass</i> is burnt in to <i>temp_black.mp4</i> and the final output is what we want.

<i>ffmpeg -y -i temp_black.mp4 -vf ass=temp_sub.ass -c:v libx264 -pix_fmt yuv420p -c:a copy black1.mov</i>




In [None]:
from video_commands import * 
main_video = "media_files/marines_5min.mov"
text = ["Section 1","Section 2","Section 3"]
output_files = ["black1.mov","black2.mov","black3.mov"]
duration = 5 # in seconds
for txt,output_names in zip(text,output_files):
    output_file = create_black_still(main_video,txt,duration,output_names,font_name="Arial",font_size=72,
    font_color="&H00FFFFFF" #Solid White
)

The black stills are produced with the same codec as the main video, we can check them:

In [None]:
file_list = ["media_files/marines_5min.mov","video1.mov","video2.mov","video3.mov","video5.mov",
            "black1.mov","black2.mov","black3.mov"]
print_media_info(file_list)

You see the video codecs are the same, but the audio codec is <i>pcm_s16l</i> for the segmented videos and black stills. This format is loseless and preserves the quality. But for the final audio format, we will convert to aac. Also note that for the black stills, the audio duration and video duration do not match. We are going to deal with this with the following function. It first extracts the codec from the main video, then convert other videos with the same codec, finally trim the video or audio duration so that they match.<br>
<b>Note that this step is actually unnecessary because it is incorporated in the next function <i>combine_video()</i></b>. I am just showing what reencoding does here. This process may take > 10 mins. To speed things up, change the <i>crf</i> and <i>preset</i> values.<br>


In [None]:
from video_commands import * 
main_video = "media_files/marines_5min.mov"
# The main video serves as the "standard" codec for others to follow
list_to_reencode = [main_video,"video1.mov","video2.mov","video3.mov","video4.mov",
            "black1.mov","black2.mov","black3.mov"]
reencoded_file_names, reencoded_file_dict =reencode_to_match(main_video, list_to_reencode,crf="50", preset="ultrafast")
"""
reencoded_file_names are the new file names and reencoded_file_dict is a dictionary with old video names as the key and
reencoded videos as the value
"""
print(reencoded_file_names, reencoded_file_dict)

Now we have the reencoded video name, and we can check the codec again. The reencoded video has a default new name. If the original name is <i>xxx.mov </i>, then the new name:  <i>xxx_reencoded_padded.mov </i>


In [None]:
list_of_videos = ["video1_reencoded_padded.mov","video2_reencoded_padded.mov","video3_reencoded_padded.mov",
"video4_reencoded_padded.mov","black1_reencoded_padded.mov","black2_reencoded_padded.mov",
                 "black3_reencoded_padded.mov"]
print_media_info(list_of_videos)

You see now the audio and video duration, and the codec are all the same. We can combine them safely now. The code creates a temporary folder and put all the intermediate files there. It first reencode all the videos to convert them to the same codec, then combine them. <b>Remember the above function <i>reencode_to_match()</i> is unnecessary.</b><br>
Now let's combine the videos:<br>

In [None]:
#input the video list in order of concatenation
video_list = ["video1.mov","black1.mov","video2.mov","black2.mov","video3.mov","black3.mov","video4.mov"]
#primary index is the video which serves as a "model" for reencoding. All other videos will have the same codec as this one. 
#Note the first video has a primary_index "1", not "0"
combine_video(video_list, primary_index=1, output_file="marines_5min_new.mov", crf=18, preset="fast")

Now you run the code below and you can see the black stills at 1:30, 2:35 and 4:10, each lasts for 5s.

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/Jou_m0InGRQ" frameborder="0" allowfullscreen></iframe>

<h3 align = center >Overlaying videos, images and music</h3>
Now we can proceed to overlaying other videos, images and music on the main video. We can even zoom in an image.

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/Jou_m0InGRQ" frameborder="0" allowfullscreen></iframe>