<a href="https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/youtube/video/convert_video_basic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Augmenting Video with FFMPEG

The code given in this utility runs the video through YOLO and put boxes around objects that YOLO recognizes.  The resolution and audio of the original video are maintained.  This code is a starting point for other offline reality augmentation projects that might use TensorFlow and video processing.  This script is based on [ffmpeg](https://en.wikipedia.org/wiki/FFmpeg), and assumes you have it installed.

Some other useful links:

* [YouTube Video About this Utility]()
* [More Basic Example of this Utility](https://github.com/jeffheaton/present/blob/master/youtube/video/convert_video_basic.ipynb)
* [GitHub Repository for this Utility](https://github.com/jeffheaton/present/blob/master/youtube/video/convert_video_basic.ipynb)
* [FFMPEG](https://www.ffmpeg.org/)

The following code allows you to upload a video file to this utility. This utility is designed to be run from Google CoLab.  Make sure to use a GPU for maxamum performance.

In [0]:
import os
from google.colab import files

uploaded = files.upload()

if len(uploaded)>1:
  print("Warning: you should only upload one video file. Only one will be processed.")

for name in uploaded.keys():
  data = uploaded[name]
  path = os.path.join('/content/',name)
  print(f"Uploaded: {path}")
  with open(path,"wb") as fp:
    fp.write(data)
    input_file = path

Saving demo_sync.mov to demo_sync.mov
Uploaded: /content/demo_sync.mov


This Python code runs a given command from the shell.  The primary purpose of this function is to run ffmpeg and capture its output collected for display, even in a Jupyter notebook.  This function creates a temporary to capture the output, I would prefer to use a string to capture the output, but was not able to get that working in CoLab.  If anyone has suggestions, please push a change or post an issue with a suggestion.

In [0]:
def execute_command(cmd):
  with open("temp.txt", 'w') as fp:
    subprocess.call(cmd, shell=True, stdout=fp)

  with open("temp.txt", 'r') as fp:
    result = fp.read()

  print(f"Executed command: {cmd}, result:")
  print(result)
  print("---------\n")
  return result.split('\n')

Next we call ffmpeg to convert the input video into a series of JPEG files and a .wav file that contains the audio.  We also query the file to determine the frames per second (FPS).  If we cannot determine FPS, 30 are assumed.

In [0]:
import os
import re
import subprocess
from scipy.io import wavfile

# Constants
FRAME_QUALITY = 3
SAMPLE_RATE = 44100
input_file = "/content/demo_sync.mov"
output_path = "/content/out.mp4"
temp_path = "/content/tmp"

# The input images (from video) and generated output images.
input_images = os.path.join(temp_path,'input-%d.jpg')
output_images = os.path.join(temp_path,'output-%d.jpg')

# Create a temporary directory to hold video frames
try:  
  os.mkdir(temp_path)
except OSError:  
  print("Temp dir already exists.")

# Delete audio file if it already exists
audio_file = os.path.join(temp_path,'audio.wav')

if os.path.exists(audio_file):
  os.remove(audio_file)

# First call to ffmpeg extracts the video image frames
execute_command(f"ffmpeg -i {input_file} -qscale:v {FRAME_QUALITY} {input_images} -hide_banner 2>&1")

# Second call to ffmpeg extracts the audio.  We also attempt to get the FPS from
# this call.
results = execute_command(f"ffmpeg -i {input_file} -ab 160k -ac 2 -ar {SAMPLE_RATE} -vn {audio_file} 2>&1")

frame_rate = 30 # default, but try to detect
for line in results:
  m = re.search('Stream #.*Video.* ([0-9]*) fps',line)
  if m is not None:
    frame_rate = float(m.group(1))
    print(f"Detected framerate of {frame_rate}")

# Report on the frame rate and attempt to obtain audio sample rate.
print(f"Frame rate used: {frame_rate}")

sampleRate, audioData = wavfile.read(audio_file)
audioSampleCount = audioData.shape[0]
#maxAudioVolume = getMaxVolume(audioData)

Executed command: ffmpeg -i /content/demo_sync.mov -qscale:v 3 /content/tmp/input-%d.jpg -hide_banner 2>&1, result:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/content/demo_sync.mov':
  Metadata:
    major_brand     : qt  
    minor_version   : 0
    compatible_brands: qt  
    creation_time   : 2020-04-03T00:19:33.000000Z
    com.apple.quicktime.location.ISO6709: +38.6253-090.5469+175.781/
    com.apple.quicktime.make: Apple
    com.apple.quicktime.model: iPhone X
    com.apple.quicktime.software: 13.3.1
    com.apple.quicktime.creationdate: 2020-04-02T19:18:24-0500
    com.apple.photos.originating.signature: AeNTOcgNQ/FA679XBLgOpeC3Vynj
  Duration: 00:00:10.73, start: 0.000000, bitrate: 10833 kb/s
    Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 99 kb/s (default)
    Metadata:
      creation_time   : 2020-04-03T00:19:33.000000Z
      handler_name    : Core Media Data Handler
    Stream #0:1(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709)

# Process the Video Frames in Some Way

In this example we are running each through YOLO.  First load YOLO package.

In [0]:
%tensorflow_version 2.x
import sys

!{sys.executable} -m pip install git+https://github.com/zzh8829/yolov3-tf2.git@master

Collecting git+https://github.com/zzh8829/yolov3-tf2.git@master
  Cloning https://github.com/zzh8829/yolov3-tf2.git (to revision master) to /tmp/pip-req-build-0qeeun3l
  Running command git clone -q https://github.com/zzh8829/yolov3-tf2.git /tmp/pip-req-build-0qeeun3l
Building wheels for collected packages: yolov3-tf2
  Building wheel for yolov3-tf2 (setup.py) ... [?25l[?25hdone
  Created wheel for yolov3-tf2: filename=yolov3_tf2-0.1-cp36-none-any.whl size=9219 sha256=b84eef6dfa384721b6543565868cb0dcded3bb21d63535ec917a98ab457a405c
  Stored in directory: /tmp/pip-ephem-wheel-cache-wx7doukt/wheels/59/1b/97/905ab51e9c0330efe8c3c518aff17de4ee91100412cd6dd553
Successfully built yolov3-tf2
Installing collected packages: yolov3-tf2
Successfully installed yolov3-tf2-0.1


Now download YOLO weights, this might take 5-10 minutes, even on CoLab.

In [0]:
import tensorflow as tf
import os

ROOT = '/content'

filename_darknet_weights = tf.keras.utils.get_file(
    os.path.join(ROOT,'yolov3.weights'),
    origin='https://pjreddie.com/media/files/yolov3.weights')
TINY = False

filename_convert_script = tf.keras.utils.get_file(
    os.path.join(os.getcwd(),'convert.py'),
    origin='https://raw.githubusercontent.com/zzh8829/yolov3-tf2/master/convert.py')

filename_classes = tf.keras.utils.get_file(
    os.path.join(ROOT,'coco.names'),
    origin='https://raw.githubusercontent.com/zzh8829/yolov3-tf2/master/data/coco.names')
filename_converted_weights = os.path.join(ROOT,'yolov3.tf')

Downloading data from https://pjreddie.com/media/files/yolov3.weights
Downloading data from https://raw.githubusercontent.com/zzh8829/yolov3-tf2/master/convert.py
Downloading data from https://raw.githubusercontent.com/zzh8829/yolov3-tf2/master/data/coco.names


Transform weights into a format usable by TensorFlow.

In [0]:
import sys
!{sys.executable} "{filename_convert_script}" --weights "{filename_darknet_weights}" --output "{filename_converted_weights}"

2020-04-06 01:32:03.397347: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
Model: "yolov3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input (InputLayer)              [(None, None, None,  0                                            
__________________________________________________________________________________________________
yolo_darknet (Model)            ((None, None, None,  40620640    input[0][0]                      
__________________________________________________________________________________________________
yolo_conv_0 (Model)             (None, None, None, 5 11024384    yolo_darknet[1][2]               
_________________________________________________________________________

In [0]:
Setup YOLO before beginning frame conversion.

In [0]:
from absl import app, flags, logging
from absl.flags import FLAGS
import cv2
import numpy as np
import tensorflow as tf
from yolov3_tf2.models import (YoloV3, YoloV3Tiny)
from yolov3_tf2.dataset import transform_images, load_tfrecord_dataset
from yolov3_tf2.utils import draw_outputs
import sys
from PIL import Image, ImageFile
import requests

# Flags are used to define several options for YOLO.
flags.DEFINE_string('classes', filename_classes, 'path to classes file')
flags.DEFINE_string('weights', filename_converted_weights, 'path to weights file')
flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny')
flags.DEFINE_integer('size', 416, 'resize images to')
flags.DEFINE_string('tfrecord', None, 'tfrecord instead of image')
flags.DEFINE_integer('num_classes', 80, 'number of classes in the model')
FLAGS([sys.argv[0]])

# Locate devices to run YOLO on (e.g. GPU)
physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)

  
# This example does not use the "Tiny version"
if FLAGS.tiny:
    yolo = YoloV3Tiny(classes=FLAGS.num_classes)
else:
    yolo = YoloV3(classes=FLAGS.num_classes)

# Load weights and classes
yolo.load_weights(FLAGS.weights).expect_partial()
print('weights loaded')

class_names = [c.strip() for c in open(FLAGS.classes).readlines()]
print('classes loaded')

weights loaded
classes loaded


In [0]:
# Nicely formatted time string
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return f"{h}:{m:>02}:{s:>05.2f}"

Process one YOLO image with the function process_image, then loop over all.

In [0]:
import time

def process_image(img_raw):
  # Preprocess image
  img = tf.expand_dims(img_raw, 0)
  img = transform_images(img, FLAGS.size)

  # Desired threshold (any sub-image below this confidence level will be ignored.)
  FLAGS.yolo_score_threshold = 0.5

  # Recognize and report results
  boxes, scores, classes, nums = yolo(img)

  # Display image using YOLO library's built in function
  img = img_raw.numpy()
  img = draw_outputs(img, (boxes, scores, classes, nums), class_names)
  with open(output_filename,"wb") as fp:
    fp.write(tf.io.encode_jpeg(img).numpy())

i = 1
done = False
frame_length = frame_rate/1000
total_video_processed = 0

while not done:
  input_filename = os.path.join(temp_path,f'input-{i}.jpg')
  output_filename = os.path.join(temp_path,f'output-{i}.jpg')

  if os.path.exists(input_filename):
    with open(input_filename,'rb') as fp:
      img_bin = fp.read()
      img_raw = tf.image.decode_image(img_bin, channels=3)
      process_image(img_raw)
  else:
    done = True

  total_video_processed += frame_length

  if i%100 == 0 or done:
    print(f"Processed image: {i}, video processed: {hms_string(total_video_processed)}")
  i+=1

Processed image: 100, video processed: 0:00:03.00
Processed image: 200, video processed: 0:00:06.00
Processed image: 300, video processed: 0:00:09.00
Processed image: 323, video processed: 0:00:09.69


# Build Final Video File

At this point we have all of the images generated and audio file.  We use ffmpeg to put it all back together into a video file.

In [0]:
if os.path.exists(output_path):
  os.remove(output_path)

execute_command(f"ffmpeg -framerate {frame_rate} -i {output_images} -i {audio_file} -strict -2 {output_path} 2>&1")

Executed command: ffmpeg -framerate 30.0 -i /content/tmp/output-%d.jpg -i /content/tmp/audio.wav -strict -2 /content/out.mp4 2>&1, result:
ffmpeg version 3.4.6-0ubuntu0.18.04.1 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.3.0-16ubuntu3)
  configuration: --prefix=/usr --extra-version=0ubuntu0.18.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-

['ffmpeg version 3.4.6-0ubuntu0.18.04.1 Copyright (c) 2000-2019 the FFmpeg developers',
 '  built with gcc 7 (Ubuntu 7.3.0-16ubuntu3)',
 '  configuration: --prefix=/usr --extra-version=0ubuntu0.18.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-lib

You might get an error running the following cell (failed to fetch) if you run it too quickly after ffmpeg finishes.  Try rerunning just this cell.  You should get a file named out.mp4 downloaded.

In [0]:
files.download(output_path)