# 2. Download and preprocess the video
In this notebook, we'll download the preprocess the video that we will be applying style transfer to. The output of the tutorial will be the extracted audio file of the video, which will be reused when stitching the video back together, as well as the video separated into individual frames.

The video that will be used in this tutorial is also of orangutans (just like the provided sample content images). The video is stored in a publib blob that we will download. However, for this section of the tutorial, you can choose to switch out the video with something of your own choice. Likewise, feel free to switch out the style image instead of using the provided image of a Renior painting.

```md
pytorch
├── images/
│   ├── orangutan/ [<-- this folder will contain all individual frames from the video]
│   ├── sample_content_images/
│   ├── sample_output_images/
│   └── style_images/
├── video/ [<-- create this new folder to put video content in]
│   ├── orangutan.mp4 [<-- this is the downloaded video]
│   └── orangutan.mp3 [<-- this is the extracted audio file from the video]
├── style_transfer_script.py
└── style_transfer_script.log
```

---

Import utilities to help us display images and html embeddings:

In [1]:
from IPython.display import HTML
import os
%load_ext dotenv
%dotenv

First, create the video folder store your video contents in.

In [2]:
%%bash
mkdir pytorch/video

Download the video that is stored in a public blob storage, located at https://happypathspublic.blob.core.windows.net/videos/orangutan.mp4

In [3]:
%%bash 
cd pytorch/video && 
    wget https://happypathspublic.blob.core.windows.net/videos/orangutan.mp4

--2018-10-16 21:45:52--  https://happypathspublic.blob.core.windows.net/videos/orangutan.mp4
Resolving happypathspublic.blob.core.windows.net (happypathspublic.blob.core.windows.net)... 52.239.214.164
Connecting to happypathspublic.blob.core.windows.net (happypathspublic.blob.core.windows.net)|52.239.214.164|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7961293 (7.6M) [video/mp4]
Saving to: ‘orangutan.mp4’

     0K .......... .......... .......... .......... ..........  0% 89.2M 0s
    50K .......... .......... .......... .......... ..........  1%  169M 0s
   100K .......... .......... .......... .......... ..........  1%  124M 0s
   150K .......... .......... .......... .......... ..........  2%  130M 0s
   200K .......... .......... .......... .......... ..........  3%  343M 0s
   250K .......... .......... .......... .......... ..........  3% 98.7M 0s
   300K .......... .......... .......... .......... ..........  4%  321M 0s
   350K .......... ..........

Set the environment variable __VIDEO_NAME__ to the name of the video as this will be used throughout the tutorial for convinience.

In [4]:
%%bash
dotenv set VIDEO_NAME orangutan

VIDEO_NAME=orangutan


Lets check out the video so we know what it looks like before hand:

In [5]:
%dotenv
HTML('\
    <video width="360" height="360" controls> \
         <source src="pytorch/video/{0}.mp4" type="video/mp4"> \
    </video>'\
    .format(os.getenv('VIDEO_NAME'))
)

Next, use __ffmpeg__ to extract the audio file and save it as orangutan.mp3 under the video directory.

In [6]:
%%bash 
cd pytorch/video &&
    ffmpeg -i ${VIDEO_NAME}.mp4 ${VIDEO_NAME}.mp3

ffmpeg version 3.4.4-0ubuntu0.18.04.1 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.3.0-16ubuntu3)
  configuration: --prefix=/usr --extra-version=0ubuntu0.18.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --ena

Finally, break up the frames of the video into separate individual images. The images will be saved inside a new folder under the `/images` directory, called `/orangutan`.

In [7]:
%%bash
cd pytorch/images/ &&
    mkdir ${VIDEO_NAME} && cd ${VIDEO_NAME} &&
    ffmpeg -i ../../video/${VIDEO_NAME}.mp4 %05d_${VIDEO_NAME}.jpg -hide_banner

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '../../video/orangutan.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.12.100
  Duration: 00:00:27.48, start: 0.000000, bitrate: 2317 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 720x720, 2242 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 69 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native))
Press [q] to stop, [?] for help
[swscaler @ 0x55bdafba7b60] deprecated pixel format used, make sure you did set range correctly
Output #0, image2, to '%05d_orangutan.jpg':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf57.83.100
    Stre

To make sure that the frames were successfully extracted, print out the number of images under `/pytorch/images/orangutan`. For the orangutan video, that number should be 823 individual images:

In [8]:
!cd pytorch/images/${VIDEO_NAME} && ls -1 | wc -l

823


---

## Conclusion
In this notebook, we downloaded the video that we will be applying neural style transfer to, and processed it so that we have the individual frames and audio track as seperate entities. In other scenarios, this can be thought of as preprocessing the data so that it is ready to be scored. 

Next, we will use the style transfer script from the previous notebook to batch apply style transfer to all extracted frames using Batch AI in Azure. But first, we need to [setup Azure so that we have the appropriate credentials and storage accounts.](./03_setup_azure.ipynb)