# Video Download
This script is used to download videos from Kinetics dataset.

**Has a python class with a class helper builder from this in classpath.**
___

Kinetics dataset
- [Official Page](https://deepmind.com/research/open-source/kinetics)
- [Paper](https://arxiv.org/abs/1907.06987)

In [1]:
import os
import shutil

import pandas as pd
import pytube

from moviepy.editor import VideoFileClip
from moviepy.video.io.ffmpeg_tools import ffmpeg_extract_subclip

## Download the zip
Make a download resources and save it in dataset path.

In [2]:
%%bash
mkdir dataset
curl https://storage.googleapis.com/deepmind-media/Datasets/kinetics700_2020.tar.gz --output dataset/Kinetics.tar.gz

mkdir: cannot create directory ‘dataset’: File exists
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0 24.2M    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0 31 24.2M   31 7886k    0     0  5401k      0  0:00:04  0:00:01  0:00:03 5398k 79 24.2M   79 19.3M    0     0  8051k      0  0:00:03  0:00:02  0:00:01 8048k100 24.2M  100 24.2M    0     0  8445k      0  0:00:02  0:00:02 --:--:-- 8442k


Extract the infos from zip file

In [3]:
%%bash
cd dataset
tar -xf Kinetics.tar.gz

## Create variables
Create variables with const values

This method allow we to create folders in `path` and return a dict when key is folder name and value is a path to it.

In [4]:
def create_file_structure(path, folders_names):
    mapping = {}
    if not os.path.exists(path):
        os.mkdir(path)
    for name in folders_names:
        dir_ = os.path.join(path, name)
        if not os.path.exists(dir_):
            os.mkdir(dir_)
        mapping[name] = dir_
    return mapping

In [5]:
URL_BASE = 'https://www.youtube.com/watch?v='

VIDEO_EXTENSION = '.mp4'
VIDEO_FORMAT = 'mp4'

In [6]:
TRAIN_FOLDER = 'train'
VALIDATE_FOLDER = 'validate'
TEST_FOLDER = 'test'

In [7]:
BASE_PATH = os.path.join('/', 'home', os.environ['USER'], '.kinetics')

### Create files
this files are used to save the videos after download

In [9]:
if not os.path.exists(BASE_PATH):
    os.mkdir(BASE_PATH)
else:
    print(f'Path {BASE_PATH} already exists!')

Path /home/dantas/.kinetics already exists!


In [10]:
# file to save train, validate and test
folders = create_file_structure(BASE_PATH, [TRAIN_FOLDER, VALIDATE_FOLDER, TEST_FOLDER])

In [11]:
1folders

{'train': '/home/dantas/.kinetics/train',
 'validate': '/home/dantas/.kinetics/validate',
 'test': '/home/dantas/.kinetics/test'}

## Video Trimming
That method is used to trimming videos before download

In [12]:
def trim(row, label_to_dir, test=False):
    label = row['label'] if not test else ''
    filename = row['youtube_id']
    time_start = row['time_start']
    time_end = row['time_end']

    input_filename = os.path.join(label_to_dir['tmp'], f'{filename}{VIDEO_EXTENSION}')
    output_filename = os.path.join(label_to_dir[label], f'{filename}{VIDEO_EXTENSION}')

    if os.path.exists(output_filename):
        print('Already trimmed: ', filename)
    else:
        print('Start trimming: ', filename)

        try:
            ffmpeg_extract_subclip(input_filename, time_start, time_end, targetname=output_filename)
        except Exception as e:
            print(f'Error in trimming: {e}')
        
        finally:
            os.remove(input_filename)
            print('Finish trimming: ', filename)
        
    return output_filename

## Method to video download
Make a download of a video

In [13]:
def download_clip(row, label_to_dir, test=False):
    filename = row['youtube_id']

    if not os.path.exists(os.path.join(label_to_dir['tmp'], filename + VIDEO_EXTENSION)):
        print('Start downloading: ', filename)
        try:
            pytube.YouTube(URL_BASE + filename) \
                .streams \
                .filter(subtype=VIDEO_FORMAT) \
                .first() \
                .download(label_to_dir['tmp'], filename)
            print('Finish downloading: ', filename)
        except KeyError as e:
            print(f'Key Error {e}')
            return
        except Exception as e:
            print(f'Error in download video: {e}')
            return
    else:
        print('Already downloaded: ', filename)

## Download
Create a method to make a download and trim the videos. Repair that I delete the `tmp` file.

In [24]:
def download(path_csv, target, heads=5, test=False):
    links_data_frames = pd.read_csv(path_csv).head(heads)
    
    if not test:
        folders_names = links_data_frames['label'].unique().tolist() + ['tmp']
        label_to_dir = create_file_structure(path=target, folders_names=folders_names)
        
        try:
            trimming = []
            for _, row in links_data_frames.iterrows():
                download_clip(row, label_to_dir)
                trimming.append(trim(row, label_to_dir))
            
            return trimming

        finally:
            shutil.rmtree(label_to_dir['tmp'])
    else:
        folders_names = ['tmp', '']
        label_to_dir = create_file_structure(path=target, folders_names=folders_names)
        try:
            for _, row in links_data_frames.iterrows():
                download_clip(row, label_to_dir)
                return trim(row, label_to_dir, test=True)
        finally:
            shutil.rmtree(label_to_dir['tmp'])

### Download of Train test
Make a test download of a five videos from train
___

The videos path is `dataset/kinetics_700_train.csv` and the target path is the variable `TRAIN_VIDEOS_PATH`.

In [25]:
train_trimming_download = download('dataset/kinetics700_2020/train.csv', folders['train'])

Start downloading:  ---0dWlqevI
Finish downloading:  ---0dWlqevI
Already trimmed:  ---0dWlqevI
Start downloading:  ---aQ-tA5_A
Finish downloading:  ---aQ-tA5_A
Start trimming:  ---aQ-tA5_A
Moviepy - Running:
>>> "+ " ".join(cmd)
Moviepy - Command successful
Finish trimming:  ---aQ-tA5_A
Start downloading:  ---j12rm3WI
Finish downloading:  ---j12rm3WI
Start trimming:  ---j12rm3WI
Moviepy - Running:
>>> "+ " ".join(cmd)
Moviepy - Command successful
Finish trimming:  ---j12rm3WI
Start downloading:  --07WQ2iBlw
Finish downloading:  --07WQ2iBlw
Start trimming:  --07WQ2iBlw
Moviepy - Running:
>>> "+ " ".join(cmd)
Moviepy - Command successful
Finish trimming:  --07WQ2iBlw
Start downloading:  --0NTAs-fA0
Finish downloading:  --0NTAs-fA0
Start trimming:  --0NTAs-fA0
Moviepy - Running:
>>> "+ " ".join(cmd)
Moviepy - Command successful
Finish trimming:  --0NTAs-fA0


In [26]:
train_trimming_download

['/home/dantas/.kinetics/train/clay pottery making/---0dWlqevI.mp4',
 '/home/dantas/.kinetics/train/news anchoring/---aQ-tA5_A.mp4',
 '/home/dantas/.kinetics/train/using bagging machine/---j12rm3WI.mp4',
 '/home/dantas/.kinetics/train/javelin throw/--07WQ2iBlw.mp4',
 '/home/dantas/.kinetics/train/climbing a rope/--0NTAs-fA0.mp4']

In [29]:
video_path = train_trimming_download[4]
clip = VideoFileClip(video_path)
clip.ipython_display(width=360)

t:   0%|          | 0/624 [00:00<?, ?it/s, now=None]                

Moviepy - Building video __temp__.mp4.
MoviePy - Writing audio in __temp__TEMP_MPY_wvf_snd.mp3
MoviePy - Done.
Moviepy - Writing video __temp__.mp4



                                                                

Moviepy - Done !
Moviepy - video ready __temp__.mp4




### Download of Validate test
Make a test download of a five videos from train
___

The videos path is `dataset/kinetics_700_val.csv` and the target path is the variable `VALIDATE_VIDEOS_PATH`.

In [21]:
validated_trimming_download = download('dataset/kinetics_700_val.csv', folders['validate'], heads=1)

Start downloading:  ixq5OGYjjmA
Finish downloading:  ixq5OGYjjmA
Already trimmed:  ixq5OGYjjmA


In [22]:
video_path = validated_trimming_download[0]
clip = VideoFileClip(video_path)
clip.ipython_display(width=360)

chunk:  26%|██▌       | 58/221 [00:00<00:00, 573.34it/s, now=None]

Moviepy - Building video __temp__.mp4.
MoviePy - Writing audio in %s


t:  48%|████▊     | 121/250 [00:00<00:00, 1199.68it/s, now=None]   

MoviePy - Done.
Moviepy - Writing video __temp__.mp4



                                                                

Moviepy - Done !
Moviepy - video ready __temp__.mp4




### Download of Test
Make a test download of a five videos from test
___

The videos path is `dataset/kinetics_700_test.csv` and the target path is the variable `TEST_VIDEOS_PATH`.

In [23]:
test_trimming_download = download('dataset/kinetics_700_test.csv', folders['test'], heads=1, test=True)

Start downloading:  6dEpI75FOeo
Finish downloading:  6dEpI75FOeo
Already trimmed:  6dEpI75FOeo


In [24]:
video_path = test_trimming_download[0]
clip = VideoFileClip(video_path)
clip.ipython_display(width=360)

chunk:   0%|          | 0/221 [00:00<?, ?it/s, now=None]

Moviepy - Building video __temp__.mp4.
MoviePy - Writing audio in %s


t:  11%|█         | 27/250 [00:00<00:00, 263.33it/s, now=None]     

MoviePy - Done.
Moviepy - Writing video __temp__.mp4



                                                               

Moviepy - Done !
Moviepy - video ready __temp__.mp4
