# Video Donwload
This script is used to download videos from kinects dataset.

**Has a python class with a class helper builder from this in classpath.**
___

Kinectis dataset
- [Official Page](https://deepmind.com/research/open-source/kinetics)
- [Paper](https://arxiv.org/abs/1907.06987)

## Download the zip
Make a download resources and save it in dataset path.

In [16]:
%%bash
mkdir dataset
curl https://storage.googleapis.com/deepmind-media/research/Kinetics_700.zip --output dataset/kinects_700.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0 52 21.7M   52 11.4M    0     0  11.8M      0  0:00:01 --:--:--  0:00:01 11.8M100 21.7M  100 21.7M    0     0  14.8M      0  0:00:01  0:00:01 --:--:-- 14.8M


extract the infos from zip file

In [17]:
%%bash
cd dataset
unzip kinects_700.zip

Archive:  kinects_700.zip
  inflating: kinetics_700_readme.txt  
   creating: __MACOSX/
  inflating: __MACOSX/._kinetics_700_readme.txt  
  inflating: kinetics_700_train.zip  
  inflating: __MACOSX/._kinetics_700_train.zip  
  inflating: kinetics_700_test.zip   
  inflating: __MACOSX/._kinetics_700_test.zip  
  inflating: kinetics_700_val.zip    
  inflating: __MACOSX/._kinetics_700_val.zip  


In [18]:
%%bash
cd dataset
unzip kinetics_700_train.zip
unzip kinetics_700_test.zip
unzip kinetics_700_val.zip

Archive:  kinetics_700_train.zip
  inflating: kinetics_700_train.csv  
  inflating: kinetics_700_train.json  
Archive:  kinetics_700_test.zip
  inflating: kinetics_700_test.csv   
  inflating: kinetics_700_test.json  
Archive:  kinetics_700_val.zip
  inflating: kinetics_700_val.csv    
  inflating: kinetics_700_val.json   


## Create variables
Create variables with const values

In [1]:
import os

This method allow we to create child fonders in `path` and return a dict when key is folder name and value is a path to it.

In [2]:
def create_file_structure(path, folders_names):
    mapping = {}
    if not os.path.exists(path):
        os.mkdir(path)
    for name in folders_names:
        dir_ = os.path.join(path, name)
        if not os.path.exists(dir_):
            os.mkdir(dir_)
        mapping[name] = dir_
    return mapping

In [3]:
URL_BASE = 'https://www.youtube.com/watch?v='

VIDEO_EXTENSION = '.mp4'
VIDEO_FORMAT = 'mp4'

In [4]:
TRAIN_FOLDER = 'train'
VALIDATE_FOLDER = 'validate'
TEST_FOLDER = 'test'

In [5]:
BASE_PATH = os.path.join('/', 'home', os.environ['USER'], 'kinects')

### Create files
this files is used for save the videos after download

In [6]:
if not os.path.exists(BASE_PATH):
    os.mkdir(BASE_PATH)
else:
    print(f'Path {BASE_PATH} already exists!')

Path /home/renato/kinects already exists!


In [7]:
# file to save train, validate and test
folders = create_file_structure(BASE_PATH, [TRAIN_FOLDER, VALIDATE_FOLDER, TEST_FOLDER])

In [8]:
folders

{'train': '/home/renato/kinects/train',
 'validate': '/home/renato/kinects/validate',
 'test': '/home/renato/kinects/test'}

## Video Trimming
That method is used to trimming videos before download

In [9]:
import subprocess
import ffmpeg

In [10]:
def trim(row, label_to_dir, test=False):
    label = row['label'] if not test else ''
    filename = row['youtube_id']
    time_start = row['time_start']
    time_end = row['time_end']

    input_filename = os.path.join(label_to_dir['tmp'], f'{filename}{VIDEO_EXTENSION}')
    output_filename = os.path.join(label_to_dir[label], f'{filename}{VIDEO_EXTENSION}')

    if os.path.exists(output_filename):
        print('Already trimmed: ', filename)
    else:
        print('Start trimming: ', filename)

        try:
            ffmpeg.trim(ffmpeg.input(input_filename), start=time_start, end=time_end).output(
                output_filename).run()
        except Exception as e:
            print(f'Error in trimming: {e}')

        print('Finish trimming: ', filename)
        
    return output_filename

## Method to video download
Make a download of a video

In [11]:
import pandas as pd
import pytube

In [12]:
def download_clip(row, label_to_dir, test=False):
    filename = row['youtube_id']

    if not os.path.exists(os.path.join(label_to_dir['tmp'], filename + VIDEO_EXTENSION)):
        print('Start downloading: ', filename)
        try:
            pytube.YouTube(URL_BASE + filename) \
                .streams \
                .filter(subtype=VIDEO_FORMAT) \
                .first() \
                .download(label_to_dir['tmp'], filename)
            print('Finish downloading: ', filename)
        except KeyError as e:
            print(f'Key Error {e}')
            return
        except Exception as e:
            print(f'Error in download video: {e}')
            return
    else:
        print('Already downloaded: ', filename)

## Donwload
Create a method to make a download and trim the videos. Repair that I delete the `tmp` file.

In [13]:
import shutil

In [14]:
def download(path_csv, target, heads=5, test=False):
    links_data_frames = pd.read_csv(path_csv).head(heads)

    if not test:
        folders_names = links_data_frames['label'].unique().tolist() + ['tmp']
        label_to_dir = create_file_structure(path=target, folders_names=folders_names)

        [download_clip(row, label_to_dir) for _, row in links_data_frames.iterrows()]
        result = [trim(row, label_to_dir) for _, row in links_data_frames.iterrows()]

        shutil.rmtree(label_to_dir['tmp'])
    else:
        folders_names = ['tmp', '']
        label_to_dir = create_file_structure(path=target, folders_names=folders_names)

        [download_clip(row, label_to_dir) for _, row in links_data_frames.iterrows()]
        [trim(row, label_to_dir, test=True) for _, row in links_data_frames.iterrows()]

        shutil.rmtree(label_to_dir['tmp'])

### Download of Train test
Make a test download of a five videos from train
___

The videos path is `dataset/kinetics_700_train.csv` and the target path is the variable `TRAIN_VIDEOS_PATH`.

In [15]:
download('dataset/kinetics_700_train.csv', folders['train'], heads=1)

Start downloading:  oJCxnjaCoyI
Error in download video: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)>
Already trimmed:  oJCxnjaCoyI


In [16]:
from moviepy.editor import VideoFileClip

In [18]:
clip = VideoFileClip('/home/renato/kinects/train/abseiling/oJCxnjaCoyI.mp4')
clip.ipython_display(width=360)

t:   1%|          | 65/11850 [00:00<00:18, 649.59it/s, now=None]

Moviepy - Building video __temp__.mp4.
Moviepy - Writing video __temp__.mp4



                                                                    

Moviepy - Done !
Moviepy - video ready __temp__.mp4


ValueError: The duration of video __temp__.mp4 (395.0) exceeds the 'maxduration' attribute. You can increase 'maxduration', by passing 'maxduration' parameterto ipython_display function.But note that embedding large videos may take all the memory away !

### Download of Validate test
Make a test download of a five videos from train
___

The videos path is `dataset/kinetics_700_val.csv` and the target path is the variable `VALIDATE_VIDEOS_PATH`.

In [35]:
download('dataset/kinetics_700_val.csv', folders['validate'], heads=1)

Start downloading:  ixq5OGYjjmA
Error in download video: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)>
Start trimming:  ixq5OGYjjmA
Error in trimming: ffmpeg error (see stderr output for detail)
Finish trimming:  ixq5OGYjjmA


### Download of Test
Make a test download of a five videos from test
___

The videos path is `dataset/kinetics_700_test.csv` and the target path is the variable `TEST_VIDEOS_PATH`.

In [17]:
download('dataset/kinetics_700_test.csv', folders['test'], heads=8, test=True)

Start downloading:  6dEpI75FOeo
Error in download video: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)>
Start downloading:  15H3EqaHVi0
Error in download video: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)>
Start downloading:  d6Ko4hm8M8E
Error in download video: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)>
Start downloading:  -SakeFNtM0s
Error in download video: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)>
Start downloading:  at3mGS-FAVg
Error in download video: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)>
Start downloading:  0-qpI81QREc
Error in download video: <urlopen erro