# Deep Learning Dance School
Using Impersonator++.

I used the [Impersonator++](https://github.com/iPERDance/iPERCore) repo to map the movements of my GF's dancing video on a static picture of me. The following script was ran on Google Colab.

**Note**: Make sure that your runtime type is 'Python 3.6+ with GPU acceleration'. To do so, go to Edit > Notebook settings > Hardware Accelerator > Select "GPU".

## Dependencies
### System Requirements
 - Linux (test on Ubuntu 16.04 and 18.04) or Windows (test on windows 10)
 - CUDA 10.1, 10.2, or 11.0
 - gcc 7.5+ (needs to support C++14)
 - ffmpeg (ffprobe) 4.3.1+

### Python Requirements
- Python 3.6+
- PyTorch tested on 1.7.0
- Torchvison tested on 0.8.1
- mmcv-full test on 1.2.0
- numpy>=1.19.3
- scipy>=1.5.2
- scikit-image>=0.17.2
- opencv-python>=4.4.0.40
- tensorboardX>=2.1
- tqdm>=4.48.2
- visdom>=0.1.8.9
- easydict>=1.9
- toml>=0.10.2
- git+https://github.com/open-mmlab/mmdetection.git@8179440ec5f75fe95484854af61ce6f6279f3bbc
- git+https://github.com/open-mmlab/mmediting@d4086aaf8a36ae830f1714aad585900d24ad1156
- git+https://github.com/iPERDance/neural_renderer.git@e5f54f71a8941acf372514eb92e289872f272653

## Guidelines
### Source/Photo Guidelines:
- Try to capture the source images with the same static background without too complex scene structures. If possible, we recommend using the
actual background.
- The person in the source images holds an A-pose for introducing the most visible textures.
- It is recommended to capture the source images in an environment without too much contrast in lighting conditions and lock auto-exposure and auto-focus of the camera.

### Reference/Video Guidelines:
- Make sure that there is only **one** person in the reference video. Since,currently, our system does not support multiple people tracking. If there are multiple people, you need firstly use other video processing tools to crop the video.
- Make sure that capture the video with full body person. Half body will result in bad results.
- Try to capture the video with the static camera lens, and make sure that there is no too much zoom-in, zoom-out, panning, lens swichtings, and camera transitions. If there are multiple lens switchting and camera transitions, you need firstly use other video processing tools to crop the video.

# 1. Installation


## 1.1 Instal ffmpeg (ffprobe) and set CUDA_HOME to the system enviroments

In [None]:
# Install ffmpeg (ffprobe)
!apt-get install ffmpeg

Reading package lists... Done
Building dependency tree       
Reading state information... Done
ffmpeg is already the newest version (7:3.4.8-0ubuntu0.2).
0 upgraded, 0 newly installed, 0 to remove and 14 not upgraded.


In [52]:
# set CUDA_HOME, here we use CUDA 10.1
import os
os.environ["CUDA_HOME"] = "/usr/local/cuda-10.1"

!echo $CUDA_HOME

/usr/local/cuda-10.1


## 1.1 Clone iPERCore Github Repo

In [None]:
!git clone https://github.com/iPERDance/iPERCore.git

Cloning into 'iPERCore'...
remote: Enumerating objects: 332, done.[K
remote: Counting objects: 100% (332/332), done.[K
remote: Compressing objects: 100% (261/261), done.[K
remote: Total 332 (delta 67), reused 289 (delta 45), pack-reused 0[K
Receiving objects: 100% (332/332), 11.65 MiB | 27.48 MiB/s, done.
Resolving deltas: 100% (67/67), done.


## 1.2 Setup

In [53]:
cd /content/iPERCore/

/content/iPERCore


In [None]:
!python setup.py develop

/usr/bin/python3 -m pip install pip==20.2.4
Collecting pip==20.2.4
[?25l  Downloading https://files.pythonhosted.org/packages/cb/28/91f26bd088ce8e22169032100d4260614fc3da435025ff389ef1d396a433/pip-20.2.4-py2.py3-none-any.whl (1.5MB)
[K     |████████████████████████████████| 1.5MB 7.7MB/s 
[?25hInstalling collected packages: pip
  Found existing installation: pip 19.3.1
    Uninstalling pip-19.3.1:
      Successfully uninstalled pip-19.3.1
Successfully installed pip-20.2.4
/usr/bin/python3 -m pip install torch==1.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
Looking in links: https://download.pytorch.org/whl/torch_stable.html
/usr/bin/python3 -m pip install torchvision==0.8.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
Looking in links: https://download.pytorch.org/whl/torch_stable.html
/usr/bin/python3 -m pip install mmcv-full==1.2.0+torch1.7.0+cu101 -f https://download.openmmlab.com/mmcv/dist/index.html
Looking in links: https://download.openmmlab.

## 1.3 Download assets
The assets contain all pre-trained models (**checkpoints.zip**), and other executable files (**executables.zip**, this one is only used for Windows. Linux ignores it), such as ffmpeg and ffprobe.

Download links: 
  - checkpoints: https://download.impersonator.org/iper_plus_plus_latest_checkpoints.zip
  - samples: https://download.impersonator.org/iper_plus_plus_latest_samples.zip

You can manually download the **checkpoints.zip**, unzip it, and mv the **checkpoints** (as well as the **samples**) to **assets** folder. 

 Otherwise, you can just run the following scripts to automaticially do these.

In [None]:
# Download all checkpoints
!wget -O assets/checkpoints.zip "https://download.impersonator.org/iper_plus_plus_latest_checkpoints.zip"
!unzip -o assets/checkpoints.zip -d assets/

!rm assets/checkpoints.zip

--2020-12-20 09:38:31--  https://download.impersonator.org/iper_plus_plus_latest_checkpoints.zip
Resolving download.impersonator.org (download.impersonator.org)... 101.32.75.151
Connecting to download.impersonator.org (download.impersonator.org)|101.32.75.151|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: ./iper_plus_plus_1.0.0_checkpoints.zip [following]
--2020-12-20 09:38:32--  https://download.impersonator.org/iper_plus_plus_1.0.0_checkpoints.zip
Reusing existing connection to download.impersonator.org:443.
HTTP request sent, awaiting response... 302 Found
Location: https://1drv.ws/u/s!AjjUqiJZsj8whLkwQyrk3W9_H7MzNA?e=rRje0G [following]
--2020-12-20 09:38:32--  https://1drv.ws/u/s!AjjUqiJZsj8whLkwQyrk3W9_H7MzNA?e=rRje0G
Resolving 1drv.ws (1drv.ws)... 168.235.93.122
Connecting to 1drv.ws (1drv.ws)|168.235.93.122|:443... connected.
HTTP request sent, awaiting response... 301 MOVED PERMANENTLY
Location: https://ciphww.dm.files.1drv.com/y4mgYHpttD4H4M9NLc

In [None]:
# download samples
!wget -O assets/samples.zip  "https://download.impersonator.org/iper_plus_plus_latest_samples.zip"
!unzip -o assets/samples.zip -d  assets
!rm assets/samples.zip

--2020-12-20 09:39:41--  https://download.impersonator.org/iper_plus_plus_latest_samples.zip
Resolving download.impersonator.org (download.impersonator.org)... 101.32.75.151
Connecting to download.impersonator.org (download.impersonator.org)|101.32.75.151|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: ./iper_plus_plus_1.0.0_samples.zip [following]
--2020-12-20 09:39:41--  https://download.impersonator.org/iper_plus_plus_1.0.0_samples.zip
Reusing existing connection to download.impersonator.org:443.
HTTP request sent, awaiting response... 302 Found
Location: https://1drv.ws/u/s!AjjUqiJZsj8whLobQPpoxo2hfhURrA?e=EUyIC2 [following]
--2020-12-20 09:39:42--  https://1drv.ws/u/s!AjjUqiJZsj8whLobQPpoxo2hfhURrA?e=EUyIC2
Resolving 1drv.ws (1drv.ws)... 168.235.93.122
Connecting to 1drv.ws (1drv.ws)|168.235.93.122|:443... connected.
HTTP request sent, awaiting response... 301 MOVED PERMANENTLY
Location: https://cirftg.dm.files.1drv.com/y4m3dRmiJGUnRi7aoD7SSR0c30Onmy

# 2 Run

In [54]:
cd /content/iPERCore/

/content/iPERCore


In [55]:
import os
import os.path as osp
import platform
import argparse
import time
import sys
import subprocess
import numpy as np
import cv2
from IPython.display import HTML
from base64 import b64encode

In [None]:
IMAGE_PATH = '/content/20161108_112909.jpg'
VIDEO_PATH = '/content/mirte_dance.mp4'
MODEL_NAME = 'jeroen_mirte'
VIDEO_CROPPED_PATH = VIDEO_PATH.replace('.mp4','_cropped.mp4')

## 2.1 Crop video
The dance-video of my GF was from a duo dance, so I needed to crop the video.

In [None]:
# Open the video
cap = cv2.VideoCapture(VIDEO_PATH)

w_frame, h_frame = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps, frames = cap.get(cv2.CAP_PROP_FPS), cap.get(cv2.CAP_PROP_FRAME_COUNT)
print(f"frame width: {w_frame}, height: {h_frame}, fps: {fps}")

# Cropping values
x,y,h,w = 0,0,200,w_frame

# output
fourcc = cv2.VideoWriter_fourcc(*'MP4V')
out = cv2.VideoWriter(VIDEO_CROPPED_PATH, fourcc, fps, (w, h))

cnt = 0
while(cap.isOpened()):
    ret, frame = cap.read()
    cnt += 1
    if ret:
        # Croping image
        crop_frame = frame[y:y+h, x:x+w]

        # stop at frame X   
        if 350 < cnt :
            out.write(crop_frame)
    else:
        break

cap.release()
out.release()
cv2.destroyAllWindows()

## 2.2 Details of Config
 - gpu_ids (str): the gpu_ids, default is "0";
 - image_size (int): the image size, default is 512;
 - num_source (int): the number of source images for Attention, default is 2. Large needs more GPU memory;
 - assets_dir (str): the assets directory. This is very important, and there are the configurations and all pre-trained checkpoints;
 - output_dir (str): the output directory;

 - src_path (str): the source input information. 
       All source paths and it supports multiple paths, uses "|" as the separator between all paths. 
       The format is "src_path_1|src_path_2|src_path_3". 
       
       Each src_input is "path?=path1,name?=name1,bg_path?=bg_path1". 
       
       It must contain 'path'. If 'name' and 'bg_path' are empty, they will be ignored.

       The 'path' could be an image path, a path of a directory contains source images, and a video path.

       The 'name' is the rename of this source input, if it is empty, we will ignore it, and use the filename of the path.

       The 'bg_path' is the actual background path if provided, otherwise we will ignore it.
       
       There are several examples of formated source paths,

        1. "path?=path1,name?=name1,bg_path?=bg_path1|path?=path2,name?=name2,bg_path?=bg_path2",
        this input will be parsed as [{path: path1, name: name1, bg_path:bg_path1},
        {path: path2, name: name2, bg_path: bg_path2}];

        2. "path?=path1,name?=name1|path?=path2,name?=name2", this input will be parsed as
        [{path: path1, name:name1}, {path: path2, name: name2}];

        3. "path?=path1", this input will be parsed as [{path: path1}].

        4. "path1", this will be parsed as [{path: path1}].

 - ref_path (str): the reference input information.
       
       All reference paths. It supports multiple paths, and uses "|" as the separator between all paths.
       The format is "ref_path_1|ref_path_2|ref_path_3".

       Each ref_path is "path?=path1,name?=name1,audio?=audio_path1,fps?=30,pose_fc?=300,cam_fc?=150".

       It must contain 'path', and others could be empty, and they will be ignored.

       The 'path' could be an image path, a path of a directory contains images of a same person, and a video path.

       The 'name' is the rename of this source input, if it is empty, we will ignore it, and use the filename of the path.

       The 'audio' is the audio path, if it is empty, we will ignore it. If the 'path' is a video,
        you can ignore this, and we will firstly extract the audio information of this video (if it has audio channel).

       The 'fps' is fps of the final outputs, if it is empty, we will set it as the default fps 25.

       The 'pose_fc' is the smooth factor of the temporal poses. The smaller of this value, the smoother of the temporal poses. If it is empty, we will set it as the default 300. In the most cases, using the default 300 is enough, and if you find the poses of the outputs are not stable, you can decrease this value. Otherwise, if you find the poses of the outputs are over stable, you can increase this value.

       The 'cam_fc' is the smooth factor of the temporal cameras (locations in the image space). The smaller of this value, the smoother of the locations in sequences. If it is empty, we will set it as the default 150. In the most cases, the default 150 is enough.

       There are several examples of formated reference paths,

        1. "path?=path1,name?=name1,audio?=audio_path1,fps?=30,pose_fc?=300,cam_fc?=150|
            path?=path2,name?=name2,audio?=audio_path2,fps?=25,pose_fc?=450,cam_fc?=200",
            this input will be parsed as
            [{path: path1, name: name1, audio: audio_path1, fps: 30, pose_fc: 300, cam_fc: 150},
             {path: path2, name: name2, audio: audio_path2, fps: 25, pose_fc: 450, cam_fc: 200}]

        2. "path?=path1,name?=name1, pose_fc?=450|path?=path2,name?=name2", this input will be parsed as
        [{path: path1, name: name1, fps: 25, pose_fc: 450, cam_fc: 150},
         {path: path2, name: name2, fps: 25, pose_fc: 300, cam_fc: 150}].

        3. "path?=path1|path?=path2", this input will be parsed as
        [{path: path1, fps:25, pose_fc: 300, cam_fc: 150}, {path: path2, fps: 25, pose_fc: 300, cam_fc: 150}].

        4. "path1|path2", this input will be parsed as
        [{path: path1, fps:25, pose_fc: 300, cam_fc: 150}, {path: path2, fps: 25, pose_fc: 300, cam_fc: 150}].

        5. "path1", this will be parsed as [{path: path1, fps: 25, pose_fc: 300, cam_fc: 150}].

In [56]:
# the gpu ids
gpu_ids = "0"

# the image size
image_size = 512

# the default number of source images, it will be updated if the actual number of sources <= num_source
num_source = 1

# the assets directory. This is very important, please download it from `one_drive_url` firstly.
assets_dir = "/content/iPERCore/assets"

# the output directory.
output_dir = "./results"

# symlink from the actual assets directory to this current directory
work_asserts_dir = os.path.join("./assets")
if not os.path.exists(work_asserts_dir):
    os.symlink(osp.abspath(assets_dir), osp.abspath(work_asserts_dir),
               target_is_directory=(platform.system() == "Windows"))

cfg_path = osp.join(work_asserts_dir, "configs", "deploy.toml")


## 2.3 Run Scripts

In [None]:
model_id = MODEL_NAME + str(time.time())

src_path = f"\"path?={IMAGE_PATH},name?={model_id}_src\""

ref_path = f"\"path?={VIDEO_CROPPED_PATH}," \
              f"name?={model_id}_ref," \
              "fps?=24," \
              "pose_fc?=400\""
print(ref_path)

!python -m iPERCore.services.run_imitator  \
  --gpu_ids     $gpu_ids       \
  --num_source  $num_source    \
  --image_size  $image_size    \
  --output_dir  $output_dir    \
  --model_id    $model_id      \
  --cfg_path    $cfg_path      \
  --src_path    $src_path      \
  --ref_path    $ref_path

"path?=/content/mufasa_crop.mp4,name?=jeroen_mufasa_1608467657.6595387_ref,fps?=24,pose_fc?=400"
ffmpeg -y -i /content/mufasa_crop.mp4 -ab 160k -ac 2 -ar 44100 -vn ./results/primitives/jeroen_mufasa_1608467657.6595387_ref/processed/audio.mp3 -loglevel quiet
ffprobe -v error -select_streams v -of default=noprint_wrappers=1:nokey=1 -show_entries stream=r_frame_rate /content/mufasa_crop.mp4
	Pre-processing: start...
----------------------MetaProcess----------------------
meta_input:
	path: /content/20161108_112909.jpg
	bg_path: 
	name: jeroen_mufasa_1608467657.6595387_src
primitives_dir: ./results/primitives/jeroen_mufasa_1608467657.6595387_src
processed_dir: ./results/primitives/jeroen_mufasa_1608467657.6595387_src/processed
vid_info_path: ./results/primitives/jeroen_mufasa_1608467657.6595387_src/processed/vid_info.pkl
-------------------------------------------------------
----------------------MetaProcess----------------------
meta_input:
	path: /content/mufasa_crop.mp4
	bg_path: 
	nam