# Inference Demo based on Video Stylization

This project showcases a [video stylization](https://github.com/rnwzd/FSPBT-Image-Translation) use case on Intel laptop/desktop, and uses [BigDL-Nano](https://github.com/intel-analytics/bigdl#nano) to accelerate AI on Intel CPU.

## Prepare the environment

Please refer to [installation guide](./installation_guide.md) for details.

#  Prepare Data
You should first unzip `data.zip` first to obtain input images, target images and pretrained model weights

In [None]:
!unzip data.zip

The input images in the `data/input` directory comes from video excerpt of [shooters](https://drive.google.com/drive/u/0/folders/1HC1SryBRGHDT2bsys1AE36WDFr4HbJmg) in [HLUV](https://arxiv.org/abs/2005.00463 ).

The images in the `data/target` directory are generated by [pytorch-AdaIN](https://github.com/naoto0804/pytorch-AdaIN), and the corresponding style images are [Picasso's self portrait](https://github.com/naoto0804/pytorch-AdaIN/blob/master/input/style/picasso_self_portrait.jpg). You can also generate different target images according to your own interests.

## Training first
Before inference, you should train your model first by simply runing `python nano_train.py` which is accelerated by BigDL-Nano Trainer. This process will cost about 7 minutes, then you will get a generator model(generator.pt) under directory `./data/models/`.

In [None]:
!python nano_train.py

## Load model and acceleration by InferenceOptimizer
Then you can load your trained model and accelerate it by InferenceOptimizer. Here we take InferenceOptimizer.quantize for example.

In [13]:
from torch.utils.data import DataLoader
import torch
from tqdm import tqdm
import torchvision.transforms as transforms
from pathlib import Path
from data import read_image_tensor, write_image_tensor, ImageDataset
from train import data_path, model_save_path

# load model
device = 'cpu'
dtype = torch.float32

generator = torch.load(model_save_path/"generator.pt")
generator.eval()
generator.to(device, dtype)

# prepare calib dataloader
input_dir = data_path/'input'
file_paths = [file for file in input_dir.iterdir()]

params = {'batch_size': 1,
          'num_workers': 8,
          'pin_memory': True}

dataset = ImageDataset(file_paths, transform=None)
loader = DataLoader(dataset, **params)

from bigdl.nano.pytorch import InferenceOptimizer
model = InferenceOptimizer.quantize(accelerator=None,
                                    model=generator,
                                    calib_dataloader=loader)

## Helper function
Below are three helper functions used for displaying video and mutual conversion between video and image.

In [20]:
import os
from IPython.display import HTML
from base64 import b64encode
from PIL import Image as PILImage
import cv2
from cv2 import VideoCapture, imwrite
import numpy as np


def display_video(file_path, width=512):
    # Source: https://colab.research.google.com/drive/1_kbRZPTjnFgViPrmGcUsaszEdYa8XTpq#scrollTo=DxlIqGfATvvj&line=1&uniqifier=1
    compressed_video_path = 'comp_' + file_path
    if os.path.exists(compressed_video_path):
        os.remove(compressed_video_path)
    os.system(f'ffmpeg -i {file_path} -vcodec libx264 -loglevel quiet {compressed_video_path}')
    
    mp4 = open(compressed_video_path, 'rb').read()
    data_url = 'data:simul2/mp4;base64,' + b64encode(mp4).decode()
    return HTML("""
        <video width={} controls>
            <source src="{}" type="video/mp4">
        </video>
        """.format(width, data_url))


def imgs_to_video(output_dir, video_name='demo_output.mp4', fps=24):
    # Refer to: https://stackoverflow.com/questions/52414148/turn-pil-images-into-video-on-linux
    imgs = []
    for image_name in os.listdir(output_dir):
        if image_name.endswith('.jpg'):
            imgs.append(output_dir + image_name)
    imgs.sort(key=lambda img : int(img.split('/')[-1].split('.')[0]))
    pil_imgs = []
    for file in imgs:
        pil_imgs.append(PILImage.open(file))
    video_dims = (pil_imgs[0].width, pil_imgs[0].height)
    fourcc = cv2.VideoWriter_fourcc(*'DIVX')
    video = cv2.VideoWriter(video_name, fourcc, fps, video_dims)
    for img in pil_imgs:
        tmp_img = img.copy()
        video.write(cv2.cvtColor(np.array(tmp_img), cv2.COLOR_RGB2BGR))


def video_to_imgs(video_name='demo_output.mp4', image_dir="./images/"):
    video_capture = VideoCapture(video_name)
    number = 0
    while True:
        flag, frame = video_capture.read()
        if flag is False:
            break
        w, h = frame.shape[0], frame.shape[1]
        if w % 4 != 0 or h % 4 != 0:
            NW = int((w // 4) * 4)
            NH = int((h // 4) * 4)
            frame = cv2.resize(frame, (NW, NH))
        imwrite(image_dir + str(0000+number)+'.jpg', frame)
        number += 1

## Demo input video
Below is a demo input video from [HLUV](https://arxiv.org/abs/2005.00463 ).

In [21]:
display_video("demo.mp4")

## Inference with input video
Below code provide inference code, and provide multi-process inference.

You can use any processes you want by modify this line:

`num_processes = 4  # specify number of processes`

If you want to test the performance of original model, just replace `model` with `generator` in the following code.

In [14]:
from pathlib import Path
import time

from bigdl.nano.pytorch import InferenceOptimizer


if __name__ == "__main__":
    input_video = "demo.mp4"
    num_processes = 4  # specify number of processes
    
    image_dir = "./video2pic/"
    output_dir = "./video-output/"
    os.makedirs(image_dir, exist_ok=True)
    os.makedirs(output_dir, exist_ok=True)

    video_to_imgs(input_video, image_dir)

    img_list = [Path(image_dir, image_name) for image_name in os.listdir(image_dir)]
    params = {'batch_size': 1,
            }
    dataset = ImageDataset(img_list, transform=None)
    loader = DataLoader(dataset, **params)

    if num_processes > 1:
        print("{} processes is used.".format(num_processes))
        st = time.perf_counter()
        with torch.no_grad():
            # call `InferenceOptimizer.to_multi_instance` to get a multi-instance inference model
            multi_instance_model = InferenceOptimizer.to_multi_instance(model, num_processes=num_processes)
            # collect a input list
            inputs_list, names_list = [], []
            for inputs, names in loader:
                inputs = inputs.to(device, dtype)
                inputs_list.append(inputs)
                names_list.append(names)
            # inference the input list
            outputs_list = multi_instance_model(inputs_list)
            # handle the output list
            for outputs, names in zip(outputs_list, names_list):
                for k in range(len(outputs)):
                    write_image_tensor(outputs[k], Path(output_dir, names[k]))
                del outputs
        end = time.perf_counter()
        print("Generation costs {}s".format(end - st))
    else:
        st = time.perf_counter()
        with torch.no_grad():
            for inputs, names in tqdm(loader):
                inputs = inputs.to(device, dtype)
                # original model
                # outputs = generator(inputs)
                # accelerated model
                outputs = model(inputs)
                for k in range(len(outputs)):
                    write_image_tensor(outputs[k], Path(output_dir, names[k]))
                del outputs
        end = time.perf_counter()
        print("Generation costs {}s".format(end - st))
    imgs_to_video(output_dir, "demo_output.mp4", fps=25)


4 processes is used.
Generation costs 21.851334668695927s


OpenCV: FFMPEG: tag 0x58564944/'DIVX' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'


### Time cost

The following table shows the speed of video stylization before and after BigDL-Nano acceleration on a [Intel® Core™ i9-12900 Processor](https://www.intel.com/content/www/us/en/products/sku/134597/intel-core-i912900-processor-30m-cache-up-to-5-10-ghz/specifications.html) with different process number. Each latency result is calculated by averaging 20 repeated experiments. 

| model      | Process=1 |  Process=4     | 
| ----------- | ----------- | ----------- | 
| Original      | 207s   |  169.5s   |
| Qutization(int8)   |  129.6s  |  111.6s  |

## demo output

Below is the output of video stylization model. Is it cool ?

In [22]:
display_video("demo_output.mp4")