# Style Transfer with OpenVINO™

This notebook demonstrates style transfer with OpenVINO, using the [Style Transfer Models](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/fast-neural-style-mosaic-onnx) from [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo/), specifically [fast-neural-style-mosaic-onnx](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/fast-neural-style-mosaic-onnx) model is one of the style transfer models designed to mix the content of an image with the style of another image. The model uses the method described in [Perceptual Losses for Real-Time Style Transfer and Super-Resolution](https://arxiv.org/abs/1603.08155) along with [Instance Normalization](https://arxiv.org/abs/1607.08022). Original ONNX models are provided in the [repository](https://github.com/onnx/models). Final part of this notebook shows live inference results from a webcam. Additionally, you can also upload a video file.

> **NOTE**: To use this notebook with a webcam, you need to run the notebook on a computer with a webcam. If you run the notebook on a server, the webcam will not work. However, you can still do inference on a video.

## Imports

In [None]:
import collections
import os
import sys
import time

import cv2
import numpy as np
from IPython import display
from openvino.runtime import Core

sys.path.append("../utils")
import notebook_utils as utils

## The Model

### Download the Model

Use `omz_downloader`, which is a command-line tool from the `openvino-dev` package. It automatically creates a directory structure and downloads the selected model. This step is skipped if the model is already downloaded. The selected model comes from the public directory, which means it must be converted into OpenVINO Intermediate Representation (OpenVINO IR).

In this case you can use `"fast-neural-style-mosaic-onnx"` as a model name, and the system automatically downloads the model.

In [None]:
# A directory where the model will be downloaded.
base_model_dir = "model"

# The name of the model from Open Model Zoo
model_name = "fast-neural-style-mosaic-onnx"

download_command = f"omz_downloader " \
                   f"--name {model_name} " \
                   f"--output_dir {base_model_dir} " \
                   f"--cache_dir {base_model_dir}"
! $download_command

### Convert the Model

The pre-trained model is in ONNX format. To use it with OpenVINO, convert it to OpenVINO IR format. Use Model Converter (`omz_converter`), which is another command-line tool from the `openvino-dev` package. If you do not specify a precision, the model will be converted many times to all available precisions (`FP32` and `FP16` in this case). Every conversion should take up to 2 minutes. If the model has been already converted, this step is skipped.



In [None]:
precision = "FP16"

# The output path for the conversion.
converted_model_path = f"model/public/{model_name}/{precision}/{model_name}.xml"

if not os.path.exists(converted_model_path):
    convert_command = f"omz_converter " \
                      f"--name {model_name} " \
                      f"--download_dir {base_model_dir} " \
                      f"--precisions {precision}"
    ! $convert_command

### Load the Model

Downloaded models are located in a fixed structure, which indicates a vendor (intel or public), the name of the model and a precision.

Only a few lines of code are required to run the model. First, initialize OpenVINO Runtime. Then, read the network architecture and model weights from the `.bin` and `.xml` files to compile for the desired device. If you choose `GPU` you need to wait for a while, as the startup time is much longer than in the case of `CPU`.

There is a possibility to allow OpenVINO to decide which hardware offers the best performance. In that case, just use `AUTO`. Remember that for most cases the best hardware is `GPU` (better performance, but longer startup time).

In [None]:
# Initialize OpenVINO Runtime.
ie_core = Core()
# Read the network and corresponding weights from a file.
model = ie_core.read_model(model=converted_model_path)
# Compile the model for CPU (you can choose manually CPU, GPU, MYRIAD etc.)
# or let the engine choose the best available device (AUTO).
compiled_model = ie_core.compile_model(model=model, device_name="AUTO")

# Get the input and output nodes.
input_layer = compiled_model.input(0)
output_layer = compiled_model.output(0)

Input and output layers have the names of the input node and output node respectively. In the case of fast-neural-style-mosaic-onnx , there is 1 input and 1 output with shape (1, 3, 224, 224).

In [None]:
print(input_layer.any_name, output_layer.any_name)
print(input_layer.shape)
print(output_layer.shape)

# Get the input size.
N, C, H, W = list(input_layer.shape)

### Preprocess the image
Preprocess the input image before running the model

In [None]:
# Preprocess the input image.
def preprocess_images(frame, H, W):
    image = np.array(frame).astype('float32')
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    image = cv2.resize(src=image, dsize=(H, W), interpolation=cv2.INTER_AREA)
    image = np.transpose(image, [2, 0, 1])
    image = np.expand_dims(image, axis=0)
    return image

### Helper function to postprocess the stylized image
The converted IR model outputs a NumPy float32 array of shape [1, 3, 224, 224].

In [None]:
# Postprocess the result        
def convert_result_to_image(frame, stylized_image) -> np.ndarray:
    """
    Resize the styled image to original size
    Convert network result of floating point numbers to image with integer
    """
    h, w = frame.shape[:2]
    stylized_image = stylized_image.squeeze().transpose(1, 2, 0)
    stylized_image = cv2.resize(src=stylized_image, dsize=(w, h), interpolation=cv2.INTER_AREA)
    stylized_image = np.clip(stylized_image, 0, 255).astype(np.uint8)
    return stylized_image

### Main Processing Function

Running style transfer function will run in different operations, either a webcam or a video file.

In [None]:
def run_style_transfer(source=0, flip=True, use_popup=False, skip_first_frames=0):
    """
    Main function to run the style inference:
    1. Create a video player to play with target fps (utils.VideoPlayer).
    2. Prepare a set of frames for text detection and recognition.
    3. Run AI inference for both text detection and recognition.
    4. Visualize the results.
    Parameters:
        source: The webcam number to feed the video stream with primary webcam set to "0", or the video path.  
        flip: To be used by VideoPlayer function for flipping capture image.
        use_popup: False for showing encoded frames over this notebook, True for creating a popup window.
        skip_first_frames: Number of frames to skip at the beginning of the video. 
    """
    # Create a video player to play with target fps.
    player = None
    try:
        player = utils.VideoPlayer(source=source, flip=flip, fps=30, skip_first_frames=skip_first_frames)
        # Start video capturing.
        player.start()
        if use_popup:
            title = "Press ESC to Exit"
            cv2.namedWindow(winname=title, flags=cv2.WINDOW_GUI_NORMAL | cv2.WINDOW_AUTOSIZE)

        processing_times = collections.deque()
        while True:
            # Grab the frame.
            frame = player.next()
            if frame is None:
                print("Source ended")
                break
            # If the frame is larger than full HD, reduce size to improve the performance.
            scale = 1280 / max(frame.shape)
            if scale < 1:
                frame = cv2.resize(src=frame, dsize=None, fx=scale, fy=scale,
                                   interpolation=cv2.INTER_AREA)
            # Preprocess the input image.

            image = preprocess_images(frame, H, W)
           
            # Measure processing time for the input image.
            start_time = time.time()
            # Perform the inference step.
            stylized_image = compiled_model([image])[output_layer]
            stop_time = time.time()

            # Postprocessing for stylized image.
            result_image = convert_result_to_image(frame, stylized_image)

            processing_times.append(stop_time - start_time)
            # Use processing times from last 200 frames.
            if len(processing_times) > 200:
                processing_times.popleft()
            processing_time_det = np.mean(processing_times) * 1000

            # Visualize the results.
            f_height, f_width = frame.shape[:2]
            fps = 1000 / processing_time_det
            cv2.putText(result_image, text=f"Inference time: {processing_time_det:.1f}ms ({fps:.1f} FPS)", 
                        org=(20, 40),fontFace=cv2.FONT_HERSHEY_COMPLEX, fontScale=f_width / 1000,
                        color=(0, 0, 255), thickness=1, lineType=cv2.LINE_AA)
            
            # Use this workaround if there is flickering.
            if use_popup:
                cv2.imshow(title, result_image)
                key = cv2.waitKey(1)
                # escape = 27
                if key == 27:
                    break
            else:
                # Encode numpy array to jpg.
                _, encoded_img = cv2.imencode(".jpg", result_image, params=[cv2.IMWRITE_JPEG_QUALITY, 90])
                # Create an IPython image.
                i = display.Image(data=encoded_img)
                # Display the image in this notebook.
                display.clear_output(wait=True)
                display.display(i)
    # ctrl-c
    except KeyboardInterrupt:
        print("Interrupted")
    # any different error
    except RuntimeError as e:
        print(e)
    finally:
        if player is not None:
            # Stop capturing.
            player.stop()
        if use_popup:
            cv2.destroyAllWindows()

### Run Style Transfer Using a Webcam

Now, try to see yourself in your webcam. By default, the primary webcam is set with `source=0`. If you have multiple webcams, each one will be assigned a consecutive number starting at 0. Set `flip=True` when using a front-facing camera. Some web browsers, especially Mozilla Firefox, may cause flickering. If you experience flickering, set `use_popup=True`.

> **NOTE**: To use a webcam, you must run this Jupyter notebook on a computer with a webcam. If you run on a server, the webcam will not work. However, you can still do inference on a video file in the final step.

In [None]:
run_style_transfer(source=0, flip=True, use_popup=False)

### Run Action Recognition on a Video File

Find out how the model works in a video file. [Any format supported](https://docs.opencv.org/4.5.1/dd/d43/tutorial_py_video_display.html) by OpenCV will work. You can press the stop button anytime while the video file is running, and it will activate the webcam for the next step.

> **NOTE**: Sometimes, the video can be cut off if there are corrupted frames. In that case, you can convert it. If you experience any problems with your video, use the [HandBrake](https://handbrake.fr/) and select the MPEG format.

In [None]:
video_file = "https://github.com/intel-iot-devkit/sample-videos/blob/b8e3425998213e0b4957c20f0ed5f83411f7a802/driver-action-recognition.mp4?raw=true"
run_style_transfer(source=video_file, flip=True, use_popup=False)

## References

1. [fast-neural-style-mosaic-onnx](https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/fast-neural-style-mosaic-onnx/README.md)
2. [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo/)