# AI Upscaling with Intel® NPU

## Background

### Purpose

In this code sample, we will leverage Intel® AI Boost Neural Processing Unit (NPU) to boost the performance of AI upscaling. We will use `OpenVINO` to upscale images and video using the `BSRGAN` model.

### What is AI Upscaling?

**AI Upscaling** is a technique that uses machine learning models to *upscale images and videos from lower quality to higher quality*. AI upscaling is also referred to as "Super-Resolution". AI upscaling models are often trained on datasets to allow the model to learn specific features related to upscale. For example, many AI upscaling models are trained specifically on Anime images, while others are trained on realistic images, and these datasets allow the model to learn the specific features of the images it is upscaling. Many of these models are implementations of GANs (Generative Adversarial Networks) which are a type of deep learning model that is used to generate new data.

AI upscaling is used in various applications such as video streaming, image editing, and gaming. For example, AI upscaling can be used to get high quality video if you have a low quality internet connection. AI upscaling can also be used to restore old photos and videos to look like new, which is especially applicable to old videos from the 1990s or 2000s. AI upscaling can also be used to improve the graphics of video games.

References:
1. [Open Model DB](https://openmodeldb.info/docs/faq)
2. [ESRGAN](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136780563.pdf)

## Imports

Importing everything takes a couple of seconds. Let's start by importing `OpenVINO` and `nccf` for model compression

We also created some utility functions to help us load the model,pre/post process the images, and more. Let's import it!

In [None]:
# Other imports
import cv2
import openvino as ov
import plotly.graph_objects as go
import plotly.io as pio
import torch
from IPython.display import HTML, display
from nncf import CompressWeightsMode, compress_weights
from plotly.subplots import make_subplots
from torchinfo import summary
from tqdm.notebook import tqdm

In [None]:
# Model imports
from bsrgan_helper import BSRGAN

# Pre/Post processing imports
from bsrgan_utils import imread_uint
from sample_utils import preprocess, postprocess

# Time imports
from sample_utils import time_execution

# Video imports
from sample_utils import collect_all_frames, write_all_frames, resize_video

# Misc imports
from sample_utils import download_file

In [None]:
# Tells Plotly to render plots in the notebook setting
pio.renderers.default = "notebook"

## Define and load the model

The AI Upscaling model we will be using for this code sample is the [BSRGAN](https://github.com/cszn/BSRGAN) model. For this sample, we will be using the 4x upscaling model.

In [None]:
scaling_factor = 4  # should be 2 for "kadirnar/BSRGANx2" and 4 for all other models

In [None]:
# Model choice: "kadirnar/bsrgan", "kadirnar/BSRGANx2", "kadirnar/RRDB_PSNR_x4",
#               "kadirnar/RRDB_ESRGAN_x4", "kadirnar/DF2K", "kadirnar/DPED", "kadirnar/DF2K_JPEG"

device = torch.device("xpu" if torch.xpu.is_available() else "cpu")
cpu_model = BSRGAN("kadirnar/bsrgan", device=device, hf_model=True).model

In [None]:
# Use torchinfo to get a summary of the model
summary(cpu_model, input_size=(1, 3, 90, 160))

## Running the Model

### Downloading Test Image

In these cells, we will load an image, preprocess it, and upscale it using the BSRGAN model. We scale down the size of the image to speed up the inference process.

In [None]:
image_url = "https://storage.openvinotoolkit.org/data/test_data/images/dog.jpg"
img_path = "input.jpg"
download_file(image_url, img_path)

In [None]:
img = imread_uint(img_path)
img = cv2.resize(img, (img.shape[1] // 2, img.shape[0] // 2))
width, height = img.shape[1], img.shape[0]
tensor_img = preprocess(img)

### Convert and compile the model for NPU

Now that we have loaded the model, we need to make sure that the model runs on the NPU. To do this, we use OpenVINO to convert and compile the PyTorch model to an Intermediate Representation (IR) format that can run on the CPU, GPU, or NPU. We will use `ov.convert_model` and `core.compile_model` to convert and compile the model for the NPU. 

Currently, [the NPU only supports static shapes](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html#limitations), so we need to specify the input shape of the model.

<div class="alert alert-warning">
The following code cells will takes around 5 minutes to complete. Please be patient!
</div>

In [None]:
ov_model = ov.convert_model(
    cpu_model,
    input=[1, 3, width, height],
    example_input=torch.randn(1, 3, width, height),
)

In [None]:
compressed_model = compress_weights(ov_model, mode=CompressWeightsMode.INT4_SYM)

In [None]:
core = ov.Core()
compiled_model = core.compile_model(compressed_model, device_name="NPU")

In [None]:
input_layer = compiled_model.input(0)
output_layer = compiled_model.output(0)

In [None]:
# Run the NPU model
npu_result = compiled_model([tensor_img])[output_layer]

In [None]:
# Run the CPU model
cpu_result = cpu_model(tensor_img)

We see that the NPU is faster than the CPU.<br>
Does this hold? Let's use the `time_execution` utility to run both of the models multiple times and see the difference in time.

In [None]:
npu_execution_times, npu_mean, npu_std = time_execution(
    lambda: compiled_model([tensor_img])[output_layer], number=1, repeat=10
)

In [None]:
cpu_execution_times, cpu_mean, cpu_std = time_execution(lambda: cpu_model(tensor_img), number=1, repeat=10)

In [None]:
# Use plotly to plot violin plot of the execution times
fig = go.Figure()
fig.add_trace(go.Violin(y=npu_execution_times, name="NPU", box_visible=True, meanline_visible=True))
fig.add_trace(go.Violin(y=cpu_execution_times, name="CPU", box_visible=True, meanline_visible=True))

fig.update_layout(title="Execution Times (lower is better)", yaxis_title="Time (s)")
fig.show()

In [None]:
print(f"The NPU is {cpu_mean / npu_mean:.2f}x faster than the CPU model")

## Visualizing the Result

Now that we have run the model, we need to convert the tensor result back into a format that we can use to visualize the image. We have a utility function for this sourced from the [BSRGAN](https://github.com/cszn/BSRGAN) repo.

In [None]:
npu_output = postprocess(npu_result)
cpu_output = postprocess(cpu_result)

Let's visualize the original image, the CPU output, and the NPU output to see the difference!

In [None]:
fig = make_subplots(rows=1, cols=3, subplot_titles=("Original Image", "CPU Output", "NPU Output"))

fig.add_trace(go.Image(z=img), row=1, col=1)
fig.add_trace(go.Image(z=cpu_output), row=1, col=2)
fig.add_trace(go.Image(z=npu_output), row=1, col=3)

fig.update_layout(showlegend=False)
fig.show()

## AI Upscaling on a video

Now that we have shown that we can use AI upscaling on an image, we can go above-and-beyond, and show how the NPU can be used to upscale an entire video! Let's download a test video and start upscaling it!

In [None]:
input_video_url = "https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/video/Coco%20Walking%20in%20Berkeley.mp4"
input_video = "benchmark_video.mp4"
output_video = "upscaled_video.mp4"

In [None]:
download_file(input_video_url, input_video)
resize_video(input_video, scale=2)

### Opening the video file and collecting frames

In [None]:
original_video = cv2.VideoCapture(input_video)
original_frames = collect_all_frames(original_video)
width, height = original_frames[0].shape[0], original_frames[0].shape[1]

### Compile model for NPU for video

Due to the limitation that the NPU can only accept static shapes, we need to specify the input shape of the model. We will use the `core.compile_model` function to compile the model for the NPU.

<div class="alert alert-warning">
The following code cells will takes around 10 minutes to complete. Please be patient!
</div>

In [None]:
ov_model = ov.convert_model(
    cpu_model,
    input=[1, 3, width, height],
    example_input=torch.randn(1, 3, width, height),
)

In [None]:
compressed_model = compress_weights(ov_model, mode=CompressWeightsMode.INT4_SYM)

In [None]:
core = ov.Core()
compiled_model = core.compile_model(compressed_model, device_name="NPU")

In [None]:
input_layer = compiled_model.input(0)
output_layer = compiled_model.output(0)

### Running NPU model per frame

Now that we have the video downloaded, we can load the video and AI upscale each frame.

<div class="alert alert-warning">
The following code cells will takes around 5 minutes to complete. Please be patient!
</div>

In [None]:
def callback(infer_request, userdata):
    res = infer_request.get_output_tensor(0).data[0]
    frame = postprocess(res)

    pbar, postprocessed_frames = userdata
    pbar.update(1)
    postprocessed_frames.append(frame)

In [None]:
infer_queue = ov.AsyncInferQueue(compiled_model)
infer_queue.set_callback(callback)
pbar = tqdm(total=len(original_frames), desc="Inferencing frames")

postprocessed_frames = []
for frame in original_frames:
    new_frame = preprocess(frame)
    infer_queue.start_async(inputs={input_layer.any_name: new_frame}, userdata=(pbar, postprocessed_frames))

infer_queue.wait_all()

In [None]:
# Create the output file having the same properties as the original video file
frame_width = int(original_video.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(original_video.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = original_video.get(cv2.CAP_PROP_FPS)
upscaled_video = cv2.VideoWriter(
    output_video,
    cv2.VideoWriter_fourcc(*"X264"),
    fps,
    (frame_width * scaling_factor, frame_height * scaling_factor),
)

In [None]:
write_all_frames(postprocessed_frames, upscaled_video)

In [None]:
original_video.release()
upscaled_video.release()

### Visualize the upscaled video

In [None]:
html = f"""
<div style="display: flex; justify-content: space-around;">
    <video width="25%" controls autoplay muted>
        <source src="{input_video}" type="video/mp4">
    </video>
    <video width="25%" controls autoplay muted>
        <source src="{output_video}" type="video/mp4">
    </video>
</div>
"""

display(HTML(html))

## Why is this important?
At the time of this sample's release, AI Upscaling Technology has targeted discrete GPUs and has limited support for built-in hardware. However, in this sample, we have showcased how Intel® NPUs can be used for the AI Upscaling Task. We hope that this sample will inspire and enable developers to leverage Intel® NPUs for AI Upscaling tasks. There are many projects that can benefit from this technology. If you are interested in applying the knowledge learned from this code sample to help open-source projects, you can consider doing the following:

1. Contribute to [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) by adding support for Intel® NPUs
2. Contribute to [chaiNNer](https://github.com/chaiNNer-org/chaiNNer) by adding support for Intel® NPUs
3. Contribute to projects like [VLC](https://www.videolan.org/contribute.html), [GIMP](https://gitlab.gnome.org/GNOME/gimp), and [OBS Studio](https://github.com/obsproject/obs-studio) providing support for AI Upscaling on Intel® NPUs.

We hope that the Intel® AI PC democratizes AI Upscaling and makes it more accessible to developers and users worldwide.