# DPU example: Pneumonia Detection Model - Pnem1

This notebooks shows an example of DPU applications. The application,
as well as the DPU IP, is pulled from the official 
[Vitis AI Github Repository](https://github.com/Xilinx/Vitis-AI).
For more information, please refer to the 
[Xilinx Vitis AI page](https://www.xilinx.com/products/design-tools/vitis/vitis-ai.html).

In this notebook, we will show how to use **Python API** to run DPU tasks.

## 1. Prepare the overlay
We will download the overlay onto the board.

In [1]:
from pynq_dpu import DpuOverlay
overlay = DpuOverlay("../dpu.bit")

The VAI package has been installed onto your board. There are multiple
binaries installed; for example, you can check the current DPU status using
`dexplorer`. You should be able to see reasonable values from the output.

!dexplorer -w

In [2]:
#!dexplorer -h

The compiled quantized model may have different kernel names depending on the DPU architectures.
This piece of information can usually be found when compiling the `*.elf` model file.

The `*.elf` model will be compiled into a shared object file; that file has to be copied to your 
library path for DNNDK inference.

By default, the naming convention for model files is:

* Model `*.elf` file: `dpu_<kernel_name>[_0].elf`
* Shared object `*.so` file: `libdpumodel<kernel_name>.elf`

All of these steps are handled by `load_model()` method.

In [4]:
overlay.load_model("./models/dpu_Pnem1_0.elf")

## 2. Run Python program

We will use DNNDK's Python API to run DPU tasks.
In this example, we will set the number of iterations to 500, meaning 
that a single picture will be taken and classified 500 times.
Users can adjust this value if they want.

In [5]:
from ctypes import *
import cv2
import numpy as np
from dnndk import n2cube
import os
import threading
import time
from pynq_dpu import dputils  

lock = threading.Lock()

In [6]:
#*** Model for 150x150 image 

KERNEL_CONV = "Pnem1_0"  
KERNEL_CONV_INPUT = "conv2d_1_convolution"
KERNEL_FC_OUTPUT = "dense_out_MatMul" 

img_dims=150

Let's first take a picture from the image folder and display it.

In [8]:
from IPython.display import display
from PIL import Image
 
image_folder = "./data/samples/"    
listimage = [i for i in os.listdir(image_folder) if "NORMAL" in i or "PNEUMONIA" in i ]
print("Sample Image Count={}".format(len(listimage)))  
imfile = listimage[0]
with open("./labels.txt", "r") as f:
    lines = f.readlines()

Sample Image Count=30


We will also open and initialize the DPU device. We will create a DPU kernel and reuse it.
Throughout the entire notebook, we don't have to redo this step.

**Note**: if you open and close DPU multiple times, the Jupyter kernel might die;
this is because the current DNNDK implementation requires bitstream to be downloaded by XRT,
which is not supported by `pynq` package. Hence we encourage users to stay with
one single DPU session, both for program robustness and higher performance.

In [9]:
n2cube.dpuOpen() 

0

In [10]:
kernel = n2cube.dpuLoadKernel(KERNEL_CONV) 

### Single execution
We define a function that will use the DPU to make a prediction on an input 
image and provide a softmax output.

In [11]:
def predict_label(imfile):
    task = n2cube.dpuCreateTask(kernel, 0)

    path = os.path.join(image_folder, imfile)
    img = cv2.imread(path) 
    img = cv2.resize(img, (img_dims, img_dims))
    img = img.astype(np.float32)
    img = (img/255.0) 
        
    """Get input Tensor"""
    tensor = n2cube.dpuGetInputTensor(task, KERNEL_CONV_INPUT)
    input_len = n2cube.dpuGetInputTensorSize(task, KERNEL_CONV_INPUT)   
    #print(input_len)
        
    """Set input Tesor"""
    n2cube.dpuSetInputTensorInHWCFP32(task, KERNEL_CONV_INPUT, img, input_len)

    """Model run on DPU"""
    n2cube.dpuRunTask(task)
        
    """Get the output tensor size from FC output"""
    size = n2cube.dpuGetOutputTensorSize(task, KERNEL_FC_OUTPUT)

    """Get the output tensor channel from FC output"""
    channel = n2cube.dpuGetOutputTensorChannel(task, KERNEL_FC_OUTPUT)

    softmax = np.zeros(size,dtype=np.float32)

    """Get FC result"""
    conf = n2cube.dpuGetOutputTensorAddress(task, KERNEL_FC_OUTPUT)

    """Get output scale of FC"""
    outputScale = n2cube.dpuGetOutputTensorScale(task, KERNEL_FC_OUTPUT)

    """Run softmax"""
    softmax = n2cube.dpuRunSoftmax(conf, channel, size // channel, outputScale)
     
    #print("size=", size)
    #print("channel=", channel)
    #print("outputScale=", outputScale)
    print("softmax =", softmax)

    n2cube.dpuDestroyTask(task)
    
    return lines[np.argmax(softmax)]

In [12]:
print("INFO: test with only 10 random X-Ray images")
sublist = listimage[0:10]
i = 0
for img in listimage:
    print("{}  {}".format(i, img))
    i = i + 1 
    label = predict_label(img)
    print('Class label: {}'.format(label))

INFO: test with only 10 random X-Ray images
0  PNEUMONIA_fe1289af-87c8-49c6-800c-1e87b2296df8.jpg
softmax = [ 0.00460957  0.99539047]
Class label: PNEUMONIA

1  NORMAL_8d3d0ba6-799d-4874-a804-75a4c0cf4ae6.jpg
softmax = [ 0.46879062  0.53120941]
Class label: PNEUMONIA

2  PNEUMONIA_849217ad-5cb2-4a87-9796-420b69722b14.jpg
softmax = [ 0.04742587  0.95257413]
Class label: PNEUMONIA

3  NORMAL_8cfcc69b-d659-4888-8551-790891d8568e.jpg
softmax = [ 0.99330717  0.00669285]
Class label: NORMAL

4  PNEUMONIA_852156be-0aab-4dc7-95d6-eda950096d0f.jpg
softmax = [ 0.1066906   0.89330941]
Class label: PNEUMONIA

5  PNEUMONIA_84032d11-9143-45a2-b3e9-955f9dc4b497.jpg
softmax = [  3.12019147e-05   9.99968827e-01]
Class label: PNEUMONIA

6  NORMAL_fc4cd37e-7bae-425d-9465-bfdd0ecf0b63.jpg
softmax = [ 0.96267313  0.03732689]
Class label: NORMAL

7  PNEUMONIA_fef7b294-017c-4491-9e57-119086855785.jpg
softmax = [  8.48110431e-05   9.99915183e-01]
Class label: PNEUMONIA

8  PNEUMONIA_85727853-cb45-478d-a517-a0

### Multiple executions
After we have verified the correctness of a single execution, we can
try multiple executions and measure the throughput in Frames Per Second (FPS).

Let's define a function that processes a single image in multiple iterations. 
The parameters are:
* `kernel`: DPU kernel.
* `img`: image to be classified.
* `count` : test rounds count.

The number of iterations is defined as `num_iterations` in previous cells.

In [13]:
path = os.path.join(image_folder, imfile)
img = cv2.imread(path) 
img = cv2.resize(img, (img_dims, img_dims))
img = img.astype(np.float32)
img = (img/255.0) 

def run_dpu_task(kernel, img, count):
    task = n2cube.dpuCreateTask(kernel, 0)
    
    count = 0
    while count < num_iterations:
           
        """Get input Tensor"""
        tensor = n2cube.dpuGetInputTensor(task, KERNEL_CONV_INPUT)
        input_len = n2cube.dpuGetInputTensorSize(task, KERNEL_CONV_INPUT)   
        
        """Set input Tesor"""
        n2cube.dpuSetInputTensorInHWCFP32(task, KERNEL_CONV_INPUT, img, input_len)

        """Model run on DPU"""
        n2cube.dpuRunTask(task)
        
        """Get the output tensor size from FC output"""
        size = n2cube.dpuGetOutputTensorSize(task, KERNEL_FC_OUTPUT)

        """Get the output tensor channel from FC output"""
        channel = n2cube.dpuGetOutputTensorChannel(task, KERNEL_FC_OUTPUT)

        softmax = np.zeros(size,dtype=np.float32)

        """Get FC result"""
        conf = n2cube.dpuGetOutputTensorAddress(task, KERNEL_FC_OUTPUT)

        """Get output scale of FC"""
        outputScale = n2cube.dpuGetOutputTensorScale(task, KERNEL_FC_OUTPUT)

        """Run softmax"""
        softmax = n2cube.dpuRunSoftmax(conf, channel, size // channel, outputScale)

        lock.acquire()
        count = count + threadnum
        lock.release()

    n2cube.dpuDestroyTask(task)

Now we are able to run the batch processing and print out DPU throughput.
Users can change the `image_folder` to point to other picture locations.
We will use the previously defined and classified image `img` and process it for
`num_interations` times.

In this example, we will just use a single thread.

The following cell may take a while to run. Please be patient.

In [14]:
threadAll = []
threadnum = 1
num_iterations = 500
start = time.time()

for i in range(threadnum):
    t1 = threading.Thread(target=run_dpu_task, args=(kernel, img, i))
    threadAll.append(t1)
for x in threadAll:
    x.start()
for x in threadAll:
    x.join()

end = time.time()

fps = float(num_iterations/(end-start))
print("%.2f FPS" % fps)

111.09 FPS


In [15]:
threadAll = []
threadnum = 4
num_iterations = 500
start = time.time()

for i in range(threadnum):
    t1 = threading.Thread(target=run_dpu_task, args=(kernel, img, i))
    threadAll.append(t1)
for x in threadAll:
    x.start()
for x in threadAll:
    x.join()

end = time.time()

fps = float(num_iterations/(end-start))
print("%.2f FPS" % fps)

185.20 FPS


In [16]:
threadAll = []
threadnum = 8
num_iterations = 500
start = time.time()

for i in range(threadnum):
    t1 = threading.Thread(target=run_dpu_task, args=(kernel, img, i))
    threadAll.append(t1)
for x in threadAll:
    x.start()
for x in threadAll:
    x.join()

end = time.time()

fps = float(num_iterations/(end-start))
print("%.2f FPS" % fps)

188.35 FPS


In [19]:
def run_tasks(kernel):
    task = n2cube.dpuCreateTask(kernel, 0)
    
    o_count = [0, 0, 0]
    p_count = [0, 0, 0]
     
    image_folder = "./data/images/"  
    listimage = [i for i in os.listdir(image_folder) if "NORMAL" in i or "PNEUMONIA" in i ] 
    num_iterations=len(listimage)
    print("Total Test Image Count={}".format(num_iterations)) 
    count = 0

    with open("./labels.txt", "r") as f:
        lines = f.readlines()
        
    print("====================== Start: Test results with 1000 X-Rays ==================")
    while count < num_iterations:
        imfile = listimage[count]
        path = os.path.join(image_folder, imfile)
        img = cv2.imread(path) 
        img = cv2.resize(img, (img_dims, img_dims))
        img = img.astype(np.float32)
        img = (img/255.0) 
        
        """Get input Tensor"""
        tensor = n2cube.dpuGetInputTensor(task, KERNEL_CONV_INPUT)
        input_len = n2cube.dpuGetInputTensorSize(task, KERNEL_CONV_INPUT)   
        #print(input_len)
        
        """Set input Tesor"""
        n2cube.dpuSetInputTensorInHWCFP32(task, KERNEL_CONV_INPUT, img, input_len)

        """Model run on DPU"""
        n2cube.dpuRunTask(task)
        
        """Get the output tensor size from FC output"""
        size = n2cube.dpuGetOutputTensorSize(task, KERNEL_FC_OUTPUT)

        """Get the output tensor channel from FC output"""
        channel = n2cube.dpuGetOutputTensorChannel(task, KERNEL_FC_OUTPUT)

        softmax = np.zeros(size,dtype=np.float32)

        """Get FC result"""
        conf = n2cube.dpuGetOutputTensorAddress(task, KERNEL_FC_OUTPUT)

        """Get output scale of FC"""
        outputScale = n2cube.dpuGetOutputTensorScale(task, KERNEL_FC_OUTPUT)

        """Run softmax"""
        softmax = n2cube.dpuRunSoftmax(conf, channel, size // channel, outputScale)

        expected = False
        if imfile[0] == 'N':  
            o_count[0] = o_count[0] + 1 
        elif imfile[0] == 'P':
            o_count[1] = o_count[1] + 1  
        else:
            print("ERROR: Invalid file tag in {}".format(imfile))
        
        label = np.argmax(softmax)
        p_count[label] = p_count[label] + 1 
        slabel = lines[label]   
        count = count + 1  
    print("Actual #(NORMAL, PNEUMONIA)= ({}, {})". format(o_count[0], o_count[1]))
    print("Predic #(NORMAL, PNEUMONIA)= ({}, {})". format(p_count[0], p_count[1]))
    print("Percentage Prediction (NORMAL, PNEUMONIA)=({}, {})".format((p_count[0]*100)/o_count[0], (p_count[1]*100)/o_count[1])) 
    n2cube.dpuDestroyTask(task)

In [20]:
run_tasks(kernel)

Total Test Image Count=1000
Actual #(NORMAL, PNEUMONIA)= (500, 500)
Predic #(NORMAL, PNEUMONIA)= (519, 481)
Percentage Prediction (NORMAL, PNEUMONIA)=(103.8, 96.2)


### Clean up

Finally, when you are done with the DPU experiments, remember to destroy the kernel and close the DPU.

In [21]:
n2cube.dpuDestroyKernel(kernel)
n2cube.dpuClose()

0