# DPU example: Pneumonia/COVID Detection Model - Pnem3

This notebooks shows an example of DPU applications. The application,
as well as the DPU IP, is pulled from the official 
[Vitis AI Github Repository](https://github.com/Xilinx/Vitis-AI).
For more information, please refer to the 
[Xilinx Vitis AI page](https://www.xilinx.com/products/design-tools/vitis/vitis-ai.html).

In this notebook, we will show how to use **Python API** to run DPU tasks.

## 1. Prepare the overlay
We will download the overlay onto the board.

In [1]:
from pynq_dpu import DpuOverlay
overlay = DpuOverlay("../dpu.bit")

The VAI package has been installed onto your board. There are multiple
binaries installed; for example, you can check the current DPU status using
`dexplorer`. You should be able to see reasonable values from the output.

!dexplorer -w

In [2]:
#!dexplorer -h

The compiled quantized model may have different kernel names depending on the DPU architectures.
This piece of information can usually be found when compiling the `*.elf` model file.

The `*.elf` model will be compiled into a shared object file; that file has to be copied to your 
library path for inference in DPU.

By default, the naming convention for model files is:

* Model `*.elf` file: `dpu_<kernel_name>[_0].elf`
* Shared object `*.so` file: `libdpumodel<kernel_name>.elf`

All of these steps are handled by `load_model()` method.

In [3]:
overlay.load_model("./models/dpu_Pnem3_0.elf")

## 2. Run Python program

We will use Vitis-AI's Python API to run DPU tasks.
In this example, we will set the number of iterations to 500, meaning 
that a single picture will be taken and classified 500 times.
Users can adjust this value if they want.

In [4]:
from ctypes import *
import cv2
import numpy as np
from dnndk import n2cube
import os
import threading
import time
from pynq_dpu import dputils  

lock = threading.Lock()

In [5]:
#*** Pneumonia and COVID detection Model with 150x150 X-Ray image input

KERNEL_CONV = "Pnem3_0"  
KERNEL_CONV_INPUT = "conv2d_1_convolution"
KERNEL_FC_OUTPUT = "dense_out_MatMul" 

img_dims=150

The ./data/ folder contains two subfolders with set of X-Ray images 1) sample - with 30 images and 2) images - with 1100 images

In [6]:
from IPython.display import display
from PIL import Image
 
image_folder = "./data/samples/"    
listimage = [i for i in os.listdir(image_folder)]
print("Sample Image Count={}".format(len(listimage)))  
imfile = listimage[0]
with open("./labels.txt", "r") as f:
    lines = f.readlines()

Sample Image Count=53


We will also open and initialize the DPU device. We will create a DPU kernel and reuse it.
Throughout the entire notebook, we don't have to redo this step.

**Note**: if you open and close DPU multiple times, the Jupyter kernel might die;
this is because the current Vitis-AI implementation requires bitstream to be downloaded by XRT,
which is not supported by `pynq` package. Hence we encourage users to stay with
one single DPU session, both for program robustness and higher performance.

In [7]:
n2cube.dpuOpen() 

0

In [8]:
kernel = n2cube.dpuLoadKernel(KERNEL_CONV) 

### Single execution
We define a function that will use the DPU to make a prediction on an input 
image and provide a softmax output.

In [9]:
def predict_label(imfile):
    task = n2cube.dpuCreateTask(kernel, 0)

    path = os.path.join(image_folder, imfile)
    img = cv2.imread(path) 
    img = cv2.resize(img, (img_dims, img_dims))
    img = img.astype(np.float32)
    img = (img/255.0) 
        
    """Get input Tensor"""
    tensor = n2cube.dpuGetInputTensor(task, KERNEL_CONV_INPUT)
    input_len = n2cube.dpuGetInputTensorSize(task, KERNEL_CONV_INPUT)   
    #print(input_len)
        
    """Set input Tensor"""
    n2cube.dpuSetInputTensorInHWCFP32(task, KERNEL_CONV_INPUT, img, input_len)

    """Model run on DPU"""
    n2cube.dpuRunTask(task)
        
    """Get the output Tensor size from FC output"""
    size = n2cube.dpuGetOutputTensorSize(task, KERNEL_FC_OUTPUT)

    """Get the output Tensor channel from FC output"""
    channel = n2cube.dpuGetOutputTensorChannel(task, KERNEL_FC_OUTPUT)

    softmax = np.zeros(size,dtype=np.float32)

    """Get FC result"""
    conf = n2cube.dpuGetOutputTensorAddress(task, KERNEL_FC_OUTPUT)

    """Get output scale of FC"""
    outputScale = n2cube.dpuGetOutputTensorScale(task, KERNEL_FC_OUTPUT)

    """Run softmax"""
    softmax = n2cube.dpuRunSoftmax(conf, channel, size // channel, outputScale)
     
    #print("size=", size)
    #print("channel=", channel)
    #print("outputScale=", outputScale)
    print("softmax =", softmax)

    n2cube.dpuDestroyTask(task)
    
    return lines[np.argmax(softmax)]

In [10]:
print("INFO: test with sample X-Ray images")
i = 0
for img in listimage:
    print("{}  {}".format(i, img))
    i = i + 1 
    label = predict_label(img)
    print('Class label: {}'.format(label))

INFO: test with sample X-Ray images
0  COVID_1-s2.0-S1684118220300608-main.pdf-002.jpg
softmax = [ 0.01752752  0.02550239  0.9569701 ]
Class label: COVID

1  COVID_covid-19-pneumonia-53.jpg
softmax = [  9.73872304e-01   2.59528179e-02   1.74868706e-04]
Class label: NORMAL

2  PNEUMONIA_fe1289af-87c8-49c6-800c-1e87b2296df8.jpg
softmax = [  2.29861223e-04   9.96969879e-01   2.80028302e-03]
Class label: PNEUMONIA

3  NORMAL_8d3d0ba6-799d-4874-a804-75a4c0cf4ae6.jpg
softmax = [ 0.07070307  0.40686801  0.52242893]
Class label: COVID

4  COVID_covid-19-pneumonia-8.jpg
softmax = [  5.81280037e-05   2.80076405e-03   9.97141123e-01]
Class label: COVID

5  PNEUMONIA_849217ad-5cb2-4a87-9796-420b69722b14.jpg
softmax = [ 0.01766842  0.96466309  0.01766842]
Class label: PNEUMONIA

6  COVID_COVID-00027.jpg
softmax = [  2.60980363e-04   8.03876086e-04   9.98935163e-01]
Class label: COVID

7  NORMAL_8cfcc69b-d659-4888-8551-790891d8568e.jpg
softmax = [  9.73872304e-01   2.59528179e-02   1.74868706e-04]
C

### Multiple executions
After we have verified the correctness of a single execution, we can
try multiple executions and measure the throughput in Frames Per Second (FPS).

Let's define a function that processes a single image in multiple iterations. 
The parameters are:
* `kernel`: DPU kernel.
* `img`: image to be classified.
* `count` : test rounds count.

The number of iterations is defined as `num_iterations` in previous cells.

In [11]:
path = os.path.join(image_folder, imfile)
img = cv2.imread(path) 
img = cv2.resize(img, (img_dims, img_dims))
img = img.astype(np.float32)
img = (img/255.0) 

def run_dpu_task(kernel, img, count):
    task = n2cube.dpuCreateTask(kernel, 0)
    
    count = 0
    while count < num_iterations:
           
        """Get input Tensor"""
        tensor = n2cube.dpuGetInputTensor(task, KERNEL_CONV_INPUT)
        input_len = n2cube.dpuGetInputTensorSize(task, KERNEL_CONV_INPUT)   
        
        """Set input Tensor"""
        n2cube.dpuSetInputTensorInHWCFP32(task, KERNEL_CONV_INPUT, img, input_len)

        """Model run on DPU"""
        n2cube.dpuRunTask(task)
        
        """Get the output Tensor size from FC output"""
        size = n2cube.dpuGetOutputTensorSize(task, KERNEL_FC_OUTPUT)

        """Get the output Tensor channel from FC output"""
        channel = n2cube.dpuGetOutputTensorChannel(task, KERNEL_FC_OUTPUT)

        softmax = np.zeros(size,dtype=np.float32)

        """Get FC result"""
        conf = n2cube.dpuGetOutputTensorAddress(task, KERNEL_FC_OUTPUT)

        """Get output scale of FC"""
        outputScale = n2cube.dpuGetOutputTensorScale(task, KERNEL_FC_OUTPUT)

        """Run softmax"""
        softmax = n2cube.dpuRunSoftmax(conf, channel, size // channel, outputScale)

        lock.acquire()
        count = count + threadnum
        lock.release()

    n2cube.dpuDestroyTask(task)

Now we are able to run the batch processing and print out DPU throughput.
Users can change the image to point to other picture locations.
We will use the previously defined and classified image `img` and process it for
`num_interations` times.

In this example, we will just use a single thread.

The following cell may take a while to run. Please be patient.

In [12]:
threadAll = []
threadnum = 1
num_iterations = 500
start = time.time()

for i in range(threadnum):
    t1 = threading.Thread(target=run_dpu_task, args=(kernel, img, i))
    threadAll.append(t1)
for x in threadAll:
    x.start()
for x in threadAll:
    x.join()

end = time.time()

fps = float(num_iterations/(end-start))
print("%.2f FPS" % fps)

111.48 FPS


In [13]:
threadAll = []
threadnum = 4
num_iterations = 500
start = time.time()

for i in range(threadnum):
    t1 = threading.Thread(target=run_dpu_task, args=(kernel, img, i))
    threadAll.append(t1)
for x in threadAll:
    x.start()
for x in threadAll:
    x.join()

end = time.time()

fps = float(num_iterations/(end-start))
print("%.2f FPS" % fps)

190.37 FPS


In [14]:
threadAll = []
threadnum = 8
num_iterations = 500
start = time.time()

for i in range(threadnum):
    t1 = threading.Thread(target=run_dpu_task, args=(kernel, img, i))
    threadAll.append(t1)
for x in threadAll:
    x.start()
for x in threadAll:
    x.join()

end = time.time()

fps = float(num_iterations/(end-start))
print("%.2f FPS" % fps)

186.88 FPS


In [15]:
def run_tasks(kernel):
    task = n2cube.dpuCreateTask(kernel, 0)
    
    o_count = [0, 0, 0]
    p_count = [0, 0, 0]
     
    image_folder = "./data/images/"  
    listimage = [i for i in os.listdir(image_folder)] 
    num_iterations=len(listimage)
    print("Total Test Image Count={}".format(num_iterations)) 
    count = 0

    with open("./labels.txt", "r") as f:
        lines = f.readlines()
        
    print("====================== Start: Test results with 100 COVID mark X-Rays ==================")
    while count < num_iterations:
        imfile = listimage[count]
        path = os.path.join(image_folder, imfile)
        img = cv2.imread(path) 
        img = cv2.resize(img, (img_dims, img_dims))
        img = img.astype(np.float32)
        img = (img/255.0) 
        
        """Get input Tensor"""
        tensor = n2cube.dpuGetInputTensor(task, KERNEL_CONV_INPUT)
        input_len = n2cube.dpuGetInputTensorSize(task, KERNEL_CONV_INPUT)   
        #print(input_len)
        
        """Set input Tensor"""
        n2cube.dpuSetInputTensorInHWCFP32(task, KERNEL_CONV_INPUT, img, input_len)

        """Model run on DPU"""
        n2cube.dpuRunTask(task)
        
        """Get the output Tensor size from FC output"""
        size = n2cube.dpuGetOutputTensorSize(task, KERNEL_FC_OUTPUT)

        """Get the output Tensor channel from FC output"""
        channel = n2cube.dpuGetOutputTensorChannel(task, KERNEL_FC_OUTPUT)

        softmax = np.zeros(size,dtype=np.float32)

        """Get FC result"""
        conf = n2cube.dpuGetOutputTensorAddress(task, KERNEL_FC_OUTPUT)

        """Get output scale of FC"""
        outputScale = n2cube.dpuGetOutputTensorScale(task, KERNEL_FC_OUTPUT)

        """Run softmax"""
        softmax = n2cube.dpuRunSoftmax(conf, channel, size // channel, outputScale)

        expected = False
        if imfile[0] == 'N':  
            o_count[0] = o_count[0] + 1 
        elif imfile[0] == 'P':
            o_count[1] = o_count[1] + 1 
        elif imfile[0] == 'C':
            o_count[2] = o_count[2] + 1 
        else:
            print("ERROR: Invalid file tag in {}".format(imfile))
        
        label = np.argmax(softmax)
        p_count[label] = p_count[label] + 1 
        slabel = lines[label] 
        if imfile[0] == 'C':
            print("Name= {}  Softmax={}  Prediction= {}".format(imfile, softmax, slabel))
        count = count + 1
    print("====================== Done: Test results with COVID mark X-Rays ==================")
    print("\n")
    print("====================== Start: Test results with All X-Rays ==================")
    print("Actual #(NORMAL, PNEUMONIA, COVID)= ({}, {}, {})". format(o_count[0], o_count[1], o_count[2]))
    print("Predic #(NORMAL, PNEUMONIA, COVID)= ({}, {}, {})". format(p_count[0], p_count[1], p_count[2]))
    print("Percentage Prediction (NORMAL, PNEUMONIA, COVID)=({}, {}, {})".format((p_count[0]*100)/o_count[0], (p_count[1]*100)/o_count[1],
                                                                                 (p_count[2]*100)/o_count[2])) 
    n2cube.dpuDestroyTask(task)

In [16]:
run_tasks(kernel)

Total Test Image Count=1100
Name= COVID_COVID-19(153).png  Softmax=[ 0.0228271   0.00654007  0.97063279]  Prediction= COVID

Name= COVID_1-s2.0-S1684118220300608-main.pdf-002.jpg  Softmax=[ 0.01752752  0.02550239  0.9569701 ]  Prediction= COVID

Name= COVID_covid-19-pneumonia-53.jpg  Softmax=[  9.73872304e-01   2.59528179e-02   1.74868706e-04]  Prediction= NORMAL

Name= COVID_COVID-19 (128).png  Softmax=[  9.09297436e-04   1.92498276e-03   9.97165740e-01]  Prediction= COVID

Name= COVID_COVID-19 (55).png  Softmax=[ 0.00102748  0.00460484  0.99436772]  Prediction= COVID

Name= COVID_covid-19-pneumonia-8.jpg  Softmax=[  5.81280037e-05   2.80076405e-03   9.97141123e-01]  Prediction= COVID

Name= COVID_covid-19-pneumonia-14-PA.png  Softmax=[  1.23295365e-04   8.03986797e-04   9.99072731e-01]  Prediction= COVID

Name= COVID_F63AB6CE-1968-4154-A70F-913AF154F53D.jpeg  Softmax=[ 0.00747147  0.01395855  0.97856992]  Prediction= COVID

Name= COVID_COVID-19 (131).png  Softmax=[ 0.00333011  0.1818

### Clean up

Finally, when you are done with the DPU experiments, remember to destroy the kernel and close the DPU.

In [17]:
n2cube.dpuDestroyKernel(kernel)
n2cube.dpuClose()

0