# Cifar-10 testset classification on Pynq Cluster

This notebook uses a convolutional QNN to classify the CIFAR-10 dataset. It uses dask to split the inference task to a cluster of Pynq boards connected to this machine.

This notebook modifies https://github.com/Xilinx/BNN-PYNQ/blob/master/notebooks/CNV-QNN_Cifar10_Testset.ipynb

## The Cifar-10 testset

This notebook requires the testset from https://www.cs.toronto.edu/~kriz/cifar.html which contains 10000 images that can be processed by CNV network directly without preprocessing.

You can download the cifar-10 set from given url and unzip it to a folder as shown below.
This may take a while as the training set is included in the archive as well.

In [1]:
# Use the command appropriate to your OS - wget (for linux) or curl (for macOS)

# !wget https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz
!curl -O https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz
#unzip
!tar -xf cifar-10-binary.tar.gz



  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  162M  100  162M    0     0  6841k      0  0:00:24  0:00:24 --:--:-- 9892k


In [2]:
from dask.distributed import Client, progress, get_worker

# Insert the scheduler IP below (available after running the "dask-scheduler" command)
client = Client("tcp://192.168.2.1:8786")
client

0,1
Client  Scheduler: tcp://192.168.2.1:8786  Dashboard: http://192.168.2.1:8787/status,Cluster  Workers: 0  Cores: 0  Memory: 0 B


In [6]:
def run_on_worker(data):
    print(f"Received task from scheduler with data {len(data)} bytes")
    from multiprocessing import Process,Queue
    from PIL import Image
    import time
    t0 = time.time()
    
    def use_overlay(queue, file_path, labels):
        import bnn
        from pynq import Xlnk        
        
        hw_classifier = bnn.CnvClassifier(bnn.NETWORK_CNVW2A2,'cifar10',bnn.RUNTIME_HW)
        print("Classifying.....")
        result_W2A2 = hw_classifier.classify_cifars(file_path)
        time_W2A2 = hw_classifier.usecPerImage
        print(time_W2A2)

        countRight = 0
        for idx in range(len(labels)):
            if labels[idx] == result_W2A2[idx]:
                countRight += 1
        accuracyW2A2 = countRight*100/len(labels)
        print("Accuracy W2A2: ",accuracyW2A2,"%")

        xlnk = Xlnk()
        xlnk.xlnk_reset()
        queue.put(result_W2A2)
        
    
    labels = []
    i = 0
    
    #Extract labels to calculate accuracy later
    while i<len(data):
        labels.append(int.from_bytes(data[i:i+1], byteorder="big"))
        i += 3073    #(1 byte of label + 32*32*3 bytes of image)
    
    # Writing to a file is necessary since this overlay expects a file path present on the Pynq board
    file_path = "input_data.bin"
    with open(file_path, "wb") as outfile:
        outfile.write(data)
    
    # We need to run the Pynq overlay in a new forked process since it cannot be run in a non-Main thread
    queue = Queue()
    p = Process(target=use_overlay, args=(queue,file_path, labels))
    p.start()
    result = queue.get()
    p.join()
    t1 = time.time()
    print("EXECUTION TIME ON THIS WORKER: ", t1 - t0)
    return result

In [7]:
import time
t0 = time.time()

num_of_workers = len(client.scheduler_info()["workers"])
data_split = []

# Split up the dataset into equal sized chunks based on number of available dask workers
with open("cifar-10-batches-bin/data_batch_1.bin", "rb") as ifile:    
    total = ifile.read()
    start = 0
    chunk_size = int(len(total)/num_of_workers)
    for i in range(num_of_workers):
        data_split.append(total[start: start+chunk_size])
        start += chunk_size
    print(f"Split image data into {num_of_workers} chunk(s)")
    

# Scatter the data to the workers before calling run_on_worker on the workers
distributed_data = client.scatter(data_split)
futures = client.map(run_on_worker, distributed_data)

#Print the output returned by the workers
print("Result", client.gather(futures))

t1 = time.time()
print("TOTAL EXECUTION TIME: ", t1 - t0)

Split image data into 2 chunk(s)
Result [array([6, 9, 9, ..., 5, 4, 6], dtype=int32), array([6, 7, 9, ..., 1, 1, 5], dtype=int32)]
TOTAL EXECUTION TIME:  22.457534313201904
