Create a virtual environment for Python to run in:

`$ python3.8 -m venv .venv`

Activate the virtual environment

`$ source .venv/bin/activate`

Update pip and setuptools

`$ pip install --upgrade pip setuptools`

Install requirements

`$ pip install -r requirements.txt`


# Research

The initial goal is to determine the different variables that we can change to see how effieiency changes.
As of now, these are:
- GPU Frequency (14)
- CPU Frequency
- Memory Frequency
- Matrix Size (128 to 2048, with steps of 128) (16)
- Deep Learning Accelerators (DLAs)
- Tensor Cores (2)
- Data Types (Half, Float, Double) (3)

Ideally the goal would be to test all combinations of them, but as there are over 30,000 combinations it's unreasonable.

For the tests I chose to do all 14 of the GPU frequencies, Matrix sizes from 128 to 2048 with steps of 128 (16 total tests), with and without tensor cores, for 3 data tytpes (Half, Float, and Double). This gives 1344 tests.

## AGX Info

```
$ cat /etc/nv_tegra_release 
# R32 (release), REVISION: 4.4, GCID: 23942405, BOARD: t186ref, EABI: aarch64, DATE: Fri Oct 16 19:37:08 UTC 2020
```

```
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_21:14:42_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
```

## Nano Info
Todo

# Benchmarking Procedures

The `benchmark.cu` file is used for benchmarking the Jetson boards using various options.

Before each test, the CPU min/max frequency is set to it's maximum frequency (can also be changed later for more power usage info).

```
$ echo "2265600" | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_{min,max}_freq
```

The GPU frequency is then set

AGX
```
# All available frequencies: 114750000 216750000 318750000 420750000 522750000 624750000 675750000 828750000 905250000 1032750000 1198500000 1236750000 1338750000 1377000000
$ echo "1377000000" | sudo tee /sys/devices/17000000.gv11b/devfreq/17000000.gv11b/{min,max}_freq
```

Nano
```
$ todo
```

In [43]:
# Data Preprocessing

import os
%matplotlib widget
import matplotlib.pyplot as plt
import numpy as np

import ipywidgets as widgets
from ipywidgets import interact, interact_manual

path = "./data/AGX/"
files = os.listdir(path)

data = []

for file_name in files:
    temp = {
        # Inputs
        "datatype": "",
        "matrix_size": -1,
        "tensor": None,
        "gpu_frequency": -1,

        # Results
        "power_usage": [],
        "flops": -1,
        
        # Calculated Results
        "avg_power": -1,
        "flops_per_watt": -1
    }
    with open(path+file_name, "r") as f:
        temp['datatype'], temp['matrix_size'], temp['tensor'], temp['gpu_frequency'] = file_name.split(".")[0].split("-")[1:]
        temp['matrix_size'] = float(temp['matrix_size'])
        temp['tensor'] = True if temp['tensor'].lower() == "tensor" else False
        temp['gpu_frequency'] = float(temp['gpu_frequency'])

        file_data = f.readlines()

        _, temp['power_usage'] = zip(*[d.strip().split(",") for d in file_data[:-1]])
        temp['power_usage'] = list(map(float, temp['power_usage']))
        temp['avg_power'] = sum(temp['power_usage'])/len(temp['power_usage'])

        temp['flops'] = float(file_data[-1])

        temp['flops_per_watt'] = temp['flops'] / temp['avg_power']
    
    data.append(temp)

In [42]:
# BEGIN WIDGETS

@interact
def select_1(fixed1=["datatype", "tensor", "matrix_size", "gpu_frequency"],
              fixed2=["datatype", "tensor", "matrix_size", "gpu_frequency"],
              fixed3=["datatype", "tensor", "matrix_size", "gpu_frequency"],
              gpu_frequency=[114750000, 216750000, 318750000, 420750000, 522750000, 624750000, 675750000, 828750000, 905250000, 1032750000, 1198500000, 1236750000, 1338750000, 1377000000],
              matrix_size=np.arange(64, 2048, step=64),
              tensor=[True, False],
              datatype=["half", "float", "double"],
              x=["datatype", "tensor", "matrix_size", "gpu_frequency", "avg_power", "flops", "flops_per_watt"],
              y=["datatype", "tensor", "matrix_size", "gpu_frequency", "avg_power", "flops", "flops_per_watt"]):

    # We want to look for 3 of the 4 searchable options: datatype, tensor, matrix_size, and gpu_frequency
    search = {}
    search[fixed1] = eval(fixed1)
    search[fixed2] = eval(fixed2)
    search[fixed3] = eval(fixed3)

    results = [d for d in data if search.items() <= d.items()]
    x, y = zip(*sorted([(r[x], r[y]) for r in results], key=lambda d : d[0]))

    fig, ax = plt.subplots()
    ax.set_xticks(np.arange(0, 2304, step=256))
    ax.plot(x, y)

    plt.show()
    # set_data()

@interact
def select_2(fixed1=["datatype", "tensor", "matrix_size", "gpu_frequency"],
              fixed2=["datatype", "tensor", "matrix_size", "gpu_frequency"],
              fixed3=["datatype", "tensor", "matrix_size", "gpu_frequency"],
              gpu_frequency=[114750000, 216750000, 318750000, 420750000, 522750000, 624750000, 675750000, 828750000, 905250000, 1032750000, 1198500000, 1236750000, 1338750000, 1377000000],
              matrix_size=np.arange(64, 2048, step=64),
              tensor=[True, False],
              datatype=["half", "float", "double"],
              x=["datatype", "tensor", "matrix_size", "gpu_frequency", "avg_power", "flops", "flops_per_watt"],
              y=["datatype", "tensor", "matrix_size", "gpu_frequency", "avg_power", "flops", "flops_per_watt"]):

    # We want to look for 3 of the 4 searchable options: datatype, tensor, matrix_size, and gpu_frequency
    search = {}
    search[fixed1] = eval(fixed1)
    search[fixed2] = eval(fixed2)
    search[fixed3] = eval(fixed3)

    results = [d for d in data if search.items() <= d.items()]
    x, y = zip(*sorted([(r[x], r[y]) for r in results], key=lambda d : d[0]))

    fig, ax = plt.subplots()
    ax.set_xticks(np.arange(0, 2304, step=256))
    ax.plot(x, y)

    plt.show()

# END WIDGETS

interactive(children=(Dropdown(description='fixed1', options=('datatype', 'tensor', 'matrix_size', 'gpu_freque…

interactive(children=(Dropdown(description='fixed1', options=('datatype', 'tensor', 'matrix_size', 'gpu_freque…

The difference in efficiencies with tensor cores enables and disabled.

In [41]:
# [114750000, 216750000, 318750000, 420750000, 522750000, 624750000, 675750000, 828750000, 905250000, 1032750000, 119850000, 1236750000, 133875000, 1377000000]
search_tensor = {
    "datatype": "float",
    "gpu_frequency": 1377000000,
    "tensor": True
}

search_nontensor = {
    "datatype": "float",
    "gpu_frequency": 1377000000,
    "tensor": False
}

results_tensor = [d for d in data if search_tensor.items() <= d.items()]
results_nontensor = [d for d in data if search_nontensor.items() <= d.items()]

for m_size in range(64, 2049, 64):
    t = next(r for r in results_tensor if r['matrix_size'] == m_size)
    nt = next(r for r in results_nontensor if r['matrix_size'] == m_size)

    res = t['flops_per_watt']/nt['flops_per_watt']
    
    print(t['datatype'], t['gpu_frequency'], t['matrix_size'], res)

float 1377000000.0 64.0 0.9294675997623285
float 1377000000.0 128.0 1.0100375330024962
float 1377000000.0 192.0 0.993580271289533
float 1377000000.0 256.0 1.0004127182593558
float 1377000000.0 320.0 0.9931683155671189
float 1377000000.0 384.0 0.9982361431035907
float 1377000000.0 448.0 1.0011837552417064
float 1377000000.0 512.0 1.0016213681790727
float 1377000000.0 576.0 0.9949193534240564
float 1377000000.0 640.0 0.9936327318282169
float 1377000000.0 704.0 1.0003020751877894
float 1377000000.0 768.0 0.9960521623572154
float 1377000000.0 832.0 1.0094845347702088
float 1377000000.0 896.0 1.001199361354049
float 1377000000.0 960.0 0.9970241841484568
float 1377000000.0 1024.0 1.0055879846252083
float 1377000000.0 1088.0 0.9972686160760723
float 1377000000.0 1152.0 0.9989444972230163
float 1377000000.0 1216.0 1.0008076632295422
float 1377000000.0 1280.0 1.0064465739487443
float 1377000000.0 1344.0 0.9897112892161065
float 1377000000.0 1408.0 1.0027391306962667
float 1377000000.0 1472.0 0.