Create a virtual environment for Python to run in:

`$ python3.8 -m venv .venv`

Activate the virtual environment

`$ source .venv/bin/activate`

Update pip and setuptools

`$ pip install --upgrade pip setuptools`

Install requirements

`$ pip install -r requirements.txt`


# Research

The initial goal is to determine the different variables that we can change to see how effieiency changes.
As of now, these are:
- GPU Frequency
- CPU Frequency
- Memory Frequency
- Matrix Size
- Deep Learning Accelerators (DLAs)
- Tensor Cores
- Data Types


## AGX Info

For the AGX all combinations of the 14 GPU frequencies, square matrix sizes (from 64 to 2048 with steps of 64 (32 total tests)), enabling and disabling the tensor cores, as well as the 3 data types (half, float, double).

### System Info

```
$ cat /etc/nv_tegra_release 
# R32 (release), REVISION: 4.4, GCID: 23942405, BOARD: t186ref, EABI: aarch64, DATE: Fri Oct 16 19:37:08 UTC 2020
```

```
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_21:14:42_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
```

## Nano Info

**todo**

### System Info

```
todo
```

# Benchmarking Procedures

The `benchmark.cu` file is used for benchmarking the Jetson boards using various options.

Before each test, the CPU min/max frequency is set to it's maximum frequency (can also be changed later for more power usage info).

```
AGX$ echo "2265600" | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_{min,max}_freq
```

The GPU frequency is then set

### AGX
```
# All available frequencies: 114750000 216750000 318750000 420750000 522750000 624750000 675750000 828750000 905250000 1032750000 1198500000 1236750000 1338750000 1377000000
$ echo "1377000000" | sudo tee /sys/devices/17000000.gv11b/devfreq/17000000.gv11b/{min,max}_freq
```

### Nano
```
$ todo
```

**Note** On the AGX, the fan ramp speed needs to be changed to make the fan more responsive when set.

```
AGX$ echo "5" | sudo tee /sys/devices/pwm-fan/step_time
```

After the GPU and CPU frequencies have been set, the benchmark can be run.

```
$ sudo ./gpu_benchmark
```

In [74]:
# Data Preprocessing

import os

path = "./data/AGX/"
files = os.listdir(path)

data = []

for file_name in files:
    temp = {
        # Inputs
        "datatype": "",
        "matrix_size": -1,
        "tensor": None,
        "gpu_frequency": -1,

        # Results
        "power_usage": [],
        "flops": -1,
        
        # Calculated Results
        "avg_power": -1,
        "flops_per_watt": -1
    }
    with open(path+file_name, "r") as f:
        temp['datatype'], temp['matrix_size'], temp['tensor'], temp['gpu_frequency'] = file_name.split(".")[0].split("-")[1:]
        temp['matrix_size'] = float(temp['matrix_size'])
        temp['tensor'] = True if temp['tensor'].lower() == "tensor" else False
        temp['gpu_frequency'] = float(temp['gpu_frequency'])

        file_data = f.readlines()

        _, temp['power_usage'] = zip(*[d.strip().split(",") for d in file_data[:-1]])
        temp['power_usage'] = list(map(float, temp['power_usage']))
        temp['avg_power'] = sum(temp['power_usage'])/len(temp['power_usage'])

        temp['flops'] = float(file_data[-1])

        temp['flops_per_watt'] = temp['flops'] / temp['avg_power']
    
    data.append(temp)

In [90]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib widget
import ipywidgets as widgets
from IPython.display import display, clear_output
plt.ion()

gpu_frequency = widgets.Dropdown(options=[114750000, 216750000, 318750000, 420750000, 522750000, 624750000, 675750000, 828750000, 905250000, 1032750000, 1198500000, 1236750000, 1338750000, 1377000000], value=1377000000, description="GPU Frequency")
datatype = widgets.Dropdown(options=["half", "float", "double"], value="float", description="Datatype")
y_opt = widgets.Dropdown(options=["avg_power", "flops", "flops_per_watt"], value="flops_per_watt", description="y axis")

output = widgets.Output()

fig, ax = plt.subplots(1, 2, figsize=(20, 5), sharey=True)

search_tensor = {
    "datatype": datatype.value,
    "gpu_frequency": gpu_frequency.value,
    "tensor": True
}

search_nontensor = {
    "datatype": datatype.value,
    "gpu_frequency": gpu_frequency.value,
    "tensor": False
}

search_x = "matrix_size"
search_y = y_opt.value

def refresh_values():
    filtered_tensor = [d for d in data if search_tensor.items() <= d.items()]
    filtered_nontensor = [d for d in data if search_nontensor.items() <= d.items()]
    x, y = [[],[]], [[],[]]
    x[0], y[0] = zip(*sorted([(r[search_x], r[search_y]) for r in filtered_tensor], key=lambda d : d[0]))
    x[1], y[1] = zip(*sorted([(r[search_x], r[search_y]) for r in filtered_nontensor], key=lambda d : d[0]))
    for i in range(2):
        ax[i].clear()
        ax[i].set_title(f"Tensor Cores {'Enabled' if i == 0 else 'Disabled'}")
        ax[i].plot(x[i], y[i])
        ax[i].set_xticks(np.arange(0, 2049, 128))

def change_gpu_frequency(change):
    search_tensor["gpu_frequency"] = change["new"]
    search_nontensor["gpu_frequency"] = change["new"]
    refresh_values()

def change_datatype(change):
    search_tensor["datatype"] = change["new"]
    search_nontensor["datatype"] = change["new"]
    refresh_values()

def update_y(change):
    global search_y
    search_y = change["new"]
    refresh_values()

gpu_frequency.observe(change_gpu_frequency, names="value")
datatype.observe(change_datatype, names="value")

y_opt.observe(update_y, names="value")

display(
    widgets.HBox([
        widgets.VBox([gpu_frequency, datatype]),
        widgets.VBox([y_opt]),
        widgets.VBox([output])
    ])
)

refresh_values()
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

HBox(children=(VBox(children=(Dropdown(description='GPU Frequency', index=13, options=(114750000, 216750000, 3…

In [None]:
# # BEGIN WIDGETS

# @interact
# def select_1(fixed1=["datatype", "tensor", "matrix_size", "gpu_frequency"],
#               fixed2=["datatype", "tensor", "matrix_size", "gpu_frequency"],
#               fixed3=["datatype", "tensor", "matrix_size", "gpu_frequency"],
#               gpu_frequency=[114750000, 216750000, 318750000, 420750000, 522750000, 624750000, 675750000, 828750000, 905250000, 1032750000, 1198500000, 1236750000, 1338750000, 1377000000],
#               matrix_size=np.arange(64, 2048, step=64),
#               tensor=[True, False],
#               datatype=["half", "float", "double"],
#               x=["datatype", "tensor", "matrix_size", "gpu_frequency", "avg_power", "flops", "flops_per_watt"],
#               y=["datatype", "tensor", "matrix_size", "gpu_frequency", "avg_power", "flops", "flops_per_watt"]):

#     # We want to look for 3 of the 4 searchable options: datatype, tensor, matrix_size, and gpu_frequency
#     search = {}
#     search[fixed1] = eval(fixed1)
#     search[fixed2] = eval(fixed2)
#     search[fixed3] = eval(fixed3)

#     results = [d for d in data if search.items() <= d.items()]
#     x, y = zip(*sorted([(r[x], r[y]) for r in results], key=lambda d : d[0]))

#     fig, ax = plt.subplots()
#     ax.set_xticks(np.arange(0, 2304, step=256))
#     ax.plot(x, y)

#     plt.show()
#     # set_data()

# @interact
# def select_2(fixed1=["datatype", "tensor", "matrix_size", "gpu_frequency"],
#               fixed2=["datatype", "tensor", "matrix_size", "gpu_frequency"],
#               fixed3=["datatype", "tensor", "matrix_size", "gpu_frequency"],
#               gpu_frequency=[114750000, 216750000, 318750000, 420750000, 522750000, 624750000, 675750000, 828750000, 905250000, 1032750000, 1198500000, 1236750000, 1338750000, 1377000000],
#               matrix_size=np.arange(64, 2048, step=64),
#               tensor=[True, False],
#               datatype=["half", "float", "double"],
#               x=["datatype", "tensor", "matrix_size", "gpu_frequency", "avg_power", "flops", "flops_per_watt"],
#               y=["datatype", "tensor", "matrix_size", "gpu_frequency", "avg_power", "flops", "flops_per_watt"]):

#     # We want to look for 3 of the 4 searchable options: datatype, tensor, matrix_size, and gpu_frequency
#     search = {}
#     search[fixed1] = eval(fixed1)
#     search[fixed2] = eval(fixed2)
#     search[fixed3] = eval(fixed3)

#     results = [d for d in data if search.items() <= d.items()]
#     x, y = zip(*sorted([(r[x], r[y]) for r in results], key=lambda d : d[0]))

#     fig, ax = plt.subplots()
#     ax.set_xticks(np.arange(0, 2304, step=256))
#     ax.plot(x, y)

#     plt.show()

# # END WIDGETS

NameError: name 'interact' is not defined

In [None]:
# [114750000, 216750000, 318750000, 420750000, 522750000, 624750000, 675750000, 828750000, 905250000, 1032750000, 119850000, 1236750000, 133875000, 1377000000]
search_tensor = {
    "datatype": "float",
    "gpu_frequency": 1377000000,
    "tensor": True
}

search_nontensor = {
    "datatype": "float",
    "gpu_frequency": 1377000000,
    "tensor": False
}

results_tensor = [d for d in data if search_tensor.items() <= d.items()]
results_nontensor = [d for d in data if search_nontensor.items() <= d.items()]

for m_size in range(64, 2049, 64):
    t = next(r for r in results_tensor if r['matrix_size'] == m_size)
    nt = next(r for r in results_nontensor if r['matrix_size'] == m_size)

    res = t['flops_per_watt']/nt['flops_per_watt']
    
    print(t['datatype'], t['gpu_frequency'], t['matrix_size'], res)