# Research

The initial goal is to determine the different variables that we can change to see how effieiency changes.
As of now, these are:
- GPU Frequency
- CPU Frequency
- Memory Frequency
- Matrix Size
- Deep Learning Accelerators (DLAs)
- Tensor Cores
- Data Types (Half, Float, Double)

Ideally the goal would be to test all combinations of them, but as there are over 30,000 combinations it's unreasonable.

## AGX Info

```
$ cat /etc/nv_tegra_release 
# R32 (release), REVISION: 4.4, GCID: 23942405, BOARD: t186ref, EABI: aarch64, DATE: Fri Oct 16 19:37:08 UTC 2020
```

```
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_21:14:42_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
```

## Nano Info
Todo

In an attempt to simplify things I created a templated function, `benchmark`, that can take `__half`, `float`, or `double` types.

```C++
template <typename T>
void benchmark(int min_dim, int max_dim) {
    cublasHandle_t handle;
    cublasCreate(&handle);

    T* h_A = (T*)malloc(max_dim * max_dim * sizeof(T));
    T* h_B = (T*)malloc(max_dim * max_dim * sizeof(T));
    T* h_C = (T*)malloc(max_dim * max_dim * sizeof(T));

    fill_random(h_A, max_dim);
    fill_random(h_B, max_dim);

    T *d_A, *d_B, *d_C;
    cudaMalloc(&d_A, max_dim * max_dim * sizeof(T));
    cudaMalloc(&d_B, max_dim * max_dim * sizeof(T));
    cudaMalloc(&d_C, max_dim * max_dim * sizeof(T));

    cudaMemcpy(d_A, h_A, max_dim * max_dim * sizeof(T), cudaMemcpyHostToDevice);
    cudaMemcpy(d_B, h_B, max_dim * max_dim * sizeof(T), cudaMemcpyHostToDevice);
    cudaMemcpy(d_C, h_C, max_dim * max_dim * sizeof(T), cudaMemcpyHostToDevice);

    for (int dim = min_dim; dim <= max_dim; dim *= 2) {
        gemm(handle, dim, d_A, d_B, d_C);
    }
}
```

The `gemm` function is templated to run the correct CUDA GEMM function for the data types.
```C++
...
template <>
void gemm(cublasHandle_t handle, int dim, __half *d_A, __half *d_B, __half *d_C) {
    printf("Half - %d\n", dim);
    __half alpha = __float2half(1.0f);
    __half beta = __float2half(1.0f);
    cublasHgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, dim, dim, dim, &alpha, d_A, dim, d_B, dim, &beta, d_C, dim);
}

template <>
void gemm(cublasHandle_t handle, int dim, float *d_A, float *d_B, float *d_C) {
    printf("Float - %d\n", dim);
    float alpha = 1;
    float beta = 1;
    cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, dim, dim, dim, &alpha, d_A, dim, d_B, dim, &beta, d_C, dim);
}
...
```