# Explore GPU

## Sanity Check #1:
### Run Standard `nvidia-smi` Tool

In [None]:
%%bash 

nvidia-smi

## Sanity Check #2: 
### Run Accelerated Linear Algebra (XLA) Tests

In [None]:
%%bash 

xla_device_test &> xla_device_test.log

tail -3 xla_device_test.log

## Run Some CUDA Code!

### Show CUDA Code

In [None]:
%%bash 

cat /root/src/main/cuda/SumArrays.cu

### Run CUDA Code and Verify Expected Output
```
### EXPECTED OUTPUT ###
...
*** Awesome!  The GPU summed the arrays!! ***
...
```
_Note the execution time._

In [None]:
%%bash 

sum_arrays

## Open a Terminal through Jupyter Notebook 
### (Menu Bar -> Terminal -> New Terminal)
![Jupyter Terminal](https://s3.amazonaws.com/fluxcapacitor.com/img/jupyter-terminal.png)

### Run this Command to Watch GPU Every Second:
```
watch -n 1 nvidia-smi
```

## Run Code In Loop, Watch GPU
_Note:  Don't go higher than 10!_

Otherwise the following may happen:
* this cell will take a long time to finish
* you may kill your instance!!

In [None]:
%%bash

# Don't go above 10!!
for _ in {1..10}
do
  sum_arrays > /dev/null 2>&1
done

echo "...Done!"

## Run Some Advanced CUDA Code!
We lower overall execution time using async, stream-based memcpy

### Show Advanced CUDA Code, Find `Stream`

In [None]:
%%bash 

cat /root/src/main/cuda/SumArraysAsyncMemcpy.cu

### Run CUDA Code and Verify Expected Output
```
### EXPECTED OUTPUT ###
...
*** Awesome!  The GPU summed the arrays!! ***
...
```
_Also, note the lower execution time due to async memcpy._

In [None]:
%%bash 

sum_arrays_async_memcpy

In [None]:
%%bash

# Don't go above 10!!
for _ in {1..10}
do
  sum_arrays_async_memcpy > /dev/null 2>&1
done

echo "...Done!"