# Using FABRIC GPUs

Your compute nodes can include GPUs. These devices are made available as FABRIC components and can be added to your nodes like any other component.

This example notebook will demonstrate how to reserve and use Nvidia GPU devices on FABRIC.


## Setup the Experiment

#### Import FABRIC API

In [None]:
from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager

fablib = fablib_manager()
                     
fablib.show_config();

## Create a Node

The cells below help you create a slice that contains a single node with an attached GPU. 

### Select GPU Type and select the FABRIC Site

First decide on which GPU type you want to try - this will determine the subset of sites where your VM can be placed.

In [None]:
# pick which GPU type we will use (execute this cell). 

# choices include
# GPU_RTX6000
# GPU_TeslaT4
# GPU_A30
# GPU_A40
GPU_CHOICE = 'GPU_RTX6000' 

# don't edit - convert from GPU type to a resource column name
# to use in filter lambda function below
choice_to_column = {
    "GPU_RTX6000": "rtx6000_available",
    "GPU_TeslaT4": "tesla_t4_available",
    "GPU_A30": "a30_available",
    "GPU_A40": "a40_available"
}

column_name = choice_to_column.get(GPU_CHOICE, "Unknown")
print(f'{column_name=}')

Give the slice and the node in it meaningful names.

In [None]:
# name the slice and the node 
slice_name=f'My Simple GPU Slice with {GPU_CHOICE}'
node_name='gpu-node'

print(f'Will create slice "{slice_name}" with node "{node_name}"')

Use a lambda filter to figure out which site the node will go to.

In [None]:
# find a site with at least one available GPU of the selected type
site_override = None

if site_override:
    site = site_override
else:
    site = fablib.get_random_site(filter_function=lambda x: x[column_name] > 0)
print(f'Preparing to create slice "{slice_name}" with node {node_name} in site {site}')

Create the desired slice with a GPU component. 

In [None]:
# Create Slice. Note that by default submit() call will poll for 360 seconds every 10-20 seconds
# waiting for slice to come up. Normal expected time is around 2 minutes. 
slice = fablib.new_slice(name=slice_name)

# Add node with a 100G drive and a couple of CPU cores (default)
node = slice.add_node(name=node_name, site=site, disk=100, image='default_ubuntu_22')
node.add_component(model=GPU_CHOICE, name='gpu1')

#Submit Slice Request
slice.submit();

## Get the Slice

Retrieve the node information and save the management IP addresses.

In [None]:
slice = fablib.get_slice(name=slice_name)
slice.show();

## Get the Node

Retrieve the node information and save the management IP address.


In [None]:
node = slice.get_node(node_name) 
node.show()

gpu = node.get_component('gpu1')
gpu.show();


## GPU PCI Device

Run the command <code>lspci</code> to see your GPU PCI device(s). This is the raw GPU PCI device that is not yet configured for use.  You can use the GPUs as you would any GPUs.

View node's GPU

In [None]:
command = "sudo apt-get install -y pciutils && lspci | grep 'NVIDIA|3D controller'"

stdout, stderr = node.execute(command)

## Install Nvidia Drivers

Now, let's run the following commands to install the latest NVidia driver and the CUDA libraries and compiler. This step can take up to 20 minutes.

NOTE: for instructional purposes the following cell sends all command output back to the notebook. You can also send it to log files to keep the notebook output clean.

In [None]:
distro='ubuntu2204'
version='12.6'
architecture='x86_64'

# install prerequisites
commands = [
    'sudo apt-get -q update',
    'sudo apt-get -q install -y linux-headers-$(uname -r) gcc',
]

print("Installing Prerequisites...")
for command in commands:
    print(f"++++ {command}")
    stdout, stderr = node.execute(command)

print("Installing PyTorch...")
commands = [
    'sudo apt install python3-pip -y',
    'pip3 install torch',
    'pip3 install torchvision'
]
for command in commands:
    print(f"++++ {command}")
    stdout, stderr = node.execute(command)

print(f"Installing CUDA {version}")
commands = [
    f'wget https://developer.download.nvidia.com/compute/cuda/repos/{distro}/{architecture}/cuda-keyring_1.1-1_all.deb',
    f'sudo dpkg -i cuda-keyring_1.1-1_all.deb',
    f'sudo apt-get -q update',
    f'sudo apt-get -q install -y cuda-{version.replace(".", "-")}'
]
print("Installing CUDA...")
for command in commands:
    print(f"++++ {command}")
    stdout, stderr = node.execute(command)
    
print("Done installing CUDA")

And once CUDA is installed, reboot the machine.

In [None]:
reboot = 'sudo reboot'

print(reboot)
node.execute(reboot)

slice.wait_ssh(timeout=360,interval=10,progress=True)

print("Now testing SSH abilites to reconnect...",end="")
slice.update()
slice.test_ssh()
print("Reconnected!")


## Testing the GPU and CUDA Installation

First, verify that the Nvidia drivers recognize the GPU by running `nvidia-smi`.

In [None]:
stdout, stderr = node.execute("nvidia-smi")

print(f"stdout: {stdout}")

### CUDA Hello World Example

Now, let's upload the following "Hello World" CUDA program file to the node.

`hello-world.cu`

*Source: https://computer-graphics.se/multicore/pdf/hello-world.cu*

*Author: Ingemar Ragnemalm*

>This file is from *"The real "Hello World!" for CUDA, OpenCL and GLSL!"* (https://computer-graphics.se/hello-world-for-cuda.html), written by Ingemar Ragnemalm, programmer and CUDA teacher. The only changes (if you download the original file from the website) are to additionally `#include <unistd.h>`, as `sleep()` is now a fuction defined in the `unistd.h` library.

In [None]:
node.upload_file('./hello-world.cu', 'hello-world.cu')

We now compile the `.cu` file using `nvcc`, the CUDA compiler tool installed with CUDA. In this example, we create an executable called `hello_world`.

In [None]:
stdout, stderr = node.execute(f"/usr/local/cuda-{version}/bin/nvcc -o hello_world hello-world.cu")

Finally, run the executable:

In [None]:
stdout, stderr = node.execute("./hello_world")

print(f"stdout: {stdout}")

If you see `Hello World!`, the CUDA program ran successfully. `World!` was computed on the GPU from an array of offsets being summed with the string `Hello `, and the result was printed to stdout.

### Congratulations! You have now successfully run a program on a FABRIC GPU!

### PyTorch CIFAR10 Classifier Example

Now, let's follow the "Training a Classifer" tutorial from PyTorch to train an image classifier on the CIFAR10 dataset

`pytorch_example`

*Source: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html*

In [None]:
node.upload_file('./pytorch_example.py', 'pytorch_example.py')

Finally, run the python script to train and test the classifier.

In [None]:
stdout, stderr = node.execute("python3 pytorch_example.py")

If you see `Finished Training` followed by the accuracy of the classifier, then the script ran successfully.

### Congratulations! You have now successfully trained a PyTorch classifier on a FABRIC GPU!

## Cleanup Your Experiment

In [None]:
fablib.delete_slice(slice_name)