# Functional Test 3.1.2 - GPUs

This Jupyter notebook will allow you to create VMs on different sites and worker nodes consistent with requirements for test 3.1.2 for testing GPU attachment.

## Step 1:  Configure the Environment

Before running this notebook, you will need to configure your environment using the [Configure Environment](../../fablib_api/configure_environment/configure_environment.ipynb) notebook. Please stop here, open and run that notebook, then return to this notebook.

**This only needs to be done once.**

## Step 2: Import the FABlib Library

In [None]:
from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager

fablib = fablib_manager()
                     
fablib.show_config();

## Step 3 Check your existing slices

Since testing can get confusing, check what slices you actually have. It may print nothing if you have no active slices.

In [None]:
try:
    for slice in fablib.get_slices():
        print(f"{slice}")
except Exception as e:
    print(f"Exception: {e}")

## Step 4: Create the test Slice

This creates a VM with a GPU attached on a specific worker at a specific site. Depending on which worker you are using different types or no GPUs may be available. If you are unsure, the generated ads for each site ([in JSON format](https://github.com/fabric-testbed/aggregate-ads/tree/main/JSON)) can help. 

**You should try different nodes and different GPU types**

**The code to create the slice will auto-refresh until the slice is created or it fails**

In [None]:
from datetime import datetime
from dateutil import tz

name='Node1'
gpu_name='gpu1'
site='CERN'
# since all workers have a standard naming scheme, you can just change the worker
# to move from worker to worker
worker=f'{site.lower()}-w6.fabric-testbed.net'
cores=10
ram=20
disk=50
slice_name=f"Slice Test 3.1.2-GPU {site} on {worker} on {datetime.now()}"

In [None]:
try:
    #Create Slice
    print(f'Creating slice {slice_name}')
    slice = fablib.new_slice(name=slice_name)

    # Add node
    node = slice.add_node(name=name, site=site, host=worker, cores=cores, ram=ram, disk=disk, image='default_ubuntu_22')
    
    # Add a GPU by type - be sure to select the righ kind for that worker
    # you can also add multiple GPUs at once if the node has several by
    # adding more add_component() statements.
    #node.add_component(model='GPU_TeslaT4', name=gpu_name)
    #node.add_component(model='GPU_A40', name=gpu_name)
    #node.add_component(model='GPU_A30', name=gpu_name)
    #node.add_component(model='GPU_RTX6000', name=gpu_name)
    node.add_component(model='GPU_A30', name=gpu_name)

    #Submit Slice Request
    slice.submit()
except Exception as e:
    print(f"Exception: {e}")

## Step 5: Observe the Slice's Attributes

### Print the slice 

In [None]:
try:
    slice = fablib.get_slice(name=slice_name)
    print(f"{slice}")
except Exception as e:
    print(f"Exception: {e}")

### Print the node

Each node in the slice has a set of get functions that return the node's attributes. Use the returned `SSH Command` string to check the node. You can do it from a Bash launched inside the Jupyter container.


In [None]:
try:
    node = slice.get_node(name) 
    print(f"{node}")
  
    gpu1 = node.get_component(gpu_name)
    print(f"{gpu1}")
    
except Exception as e:
    print(f"Exception: {e}")

### GPU PCI Device

Run the command <code>lspci</code> to see your GPU PCI device(s). This is the raw GPU PCI device that is not yet configured for use.  You can use the GPUs as you would any GPUs.

View node1's GPU

In [None]:
command = "sudo apt-get install -y pciutils && lspci | grep 'NVIDIA|3D controller'"

stdout, stderr = node.execute(command)

## Step 6: Install Nvidia Drivers

Now, let's run the following commands to install the latest CUDA driver and the CUDA libraries and compiler.

Now install NVIDIA drivers (may take a long time, be prepared to wait).

In [None]:
distro='ubuntu2204'
version='12.6'
architecture='x86_64'

# install prerequisites
commands = [
    'sudo apt-get -q update',
    'sudo apt-get -q install -y linux-headers-$(uname -r) gcc',
]

print("Installing Prerequisites...")
for command in commands:
    print(f"++++ {command}")
    stdout, stderr = node.execute(command)

print(f"Installing CUDA {version}")
commands = [
    f'wget https://developer.download.nvidia.com/compute/cuda/repos/{distro}/{architecture}/cuda-keyring_1.1-1_all.deb',
    f'sudo dpkg -i cuda-keyring_1.1-1_all.deb',
    f'sudo apt-get -q update',
    f'sudo apt-get -q install -y cuda-{version.replace(".", "-")}'
]
print("Installing CUDA...")
for command in commands:
    print(f"++++ {command}")
    stdout, stderr = node.execute(command)
    
print("Done installing CUDA")

And once CUDA is installed, reboot the machine.

In [None]:
reboot = 'sudo reboot'
try:
    print(reboot)
    node.execute(reboot)
    
    slice.wait_ssh(timeout=360,interval=10,progress=True)

    print("Now testing SSH abilites to reconnect...",end="")
    slice.update()
    slice.test_ssh()
    print("Reconnected!")

except Exception as e:
    print(f"Fail: {e}")

## Step 7: Testing the GPU and CUDA Installation

Verify that the Nvidia drivers recognize the GPU by running `nvidia-smi`. It will print out a table with information about GPU vitals. 

In [None]:
try:
    stdout, stderr = node.execute("nvidia-smi")
    #print(f"stdout: {stdout}")
except Exception as e:
    print(f"Exception: {e}")

## Step 8: Cleanup Your Slice

In [None]:
try:
    slice = fablib.get_slice(name=slice_name)
    slice.delete()
except Exception as e:
    print(f"Exception: {e}")