<img src="https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/01-nvidia-logo-horiz-500x200-2c50-d@2x.png" alt="NVIDIA Logo" style="width: 300px; height: auto;">

---

# Lab 1: Driver Installation & Setup

## Lab Overview

### Audience
The workbook is intended for technical training students. The target audience includes system
administrators responsible for the setup and upkeep of a single or a small set of servers with a H100
GPU.

### Objectives
In this practice, you will manage the installation of drivers on a server. You will use the nvidia-smi command to display information about the GPU.

### Prerequisites and Guidelines
There are no prerequisites for this lab.

### Notice
Please follow the instructions below carefully to successfully complete the practice.
If you encounter technical issues, please contact the NVIDIA Networking Academy team:
nbu-academy-support@nvidia.com

---

## Practice 1: Drivers Installation


Practice objectives:
In this practice you will handle the driver’s installation on the server:

    ✓ Gather information about your server and GPU for driver installation.
    ✓ Download the appropriate driver from NVIDIA’s website.
    ✓ Install the driver on the server.

### Task 1: Retrieve information about the GPU from the server


#### 1.1 - Log in to the server and issue this command
*Please note: No GPU driver should be installed.*

In [None]:
!nvidia-smi

#### 1.2 - List all PCI devices on your system and filters to show only NVIDIA hardware.:

In [None]:
!lspci -Q | grep NVIDIA

#### 1.3 - Display Linux distribution name, version, and other OS metadata.

In [None]:
!cat /etc/os-release

#### 1.4 - Show kernel name, version, hardware platform, and other core OS information.

In [None]:
!uname -a

The architecture of this server is x84_64.Remember the information gathered, as it will be used to download the correct driver for the GPU.

---

### Task 2: Download the latest driver from NVIDIA’s website


#### 2.1 - Go to the NVIDIA web site www.nvidia.com. Click on drivers.

<div style="text-align: center;">
  <img src="attachment:76d0e9c2-b486-4345-983b-715df1b8363c.png" style="width: 70%">
</div>

#### 2.2 - In the product category section, click on Data Center/Tesla.

<div style="text-align: center;">
  <img src="attachment:4c8e6167-71ef-4a69-a269-ce11ba22e1fe.png" style="width: 70%">
</div>

#### 2.3 - Use the gathered information from task 1, to fill in the fields.

<div style="text-align: center;">
  <img src="attachment:a9c93fb6-e99e-4ca9-899a-9bf12a62ceb8.png" style="width: 70%">
</div>

#### 2.4 - Then click "Find"

<div style="text-align: center;">
  <img src="attachment:218a2881-1783-4dc9-99b9-4c89a514825e.png" style="width: 70%;">
</div>

#### The NVIDIA driver can be installed from NVIDIA's apt repository.

The setup has already configured the NVIDIA CUDA repository for you. The `cuda-keyring` package is available in the materials folder.

Let's verify it's there:

In [None]:
!ls ~/module8/materials/

---

### Task 3: Install the latest NVIDIA driver


#### 3.1 - First, let's add the NVIDIA apt repository using the cuda-keyring package:

In [None]:
!sudo dpkg -i ~/module8/materials/cuda-keyring_1.1-1_all.deb

#### 3.2 - Update the package list to include NVIDIA's repository:

In [None]:
!sudo apt-get update

#### 3.3 - Install the NVIDIA driver:

In [None]:
!sudo apt-get install -y nvidia-driver-570 libnvidia-nscq-570

#### 3.4 - Reboot to load the kernel modules and initialize the environment.

Since we are in a training environment, we can simulate a reboot by running the initialisation script below. This will load the drivers and configure the necessary device nodes.

In [None]:
!sudo /usr/local/bin/fix-lab-env.sh

#### 3.5 - Verify the driver is installed and working:

In [None]:
!nvidia-smi

---

## Practice 2: Check the GPU Status Using nvidia-smi


###  Practice objectives:

In this practice you will utilize the nvidia-smi command to display information about the GPU.

#### 4.1 - Use the following command to display GPU utilization. 

In [None]:
!nvidia-smi --query-gpu=utilization.gpu --format=csv 

utilization.gpu [%]
2 %
Task 2: Query the GPU memory used
• Use the following command to display memory utilization:
nvidia-smi --query-gpu=memory.used --format=csv
Output example:

#### 4.2 - Use the following command to display memory utilization: 

In [None]:
!nvidia-smi --query-gpu=memory.used --format=csv 

• Use the following command. The command will display the following parameters:
utilization.gpu: Percentage of GPU utilization
memory.total: Total GPU memory
memory.used: Amount of GPU memory currently in use
memory.free: Amount of free GPU memory
temperature.gpu: Current GPU temperature
power.draw: Current power consumption of the GPU
: This flag sets the command to loop, refreshing the data every 2 seconds.
This enables real-time monitoring of the GPU's status. To stop the process, press Ctrl+C.
nvidia-smi –query
gpu=utilization.gpu,memory.total,memory.used,memory.free,temperature.gpu,power.draw --
format=csv 
Output example:

#### 4.3 - Display several properties continuously 

    ✓ utilization.gpu: Percentage of GPU utilization 
    ✓ memory.total: Total GPU memory 
    ✓ memory.used: Amount of GPU memory currently in use 
    ✓ memory.free: Amount of free GPU memory 
    ✓ temperature.gpu: Current GPU temperature 
    ✓ power.draw: Current power consumption of the GPU 

In [None]:
!nvidia-smi --query-gpu=utilization.gpu,memory.total,memory.used,memory.free,temperature.gpu,power.draw --format=csv