# PyTorch & Hugging Face Setup for NVIDIA RTX 50 Series GPUs


This notebook demonstrates how to set up and use PyTorch with NVIDIA RTX 5090 GPU,
including basic tensor operations and Hugging Face model inference.

Requirements:
- NVIDIA RTX 50 series GPU
- Ubuntu 24.04 LTS
- NVIDIA Driver 570.86.16 or later


Downloads NVIDIA drivers by going here [https://www.nvidia.com/en-us/drivers/](https://www.nvidia.com/en-us/drivers/) and download .run file


### Installing PyTorch, Torchvision, and Torchaudio

In [2]:
!pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
!pip install soundfile

Looking in indexes: https://download.pytorch.org/whl/nightly/cu128
Collecting torch
  Using cached https://download.pytorch.org/whl/nightly/cu128/torch-2.7.0.dev20250205%2Bcu128-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (28 kB)
Collecting torchvision
  Using cached https://download.pytorch.org/whl/nightly/cu128/torchvision-0.22.0.dev20250205%2Bcu128-cp310-cp310-linux_x86_64.whl.metadata (6.2 kB)
Collecting torchaudio
  Using cached https://download.pytorch.org/whl/nightly/cu128/torchaudio-2.6.0.dev20250205%2Bcu128-cp310-cp310-linux_x86_64.whl.metadata (6.6 kB)
Using cached https://download.pytorch.org/whl/nightly/cu128/torch-2.7.0.dev20250205%2Bcu128-cp310-cp310-manylinux_2_28_x86_64.whl (1363.8 MB)
Using cached https://download.pytorch.org/whl/nightly/cu128/torchvision-0.22.0.dev20250205%2Bcu128-cp310-cp310-linux_x86_64.whl (8.2 MB)
Using cached https://download.pytorch.org/whl/nightly/cu128/torchaudio-2.6.0.dev20250205%2Bcu128-cp310-cp310-linux_x86_64.whl (3.5 MB)
Installing col

In [3]:
import torch

# Test PyTorch installation
print("PyTorch version:", torch.__version__)

PyTorch version: 2.7.0.dev20250205+cu128


In [4]:
# Check if CUDA (GPU) is available
if torch.cuda.is_available():
    device = torch.device("cuda")  # Use GPU
    print("CUDA is available! Using GPU.")
else:
    device = torch.device("cpu")  # Use CPU
    print("CUDA is not available. Using CPU.")

# Create a tensor
x = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
print("Tensor x:\n", x)

# Perform basic operations
y = x + 2
print("Tensor y (x + 2):\n", y)

z = torch.matmul(x, y)
print("Matrix multiplication (x @ y):\n", z)

CUDA is available! Using GPU.
Tensor x:
 tensor([[1., 2.],
        [3., 4.]])
Tensor y (x + 2):
 tensor([[3., 4.],
        [5., 6.]])
Matrix multiplication (x @ y):
 tensor([[13., 16.],
        [29., 36.]])


In [1]:
import torchaudio
import torchvision

# Print versions to verify installation
print(f"Torchaudio version: {torchaudio.__version__}")
print(f"Torchvision version: {torchvision.__version__}")

Torchaudio version: 2.6.0.dev20250205+cu128
Torchvision version: 0.22.0.dev20250205+cu128


In [2]:
# Basic audio loading test
file = torchaudio.utils.download_asset("tutorial-assets/Lab41-SRI-VOiCES-rm1-babb-mc01-stu-clo-8000hz.wav")
waveform, sample_rate = torchaudio.load(file) 
print(f"\nAudio shape: {waveform.shape}")
print(f"Sample rate: {sample_rate}")

# Basic image loading test
from torchvision import datasets
train_data = datasets.MNIST(
    root = 'data',
    train = True,
    download = True
)
print(f"\nMNIST Dataset size: {len(train_data)}")



Audio shape: torch.Size([1, 40000])
Sample rate: 8000


100.0%
100.0%
100.0%
100.0%


MNIST Dataset size: 60000





### Installing Hugging Face

In [3]:
!pip install transformers accelerate "huggingface_hub[cli]"

Collecting transformers
  Using cached transformers-4.48.2-py3-none-any.whl.metadata (44 kB)
Collecting accelerate
  Using cached accelerate-1.3.0-py3-none-any.whl.metadata (19 kB)
Collecting huggingface_hub[cli]
  Using cached huggingface_hub-0.28.1-py3-none-any.whl.metadata (13 kB)
Collecting regex!=2019.12.17 (from transformers)
  Using cached regex-2024.11.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers)
  Using cached tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting safetensors>=0.4.1 (from transformers)
  Using cached safetensors-0.5.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Collecting tqdm>=4.27 (from transformers)
  Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting InquirerPy==0.3.4 (from huggingface_hub[cli])
  Using cached InquirerPy-0.3.4-py3-none-any.whl.metadata (8.1 kB)
Collecting p

Load and test Hugging Face by running Llama 3.2 Model

In [4]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B", 
                                             device_map="auto", 
                                             torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B")
inputs = tokenizer("Explain GPU acceleration:", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))


  from .autonotebook import tqdm as notebook_tqdm
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.60s/it]
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|>Explain GPU acceleration: What are the main components of a GPU?
GPU stands for Graphics Processing Unit. The GPU is a special-purpose processor that is designed to accelerate the graphics processing of a computer. GPUs are used in a variety of applications, including video games, 3D modeling, and scientific computing. GPUs are typically made up of a large number of processing cores, which are connected by a high-speed bus. The GPU is responsible for rendering the 3D graphics that are displayed on the screen, as well as
