## Part 1: Setup the pytorch/CUDA/GPU environment

In [1]:
import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'

In [2]:
import torch

def print_gpu_memory():
    if torch.cuda.is_available():
        print(f"Total GPU Memory: {torch.cuda.get_device_properties(0).total_memory / (1024**3):.2f} GB")
        print(f"Used GPU Memory: {torch.cuda.memory_allocated(0) / (1024**3):.2f} GB")
        print(f"Free GPU Memory: {torch.cuda.memory_reserved(0) / (1024**3):.2f} GB")
    else:
        print("CUDA is not available. No GPU detected.")

print_gpu_memory()

Total GPU Memory: 15.74 GB
Used GPU Memory: 0.00 GB
Free GPU Memory: 0.00 GB


## Part 2: Loading the model

In this project, the workstation has 16GB VRAM. 
A simple "divide by 4" rule of thumb, we can run at most 4B parameters model.

The configurarion of this model can be found on: https://huggingface.co/docs/transformers/en/model_doc/gemma#transformers.GemmaConfig

In [3]:
import torch

# Clear any cached memory (might help in some cases)
torch.cuda.empty_cache()

# Check if CUDA is available
if torch.cuda.is_available():
    print("CUDA (GPU support) is available in this environment!")
    print(f"Number of GPUs available: {torch.cuda.device_count()}")
    # Get the name of the GPU
    print(f"GPU Name: {torch.cuda.get_device_name(0)}")
else:
    print("CUDA is not available. Using CPU instead.")

CUDA (GPU support) is available in this environment!
Number of GPUs available: 1
GPU Name: NVIDIA GeForce RTX 3080 Ti Laptop GPU


In [4]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Initialize the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")

# Ensure no gradients are computed for the model (saves memory)
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it").eval()

# Check if a GPU is available and move the model to GPU if it is
if torch.cuda.is_available():
    model = model.to("cuda")
    print("Using GPU:", torch.cuda.get_device_name(0))
else:
    print("No GPU found, using CPU instead.")

  from .autonotebook import tqdm as notebook_tqdm
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  3.36it/s]


Using GPU: NVIDIA GeForce RTX 3080 Ti Laptop GPU


In [13]:
# Tokenize the input text and move the tensor to GPU if available
input_text = "write me a python code that generate a simple game"
input_ids = tokenizer(input_text, return_tensors="pt")
input_ids = input_ids.to("cuda") if torch.cuda.is_available() else input_ids

# Generate output
outputs = model.generate(**input_ids, max_new_tokens = 300)

print(tokenizer.decode(outputs[0]))

<bos>write me a python code that generate a simple game board with a set number of pieces.

```python
import random

# Create a board with 10x10 pieces
board = [[0 for _ in range(10)] for _ in range(10)]

# Randomly place pieces on the board
for i in range(10):
    for j in range(10):
        if random.random() < 0.5:
            board[i][j] = 1

# Print the board
for row in board:
    print(row)
```

**Explanation:**

* `import random` imports the `random` module, which provides functions to generate random numbers.
* `board = [[0 for _ in range(10)] for _ in range(10)]` creates a 10x10 board with all elements initialized to 0.
* The `for` loops iterate over each element in the board.
* `random.random() < 0.5` generates a random number between 0 and 0.5. If the random number is less than 0.5, it sets the corresponding element to 1.
* `print(board)` prints the board, row by row.

**Output:**

The code will generate a random game board with 10x10 pieces, where each element is either 0 o

So, the model has already successfully setup on the PC.

## Part 3: Benchmarking

(Skipped)