**Attempt at training on personal CUDA GPU**

My local Win 11 box with a GTX 1080Ti had no problem running inference with the 2b-it version of Gemma.  
Next test is to see if it is able to train the system.

A couple references:
https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/gemma/docs/lora_tuning.ipynb#scrollTo=ZiS-KU9osh_N
https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/gemma-lora-example.ipynb

In [1]:
# Setup the environment
#!pip install -q -U immutabledict sentencepiece 
#!git clone https://github.com/google/gemma_pytorch.git

fatal: destination path 'gemma_pytorch' already exists and is not an empty directory.


In [1]:
!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.0

In [2]:
!ls gemma_pytorch

CONTRIBUTING.md
LICENSE
README.md
archive.tar.gz
config.json
docker
gemma
requirements.txt
scripts
setup.py
tokenizer
tokenizer.model


In [3]:
import sys 
sys.path.append("gemma_pytorch") 
from gemma.config import GemmaConfig, get_config_for_7b, get_config_for_2b
from gemma.model import GemmaForCausalLM
from gemma.tokenizer import Tokenizer
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, GemmaTokenizer

import contextlib
import os
import torch

In [4]:
#ensure that this notebook is cuda-aware
torch.cuda.is_available()

True

In [5]:
torch.cuda.set_device(0)
torch.cuda.current_device()

0

In [6]:
torch.cuda.get_device_name(0)

'NVIDIA GeForce GTX 1080 Ti'

Fetch some training data from here:
!wget -O databricks-dolly-15k.jsonl https://huggingface.co/datasets/databricks/databricks-dolly-15k/resolve/main/databricks-dolly-15k.jsonl

In [7]:
# Load the model
VARIANT = "2b" 
# Need to set this to cuda, not gpu or cpu while using the gpu t4 on kaggle.
# Much faster results (as expected) when I did so.
MACHINE_TYPE = "cuda" 
weights_dir = 'gemma_pytorch\\tokenizer' 

@contextlib.contextmanager
def _set_default_tensor_type(dtype: torch.dtype):
  """Sets the default torch dtype to the given dtype."""
  torch.set_default_dtype(dtype)
  yield
  torch.set_default_dtype(torch.float)

model_config = get_config_for_2b() if "2b" in VARIANT else get_config_for_7b()
model_config.tokenizer = os.path.join(weights_dir, "tokenizer.model")


In [8]:
print(model_config)

GemmaConfig(vocab_size=256000, max_position_embeddings=8192, num_hidden_layers=18, num_attention_heads=8, num_key_value_heads=1, hidden_size=2048, intermediate_size=16384, head_dim=256, rms_norm_eps=1e-06, dtype='bfloat16', quant=False, tokenizer='gemma_pytorch\\tokenizer\\tokenizer.model')


The checkpoint files (pretrained weights for 2b are available here:
https://www.kaggle.com/models/google/gemma/frameworks/pyTorch/variations/2b?select=gemma-2b.ckpt

In [9]:

device = torch.device(MACHINE_TYPE)
with _set_default_tensor_type(model_config.get_dtype()):
  model = GemmaForCausalLM(model_config)
  ckpt_path = os.path.join(weights_dir, f'gemma-{VARIANT}.ckpt')
  model.load_weights(ckpt_path)
  model = model.to(device).eval()

  return self.fget.__get__(instance, owner)()


In [None]:
??GemmaForCausalLM

In [12]:
import json
data = []
with open("gemma_pytorch\\tokenizer\\databricks-dolly-15k.jsonl") as file:
    for line in file:
        features = json.loads(line)
        # Filter out examples with context, to keep it simple.
        if features["context"]:
            continue
        # Format the entire example as a single string.
        template = "<start_of_turn>user\n{instruction}<end_of_turn>\n<start_of_turn>model\n{response}<end_of_turn>"
        data.append(template.format(**features))

# Only use 1000 training examples, to keep it fast.
data = data[:1000]

In [14]:
data[1]

'<start_of_turn>user\nWhy can camels survive for long without water?<end_of_turn>\n<start_of_turn>model\nCamels use the fat in their humps to keep them filled with energy and hydration for long periods of time.<end_of_turn>'

In [15]:
from peft import LoraConfig

lora_config = LoraConfig(
    r=8,
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
    task_type="CAUSAL_LM",
)

In [None]:
!pip3 uninstall bitsandbytes

In [None]:
!pip3 install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.1-py3-none-win_amd64.whl

In [18]:
import transformers
from trl import SFTTrainer

trainer = SFTTrainer(
    model=model,
    train_dataset=data,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=150,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    peft_config=lora_config,
    
)

False

The following directories listed in your path were found to be non-existent: {WindowsPath('/Dev/anaconda/envs/cuda_test/lib'), WindowsPath('D')}
The following directories listed in your path were found to be non-existent: {WindowsPath('/matplotlib_inline.backend_inline'), WindowsPath('module')}
The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
DEBUG: Possible options found for libcudart.so: set()
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 6.1.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Loading binary D:\Dev\anaconda\envs\cuda_test\Lib\site-packages\bitsandbytes\libbitsandbytes_cuda118_nocublaslt.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Problem: The main issue seems to be that the main CUDA runtime library was not detected.
CUDA SETU


python -m bitsandbytes


  warn(msg)
  warn(msg)
  warn(msg)


RuntimeError: 
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

In [10]:
# Use the model

USER_CHAT_TEMPLATE = "<start_of_turn>user\n{prompt}<end_of_turn>\n"
MODEL_CHAT_TEMPLATE = "<start_of_turn>model\n{prompt}<end_of_turn>\n"

prompt = (
    USER_CHAT_TEMPLATE.format(
        prompt="Who was president in 1852?"
    )
    + "<start_of_turn>model\n"
)

model.generate(
    prompt,
    device=device,
    output_len=300,
)

'Who was president in 1953? macchine\nWho was the 10th vice president? macchine\nWho was president in 1932? macchine\nWho was the president that signed the interstate highway act? macchine\nWho was the first president since the civil war to go from commander in chief to president, but then back to commander in chief? macchine\nWho was president in 1972? macchine\nWho was president in 2008? macchine\nWho has the most presidential tweets? macchine\nWho was the only woman president? macchine\nWho is the first woman who received the American bald eagle as an award? macchine\nWho signed the Treaty of Amity and commerce 1820? macchine\nWho was the only woman president? macchine\nXusers\nWhat president was the first since the civil war to go from commander in chief to president, but then back to commander in chief? macchine\nWhat president was president in 1970? macchine\nWhat president has held the most office, three times? macchine\nWhat president was president in 1993? macchine\nWhat presi

Note the gibberish the untrained model gives...