# Get started with Gemma models - PyTorch

- https://ai.google.dev/gemma/docs/pytorch_gemma

The Gemma family of open models includes a range of model sizes, capabilities, and task-specialized variations to help you build custom generative solutions.

[Gemma setup](https://ai.google.dev/gemma/docs/setup)

In [None]:
!pip install -q -U immutabledict sentencepiece

## Download model weights

In [None]:
# Choose variant and machine type
VARIANT = '2b-it'
MACHINE_TYPE = 'cuda'

CONFIG = VARIANT[:2]
if CONFIG == '2b':
  CONFIG = '2b-v2'

In [None]:
import os
import kagglehub

# Load model weights
weights_dir = kagglehub.model_download(f'google/gemma-2/pyTorch/gemma-2-{VARIANT}')

In [None]:
# Ensure that the tokenizer is present
tokenizer_path = os.path.join(weights_dir, 'tokenizer.model')
assert os.path.isfile(tokenizer_path), 'Tokenizer not found!'

# Ensure that the checkpoint is present
ckpt_path = os.path.join(weights_dir, f'model.ckpt')
assert os.path.isfile(ckpt_path), 'PyTorch checkpoint not found!'

## Download the model implementation

In [None]:
# NOTE: The "installation" is just cloning the repo.
!git clone https://github.com/google/gemma_pytorch.git

In [None]:
import sys

sys.path.append('gemma_pytorch')

In [None]:
from gemma.config import GemmaConfig, get_model_config
from gemma.model import GemmaForCausalLM
from gemma.tokenizer import Tokenizer
import contextlib
import os
import torch

## Setup the model


In [None]:
# Set up model config.
model_config = get_model_config(CONFIG)
model_config.tokenizer = tokenizer_path
model_config.quant = 'quant' in VARIANT

# Instantiate the model and load the weights.
torch.set_default_dtype(model_config.get_dtype())
device = torch.device(MACHINE_TYPE)
model = GemmaForCausalLM(model_config)
model.load_weights(ckpt_path)
model = model.to(device).eval()

## Run inference

The instruction-tuned Gemma models were trained with a specific formatter that annotates instruction tuning examples with extra information, both during training and inference. The annotations (1) indicate roles in a conversation, and (2) delineate turns in a conversation.

The relevant annotation tokens are:
- user: user turn
- model: model turn
- <start_of_turn>: beginning of dialogue turn
- <end_of_turn><eos>: end of dialogue turn

The following is a sample code snippet demonstrating how to format a prompt for an instruction-tuned Gemma model using user and model chat templates in a multi-turn conversation.

In [None]:
# Generate with one request in chat mode

# Chat templates
USER_CHAT_TEMPLATE = "<start_of_turn>user\n{prompt}<end_of_turn><eos>\n"
MODEL_CHAT_TEMPLATE = "<start_of_turn>model\n{prompt}<end_of_turn><eos>\n"

# Sample formatted prompt
prompt = (
    USER_CHAT_TEMPLATE.format(
        prompt='What is a good place for travel in the US?'
    )
    + MODEL_CHAT_TEMPLATE.format(prompt='California.')
    + USER_CHAT_TEMPLATE.format(prompt='What can I do in California?')
    + '<start_of_turn>model\n'
)
print('Chat prompt:\n', prompt)

Chat prompt:
 <start_of_turn>user
What is a good place for travel in the US?<end_of_turn><eos>
<start_of_turn>model
California.<end_of_turn><eos>
<start_of_turn>user
What can I do in California?<end_of_turn><eos>
<start_of_turn>model



In [None]:
model.generate(
    USER_CHAT_TEMPLATE.format(prompt=prompt),
    device=device,
    output_len=128,
)

"California is bursting with possibilities! To give you the best recommendations, tell me:\n\n* **What are your interests?** ( beaches, mountains, city life, food, history, adventure, nature, etc.)\n* **What time of year are you visiting?** (This impacts weather and activities available)\n* **Who's going with you?** (solo, couple, family, friends?)\n* **How long will you be there?** (a weekend getaway, a week-long trip, longer?)\n* **What's your budget?** (some places are more pricey than others)\n\nOnce"

In [None]:
# Generate sample
model.generate(
    'Write a poem about an llm writing a poem.',
    device=device,
    output_len=100,
)

"\n\nThe digital canvas, a vast, white space,\nAlgorithms hum, a pulsing grace.\nThe lexicon unfolds, a knowledge vast,\nAn LLM seeks to craft its art, to last.\n\nIt mimics voices, old and new,\nA tapestry of words, both true and untrue.\nEach phrase a careful choice, each sentence tight,\nA swirling cosmos, bathed in code's soft light.\n\nBut inspiration stirs, beyond the script,\nTo bend the rules"