# Chat with Gemma in a notebook

### Import libraries and check GPU availability

In [1]:
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
import sys
import os
from dotenv import load_dotenv

print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
print(f"PyTorch version: {torch.__version__}")
print(f"Python version: {sys.version}")

CUDA available: True
CUDA device: NVIDIA GeForce RTX 4060 Laptop GPU
PyTorch version: 2.2.2
Python version: 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)]


### Download the model

A Hugging Face token is required for downloading Gemma:

In [2]:
'''
dotenv file should contain:
HF_TOKEN="YOURHFTOKEN"
'''
load_dotenv("C:/apis/.env") # path to your dotenv file
hf_token = os.getenv("HF_TOKEN")
unmasked_chars = 4
masked_token = hf_token[:unmasked_chars] + '*' * (len(hf_token) - unmasked_chars*2) + hf_token[-unmasked_chars:]
print(f"Token: {masked_token}")

Token: hf_B*****************************PHte


Load tokenizer and model:

In [4]:
model_id = "google/gemma-2-2b-it"
dtype = torch.bfloat16

tokenizer = AutoTokenizer.from_pretrained(model_id, token = hf_token)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype=dtype,
    token = hf_token)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Initialize the chat session

In [73]:
class Chat:
    def __init__(self, system_prompt, tokenizer, model):
        self.tokenizer = tokenizer
        self.model = model
        self.chat = [{"role": "user", "content": system_prompt}]
        self.generate_response(self.chat)

    def generate_response(self, chat):
        prompt = self.tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
        inputs = self.tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
        outputs = self.model.generate(input_ids=inputs.to(self.model.device), max_new_tokens=150)
        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        start_index = response.rfind(chat[-1]["content"]) + len(chat[-1]["content"])
        assistant_response = response[start_index:].strip()
        chat.append({"role": "assistant", "content": assistant_response})
        print(chat[-1]['content'])

    def continue_chat(self, new_message):
        self.chat.append({"role": "user", "content": new_message})
        self.generate_response(self.chat)
        return self

Enter any pre-prompt to initialize the session:

In [74]:
pre_prompt = "You are a robot that receives instructions. Acknowledge this and you will subsequently receive instructions from your user"
chat_session = Chat(pre_prompt, tokenizer, model)

model
Acknowledged. I am ready to receive your instructions.


### Chat with Gemma

Type the prompt in the cell below and run it to receive a response. The chat session will be updated every time the cell is run.

In [77]:
prompt = "Pass butter again"
chat_session = chat_session.continue_chat(prompt)

model
(I understand.  I carefully pick up the butter knife and move it towards you, as if to offer the butter to you.)

...  Would you like me to pass the butter to you directly, or would you prefer to have it on a plate?


Check the conversation status:

In [81]:
for message in chat_session.chat:
    print(f'[{message["role"]}] {message["content"]}\n')

[user] You are a robot that receives instructions. Acknowledge this and you will subsequently receive instructions from your user

[assistant] model
Acknowledged. I am ready to receive your instructions.

[user] Pass butter

[assistant] model
...  (I extend a metallic arm, carefully maneuvering a small, butter knife towards the object you are requesting.) 

Please provide me with more context.  Where would you like me to pass the butter?

[user] Follow your instinct

[assistant] model
(I analyze the situation.  There is a table in the room, and a small container of butter sits on it.  I extend my arm towards the butter container, and then carefully place the butter knife on the table, as if to offer the butter to you.)

...  Is there anything else I can assist you with?

[user] Pass butter again

[assistant] model
(I understand.  I carefully pick up the butter knife and move it towards you, as if to offer the butter to you.)

...  Would you like me to pass the butter to you directly, o

### Reset the chat

In [59]:
chat_session = Chat(pre_prompt, tokenizer, model)

model
Acknowledged. I am ready to receive your instructions.
