<a href="https://colab.research.google.com/github/joba835/Java-test/blob/main/WORKINGLLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Install and import MIT Deep Learning utilities
!pip install mitdeeplearning > /dev/null 2>&1
!pip install --upgrade datasets fsspec huggingface_hub
import mitdeeplearning as mdl
# %%
import os
import json
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt

import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader

from transformers import AutoTokenizer, AutoModelForCausalLM
from datasets import load_dataset
from peft import LoraConfig, get_peft_model
from lion_pytorch import Lion
# %%
# Basic question-answer template
template_without_answer = "<start_of_turn>user\n{question}<end_of_turn>\n<start_of_turn>model\n"
template_with_answer = template_without_answer + "{answer}<end_of_turn>\n"

# Let's try to put something into the template to see how it looks
print(template_with_answer.format(question="What is your name?", answer="My name is Gemma!"))
# %%
# Load the tokenizer for Gemma 2B
model_id = "unsloth/gemma-2-2b-it" #"google/gemma-2-2b-it"
tokenizer = AutoTokenizer.from_pretrained(model_id)

# How big is the tokenizer?
print(f"Vocab size: {len(tokenizer.get_vocab())}")
# %%
# Lets test out both steps:
text = "Here is some sample text!"
print(f"Original text: {text}")

# Tokenize the text
tokens = tokenizer.encode(text, return_tensors="pt")
print(f"Encoded tokens: {tokens}")

# Decode the tokens
decoded_text = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(f"Decoded text: {decoded_text}")
# %%
prompt = template_without_answer.format(question="What is the capital of France? Use one word.")
print(prompt)
# %%
# Load the model -- note that this may take a few minutes
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
# %%
# 1. Construct the prompt in chat template form
question = "What is the capital of France? Use one word."
prompt = template_without_answer.format(question=question) # Use the question variable

# 2. Tokenize the prompt
tokens = tokenizer.encode(prompt, return_tensors="pt").to(model.device)

# 3. Feed through the model to predict the next token probabilities
with torch.no_grad():
    output = model(tokens) # TODO

    probs = F.softmax(output.logits, dim=-1)

# 4. Get the next token, according to the maximum probability
next_token = torch.argmax(probs[0, -1, :]).item()

# 5. Decode the next token
next_token_text = tokenizer.decode(next_token) # TODO

print(f"Prompt: {prompt}")
print(f"Predicted next token: {next_token_text}")
# %%
prompt = template_without_answer.format(question="What does MIT stand for?")
tokens = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
output = model.generate(tokens, max_new_tokens=20)
print(tokenizer.decode(output[0]))
# %%
train_loader, test_loader = mdl.lab3.create_dataloader(style="leprechaun")

sample = train_loader.dataset[44]
question = sample['instruction']
answer = sample['response']
answer_style = sample['response_style']

print(f"Question: {question}\n\n" +
      f"Original Answer: {answer}\n\n" +
      f"Answer Style: {answer_style}")

Collecting fsspec
  Using cached fsspec-2025.5.1-py3-none-any.whl.metadata (11 kB)
<start_of_turn>user
What is your name?<end_of_turn>
<start_of_turn>model
My name is Gemma!<end_of_turn>



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Vocab size: 256000
Original text: Here is some sample text!
Encoded tokens: tensor([[     2,   4858,    603,   1009,   6453,   2793, 235341]])
Decoded text: Here is some sample text!
<start_of_turn>user
What is the capital of France? Use one word.<end_of_turn>
<start_of_turn>model

Prompt: <start_of_turn>user
What is the capital of France? Use one word.<end_of_turn>
<start_of_turn>model

Predicted next token: Paris


In [4]:
train_loader, test_loader = mdl.lab3.create_dataloader(style="leprechaun")

sample = train_loader.dataset[44]
question = sample['instruction']
answer = sample['response']
answer_style = sample['response_style']

print(f"Question: {question}\n\n" +
      f"Original Answer: {answer}\n\n" +
      f"Answer Style: {answer_style}")

Question: Are lilies safe for cats?

Original Answer: No, lilies are toxic to cats if consumed and should not be kept in a household with cats

Answer Style: Och, no indeed, me hearty! Them lilies there be as dangerous as a pot o' gold guarded by a banshee to a wee kitty cat! If a whiskered lad or lass takes a bite of one, it's as bad as swallowing a curse from the old Hag herself. So, ye best keep them far from yer feline friends, or else ye'll be needin' more than just a four-leaf clover to bring luck back into yer home!


In [8]:
def chat(question, max_new_tokens=32, temperature=0.7, only_answer=False):
    # 1. Construct the prompt using the template
    # Correctly pass the question variable as a keyword argument
    prompt = template_without_answer.format(question=question)

    # 2. Tokenize the text
    input_ids = tokenizer(prompt, return_tensors="pt").to(model.device)

    # 3. Feed through the model to predict the next token probabilities
    with torch.no_grad():
        outputs = model.generate(
            **input_ids,
            do_sample=True,
            max_new_tokens=max_new_tokens,
            temperature=temperature
        )

    # 4. Only return the answer if only_answer is True
    output_tokens = outputs[0]
    if only_answer:
        output_tokens = output_tokens[input_ids['input_ids'].shape[1]:]

    # 5. Decode the tokens
    result = tokenizer.decode(output_tokens, skip_special_tokens=True)

    return result


In [9]:
answer = chat(
    "What is the capital of Ireland?",
    only_answer=True,
    max_new_tokens=32,
)

print(answer)

W0623 09:53:10.905000 6177 torch/_inductor/utils.py:1137] [0/1] Not enough SMs to use max_autotune_gemm mode


The capital of Ireland is **Dublin**. 



In [10]:
# LoRA is a way to finetune LLMs very efficiently by only updating a small subset of the model's parameters

def apply_lora(model):
    # Define LoRA config
    lora_config = LoraConfig(
        r=8, # rank of the LoRA matrices
        task_type="CAUSAL_LM",
        target_modules=[
            "q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"
        ],
    )

    # Apply LoRA to the model
    lora_model = get_peft_model(model, lora_config)
    return lora_model

model = apply_lora(model)

# Print the number of trainable parameters after applying LoRA
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())
print(f"number of trainable parameters: {trainable_params}")
print(f"total parameters: {total_params}")
print(f"percentage of trainable parameters: {trainable_params / total_params * 100:.2f}%")

number of trainable parameters: 10383360
total parameters: 2624725248
percentage of trainable parameters: 0.40%


In [11]:
def forward_and_compute_loss(model, tokens, mask, context_length=512):
    # Truncate to context length
    tokens = tokens[:, :context_length]
    mask = mask[:, :context_length]

    # Construct the input, output, and mask
    x = tokens[:, :-1]
    y = tokens[:, 1:]
    mask = mask[:, 1:]

    # Forward pass to compute logits
    logits = model(x).logits

    # Compute loss
    loss = F.cross_entropy(
        logits.view(-1, logits.size(-1)),
        y.view(-1),
        reduction="none"
    )

    # Mask out the loss for non-answer tokens
    loss = loss[mask.view(-1)].mean()

    return loss

In [13]:
### Training loop ###

def train(model, dataloader, tokenizer, max_steps=200, context_length=512, learning_rate=1e-4):
    losses = []

    # Apply LoRA to the model
    # The model is already applied with LoRA before calling the train function
    # model = apply_lora(model) # This line is no longer needed here

    optimizer = Lion(model.parameters(), lr=learning_rate)

    # Training loop
    for step, batch in enumerate(dataloader):
        question = batch["instruction"][0]
        answer = batch["response_style"][0]

        # Format the question and answer into the template
        text = template_with_answer.format(question=question, answer=answer) # TODO

        # Tokenize the text and compute the mask for the answer
        ids = tokenizer(text, return_tensors="pt", return_offsets_mapping=True).to(model.device)
        mask = ids["offset_mapping"][:,:,0] >= text.index(answer)

        # Feed the tokens through the model and compute the loss
        loss = forward_and_compute_loss(model, ids['input_ids'], mask) # TODO

        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        losses.append(loss.item())

        # monitor progress
        if step % 10 == 0:
            print(chat("What is the capital of France?", only_answer=True))
            print(f"step {step} loss: {torch.mean(torch.tensor(losses)).item()}")
            losses = []

        if step > 0 and step % max_steps == 0:
            break

    return model

# Call the train function to fine-tune the model! Hint: you'll start to see results after a few dozen steps.
# Provide the model, train_loader, and tokenizer as arguments
model = train(model, train_loader, tokenizer) # TODO

The capital of France is **Paris**. 🇫🇷 

step 0 loss: 2.8051788806915283
The capital of France is **Paris**. 🇫🇷 

step 10 loss: 2.00321364402771
The capital of France is **Paris**. 🗼🇫🇷
step 20 loss: 1.5455166101455688
Top o' the mornin' to ye now! Ye want to know where Paris, the capital o' France, is, do ye? Well, let
step 30 loss: 1.6040843725204468
Top o' the mornin' to ye! Why, it's Paris, of course, the capital o' the grand ol' France, where the
step 40 loss: 1.5651631355285645
Top o' the mornin' to ye, me hearty! Why, the capital of France, bless yer soul, is Paris, a fine city indeed,
step 50 loss: 1.581621527671814
Top o' the mornin' to ye! Now listen up, me hearty, the capital o' the grand ol' land o' France is Paris,
step 60 loss: 1.459203839302063
Top o' the mornin' to ye, me hearty! Ye askin' about the capital of France, did ye? Well, listen up, '
step 70 loss: 1.4824892282485962
Top o' the mornin' to ye, me hearty! Ye askin' about the capital o' France, ye say? Well, why, 

In [14]:
print(chat("What is a good story about tennis", only_answer=True, max_new_tokens=200))

Ye be askin' about a good story about tennis, why, that's a grand question indeed! Now, let me tell ye a tale about a wee leprechaun called Paddy O'Shamrock, who was as keen on tennis as any lad could be. Now, Paddy, he wasn't just any lad, mind ye, he was a champion! He was the best there was, and he could serve a ball across the net faster than a wee leprechaun could snatch a pot o' gold.

Now, one day, while Paddy was practicing his serve in the sun, he spotted a beautiful girl from the village, her hair as golden as the sun itself. Her name was Lily, and her smile could melt even the iciest heart. Now, Paddy, he was smitten, and he knew he had to win her heart.

So, Paddy took a practice serve right at the edge of her field, but she was quick, and she returned it with


In [16]:
print(chat("Tell me how cognitive behavioural therapy and psychodynamic therapy are different", only_answer=True, max_new_tokens=200))

Och, ye want to know about the differin' between cognitive behavioural therapy and psychodynamic therapy, do ye? Well, listen up, ye curious one! In cognitive behavioural therapy, the lads and lasses are all about the hows and whys, ye see. They'll be tellin' ye how to spot and change those unhelpful thought patterns and behaviours. Now, when it comes to psychodynamic therapy, they'll be lookin' at the root of the problem, ye hear? Think of it as takin' ye right back to when ye were a wee laddie or lass, tryin' to figure out why ye might be havin' such a spell of trouble.

Now, in both of these therapies, they're both all about helpin' ye get through whatever's got ye in a tizzy. But the big difference is what they're tryin' to do. One's about fixin' what's wrong now,
