# Reverse Knowledge Distillation
In questo notebook viene riportato un esperimento riguardo alla distillazione inversa, ossia trasferire la conoscenza degli adattatori lora di un modello piccolo, ad adattatori lora per un modello grande.

In [1]:
!pip install -U -q bitsandbytes

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.0/67.0 MB[0m [31m25.8 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m0:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m0:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m0:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m31.3 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m207.5/207.5 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m0:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━

In [None]:
############### Librerie ###############
import random
import numpy as np
import torch
import torch.nn.functional as F
import torch.nn as nn
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel, LoraConfig, get_peft_model
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.colors import LinearSegmentedColormap
import matplotlib.gridspec as gridspec
from matplotlib.ticker import MaxNLocator
import time

In [None]:
def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True 
    torch.backends.cudnn.benchmark = False

set_seed(42)

# Loading big model
In questa sezione viene importato il modello grande alla quale deve essere trasferita la conoscenza.

In [None]:
model_id = "google/gemma-2-9b-it"

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

print("Caricamento del modello e del tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(model_id)
model_9b = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    token=token,
    quantization_config=quantization_config
)

2025-06-02 17:59:36.630683: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1748887176.865972      35 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1748887176.928537      35 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Caricamento del modello e del tokenizer...


tokenizer_config.json:   0%|          | 0.00/47.0k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/857 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/39.1k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/3.67G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.96G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/173 [00:00<?, ?B/s]

In [5]:
input_text = """If $\mathbf{a} = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix},$ then find the vector $\mathbf{v}$ such that $\mathbf{a} \cdot \mathbf{v} = 2$ and $\mathbf{a} \times \mathbf{v} = \begin{pmatrix} 1 \\ -2 \\ 1 \end{pmatrix}.$"""
chat_input = tokenizer.apply_chat_template(
            [{"role": "user", "content": input_text}], 
            tokenize=False,
            add_generation_prompt=True
)
if chat_input.startswith("<bos>"):
        chat_input = chat_input[len("<bos>"):]
chat_input

'<start_of_turn>user\nIf $\\mathbf{a} = \x08egin{pmatrix} 1 \\ 1 \\ 1 \\end{pmatrix},$ then find the vector $\\mathbf{v}$ such that $\\mathbf{a} \\cdot \\mathbf{v} = 2$ and $\\mathbf{a} \times \\mathbf{v} = \x08egin{pmatrix} 1 \\ -2 \\ 1 \\end{pmatrix}.$<end_of_turn>\n<start_of_turn>model\n'

In [7]:
inputs = tokenizer(chat_input, return_tensors="pt").to(model_9b.device)
    
with torch.no_grad():
        
    outputs = model_9b.generate(
            **inputs,
            max_new_tokens=1024
    )
    
full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
input_length = len(tokenizer.decode(inputs.input_ids[0], skip_special_tokens=True))
generated_text = full_output[input_length:].strip()
print(generated_text)

Here's how to solve the problem:

**Understanding the Problem**

* **Dot Product:**  The dot product of two vectors gives us a scalar value.  The equation $\mathbf{a} \cdot \mathbf{v} = 2$ tells us the magnitude of the projection of $\mathbf{v}$ onto $\mathbf{a}$ is 2.
* **Cross Product:** The cross product of two vectors gives us a new vector that is perpendicular to both original vectors. The equation $\mathbf{a} \times \mathbf{v} = \begin{pmatrix} 1 \\ -2 \\ 1 \end{pmatrix}$ tells us the direction and magnitude of the vector resulting from the cross product.

**Solving for v**

1. **Expressing v:** Let $\mathbf{v} = \begin{pmatrix} x \\ y \\ z \end{pmatrix}$.

2. **Dot Product:**
   *  $\mathbf{a} \cdot \mathbf{v} = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} \cdot \begin{pmatrix} x \\ y \\ z \end{pmatrix} = x + y + z = 2$

3. **Cross Product:**
   * $\mathbf{a} \times \mathbf{v} = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} \times \begin{pmatrix} x \\ y \\ z \end{pmatrix} = \begin{pmat

A questo punto definiamo le varie matrici LoRA del modello grande, matrici che dovranno essere opportunamente costruite.

In [None]:
lora_config = LoraConfig(
    r=8,                # rango di decomposizione 
    lora_alpha=16,      # fattore di scala per la matrice LoRA
    lora_dropout=0,     # probabilità di dropout per la LoRA
    bias="none",  
    target_modules             = ["q_proj", "k_proj", "v_proj", "o_proj",
                                  "gate_proj", "up_proj", "down_proj"],
)

peft_model = get_peft_model(model_9b, lora_config)
peft_model

PeftModel(
  (base_model): LoraModel(
    (model): Gemma2ForCausalLM(
      (model): Gemma2Model(
        (embed_tokens): Embedding(256000, 3584, padding_idx=0)
        (layers): ModuleList(
          (0-41): 42 x Gemma2DecoderLayer(
            (self_attn): Gemma2Attention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=3584, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=3584, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora.Linear4bit(


# Load small model
Caricamento del modello piccolo, modello sottoposto precedentemente ad un SFT in un task predefinito, in questo caso è MATH.

In [None]:
model_id = "google/gemma-2-2b-it"

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

print("Caricamento del modello e del tokenizer...")
tokenizer_2b = AutoTokenizer.from_pretrained(model_id)
model_2b = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    token=token,
    quantization_config=quantization_config
)

Caricamento del modello e del tokenizer...


tokenizer_config.json:   0%|          | 0.00/47.0k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/838 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/241M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

## Load Adapter
Caricamento dell'adattatore LoRA precedentemente fine-tunato.

In [10]:
model_2b.load_adapter("stefra/GEMMA2BITMATHR8A16", adapter_name="default")
#model_2b

adapter_config.json:   0%|          | 0.00/800 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/41.6M [00:00<?, ?B/s]

In [11]:
model_2b.active_adapters()

['default']

In [12]:
model_2b.set_adapter("default")

Prova per verificare che l'adattatore sia effettivamente attivo.

In [13]:
inputs = tokenizer_2b(chat_input, return_tensors="pt").to(model_2b.device)
    
with torch.no_grad():
    outputs = model_2b.generate(
            **inputs,
            max_new_tokens=1024
    )
full_output = tokenizer_2b.decode(outputs[0], skip_special_tokens=True)
    
input_length = len(tokenizer_2b.decode(inputs.input_ids[0], skip_special_tokens=True))
generated_text = full_output[input_length:].strip()
print(generated_text)

We have that
\[\mathbf{a} \cdot \mathbf{v} = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} \cdot \begin{pmatrix} x \\ y \\ z \end{pmatrix} = x + y + z.\]Also,
\[\mathbf{a} \cdot \mathbf{v} = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} \cdot \begin{pmatrix} x \\ -y \\ z \end{pmatrix} = x - y + z.\]Thus, $x + y + z = 2$ and $x - y + z = 2.$  Adding these equations, we get $2x = 4,$ so $x = 2.$  Then $y = -2,$ and $z = 1.$  Therefore, $\mathbf{v} = \boxed{\begin{pmatrix} 2 \\ -2 \\ 1 \end{pmatrix}}.$


# Get the weights
Qui ci andiamo a definire due dizionari nella quale ci andiamo a recuperare i pesi delle varie matrici, sia modello piccolo che modello grande.

In [None]:
# Dizionario di pesi LoRA dal modello 2B
lora_weights_2b = {}
for name, param in model_2b.model.named_parameters():
    if "lora" in name:
        lora_weights_2b[name] = param.data

# Dizionario di pesi LoRA dal modello 9B
lora_weights_model_9 = {}
for name, param in peft_model.named_parameters():
    if "lora" in name:
        lora_weights_model_9[name] = param.data

In [None]:
def xavier_initialization(tensor):
    """Inizializza i pesi con la distribuzione Xavier"""
    return nn.init.xavier_uniform_(tensor)

def smart_lora_expansion(weight_2b, target_shape):
    """
    Espande secondo quanto riportato nella relazione, i pesi LoRA preservando pattern semantici.
    """
    current_shape = weight_2b.shape # dimensione matrice di partenza
    scale_factor = 0.5 #scaling factor per preservare i pattern semantici

    # Se le forme sono già uguali, restituisci il peso originale senza andare ad espandere
    if current_shape == target_shape:
        return weight_2b.clone()
    
    # Altrimenti creiamo un tensore espanso con la forma target, inizializzato a zero
    expanded = torch.zeros(target_shape, dtype=weight_2b.dtype, device=weight_2b.device)
    
    if len(current_shape) == 2:  # Matrice 2D (caso più comune per LoRA)
        old_rows, old_cols = current_shape
        new_rows, new_cols = target_shape
        
        # Copia i pesi originali nella porzione corrispondente
        copy_rows = min(old_rows, new_rows)
        copy_cols = min(old_cols, new_cols)
        expanded[:copy_rows, :copy_cols] = weight_2b[:copy_rows, :copy_cols]
        
        ################ FASE DI ESPANSIONE ################
        # righe
        if new_rows > old_rows:
            for i in range(old_rows, new_rows):
                source_row = i % old_rows  # Cicla attraverso le righe esistenti
                expanded[i, :copy_cols] = weight_2b[source_row, :copy_cols] * scale_factor
        
        # colonne
        if new_cols > old_cols:
            for j in range(old_cols, new_cols):
                source_col = j % old_cols  # Cicla attraverso le colonne esistenti
                # Applica a tutte le righe (sia originali che espanse)
                expanded[:, j] = expanded[:, source_col] * scale_factor # qui è dove avviene questa espansione "intelligente"
                
    elif len(current_shape) == 1:  # Vettore 1D (bias)
        old_size = current_shape[0]
        new_size = target_shape[0]
        
        # Copia i valori originali
        copy_size = min(old_size, new_size)
        expanded[:copy_size] = weight_2b[:copy_size]
        
        # Espansione
        if new_size > old_size:
            for i in range(old_size, new_size):
                source_idx = i % old_size
                expanded[i] = weight_2b[source_idx] * scale_factor
                
    else:
        # Per tensori con più di 2 dimensioni si usa Xavier come fallback
        print(f"Dimensioni non supportate per smart expansion: {current_shape}. Uso Xavier.")
        return xavier_initialization(expanded)
    
    return expanded

def new_lora_weights_smart(lora_weights_2b, lora_weights_9b):
    """
    Funzione di richiamo per l'espansione delle matrici LoRA
    """

    new_weights = {}
    
    for name, weight_9b in lora_weights_9b.items():
        # Rimozione della parte 'base_model.model.model.' dal nome della chiave per adattarlo a quello del modello 2B
        name_2b = name.replace('base_model.model.model.', '')
        
        if name_2b in lora_weights_2b:
            weight_2b = lora_weights_2b[name_2b]
            
            # Verifica compatibilità delle dimensioni
            if weight_9b.dim() != weight_2b.dim():
                print(f"ATTENZIONE: Dimensioni incompatibili per {name_2b}! "
                      f"2B: {weight_2b.dim()}D, 9B: {weight_9b.dim()}D")

                new_weights[name] = xavier_initialization(torch.zeros_like(weight_9b))
                continue
            
            ############## RICHIAMO FUNZIONE SMART ESPANSIONE ##############
            if weight_9b.size() != weight_2b.size():
                print(f"Espansione smart per {name_2b}: {weight_2b.size()} -> {weight_9b.size()}")
                new_weights[name] = smart_lora_expansion(weight_2b, weight_9b.size())
            else:
                # Le dimensioni sono uguali quindi copia direttamente
                new_weights[name] = weight_2b.clone()
                
        else:
            # Se non ci sono pesi corrispondenti in 2B, inizializza con Xavier
            print(f"Layer {name} non trovato nel modello 2B. Inizializzazione Xavier.")
            new_weights[name] = xavier_initialization(torch.zeros_like(weight_9b))
    
    return new_weights

new_lora_weights_big = new_lora_weights_smart(lora_weights_2b, lora_weights_model_9)

# Per verificare che tutto sia andato bene, stampiamo i nomi dei pesi interpolati
for name, weight in new_lora_weights_big.items():
    print(f'{name}: {weight.size()}')

Espansione smart per layers.0.self_attn.q_proj.lora_A.default.weight: torch.Size([8, 2304]) -> torch.Size([8, 3584])
Espansione smart per layers.0.self_attn.q_proj.lora_B.default.weight: torch.Size([2048, 8]) -> torch.Size([4096, 8])
Espansione smart per layers.0.self_attn.k_proj.lora_A.default.weight: torch.Size([8, 2304]) -> torch.Size([8, 3584])
Espansione smart per layers.0.self_attn.k_proj.lora_B.default.weight: torch.Size([1024, 8]) -> torch.Size([2048, 8])
Espansione smart per layers.0.self_attn.v_proj.lora_A.default.weight: torch.Size([8, 2304]) -> torch.Size([8, 3584])
Espansione smart per layers.0.self_attn.v_proj.lora_B.default.weight: torch.Size([1024, 8]) -> torch.Size([2048, 8])
Espansione smart per layers.0.self_attn.o_proj.lora_A.default.weight: torch.Size([8, 2048]) -> torch.Size([8, 4096])
Espansione smart per layers.0.self_attn.o_proj.lora_B.default.weight: torch.Size([2304, 8]) -> torch.Size([3584, 8])
Espansione smart per layers.0.mlp.gate_proj.lora_A.default.weigh

In [None]:
len(new_lora_weights_big)

588

In [22]:
len(lora_weights_model_9)

588

In [None]:
torch.all(new_lora_weights_big['base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight']==0)

tensor(False, device='cuda:1')

In [24]:
torch.all(lora_weights_2b['layers.0.self_attn.q_proj.lora_B.default.weight']==0)

tensor(False, device='cuda:1')

Verifica se ci sia una matrice inizializzata tutta pari a 0.

In [None]:
i = 0 
for name, weight in new_lora_weights_big.items():
    if torch.all(weight== 0):
        i = i+1
i

0

## Update weights
Fase di aggiornamento dei pesi della matrice LoRA

In [None]:
peft_model.base_model.model.model.layers[0].self_attn.q_proj.lora_A.default.weight #matrice di partenza

Parameter containing:
tensor([[ 0.0039, -0.0129,  0.0111,  ..., -0.0022, -0.0006,  0.0105],
        [ 0.0050,  0.0070,  0.0046,  ..., -0.0153, -0.0092,  0.0138],
        [-0.0098, -0.0110,  0.0113,  ...,  0.0046,  0.0050, -0.0089],
        ...,
        [ 0.0089, -0.0066, -0.0130,  ...,  0.0029, -0.0152, -0.0081],
        [-0.0013,  0.0052, -0.0122,  ..., -0.0082,  0.0099, -0.0146],
        [ 0.0022,  0.0010,  0.0141,  ...,  0.0130,  0.0156,  0.0160]],
       device='cuda:0', requires_grad=True)

In [None]:
#AGGIORNAMENTO PESI LORA
base_model_state_dict = peft_model.state_dict()

for name, interpolated_weight in new_lora_weights_big.items():
    if name in base_model_state_dict:
        base_model_state_dict[name].data.copy_(interpolated_weight)
    else:
        print(f"Peso {name} non trovato nel modello")

In [None]:
peft_model.base_model.model.model.layers[0].self_attn.q_proj.lora_A.default.weight #matrice di partenza aggiornata

Parameter containing:
tensor([[-0.0044,  0.0036,  0.0053,  ...,  0.0166,  0.0129,  0.0003],
        [-0.0168,  0.0066,  0.0210,  ...,  0.0094,  0.0250,  0.0120],
        [-0.0114,  0.0111,  0.0274,  ..., -0.0061,  0.0158, -0.0054],
        ...,
        [-0.0011, -0.0093,  0.0021,  ..., -0.0067,  0.0018, -0.0126],
        [ 0.0122,  0.0073, -0.0139,  ...,  0.0197, -0.0093, -0.0103],
        [ 0.0183, -0.0182, -0.0130,  ...,  0.0133, -0.0081,  0.0127]],
       device='cuda:0', requires_grad=True)

# Testing

In [None]:
inputs = tokenizer(chat_input, return_tensors="pt").to(model_9b.device)
    
with torch.no_grad():
    outputs = model_9b.generate(
            **inputs,
            max_new_tokens=1024,
    )
full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
input_length = len(tokenizer.decode(inputs.input_ids[0], skip_special_tokens=True))
generated_text = full_output[input_length:].strip()
print(generated_text)

Here's how to solve the problem:

**Understanding the Problem**

* **Dot Product:** The dot product of two vectors gives a scalar (a number).  The equation $\mathbf{a} \cdot \mathbf{v} = 2$ tells us the result of the dot product of  $\mathbf{a}$ and $\mathbf{v}$ is 2.
* **Cross Product:** The cross product of two vectors results in a vector. The equation $\mathbf{a} \times \mathbf{v} = \begin{pmatrix} 1 \\ -2 \\ 1 \end{pmatrix}$ tells us the cross product of $\mathbf{a}$ and $\mathbf{v}$ is the vector $\begin{pmatrix} 1 \\ -2 \\ 1 \end{pmatrix}$.

**Solution**

1. **Dot Product:**
   Let $\mathbf{v} = \begin{pmatrix} x \\ y \\ z \end{pmatrix}$.  The dot product is:
   
   $$\mathbf{a} \cdot \mathbf{v} = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} \cdot \begin{pmatrix} x \\ y \\ z \end{pmatrix} = x + y + z = 2$$

2. **Cross Product:**
   The cross product of two vectors is calculated as follows:

   $$\mathbf{a} \times \mathbf{v} = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} \times \begin{p

In [None]:
# Salvataggio locale e su HF
peft_model.save_pretrained("./math_scaling_1")
peft_model.push_to_hub("stefra/math_scaling_1", 
                       commit_message="Upload LoRA adapter")

adapter_model.safetensors:   0%|          | 0.00/108M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/stefra/math_scaling_1/commit/740f7ebdde4208a2a2c2e4ea93eb2541c2520835', commit_message='Upload LoRA adapter', commit_description='', oid='740f7ebdde4208a2a2c2e4ea93eb2541c2520835', pr_url=None, repo_url=RepoUrl('https://huggingface.co/stefra/math_scaling_1', endpoint='https://huggingface.co', repo_type='model', repo_id='stefra/math_scaling_1'), pr_revision=None, pr_num=None)

# Testing v2 w loading from HF

In [32]:
model_9b.load_adapter("stefra/math_scaling_1", adapter_name="default")
#model_2b

adapter_config.json:   0%|          | 0.00/896 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/108M [00:00<?, ?B/s]

In [33]:
model_9b.set_adapter("default")

In [None]:
inputs = tokenizer(chat_input, return_tensors="pt").to(model_9b.device)
    
with torch.no_grad():
    outputs = model_9b.generate(
            **inputs,
            max_new_tokens=1024,
    )
full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
input_length = len(tokenizer.decode(inputs.input_ids[0], skip_special_tokens=True))
generated_text = full_output[input_length:].strip()
print(generated_text)

Here's how to solve the problem:

**Understanding the Problem**

* **Dot Product:** The dot product of two vectors gives a scalar (a number).  The equation $\mathbf{a} \cdot \mathbf{v} = 2$ tells us the result of the dot product of  $\mathbf{a}$ and $\mathbf{v}$ is 2.
* **Cross Product:** The cross product of two vectors results in a vector. The equation $\mathbf{a} \times \mathbf{v} = \begin{pmatrix} 1 \\ -2 \\ 1 \end{pmatrix}$ tells us the cross product of $\mathbf{a}$ and $\mathbf{v}$ is the vector $\begin{pmatrix} 1 \\ -2 \\ 1 \end{pmatrix}$.

**Solution**

1. **Dot Product:**
   Let $\mathbf{v} = \begin{pmatrix} x \\ y \\ z \end{pmatrix}$.  The dot product is:
   
   $$\mathbf{a} \cdot \mathbf{v} = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} \cdot \begin{pmatrix} x \\ y \\ z \end{pmatrix} = x + y + z = 2$$

2. **Cross Product:**
   The cross product of two vectors is calculated as follows:

   $$\mathbf{a} \times \mathbf{v} = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} \times \begin{p