<a href="https://colab.research.google.com/github/shilpathota/AI_ML/blob/master/Text_To_LoRA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Basic LoRA Technique

In [1]:
!pip install -q "transformers>=4.40.0" "datasets>=2.18.0" "accelerate>=0.30.0" "peft>=0.11.0"

## Load a tiny dataset and baseline model

In [2]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

# Tiny subset for demo (100 train / 50 test)
ds = load_dataset("imdb")
small_train = ds["train"].shuffle(seed=42).select(range(200))
small_test  = ds["test"].shuffle(seed=42).select(range(50))

model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2).to(device)


Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


README.md: 0.00B [00:00, ?B/s]

plain_text/train-00000-of-00001.parquet:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

plain_text/test-00000-of-00001.parquet:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

plain_text/unsupervised-00000-of-00001.p(â€¦):   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Quick baseline inference (before LoRA)

In [3]:
def predict_sent(text, model):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=256).to(device)
    with torch.no_grad():
        logits = model(**inputs).logits
    probs = logits.softmax(dim=-1).cpu().numpy()[0]
    return {"neg": float(probs[0]), "pos": float(probs[1])}

sample_text = "The movie was surprisingly good and I loved the acting!"
print("Baseline:", predict_sent(sample_text, base_model))


Baseline: {'neg': 0.5112807154655457, 'pos': 0.48871931433677673}


## Wrap the model with LoRA and fine-tune

In [6]:
from peft import LoraConfig, get_peft_model
from torch.utils.data import DataLoader
from transformers import get_linear_schedule_with_warmup
from torch.optim import AdamW

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_lin", "k_lin", "v_lin", "out_lin"],  # DistilBERT attention module names
    lora_dropout=0.05,
    bias="none",
    task_type="SEQ_CLS",
)

model_lora = get_peft_model(base_model, lora_config)
model_lora.print_trainable_parameters()

def encode_batch(batch):
    enc = tokenizer(
        batch["text"],
        truncation=True,
        padding="max_length",
        max_length=256,
    )
    enc["labels"] = batch["label"]
    return enc

train_enc = small_train.map(encode_batch, batched=True)
test_enc  = small_test.map(encode_batch, batched=True)

columns = ["input_ids", "attention_mask", "labels"]
train_enc.set_format(type="torch", columns=columns)
test_enc.set_format(type="torch", columns=columns)

train_loader = DataLoader(train_enc, batch_size=8, shuffle=True)
test_loader  = DataLoader(test_enc, batch_size=8)

optimizer = AdamW(model_lora.parameters(), lr=2e-4)
num_epochs = 1
num_training_steps = num_epochs * len(train_loader)
scheduler = get_linear_schedule_with_warmup(
    optimizer,
    num_warmup_steps=int(0.1 * num_training_steps),
    num_training_steps=num_training_steps,
)

model_lora.train()
for epoch in range(num_epochs):
    for step, batch in enumerate(train_loader):
        batch = {k: v.to(device) for k, v in batch.items()}
        outputs = model_lora(**batch)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()
        if step % 20 == 0:
            print(f"Epoch {epoch} Step {step} Loss {loss.item():.4f}")


trainable params: 887,042 || all params: 67,842,052 || trainable%: 1.3075




Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Map:   0%|          | 0/50 [00:00<?, ? examples/s]

Epoch 0 Step 0 Loss 0.6478
Epoch 0 Step 20 Loss 0.6956


## Check improvement (after LoRA)

In [7]:
model_lora.eval()

def accuracy(dataloader, model):
    correct = 0
    total = 0
    for batch in dataloader:
        batch = {k: v.to(device) for k, v in batch.items()}
        with torch.no_grad():
            logits = model(**batch).logits
        preds = logits.argmax(dim=-1)
        correct += (preds == batch["labels"]).sum().item()
        total += preds.size(0)
    return correct / total

print("Test accuracy with LoRA:", accuracy(test_loader, model_lora))

print("Before vs After on a sample:")
print("LoRA output:", predict_sent(sample_text, model_lora))


Test accuracy with LoRA: 0.46
Before vs After on a sample:
LoRA output: {'neg': 0.4301743507385254, 'pos': 0.5698257088661194}


# Mimic Text-to-LoRA paper

In [8]:
!pip -q install transformers peft torch


## Install & load base model

In [9]:
import torch
import torch.nn as nn
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model

device = "cuda" if torch.cuda.is_available() else "cpu"

model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

base_model = AutoModelForCausalLM.from_pretrained(model_id).to(device)
base_model.eval()


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

## Attach LoRA (untrained)

In [10]:
lora_cfg = LoraConfig(
    r=4,
    lora_alpha=8,
    target_modules=["c_attn", "c_proj"],
    task_type="CAUSAL_LM",
)

lora_model = get_peft_model(base_model, lora_cfg).to(device)
lora_model.eval()




PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): GPT2LMHeadModel(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): lora.Linear(
                (base_layer): Conv1D(nf=2304, nx=768)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=4, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=4, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitud

## Encode text (simple version)

In [12]:
def encode_text(text):
    tokens = tokenizer(text, return_tensors="pt").to(device)
    with torch.no_grad():
        emb = base_model.transformer.wte(tokens["input_ids"])
    return emb.mean(dim=1)  # [1, hidden_dim]


## Define a hypernetwork and Generate & inject LoRA weights

In [14]:
import torch
import torch.nn as nn

hidden = base_model.config.n_embd
r = 4

# Hypernet outputs "base" LoRA in hidden space
class LoRAHyperNet(nn.Module):
    def __init__(self, text_dim, hidden_dim, r):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(text_dim, 512),
            nn.ReLU(),
            nn.Linear(512, 2 * hidden_dim * r)
        )
        self.hidden_dim = hidden_dim
        self.r = r

    def forward(self, text_emb):
        out = self.net(text_emb)                    # [1, 2*hidden*r]
        out = out.squeeze(0)
        A_flat, B_flat = torch.split(out, self.hidden_dim * self.r)
        A_base = A_flat.view(self.r, self.hidden_dim)       # (r, hidden)
        B_base = B_flat.view(self.hidden_dim, self.r)       # (hidden, r)
        return A_base, B_base

hypernet = LoRAHyperNet(hidden, hidden, r).to(device)

# Cache fixed random projections so shapes are stable
proj_in_cache = {}
proj_out_cache = {}

def get_proj_in(in_features):
    if in_features not in proj_in_cache:
        g = torch.Generator(device=device)
        g.manual_seed(1234 + in_features)
        # (hidden, in_features)
        proj_in_cache[in_features] = torch.randn(hidden, in_features, generator=g, device=device) / (hidden ** 0.5)
    return proj_in_cache[in_features]

def get_proj_out(out_features):
    if out_features not in proj_out_cache:
        g = torch.Generator(device=device)
        g.manual_seed(5678 + out_features)
        # (out_features, hidden)
        proj_out_cache[out_features] = torch.randn(out_features, hidden, generator=g, device=device) / (hidden ** 0.5)
    return proj_out_cache[out_features]

@torch.no_grad()
def encode_text_mean(text: str):
    tokens = tokenizer(text, return_tensors="pt").to(device)
    # token embeddings: [1, seq, hidden]
    emb = base_model.transformer.wte(tokens["input_ids"])
    return emb.mean(dim=1)  # [1, hidden]

@torch.no_grad()
def apply_hyper_lora(model, description: str, scale=0.05):
    text_emb = encode_text_mean(description)
    A_base, B_base = hypernet(text_emb)

    # Optional: keep LoRA magnitudes small
    A_base = scale * A_base
    B_base = scale * B_base

    for module in model.modules():
        if hasattr(module, "lora_A") and hasattr(module, "lora_B"):
            for k in module.lora_A.keys():
                A_w = module.lora_A[k].weight   # (r, in_features)
                B_w = module.lora_B[k].weight   # (out_features, r)

                in_features = A_w.shape[1]
                out_features = B_w.shape[0]

                R_in = get_proj_in(in_features)       # (hidden, in_features)
                R_out = get_proj_out(out_features)    # (out_features, hidden)

                A = A_base @ R_in                     # (r, in_features)
                B = R_out @ B_base                    # (out_features, r)

                A_w.copy_(A)
                B_w.copy_(B)


## Compare Outputs of Base Model and LoRA Model

In [18]:
apply_hyper_lora(lora_model, "Answer formally like a science professor.")

def generate(model, prompt):
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    out = model.generate(**inputs, max_new_tokens=60)
    return tokenizer.decode(out[0], skip_special_tokens=True)

prompt = "Why is the sky blue?"

print("Base model:")
print(generate(base_model, prompt))

print("\nBase + Hypernetwork-generated LoRA:")
print(generate(lora_model, prompt))



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Base model:


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Why is the sky blue?

The sky blue is a color that is often associated with the sky. It is a color that is often associated with the sky. It is a color that is often associated with the sky. It is a color that is often associated with the sky. It is a color that is often associated

Base + Hypernetwork-generated LoRA:
Why is the sky blue?

The sky blue is a color that is often associated with the sky. It is a color that is often associated with the sky. It is a color that is often associated with the sky. It is a color that is often associated with the sky. It is a color that is often associated
