# LoRA Practice
* **Base model**: `bigscience/bloom-7b1`
* **LoRA fine-tuning for BLOOM**:
  * Implemented in a **plug-and-play (plugin/adapter) style**, meaning the LoRA adapters can be added or removed without modifying the base model structure.
  * freeze original weights
  * plugin lora adapters (peft)
* **Using the Hugging Face `transformers` library**:
  * Understanding the parameters and workflow of `trainer.train()` is essential.
  * **MLM vs. CLM (training objectives)**:
    * Both are **unsupervised learning**, where the model can automatically construct `input/labels` pairs.
    * **MLM (Masked Language Modeling)**: Used by models like **BERT**,
        * where some tokens are masked and the model predicts them.
    * **CLM (Causal Language Modeling)**: Used by models like **GPT/BLOOM**,
        * where the model predicts the next token given the previous context.
* **Training & Inference Pipeline**:
  1. **Dataset & Tasks** – prepare datasets aligned with the desired objective.
  2. **Tokenizer** – preprocess input text into tokens consistent with the base model’s vocabulary.
  3. **Training** – fine-tune the base model with LoRA adapters using PEFT, while keeping frozen parameters intact.
  4. **Inference** – load the fine-tuned LoRA model for downstream tasks.

The traning and inference whole process:
1. Load base model
    1. Freeze weights；
2. Configure Lora Adapter
    1. Configure Lora config
    2. Inject to model(get\_peft\_model)
3. Use Trainer for training
    1. Set (model, train_dataset, args)
    2. train
4. Save and load Lora weights
5. Inference

## 1. base model & lora adapters

In [None]:
import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM # from huggingface

## 1.1 load base model

In [None]:
model_name = "bigscience/bloom-7b1"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_8bit=True,
    device_map="auto",
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

## 1.2 freeze original weights

In [None]:
for i, param in enumerate(model.parameters()):
    param.requires_grad = False # freeze the model, train adapters later
    if param.ndim == 1:
        # cast the small parameters(e.g. layernorm) to fp32 for stability
        param.data = param.data.to(torch.float32)

# reduce number of stored activations
# Save video memory at the expense of time
model.gradient_checkpointing_enable()
model.enable_input_require_grads()

class CastOutputToFloat(nn.Sequential):
    def forward(self, x):
        return super().forward(x).to(torch.float32)

model.lm_head = CastOutputToFloat(model.lm_head)

## 1.3 Lora Adapters
lora Adapters -> ΔW = B·A

In [None]:
from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16, # Low rank r, determines the intermediate dimension of A/B
    lora_alpha=32, # Scaling factor, multiply by (alpha / r) on output
    lora_dropout=0.05, # Dropout during LoRA insertion
    bias="none", # # No handle to the bias parameter
    task_type="CAUSAL_LM" # Task type, set this for CLM or Seq2Seq
)

```python
def lora_forward_matmul(x, W, W_A, W_B):
    h = x @ W   # regular matrix multiplication
    h += x @ (W_A @ W_B) * alpha   # use scaled LoRA weights
    return h

In [None]:
model = get_peft_model(model, config)

## 2. pipeline
### 2.1 data
- dataset -> Abirate/english_quotes

In [None]:
import transformers
from datasets import load_dataset

dataset = load_dataset("Abirate/english_quotes")

In [None]:
dataset

In [None]:
dataset['train'].to_pandas()

In [None]:
def merge(row):
    row['prediction'] = row['quote'] + ' ->: ' + str(row['tags'])
    return row
dataset['train'] = dataset['train'].map(merge)

### 2.2 tokenize

In [None]:
dataset = dataset.map(lambda samples: tokenizer(samples['prediction']), batched=True)

### 2.3 tokenize

In [None]:
from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling

trainer = Trainer(
    model=model,
    train_dataset=dataset['train'],
    args=TrainingArguments( # Training hyperparameters
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=100,
        max_steps=200,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir='outputs'
    ),
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False) # datasets -> batch(input_ids, labels)
)
model.config.use_cache = False
trainer.tran()

### 2.4 inference

In [None]:
batch = tokenizer("“An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains.” ->: ", return_tensors='pt')

with torch.cuda.amp.autocast():
    output_tokens = model.generate(**batch, max_new_tokens=50)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))