# Lightweight Fine Tuning

This project will load a pre-trained model and evaluate it's performance, perform parameter-efficient fine-tuning using the pre-trained model, and perform inference using the fine-tuned model, finally comparing its performance to the original model. 

- **PEFT Technique**:
    - Parameter Efficient Fine Tuning Methods
    - This project will use LoRA: [Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)
- **Model**:
    - GPT-2: [OpenAI's open source Generative Pre-trained Transformer](https://huggingface.co/openai-community/gpt2)
- **Evaluation Approach**:
    - The `evaluate` method with a Hugging Face `Trainer` will be used.
    - The key requirement for the evlauation is that 
- **Dataset**:
    - [Wikitext2](https://huggingface.co/datasets/mindchain/wikitext2): The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.

## Training with PEFT

### Importing the modules

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import get_peft_model, LoraConfig, TaskType, AutoPeftModelForCausalLM
from datasets import load_dataset
from transformers import Trainer, TrainingArguments
from torch.utils.data import DataLoader
from transformers import default_data_collator, Trainer

### Setup the Model and Tokenizer

In [None]:
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")

### Creating a PEFT Config

In [None]:
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,  # Causal language modeling for GPT-2
    r=8,                           # Rank of update matrices
    lora_alpha=32,                 # Alpha parameter for LoRA scaling
    lora_dropout=0.1,              # Dropout probability for LoRA layers
    # Target the attention and MLP layers in GPT-2
    target_modules=["c_attn", "c_proj", "c_fc"],
    bias="none",
    fan_in_fan_out=True,
    inference_mode=False,
)
lora_model = get_peft_model(model, peft_config)
# Check trainable parameters
lora_model.print_trainable_parameters()

### Training with a PEFT Model

In [None]:
# Define training arguments
training_args = TrainingArguments(
    output_dir="peft_model_output",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=2,  # Accumulate gradients to simulate larger batch
    warmup_steps=100,
    learning_rate=2e-5,
    logging_steps=10,
    evaluation_strategy="steps",
    eval_steps=50,
    save_strategy="steps",
    save_steps=200,
    load_best_model_at_end=True,
)

# Load the dataset
dataset = load_dataset("wikitext", "wikitext-2-raw-v1")

# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples["text"])

tokenized_dataset = dataset.map(tokenize_function, batched=True)

# Initialize trainer
trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
)

# Start training
trainer.train()

# Save the final model
lora_model.save_pretrained("gpt2-lora")

## Inference with PEFT

### Loading a Saved PEFT Model

In [None]:
lora_model = AutoPeftModelForCausalLM.from_pretrained("gpt-lora")

### Generating Text from a PEFT Model

tokenizer = AutoTokenizer.from_pretrained("gpt2")
inputs = tokenizer("Hello, my name is ", return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"], max_new_tokens=10)
print(tokenizer.batch_decode(outputs))

lora_inputs = lora_model("Hello, my name is " return_tensors="pt")
lora_outputs = model.generate(input_ids=lora_inputs["input_ids"], max_new_tokens=10)
print(tokenizer.batch_decode(lora_outputs))