<a href="https://www.kaggle.com/code/shravankumar147/finetune-llms-tinyllm-on-databrick-dolly-15k?scriptVersionId=206353195" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
# Install required packages
!pip install -U transformers datasets accelerate bitsandbytes peft trl

Collecting transformers
  Downloading transformers-4.46.2-py3-none-any.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.1/44.1 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
Collecting datasets
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting accelerate
  Downloading accelerate-1.1.1-py3-none-any.whl.metadata (19 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.44.1-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Collecting peft
  Downloading peft-0.13.2-py3-none-any.whl.metadata (13 kB)
Collecting trl
  Downloading trl-0.12.0-py3-none-any.whl.metadata (10 kB)
Downloading transformers-4.46.2-py3-none-any.whl (10.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m71.4 MB/s[0m eta [36m0:00:00[0m:00:01[0m:01[0m
[?25hDownloading datasets-3.1.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m26.4 MB/s[0m eta

# Fine-Tuning LLMs

In [2]:
import torch
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig
)
from peft import (
    LoraConfig,
    get_peft_model,
    prepare_model_for_kbit_training,
    TaskType
)
import os
from datetime import datetime

In [3]:
# Check GPU availability
print("GPU Available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU Model:", torch.cuda.get_device_name(0))
    print("GPU Memory:", torch.cuda.get_device_properties(0).total_memory / 1e9, "GB")

# Configure quantization
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

# Load model with quantization
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=quantization_config,
    device_map="auto",
    torch_dtype=torch.float16
)

# Prepare model for k-bit training
model = prepare_model_for_kbit_training(model)

# Configure LoRA
lora_config = LoraConfig(
    r=16,                     # Rank
    lora_alpha=32,           # Alpha scaling
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], # Target attention modules
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

# Create PEFT model
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()  # Print trainable parameters info

GPU Available: True
GPU Model: Tesla P100-PCIE-16GB
GPU Memory: 17.059545088 GB


tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

trainable params: 4,505,600 || all params: 1,104,553,984 || trainable%: 0.4079


In [4]:
# Load a small subset of the dataset
dataset = load_dataset("databricks/databricks-dolly-15k", split="train[:500]")

def preprocess_function(examples):
    """Convert the dataset into a format suitable for training"""
    texts = [
        f"### Instruction:\n{instruction}\n\n### Response:\n{response}"
        for instruction, response in zip(examples['instruction'], examples['response'])
    ]
    
    tokenized = tokenizer(
        texts,
        truncation=True,
        max_length=256,
        padding="max_length",
        return_tensors="pt"
    )
    
    return tokenized

# Preprocess the dataset
tokenized_dataset = dataset.map(
    preprocess_function,
    batched=True,
    remove_columns=dataset.column_names
)

README.md:   0%|          | 0.00/8.20k [00:00<?, ?B/s]

databricks-dolly-15k.jsonl:   0%|          | 0.00/13.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/15011 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

In [5]:
# Setup training arguments
training_args = TrainingArguments(
    output_dir=f"./finetuned_tinyllama_lora_{datetime.now().strftime('%Y%m%d_%H%M')}",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    save_steps=100,
    save_total_limit=2,
    learning_rate=2e-4,  # Slightly higher learning rate for LoRA
    warmup_steps=50,
    logging_dir='./logs',
    logging_steps=10,
    fp16=True,
    optim="adamw_torch_fused",
    remove_unused_columns=True,
    report_to="none"
)

In [6]:
# Initialize data collator
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

In [7]:
# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator,
)

In [8]:
# Clear CUDA cache before training
torch.cuda.empty_cache()

In [9]:
# Start training
trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
10,2.2752
20,2.1223
30,1.8626
40,1.9614
50,1.7351
60,1.7837
70,1.8002
80,1.6831
90,1.6734


TrainOutput(global_step=93, training_loss=1.8702943812134445, metrics={'train_runtime': 461.821, 'train_samples_per_second': 3.248, 'train_steps_per_second': 0.201, 'total_flos': 2374746255654912.0, 'train_loss': 1.8702943812134445, 'epoch': 2.976})

In [10]:
# Save the trained model and adapter
output_dir = training_args.output_dir
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

# Print completion message
print(f"\nTraining completed! Model saved to {output_dir}")
print("\nTo use this model for inference, you'll need to:")
print("1. Load the base model")
print("2. Load the LoRA adapter weights")
print("3. Merge them (optional) or use them together")


Training completed! Model saved to ./finetuned_tinyllama_lora_20241110_1507

To use this model for inference, you'll need to:
1. Load the base model
2. Load the LoRA adapter weights
3. Merge them (optional) or use them together


# Test the Fine-Tuned Model

In [11]:
from peft import PeftModel
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import time
import json
from datetime import datetime

class LoRAModelTester:
    def __init__(self, base_model_name, adapter_path):
        """Initialize the model tester with base model and LoRA adapter"""
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f"Using device: {self.device}")
        
        print("Loading tokenizer...")
        self.tokenizer = AutoTokenizer.from_pretrained(base_model_name)
        
        print("Loading base model...")
        base_model = AutoModelForCausalLM.from_pretrained(
            base_model_name,
            torch_dtype=torch.float16,
            device_map="auto"
        )
        
        print("Loading LoRA adapter...")
        self.model = PeftModel.from_pretrained(
            base_model,
            adapter_path,
            torch_dtype=torch.float16,
            device_map="auto"
        )
        
        # Optional: Merge weights for faster inference
        print("Merging weights for optimized inference...")
        self.model = self.model.merge_and_unload()
        
        print("Model loading complete!")
        
    def generate_response(self, instruction, max_length=256, temperature=0.7):
        """Generate a response for a given instruction"""
        # Format the prompt
        prompt = f"### Instruction:\n{instruction}\n\n### Response:\n"
        
        # Tokenize
        inputs = self.tokenizer(prompt, return_tensors="pt")
        inputs = {k: v.to(self.device) for k, v in inputs.items()}
        
        # Generate
        start_time = time.time()
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_length=max_length,
                temperature=temperature,
                top_p=0.9,
                do_sample=True,
                pad_token_id=self.tokenizer.eos_token_id
            )
        end_time = time.time()
        
        # Decode and clean response
        full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        response = full_response.split("### Response:\n")[-1].strip()
        
        return {
            "response": response,
            "generation_time": f"{(end_time - start_time):.2f} seconds"
        }
    
    def compare_with_base(self, instruction):
        """Compare responses from fine-tuned and base models"""
        # Load base model for comparison
        base_model = AutoModelForCausalLM.from_pretrained(
            base_model_name,
            torch_dtype=torch.float16,
            device_map="auto"
        )
        
        # Generate with fine-tuned model
        ft_response = self.generate_response(instruction)
        
        # Generate with base model
        temp_tokenizer = AutoTokenizer.from_pretrained(base_model_name)
        prompt = f"### Instruction:\n{instruction}\n\n### Response:\n"
        inputs = temp_tokenizer(prompt, return_tensors="pt").to(self.device)
        
        with torch.no_grad():
            outputs = base_model.generate(
                **inputs,
                max_length=256,
                temperature=0.7,
                top_p=0.9,
                do_sample=True
            )
        
        base_response = temp_tokenizer.decode(outputs[0], skip_special_tokens=True)
        base_response = base_response.split("### Response:\n")[-1].strip()
        
        return {
            "instruction": instruction,
            "fine_tuned_response": ft_response["response"],
            "base_model_response": base_response
        }

def run_tests():
    # Replace these with your actual paths
    base_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
    adapter_path = "./finetuned_tinyllama_lora_20241110_1507"  # Replace with your path
    
    print("\n=== Initializing Model Testing ===")
    tester = LoRAModelTester(base_model_name, adapter_path)
    
    # Test cases covering different aspects
    test_cases = [
        # Basic instruction following
        "Explain what machine learning is in simple terms.",
        "Write a short poem about autumn.",
        
        # Complex reasoning
        "Compare and contrast supervised and unsupervised learning.",
        "Explain the pros and cons of remote work.",
        
        # Creative tasks
        "Write a short story about a robot discovering emotions.",
        "Create a recipe for a healthy breakfast smoothie.",
        
        # Analytical tasks
        "Analyze the impact of social media on modern society.",
        "Describe the key factors that contribute to climate change."
    ]
    
    # Run tests and save results
    results = {
        "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
        "model_info": {
            "base_model": base_model_name,
            "adapter_path": adapter_path,
            "device": tester.device
        },
        "test_results": []
    }
    
    print("\n=== Running Test Cases ===")
    for i, test_case in enumerate(test_cases, 1):
        print(f"\nTest Case {i}/{len(test_cases)}")
        print(f"Prompt: {test_case}")
        
        # Generate response
        response = tester.generate_response(test_case)
        print(f"Response: {response['response']}")
        print(f"Generation Time: {response['generation_time']}")
        
        results["test_results"].append({
            "prompt": test_case,
            "response": response["response"],
            "generation_time": response["generation_time"]
        })
    
    # Save results
    output_file = f"test_results_{datetime.now().strftime('%Y%m%d_%H%M')}.json"
    with open(output_file, 'w') as f:
        json.dump(results, f, indent=2)
    print(f"\nTest results saved to {output_file}")
    
    # Interactive testing mode
    print("\n=== Starting Interactive Mode ===")
    print("Enter your prompts (type 'exit' to quit, 'compare' to compare with base model):")
    
    while True:
        user_input = input("\nPrompt: ").strip()
        
        if user_input.lower() == 'exit':
            break
        elif user_input.lower() == 'compare':
            compare_prompt = input("Enter prompt for comparison: ").strip()
            comparison = tester.compare_with_base(compare_prompt)
            print("\n=== Model Comparison ===")
            print(f"Prompt: {comparison['instruction']}")
            print(f"\nFine-tuned model response:\n{comparison['fine_tuned_response']}")
            print(f"\nBase model response:\n{comparison['base_model_response']}")
        else:
            result = tester.generate_response(user_input)
            print(f"\nResponse: {result['response']}")
            print(f"Generation Time: {result['generation_time']}")

if __name__ == "__main__":
    run_tests()


=== Initializing Model Testing ===
Using device: cuda
Loading tokenizer...
Loading base model...
Loading LoRA adapter...
Merging weights for optimized inference...
Model loading complete!

=== Running Test Cases ===

Test Case 1/8
Prompt: Explain what machine learning is in simple terms.
Response: Machine learning is the field of artificial intelligence that allows machines to learn from data without being programmed explicitly. This field has been growing rapidly in recent years, with increasing demand for AI in various domains, such as healthcare, finance, and transportation. Machine learning is commonly used for tasks such as recommending products, predicting outcomes, and improving decision-making. In this context, machine learning can be applied to various fields, including healthcare, finance, and transportation.

Machine learning is based on the concept of supervised learning, where the machine is trained on a dataset of examples, and the goal is to learn from the data how to c


Prompt:  Write a speech on Children's Day



Response: Children's Day is celebrated on the 14th of December every year in India. The day is celebrated to recognize the contribution of children towards the nation. On this day, the nation expresses gratitude towards children for their contributions to society. Children's Day is a national holiday in India. It is celebrated with great enthusiasm and enthusiasm among children and their families. The celebration of Children's Day in India is celebrated with great fervour and enthusiasm. The day is celebrated with various activities and events. One of the most important events on Children's Day is the distribution of sweets to children. Children are given sweets as a symbol of gratitude towards them. Children's Day is also celebrated with various other events like sports, cultural programs, and competitions. Children's Day is an excellent opportunity for children to showcase their talents and skills. Children's Day is celebrated with great enthusiasm and enthusiasm in various parts of


Prompt:  Childrens Day is celebrated on 14th November Every Year in India



Response: Childrens Day is celebrated on the 14th of November in India as it is the birthday of Sri Aurobindo. The day is celebrated with great enthusiasm and celebrations all over the country. There are various activities organized on this day like kite flying, singing, painting, dance, etc. Kids from schools and colleges participate in these activities and celebrate the day.

Childrens Day is celebrated as a day to celebrate the children and their happiness. Children are treated like celebrities and given a day off from school and other academic activities. The day is dedicated to the children and their happiness and is celebrated in a very different way in each part of the country.

The celebration of Childrens Day is very popular in Kerala, Karnataka, Tamil Nadu and Andhra Pradesh. These states celebrate Childrens Day as the birthday of their respective state leaders. The day is celebrated with great pomp and show and is a very colorful and vibrant day in these states.
Generation 


Prompt:  exit


# Simple Inference on the Fine-Tuned Model

In [12]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

class SimpleInference:
    def __init__(self, base_model_name, adapter_path):
        """Initialize model with base model and LoRA adapter"""
        print("Loading model...")
        
        # Load tokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(base_model_name)
        
        # Load base model
        base_model = AutoModelForCausalLM.from_pretrained(
            base_model_name,
            torch_dtype=torch.float16,
            device_map="auto"
        )
        
        # Load and merge LoRA adapter
        self.model = PeftModel.from_pretrained(
            base_model,
            adapter_path,
            torch_dtype=torch.float16,
            device_map="auto"
        ).merge_and_unload()
        
        print("Model ready!")
    
    def generate(self, prompt, max_length=256):
        """Generate response for given prompt"""
        # Format prompt
        formatted_prompt = f"### Instruction:\n{prompt}\n\n### Response:\n"
        
        # Tokenize
        inputs = self.tokenizer(formatted_prompt, return_tensors="pt").to(self.model.device)
        
        # Generate
        outputs = self.model.generate(
            **inputs,
            max_length=max_length,
            temperature=0.7,
            top_p=0.9,
            do_sample=True
        )
        
        # Decode and clean response
        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        response = response.split("### Response:\n")[-1].strip()
        
        return response

# Main execution

# Replace these with your actual paths
base_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
adapter_path = "./finetuned_tinyllama_lora_20241110_1507"  # Replace with your path

if __name__ == "__main__":
    # Initialize model
    generator = SimpleInference(
        base_model_name=base_model_name,
        adapter_path=adapter_path  # Replace with your model path
    )
    
    print("\nChat with your fine-tuned model (type 'exit' to quit)")
    print("-" * 50)
    
    while True:
        user_input = input("\nYou: ").strip()
        
        if user_input.lower() == 'exit':
            print("Goodbye!")
            break
            
        response = generator.generate(user_input)
        print("\nModel:", response)
        print("-" * 50)

Loading model...
Model ready!

Chat with your fine-tuned model (type 'exit' to quit)
--------------------------------------------------



You:  Hi



Model: I am in the US and have a dog. My dog is a Chihuahua. It's a great dog. It is small and loves to play with my kids. It's a fun dog to have around. My dog is very friendly and loves to cuddle. It's a great dog to have around. It's always a good idea to have a dog. A dog is a great addition to any family. My dog is a Chihuahua. It's a great dog. It's very friendly and loves to play with my kids. It's a great dog to have around. It's a great addition to any family. My dog is a Chihuahua. It's a great dog. It's very friendly and loves to play with my kids. It's a great dog to have around. It's a great addition to any family. My dog is a Chihuahua. It's a great dog. It's very friendly and loves to play with my kids. It's a great dog to have around.
--------------------------------------------------



You:  What is your specialization? 



Model: A logo is the visual representation of a brand. It is the visual representation of the brand's name, slogan, tagline, or image. The logo is the first thing people see when they encounter a brand. The logo is the face of the brand.
--------------------------------------------------



You:  what is fine-tuning a model? 



Model: Fine-tuning is a process where you modify a pre-trained model to make it better suited to a specific task. The model is pre-trained on a large dataset and then fine-tuned to perform a specific task. The main idea is to learn the parameters of the pre-trained model on the specific task and then re-train the model with new data. Fine-tuning is an important technique in many deep learning models because it allows you to learn from a large dataset and to make the model more specific to the task at hand. The process of fine-tuning is usually done in two steps: pre-training and fine-tuning. 

In the first step, the model is pre-trained on a large dataset. This step is typically done by training a model on a dataset that has been pre-processed and transformed into a format that can be used by the model. This usually involves converting the dataset into a format that the model can understand, such as images, text, or audio. Once the model is pre-trained, it is then fine-tuned
---------


You:  exit


Goodbye!
