# ðŸ§  AI Pipelines: LLM Fine-Tuning Demo
This Colab notebook demonstrates how to fine-tune an open LLM using the **AI Pipelines** framework.

**Steps:**
1. Install dependencies
2. Clone the repo
3. Prepare a sample dataset
4. Run fine-tuning
5. Evaluate and generate text

In [ ]:
!git clone https://github.com/philipstevens/ai-pipelines.git
%cd ai-pipelines
!pip install -r <(conda env export --no-builds | grep -v '^prefix' | grep '^- ' | sed 's/^- //' | grep -v '^python')

In [ ]:
# (Optional) Create toy dataset
import json, os
os.makedirs('data/raw', exist_ok=True)
samples = [
    {"instruction": "Explain the difference between AI and ML.", "response": "AI is the broad field of building intelligent systems; ML is a subset focused on data-driven learning."},
    {"instruction": "What is LoRA fine-tuning?", "response": "LoRA adapts model weights efficiently using low-rank matrices, reducing memory footprint."}
]
with open('data/raw/sample.jsonl', 'w') as f:
    for s in samples:
        json.dump(s, f)
        f.write('\n')

In [ ]:
# Process data
!python pipelines/llm_finetuning/scripts/run_data_pipeline.py --input_dir data/raw

In [ ]:
# Run fine-tuning (this will use the example config)
!python pipelines/llm_finetuning/scripts/run_finetune.py --config pipelines/llm_finetuning/configs/example_config.yml

In [ ]:
# Evaluate model latency improvements
!python pipelines/llm_finetuning/scripts/run_eval.py --model_before meta-llama/Llama-3-8B-Instruct --model_after models/finetuned-llama3/

In [ ]:
# Try inference
from pipelines.llm_finetuning.serve.model_loader import load_model, generate
model, tokenizer = load_model('models/finetuned-llama3/')
print(generate(model, tokenizer, 'Explain how transformers work.', max_tokens=128))