# DeepFabric Quick Start

**Goal:** Generate a small synthetic dataset and fine-tune a model in < 15 minutes

**Perfect for:**
- First-time users
- Quick testing
- Google Colab free tier
- Limited GPU memory

In [None]:
# Install DeepFabric
!pip install 'deepfabric[training]' -q

# Verify GPU
import torch
assert torch.cuda.is_available(), "❌ GPU not available! Enable GPU in runtime settings."
print(f"✓ GPU: {torch.cuda.get_device_name(0)}")

## Step 1: Generate Dataset

In [None]:
from deepfabric.pipeline import DeepFabricPipeline

# Small, fast model for quick testing
pipeline = DeepFabricPipeline(
    model_name="meta-llama/Llama-3.2-3B-Instruct",
    provider="transformers",
    device="cuda",
    dtype="bfloat16",
)

# Generate 20 samples (takes ~3-5 minutes)
# You can customize the generation_system_prompt to improve quality
dataset = pipeline.generate_dataset(
    topic_prompt="Python programming basics",
    num_samples=20,
    batch_size=5,
    tree_depth=2,
    tree_degree=3,
    generation_system_prompt="You are an expert Python educator creating clear, beginner-friendly training examples.",
    # Optional: add more parameters
    # temperature=0.8,  # Higher = more creative
    # instructions="Focus on practical, real-world examples",
)

print(f"✓ Generated {len(dataset)} samples")
dataset.save("quickstart_dataset.jsonl")

## Step 2: Fine-Tune Model

In [None]:
from deepfabric.training import SFTTrainingConfig, LoRAConfig

# Quick training config
config = SFTTrainingConfig(
    model_name="meta-llama/Llama-3.2-3B-Instruct",
    output_dir="./quickstart_model",
    num_train_epochs=1,  # Just 1 epoch for quick test
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=2e-5,
    lora=LoRAConfig(enabled=True, r=16),  # Memory efficient
    bf16=True,
    gradient_checkpointing=True,
    logging_steps=5,
)

# Train (takes ~5-10 minutes)
metrics = pipeline.train(config)
print("\n✓ Training complete!")

## Step 3: Test Model

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "./quickstart_model",
    dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("./quickstart_model")

# Test
prompt = "What is a Python list?"
inputs = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    tokenize=True,
    return_tensors="pt",
    add_generation_prompt=True,
).to("cuda")

outputs = model.generate(inputs, max_new_tokens=150, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(f"Q: {prompt}")
print(f"\nA: {response.split('assistant')[-1].strip()}")

## Next Steps

1. **Larger dataset**: Increase `num_samples` to 100-500
2. **More epochs**: Try `num_train_epochs=3`
3. **Different topics**: Change `topic_prompt`
4. **Bigger model**: Use `"Qwen/Qwen2.5-7B-Instruct"` (needs more GPU memory)
5. **Upload to Hub**: Set `push_to_hub=True` in config

See `cuda_dataset_and_training.ipynb` for the complete workflow!