# üé® AugmentAI: LLM-Powered Data Augmentation

**Design domain-safe augmentation policies through natural language.**

This notebook covers:
1. Installation & Setup
2. One-Command Dataset Preparation
3. AutoSearch: Automated Policy Optimization
4. Python API Usage
5. Domain Safety & Constraints
6. Policy Validation & Export

> **Design Philosophy**: The LLM suggests. Rules decide. Code executes.

[![PyPI](https://img.shields.io/pypi/v/augmentai.svg)](https://pypi.org/project/augmentai/)
[![GitHub](https://img.shields.io/badge/GitHub-augmentai-black)](https://github.com/kyrozepto/augmentai)

## 1. üì¶ Installation

In [None]:
# Install AugmentAI from PyPI
!pip install -q augmentai

# Verify installation
!augmentai --help

### Set up LLM Provider (Optional)

AugmentAI can use LLMs for intelligent policy design. For this demo, we'll use the built-in defaults which don't require an API key.

In [None]:
import os

# Uncomment to use OpenAI
# os.environ["OPENAI_API_KEY"] = "your-api-key"

# Or use Google's Gemini (if you have access)
# os.environ["GOOGLE_API_KEY"] = "your-api-key"

## 2. üìÇ Create a Sample Dataset

Let's create a simple image classification dataset for testing.

In [None]:
import os
from pathlib import Path
import numpy as np
from PIL import Image

# Create sample dataset structure
dataset_path = Path("sample_dataset")

classes = ["cat", "dog", "bird"]
for cls in classes:
    (dataset_path / cls).mkdir(parents=True, exist_ok=True)
    
    # Create 10 random images per class
    for i in range(10):
        # Create a colored image for each class
        if cls == "cat":
            color = (255, 100, 100)  # Reddish
        elif cls == "dog":
            color = (100, 255, 100)  # Greenish
        else:
            color = (100, 100, 255)  # Bluish
        
        # Add some variation
        img_array = np.random.randint(0, 50, (128, 128, 3), dtype=np.uint8)
        img_array[:, :] += np.array(color, dtype=np.uint8)
        
        img = Image.fromarray(img_array)
        img.save(dataset_path / cls / f"{cls}_{i:03d}.jpg")

print("‚úÖ Sample dataset created!")
!ls -la sample_dataset/
!ls sample_dataset/cat/ | head -5

## 3. üöÄ One-Command Dataset Preparation

The `prepare` command handles everything: inspection, splitting, policy generation, and export.

In [None]:
# Dry run to see what would happen
!augmentai prepare sample_dataset --domain natural --dry-run --skip-lint

In [None]:
# Full preparation with 70/15/15 split
!augmentai prepare sample_dataset \
    --domain natural \
    --split 70/15/15 \
    --seed 42 \
    --output prepared_dataset \
    --skip-lint

In [None]:
# Check output structure
!ls -la prepared_dataset/
print("\nüìÑ Generated config.yaml:")
!cat prepared_dataset/config.yaml

## 4. üîç AutoSearch: Find Optimal Policies

AutoSearch uses evolutionary optimization to find the best augmentation policy for your dataset.

In [None]:
# Run AutoSearch with budget of 30 evaluations
!augmentai search sample_dataset \
    --domain natural \
    --budget 30 \
    --output search_results \
    --seed 42

In [None]:
# View the best policy found
print("üèÜ Best policy:")
!cat search_results/best_policy.yaml

## 5. üêç Python API Usage

Use AugmentAI programmatically for full control.

In [None]:
from augmentai.core.policy import Policy, Transform
from augmentai.domains import get_domain

# Create a custom policy
policy = Policy(
    name="my_custom_policy",
    domain="natural",
    transforms=[
        Transform("HorizontalFlip", probability=0.5),
        Transform("VerticalFlip", probability=0.3),
        Transform("Rotate", probability=0.7, parameters={"limit": 30}),
        Transform("RandomBrightnessContrast", probability=0.5, parameters={
            "brightness_limit": 0.2,
            "contrast_limit": 0.2
        }),
        Transform("GaussNoise", probability=0.3, parameters={"var_limit": (10, 50)}),
    ]
)

# Display the policy
print("Policy:", policy.name)
print("Domain:", policy.domain)
print("\nTransforms:")
for t in policy.transforms:
    print(f"  - {t.name}: p={t.probability}, params={t.parameters}")

In [None]:
# Export to YAML
yaml_str = policy.to_yaml()
print(yaml_str)

# Save to file
with open("my_policy.yaml", "w") as f:
    f.write(yaml_str)

## 6. üè• Domain Safety & Constraints

Different domains have different safety requirements.

In [None]:
# List available domains
!augmentai domains

In [None]:
# Explore domain constraints
from augmentai.domains import MedicalDomain, OCRDomain, SatelliteDomain, NaturalDomain

domains = [
    ("Medical", MedicalDomain()),
    ("OCR", OCRDomain()),
    ("Satellite", SatelliteDomain()),
    ("Natural", NaturalDomain()),
]

for name, domain in domains:
    print(f"\nüè∑Ô∏è {name} Domain")
    print(f"   Forbidden: {list(domain.forbidden_transforms)[:5]}...")
    print(f"   Recommended: {list(domain.recommended_transforms)[:5]}...")

In [None]:
# Test domain enforcement
from augmentai.rules.enforcement import RuleEnforcer

# Create a policy with forbidden transforms for medical
risky_policy = Policy(
    name="risky_medical_policy",
    domain="medical",
    transforms=[
        Transform("HorizontalFlip", 0.5),       # ‚úÖ Safe
        Transform("ElasticTransform", 0.5),    # ‚ùå Forbidden!
        Transform("ColorJitter", 0.3),         # ‚ùå Forbidden!
        Transform("GaussNoise", 0.2),          # ‚úÖ Safe
    ]
)

# Enforce domain rules
medical_domain = get_domain("medical")
enforcer = RuleEnforcer(medical_domain)
result = enforcer.enforce_policy(risky_policy)

print("Enforcement result:")
print(f"  Success: {result.success}")
print(f"  Original transforms: {len(risky_policy.transforms)}")
print(f"  Safe transforms: {len(result.policy.transforms) if result.policy else 0}")

if result.policy:
    print("\n  Remaining transforms:")
    for t in result.policy.transforms:
        print(f"    ‚úÖ {t.name}")

## 7. ‚úÖ Policy Validation

Validate policies against domain constraints.

In [None]:
# Validate via CLI
!augmentai validate my_policy.yaml --domain natural

In [None]:
# Validate via Python API
from augmentai.rules.validator import SafetyValidator

validator = SafetyValidator(get_domain("natural"))
validation_result = validator.validate(policy)

print(f"Is safe: {validation_result.is_safe}")
print(f"Summary: {validation_result.summary()}")

## 8. üì§ Export to Executable Script

Generate standalone Python scripts for your augmentation pipeline.

In [None]:
from augmentai.export import ScriptGenerator

# Generate augmentation script
generator = ScriptGenerator(backend="albumentations")
script = generator.generate_augment_script(
    policy,
    input_dir="data/train",
    output_dir="augmented/train",
    seed=42
)

# Save script
with open("augment_script.py", "w") as f:
    f.write(script)

print("üìÑ Generated script (first 50 lines):")
print("\n".join(script.split("\n")[:50]))

## 9. üî¨ AutoSearch with Python API

Run AutoSearch programmatically for integration with your training pipeline.

In [None]:
from augmentai.search import PolicyOptimizer, quick_search
from augmentai.search.optimizer import OptimizerConfig

# Quick search with defaults
result = quick_search(
    domain="natural",
    budget=20,
    seed=42
)

print(f"\nüèÜ Search Result")
print(f"   {result.summary()}")
print(f"\n   Best policy transforms:")
for t in result.best_policy.transforms:
    print(f"   - {t.name}: p={t.probability}")

In [None]:
# Advanced: Custom optimizer configuration
config = OptimizerConfig(
    population_size=15,
    generations=5,
    mutation_rate=0.6,
    mutation_strength=0.4,
    seed=123
)

optimizer = PolicyOptimizer(config)
result = optimizer.search("medical", budget=40)

print(f"Medical domain search: {result.summary()}")

## 10. üìä Reproducibility Manifest

Every preparation includes a manifest for full reproducibility.

In [None]:
import json

# View the manifest
if Path("prepared_dataset/manifest.json").exists():
    with open("prepared_dataset/manifest.json") as f:
        manifest = json.load(f)
    
    print("üìã Reproducibility Manifest:")
    print(json.dumps(manifest, indent=2))

## üéâ Summary

You've learned:

| Feature | Command/API |
|---------|-------------|
| Install | `pip install augmentai` |
| Prepare dataset | `augmentai prepare ./dataset` |
| AutoSearch | `augmentai search ./dataset --budget 50` |
| List domains | `augmentai domains` |
| Validate policy | `augmentai validate policy.yaml` |
| Python Policy | `Policy(name, domain, transforms)` |
| Domain safety | `RuleEnforcer(domain).enforce_policy(policy)` |
| Generate script | `ScriptGenerator().generate_augment_script(policy)` |

---

**Learn more:**
- üì¶ PyPI: https://pypi.org/project/augmentai/
- üêô GitHub: https://github.com/kyrozepto/augmentai
- üìñ Docs: https://github.com/kyrozepto/augmentai/tree/main/docs