# SpaX Quickstart: Minimal Overhead Migration

## Overview

This notebook demonstrates how to use SpaX for search space exploration with minimal code changes from standard Pydantic configurations.

**What you'll learn:**
- Migrating from Pydantic `BaseModel` to SpaX `Config` (one line change)
- Automatic space inference from type hints and Field constraints
- Random sampling and parameter inspection
- Explicit space control when needed
- Basic search space statistics

**Prerequisites:**
- Basic Python knowledge
- Familiarity with type hints (helpful but not required)
- No prior Pydantic or SpaX knowledge needed

Let's start by defining a simple machine learning configuration.

In [1]:
# First, let's see a standard Pydantic configuration
from typing import Literal

from pydantic import BaseModel, Field


class MLConfigPydantic(BaseModel):
    """A typical Pydantic configuration for ML experiments."""

    # Numeric parameters with constraints
    hidden_dim: int = Field(gt=16, lt=4096)
    learning_rate: float = Field(ge=1e-5, le=1e-1)

    # Categorical choices
    activation: Literal["relu", "gelu", "silu"]
    optimizer: Literal["adam", "sgd", "rmsprop"]

    # Boolean flag
    use_batch_norm: bool

    # Parameter with default
    dropout: float = Field(ge=0.0, le=0.5, default=0.1)


# Create an instance manually
config = MLConfigPydantic(
    hidden_dim=256,
    learning_rate=0.001,
    activation="relu",
    optimizer="adam",
    use_batch_norm=True,
)

print(config)

hidden_dim=256 learning_rate=0.001 activation='relu' optimizer='adam' use_batch_norm=True dropout=0.1


In [2]:
# Now, let's migrate to SpaX - just change BaseModel to Config!
import spax as sp


class MLConfigSpaX(sp.Config):
    """Same configuration, now with search space superpowers."""

    # Exact same field definitions - no changes needed!
    hidden_dim: int = Field(gt=16, lt=4096)
    learning_rate: float = Field(ge=1e-5, le=1e-1)
    activation: Literal["relu", "gelu", "silu"]
    optimizer: Literal["adam", "sgd", "rmsprop"]
    use_batch_norm: bool
    dropout: float = Field(ge=0.0, le=0.5, default=0.1)


# Still works exactly like Pydantic
config = MLConfigSpaX(
    hidden_dim=256,
    learning_rate=0.001,
    activation="relu",
    optimizer="adam",
    use_batch_norm=True,
)

print(config)

MLConfigSpaX(hidden_dim=256, learning_rate=0.001, activation='relu', optimizer='adam', use_batch_norm=True, dropout=0.1)


In [3]:
# But now you have NEW capabilities with zero extra code!

# 1. Random sampling for testing and exploration
random_config = MLConfigSpaX.random(seed=42)
print("🎲 Randomly sampled configuration:")
print(random_config)
print()

# 2. Inspect all searchable parameters
print("📋 Searchable parameters:")
params = MLConfigSpaX.get_parameter_names()
for param in params:
    print(f"  • {param}")
print()

# 3. Get the override template to see the search space structure
print("🔍 Search space structure:")
template = MLConfigSpaX.get_override_template()
for key, value in template.items():
    print(f"  {key}: {value}")

🎲 Randomly sampled configuration:
MLConfigSpaX(hidden_dim=2636, learning_rate=0.011141993505886382, activation='silu', optimizer='adam', use_batch_norm=True, dropout=0.05124758808575375)

📋 Searchable parameters:
  • MLConfigSpaX.hidden_dim
  • MLConfigSpaX.learning_rate
  • MLConfigSpaX.activation
  • MLConfigSpaX.optimizer
  • MLConfigSpaX.use_batch_norm
  • MLConfigSpaX.dropout

🔍 Search space structure:
  hidden_dim: {'gt': 16, 'lt': 4096}
  learning_rate: {'ge': 1e-05, 'le': 0.1}
  activation: ['relu', 'gelu', 'silu']
  optimizer: ['adam', 'sgd', 'rmsprop']
  use_batch_norm: ['True', 'False']
  dropout: {'ge': 0.0, 'le': 0.5}


## What Just Happened? Automatic Space Inference

SpaX automatically converted your type hints and Pydantic constraints into search spaces:

| Type Annotation | Inferred Space |
|----------------|----------------|
| `Literal["relu", "gelu", "silu"]` | `CategoricalSpace` with 3 choices |
| `bool` | `CategoricalSpace` with `[True, False]` |
| `Field(gt=16, lt=4096)` | `IntSpace` with bounds (16, 4096) - exclusive |
| `Field(ge=1e-5, le=1e-1)` | `FloatSpace` with bounds [1e-5, 1e-1] - inclusive |

**Automatic inference works for:**
- ✅ `Literal` types → Categorical choices
- ✅ `bool` → Categorical `[True, False]`
- ✅ `Union` / `|` types → Categorical choices (only if all args are: `Literal`, `bool`, `None`, or `Config` types)
- ✅ Numeric `Field()` with **both** lower and upper bounds → Numeric spaces

**Limitations:**
- ❌ Numeric fields without bounds (e.g., `int` alone) - cannot infer range
- ❌ `Union` with complex types (e.g., `int | str`) - cannot infer
- ❌ Fields without defaults and not searchable - must provide explicitly

**Important:** If you provide a default value for a field without a searchable space definition, it becomes a **fixed value** (not part of the search space).

**Solution:** Use explicit SpaX spaces when automatic inference isn't sufficient or for better control.

Let's explore what we can do with random sampling:

In [4]:
# Let's generate multiple random configurations to see the variety
print("🎲 Sampling 5 different configurations:\n")

for i in range(5):
    config = MLConfigSpaX.random(seed=42 + i)
    print(f"Sample {i+1}:")
    print(f"  hidden_dim={config.hidden_dim}, lr={config.learning_rate:.6f}, "
          f"activation={config.activation}, optimizer={config.optimizer}, "
          f"batch_norm={config.use_batch_norm}")
    print()

# Sampling is reproducible with the same seed
print("🔄 Reproducibility - same seed gives same config:")
config1 = MLConfigSpaX.random(seed=999)
config2 = MLConfigSpaX.random(seed=999)
print(f"Config 1: hidden_dim={config1.hidden_dim}")
print(f"Config 2: hidden_dim={config2.hidden_dim}")
print(f"Equal? {config1.hidden_dim == config2.hidden_dim}")

🎲 Sampling 5 different configurations:

Sample 1:
  hidden_dim=2636, lr=0.011142, activation=silu, optimizer=adam, batch_norm=True

Sample 2:
  hidden_dim=174, lr=0.028616, activation=silu, optimizer=rmsprop, batch_norm=True

Sample 3:
  hidden_dim=1690, lr=0.052007, activation=silu, optimizer=adam, batch_norm=True

Sample 4:
  hidden_dim=1130, lr=0.041775, activation=relu, optimizer=adam, batch_norm=True

Sample 5:
  hidden_dim=3655, lr=0.007648, activation=relu, optimizer=sgd, batch_norm=False

🔄 Reproducibility - same seed gives same config:
Config 1: hidden_dim=3217
Config 2: hidden_dim=3217
Equal? True


## Taking Explicit Control

While automatic inference is convenient, sometimes you need more control:

**When to use explicit SpaX spaces:**
- **Distribution control**: Use log distribution for learning rates
- **Weighted choices**: Make some categorical options more likely
- **Complex conditions**: Parameters that depend on other parameters (covered later)
- **Clarity**: When you want the search space to be immediately obvious

Let's see explicit spaces in action:

In [5]:
# Example with explicit SpaX spaces for more control


class MLConfigExplicit(sp.Config):
    """Configuration with explicit SpaX spaces."""

    # Log distribution for learning rate (better for HPO)
    learning_rate: float = sp.Float(ge=1e-5, le=1e-1, distribution="log")

    # Explicit integer space
    hidden_dim: int = sp.Int(gt=16, lt=4096, distribution="uniform")

    # Weighted categorical - make "adam" more likely
    optimizer: str = sp.Categorical(
        [
            sp.Choice("adam", weight=3.0),  # 3x more likely
            sp.Choice("sgd", weight=1.0),
            sp.Choice("rmsprop", weight=1.0),
        ]
    )

    # Simple categorical
    activation: str = sp.Categorical(["relu", "gelu", "silu"])

    # Boolean (could use bool type, but explicit is clearer)
    use_batch_norm: bool = sp.Categorical([True, False])

    # Fixed value (not part of search space)
    model_name: str = "explicit_mlp"


# Sample and inspect
config = MLConfigExplicit.random(seed=42)
print("Sampled configuration:")
print(config)
print(f"\nLearning rate: {config.learning_rate:.2e} (log scale)")
print(f"Model name: {config.model_name} (fixed)")

Sampled configuration:
MLConfigExplicit(learning_rate=0.003611662782147249, hidden_dim=119, optimizer='sgd', activation='relu', use_batch_norm=True, model_name='explicit_mlp')

Learning rate: 3.61e-03 (log scale)
Model name: explicit_mlp (fixed)


## 🎯 The Override System: Iterative Space Refinement

One of SpaX's most powerful features is the **override system** - it allows you to progressively narrow your search space based on experimental results.

**Common workflow:**
1. Start with a broad search space
2. Run initial experiments
3. Identify promising regions
4. Create an override to focus on those regions
5. Repeat until satisfied

**What you can override:**
- Numeric bounds: Narrow the range
- Categorical choices: Remove unpromising options
- Fix parameters: Lock to a specific value

Let's see it in action:

In [6]:
# Get the override template to see the structure
import json

print("📋 Override template for MLConfigExplicit:")
template = MLConfigExplicit.get_override_template()
print(json.dumps(template, indent=2))

📋 Override template for MLConfigExplicit:
{
  "learning_rate": {
    "ge": 1e-05,
    "le": 0.1
  },
  "hidden_dim": {
    "gt": 16,
    "lt": 4096
  },
  "optimizer": [
    "adam",
    "sgd",
    "rmsprop"
  ],
  "activation": [
    "relu",
    "gelu",
    "silu"
  ],
  "use_batch_norm": [
    "True",
    "False"
  ]
}


In [7]:
# Example 1: Narrow numeric ranges based on experiments
override_narrow = {
    "learning_rate": {"ge": 1e-4, "le": 1e-2},  # Focus on promising range
    "hidden_dim": {"ge": 128, "le": 512},        # Narrow from (16, 4096)
}

print("Example 1: Narrowing numeric ranges")
print("-" * 50)
config1 = MLConfigExplicit.random(seed=42, override=override_narrow)
print(f"Learning rate: {config1.learning_rate:.2e} (now in [1e-4, 1e-2])")
print(f"Hidden dim: {config1.hidden_dim} (now in [128, 512])")
print()

# Example 2: Remove unpromising categorical choices
override_categorical = {
    "optimizer": ["adam"],           # Only use adam (best performer)
    "activation": ["relu", "gelu"],  # Remove silu (worst performer)
}

print("Example 2: Removing categorical choices")
print("-" * 50)
config2 = MLConfigExplicit.random(seed=42, override=override_categorical)
print(f"Optimizer: {config2.optimizer} (always 'adam' now)")
print(f"Activation: {config2.activation} (only 'relu' or 'gelu')")
print()

# Example 3: Fix some parameters completely
override_fix = {
    "learning_rate": 0.001,           # Fix to best value found
    "optimizer": ["adam"],             # Fix optimizer
    "hidden_dim": {"ge": 256, "le": 512},  # Still explore this
}

print("Example 3: Mix of fixed and narrowed parameters")
print("-" * 50)
config3 = MLConfigExplicit.random(seed=42, override=override_fix)
print(f"Learning rate: {config3.learning_rate} (fixed)")
print(f"Optimizer: {config3.optimizer} (fixed)")
print(f"Hidden dim: {config3.hidden_dim} (still exploring [256, 512])")

Example 1: Narrowing numeric ranges
--------------------------------------------------
Learning rate: 1.90e-03 (now in [1e-4, 1e-2])
Hidden dim: 140 (now in [128, 512])

Example 2: Removing categorical choices
--------------------------------------------------
Optimizer: adam (always 'adam' now)
Activation: gelu (only 'relu' or 'gelu')

Example 3: Mix of fixed and narrowed parameters
--------------------------------------------------
Learning rate: 0.001 (fixed)
Optimizer: adam (fixed)
Hidden dim: 313 (still exploring [256, 512])


## 🚀 What's Next? More Power Awaits

You've learned the basics, but SpaX can do much more:

### Conditional Parameters
Parameters that depend on other parameters:
```python
class ConfigWithConditions(sp.Config):
    use_dropout: bool
    dropout_rate: float = sp.Conditional(
        sp.FieldCondition("use_dropout", sp.EqualsTo(True)),
        true=sp.Float(gt=0.0, lt=0.5),
        false=0.0  # Fixed when use_dropout=False
    )
```

### Nested Configurations
Compose complex configurations from smaller ones:
```python
class TrainingConfig(sp.Config):
    learning_rate: float = sp.Float(ge=1e-5, le=1e-1, distribution="log")
    
class ModelConfig(sp.Config):
    num_layers: int = sp.Int(ge=1, le=10)

class ExperimentConfig(sp.Config):
    training: TrainingConfig
    model: ModelConfig
```

### HPO Integration (Optuna)
```python
def objective(trial):
    config = MyConfig.from_trial(trial)
    return train_and_evaluate(config)

study = optuna.create_study()
study.optimize(objective, n_trials=100)
```

### Serialization
Save and load configurations in multiple formats:
```python
# JSON, YAML, TOML support
json_str = config.model_dump_json()
loaded = MyConfig.model_validate_json(json_str)
```
---

## 📝 Summary: What You've Learned

In this notebook, you learned how to use SpaX with minimal overhead:

### ✅ Core Concepts
1. **One-line migration**: `BaseModel` → `sp.Config`
2. **Automatic inference**: Type hints and `Field()` constraints become search spaces
3. **Random sampling**: `Config.random(seed=42)` for testing and exploration
4. **Explicit spaces**: Full control with log distributions, weighted choices, and more
5. **Override system**: Iteratively narrow search spaces based on results

### ✅ What's Possible (Advanced Features)
- **Conditional spaces**: Parameters that only exist when conditions are met
- **Nested configurations**: Compose complex configs from smaller ones
- **HPO integration**: Direct Optuna/trial support
- **Multiple serialization formats**: JSON, YAML, TOML

### ✅ When to Use What
- **Automatic inference**: Quick start, simple constraints, prototyping
- **Explicit spaces**: Log distributions, weighted choices, precise control
- **Conditional spaces**: Parameters that depend on other parameters
- **Overrides**: After initial experiments to focus on promising regions

### 🎯 Key Takeaway
SpaX gives you powerful search space exploration capabilities with almost no code changes. Start with automatic inference, add explicit spaces where needed, use conditional spaces for dependencies, and leverage overrides to iteratively refine your search.

**Happy exploring! 🚀**