# ðŸ“Š Hierarchical Sampling & Experiment Management

## Advanced SeedHash Tutorial #2

This notebook covers:
- `SeedExperimentManager` for systematic experimentation
- Hierarchical seed generation (master â†’ seeds â†’ sub-seeds)
- 4 sampling methods: simple, stratified, cluster, systematic
- ML experiment tracking with metrics
- DataFrame export and analysis

**Prerequisites**: Complete Tutorial #1 first

---

## Installation

If you haven't installed seedhash yet, run:

In [None]:
# Install seedhash with all dependencies (uncomment if needed)
# !pip install "git+https://github.com/melhzy/seedhash.git#egg=seedhash[all]&subdirectory=Python"

In [1]:
# Setup and imports
import sys
sys.path.insert(0, '../Python')

import numpy as np
import pandas as pd
from seedhash import SeedExperimentManager

print("âœ… All imports successful!")

âœ… All imports successful!


## 1. Introduction to SeedExperimentManager ðŸŽ¯

The `SeedExperimentManager` manages multiple experiments with hierarchical seeds.

In [3]:
# Create an experiment manager
manager = SeedExperimentManager("my_ml_project")

print(f"Project: {manager.experiment_name}")
print(f"Master seed: {manager.master_seed}")
print(f"Experiments tracked: {len(manager.results)}")

Project: my_ml_project
Master seed: 3002848522
Experiments tracked: 0


## 2. Hierarchical Seed Generation ðŸŒ³

Generate hierarchy: master â†’ seeds â†’ sub-seeds

In [4]:
# Generate hierarchical seeds
hierarchy = manager.generate_seed_hierarchy(
    n_seeds=3,
    n_sub_seeds=2,
    max_depth=2
)

print(f"Master seed: {hierarchy[0][0]}\n")
print(f"Level 0 (master): {hierarchy[0]}")
print(f"Level 1 (seeds): {hierarchy[1]}")
print(f"Level 2 (sub-seeds): {hierarchy[2]}")

print(f"\nTotal: 1 master â†’ {len(hierarchy[1])} seeds â†’ {len(hierarchy[2])} sub-seeds")

Master seed: 3002848522

Level 0 (master): [3002848522]
Level 1 (seeds): [5644790, 1216290993, 1629135287]
Level 2 (sub-seeds): [1240067319, 370689488, 732425434, 2111485999, 2141033051, 862800348]

Total: 1 master â†’ 3 seeds â†’ 6 sub-seeds


## 3. Simple Random Sampling ðŸŽ²

In [5]:
# Simple random sampling
samples = manager.(
    population_size=1000,
    sample_size=100,
    seed=12345
)

print(f"Sample size: {len(samples)}")
print(f"First 10: {sorted(samples)[:10]}")

AttributeError: 'SeedExperimentManager' object has no attribute 'simple_random_sample'

## 4. Stratified Random Sampling ðŸ“Š

In [6]:
# Stratified sampling ensures balanced coverage
samples = manager.stratified_random_sample(
    population_size=1000,
    sample_size=100,
    n_strata=10,
    seed=12345
)

print(f"Stratified sample size: {len(samples)}")
print(f"Expected per stratum: ~{100//10}")

AttributeError: 'SeedExperimentManager' object has no attribute 'stratified_random_sample'

## 5. ML Experiment Tracking ðŸ“ˆ

In [7]:
# Track ML experiments
tracker = SeedExperimentManager("ml_tracking_demo")

hierarchy = tracker.generate_seed_hierarchy(n_seeds=5, n_sub_seeds=2, max_depth=2)

# Simulate experiments
for seed in hierarchy[1][:3]:
    rmse = 5.0 + np.random.rand()
    r2 = 0.95 + np.random.rand() * 0.04
    
    tracker.add_experiment_result(
        seed=seed,
        ml_task="regression",
        metrics={"rmse": rmse, "r2": r2, "mae": rmse * 0.8},
        sampling_method="simple",
        metadata={"model": "linear_regression", "n_samples": 100}
    )
    print(f"Tracked seed {seed}: RMSE={rmse:.3f}, RÂ²={r2:.3f}")

print(f"\nâœ“ Tracked {len(tracker.results)} experiments")

Tracked seed 1709912838: RMSE=5.469, RÂ²=0.969
Tracked seed 374373719: RMSE=5.230, RÂ²=0.985
Tracked seed 1166838137: RMSE=5.365, RÂ²=0.987

âœ“ Tracked 3 experiments


## 6. DataFrame Export & Analysis ðŸ“Š

In [8]:
# Export to DataFrame
df = tracker.get_results_dataframe()

print("DataFrame Preview:")
print(df.head())

print(f"\nShape: {df.shape}")
print(f"\nMetric Statistics:")
print(df[['metric_rmse', 'metric_r2', 'metric_mae']].describe())

# Export to files
df.to_csv('experiment_results.csv', index=False)
print("\nâœ“ Exported to CSV!")

DataFrame Preview:
                                experiment_id  seed_level  master_seed  \
0  ml_tracking_demo_regression_seed1709912838           1   3662338350   
1   ml_tracking_demo_regression_seed374373719           1   3662338350   
2  ml_tracking_demo_regression_seed1166838137           1   3662338350   

         seed sub_seed  current_seed sampling_method     ml_task  metric_mae  \
0  1709912838     None    1709912838          simple  regression    4.374855   
1   374373719     None     374373719          simple  regression    4.184014   
2  1166838137     None    1166838137          simple  regression    4.291601   

   metric_r2  metric_rmse         meta_model  meta_n_samples  \
0   0.968750     5.468568  linear_regression             100   
1   0.985075     5.230018  linear_regression             100   
2   0.986842     5.364501  linear_regression             100   

                    timestamp  
0  2025-11-02T23:37:29.213248  
1  2025-11-02T23:37:29.214630  
2  2025-11

## 7. Complete Example: Multi-Method Study ðŸ”¬

In [9]:
# Compare all sampling methods
study = SeedExperimentManager("complete_study")

for method in ["simple", "stratified", "cluster", "systematic"]:
    hierarchy = study.generate_seed_hierarchy(
        n_seeds=3,
        n_sub_seeds=2,
        max_depth=2,
        sampling_method=method
    )
    
    for seed in hierarchy[1]:
        accuracy = 0.80 + np.random.rand() * 0.15
        f1 = accuracy * (0.95 + np.random.rand() * 0.05)
        
        study.add_experiment_result(
            seed=seed,
            ml_task="classification",
            metrics={"accuracy": accuracy, "f1": f1},
            sampling_method=method
        )
    
    print(f"âœ“ Completed {method} sampling")

df = study.get_results_dataframe()
print(f"\nTotal experiments: {len(df)}")
print("\nAccuracy by method:")
print(df.groupby('sampling_method')['metric_accuracy'].agg(['mean', 'std']).round(3))

âœ“ Completed simple sampling
âœ“ Completed stratified sampling
âœ“ Completed cluster sampling
âœ“ Completed systematic sampling

Total experiments: 12

Accuracy by method:
                  mean    std
sampling_method              
cluster          0.847  0.061
simple           0.889  0.033
stratified       0.918  0.044
systematic       0.854  0.039


## Summary ðŸŽ‰

You learned:
- âœ… SeedExperimentManager for systematic experiments
- âœ… Hierarchical seed generation
- âœ… 4 sampling methods
- âœ… ML experiment tracking
- âœ… DataFrame export

**Next**: Try Tutorial #3 for advanced ML paradigms!