# SeedHash: Complete Tutorial

## Reproducible Random Seed Generation for Machine Learning

Welcome to the comprehensive tutorial for the **seedhash** library! This notebook demonstrates all features and use cases.

### What is SeedHash?

SeedHash generates **deterministic random seeds from string inputs** using MD5 hashing, making your experiments reproducible across:
- Python's `random` module
- NumPy
- PyTorch
- TensorFlow
- And more!

### Key Features

‚úÖ **String-to-Seed Conversion**: Turn experiment names into seeds  
‚úÖ **Cross-Framework Support**: Seeds Python, NumPy, PyTorch, TensorFlow  
‚úÖ **Hierarchical Sampling**: Generate master ‚Üí seeds ‚Üí sub-seeds  
‚úÖ **ML Experiment Tracking**: Track experiments with DataFrame output  
‚úÖ **Advanced Paradigms**: Semi-supervised, RL, Federated Learning support

---

## 1. Installation

First, let's install the seedhash library. You can install it directly from GitHub:

In [None]:
# Install seedhash with all dependencies
# !pip install "git+https://github.com/melhzy/seedhash.git#subdirectory=Python[all]"

# For this tutorial, we'll add the Python directory to the path
import sys
sys.path.insert(0, '../Python')

# Import seedhash
from seedhash import SeedHashGenerator, SeedExperimentManager, MLMetrics

# Import other libraries
import random
import numpy as np

print("‚úÖ SeedHash imported successfully!")
print(f"üì¶ Available classes: SeedHashGenerator, SeedExperimentManager, MLMetrics")

## 2. Basic Usage: Creating a SeedHashGenerator

The core of seedhash is the `SeedHashGenerator` class. It converts any string into a deterministic integer seed.

In [None]:
# Create a SeedHashGenerator with an experiment name
gen = SeedHashGenerator("my_first_experiment")

print(f"Input string: {gen.input_string}")
print(f"Generated seed: {gen.seed_number}")
print(f"MD5 hash: {gen.get_hash()}")
print(f"\n{gen}")

In [None]:
# Verify the seed generation
print(f"Generated seed: {gen.seed_number}")
print(f"Seed type: {type(gen.seed_number)}")
print(f"Seed range: 0 to {2**31 - 1}")

# The same identifier always produces the same seed!
gen_copy = SeedHashGenerator("my_experiment")
print(f"\nVerification: {gen.seed_number == gen_copy.seed_number}")

## 3. Reproducibility in Action üîÑ

The **most powerful feature** of SeedHash is guaranteed reproducibility. The same identifier always produces the same seed, regardless of when or where you run it.

In [None]:
# Demonstrate reproducibility
identifier = "data_split_v1"

# First run
gen1 = SeedHashGenerator(identifier)
random.seed(gen1.seed_number)
sample1 = random.sample(range(100), 10)

# Second run (completely separate)
gen2 = SeedHashGenerator(identifier)
random.seed(gen2.seed_number)
sample2 = random.sample(range(100), 10)

# Results are identical!
print(f"First sample:  {sample1}")
print(f"Second sample: {sample2}")
print(f"Are they equal? {sample1 == sample2} ‚úÖ")

## 4. Seeding Different Frameworks üß†

SeedHash seamlessly integrates with popular Python frameworks. Let's seed each one!

In [None]:
### Python Random Module
gen = SeedHashGenerator("python_random_exp")
gen.seed_python()  # Seeds random module
print(f"Python seeded with: {gen.seed_number}")
print(f"Random number: {random.random()}")

### NumPy
gen = SeedHashGenerator("numpy_exp")
gen.seed_numpy()  # Seeds numpy.random
print(f"\nNumPy seeded with: {gen.seed_number}")
print(f"Random array: {np.random.rand(3)}")

### PyTorch (if installed)
try:
    import torch
    gen = SeedHashGenerator("pytorch_exp")
    gen.seed_torch()  # Seeds PyTorch
    print(f"\nPyTorch seeded with: {gen.seed_number}")
    print(f"Random tensor: {torch.rand(3)}")
except ImportError:
    print("\n‚ö†Ô∏è PyTorch not installed (optional)")

### TensorFlow (if installed)
try:
    import tensorflow as tf
    gen = SeedHashGenerator("tensorflow_exp")
    gen.seed_tensorflow()  # Seeds TensorFlow
    print(f"\nTensorFlow seeded with: {gen.seed_number}")
    print(f"Random tensor: {tf.random.uniform([3])}")
except ImportError:
    print("‚ö†Ô∏è TensorFlow not installed (optional)")

## 5. Deterministic Mode üîí

For **maximum reproducibility**, use `set_deterministic_mode()` to configure all frameworks at once!

In [None]:
gen = SeedHashGenerator("deterministic_exp")

# Seed ALL frameworks at once!
gen.set_deterministic_mode()

print("‚úÖ All frameworks seeded:")
print(f"  - Python random: {gen.seed_number}")
print(f"  - NumPy: {gen.seed_number}")

try:
    import torch
    print(f"  - PyTorch: {gen.seed_number}")
    print(f"  - CUDA deterministic: ON")
except ImportError:
    pass

try:
    import tensorflow as tf
    print(f"  - TensorFlow: {gen.seed_number}")
except ImportError:
    pass

# Now all random operations are reproducible!
print(f"\nPython random: {random.random()}")
print(f"NumPy random: {np.random.rand()}")

## 6. Generating Multiple Seeds üé≤

Need multiple random seeds? Use `generate_random_seeds()`:

In [None]:
gen = SeedHashGenerator("multi_seed_exp")

# Generate 5 random seeds
seeds = gen.generate_random_seeds(n=5)
print("Generated seeds:")
for i, seed in enumerate(seeds, 1):
    print(f"  Seed {i}: {seed}")

# These seeds are still reproducible!
gen2 = SeedHashGenerator("multi_seed_exp")
seeds2 = gen2.generate_random_seeds(n=5)
print(f"\nReproducible? {seeds == seeds2} ‚úÖ")

## 7. Custom Ranges üéØ

Generate seeds within specific ranges using `min_seed` and `max_seed`:

In [None]:
# Small range (0-999)
gen = SeedHashGenerator("small_range", min_seed=0, max_seed=999)
print(f"Small range seed: {gen.seed_number} (0-999)")

# Custom range (1000-9999)
gen = SeedHashGenerator("custom_range", min_seed=1000, max_seed=9999)
print(f"Custom range seed: {gen.seed_number} (1000-9999)")

# Large range (default)
gen = SeedHashGenerator("large_range")
print(f"Default range seed: {gen.seed_number} (0-{2**31-1})")

## 8. Practical Example: Data Splitting üìä

Let's use SeedHash for a real-world task: splitting data reproducibly.

In [None]:
# Create synthetic data
X = np.random.randn(100, 5)
y = np.random.randint(0, 2, 100)

# Split data reproducibly
gen = SeedHashGenerator("data_split_v1")
gen.seed_numpy()

# Shuffle indices
indices = np.arange(len(X))
np.random.shuffle(indices)

# Split 80/20
split_point = int(0.8 * len(indices))
train_idx = indices[:split_point]
test_idx = indices[split_point:]

print(f"Training samples: {len(train_idx)}")
print(f"Test samples: {len(test_idx)}")
print(f"\nFirst 5 train indices: {sorted(train_idx)[:5]}")
print(f"First 5 test indices: {sorted(test_idx)[:5]}")

# Run again - same split!
gen2 = SeedHashGenerator("data_split_v1")
gen2.seed_numpy()
indices2 = np.arange(len(X))
np.random.shuffle(indices2)
train_idx2 = indices2[:split_point]

print(f"\nSame split? {np.array_equal(train_idx, train_idx2)} ‚úÖ")

## 9. Practical Example: Model Training ü§ñ

Train a simple model with reproducible initialization:

In [None]:
# Simulate training with NumPy
def train_model(seed_id):
    """Simple model training simulation"""
    gen = SeedHashGenerator(seed_id)
    gen.seed_numpy()
    
    # Initialize random weights
    weights = np.random.randn(10)
    bias = np.random.randn()
    
    # Simulate training
    for epoch in range(5):
        loss = np.random.rand()  # Simulated loss
    
    return weights, bias, loss

# Train model 1
weights1, bias1, loss1 = train_model("model_v1")
print(f"Model 1 - Loss: {loss1:.4f}")
print(f"First 3 weights: {weights1[:3]}")

# Train model 2 (same seed)
weights2, bias2, loss2 = train_model("model_v1")
print(f"\nModel 2 - Loss: {loss2:.4f}")
print(f"First 3 weights: {weights2[:3]}")

# Verify reproducibility
print(f"\nSame weights? {np.allclose(weights1, weights2)} ‚úÖ")
print(f"Same loss? {loss1 == loss2} ‚úÖ")

## 10. Understanding MD5 Hashing üîê

SeedHash uses MD5 hashing to convert identifiers to seeds. Let's see how!

In [None]:
gen = SeedHashGenerator("my_experiment")

# The full MD5 hash (hexadecimal)
print(f"MD5 hash: {gen.md5_hash}")
print(f"Hash length: {len(gen.md5_hash)} characters")

# Converted to integer seed
print(f"\nInteger seed: {gen.seed_number}")
print(f"Seed range: [0, {2**31 - 1}]")

# Different identifiers ‚Üí Different seeds
identifiers = ["exp_1", "exp_2", "exp_3"]
for identifier in identifiers:
    gen = SeedHashGenerator(identifier)
    print(f"\n'{identifier}':")
    print(f"  MD5: {gen.md5_hash[:16]}...")
    print(f"  Seed: {gen.seed_number}")

## 11. Error Handling ‚ö†Ô∏è

SeedHash validates inputs and provides helpful error messages:

In [None]:
# Test invalid range
try:
    gen = SeedHashGenerator("test", min_seed=100, max_seed=10)
except ValueError as e:
    print(f"‚ùå Error: {e}")

# Test invalid type
try:
    gen = SeedHashGenerator(12345)  # Must be string
except TypeError as e:
    print(f"‚ùå Error: {e}")

# Test invalid n for multiple seeds
try:
    gen = SeedHashGenerator("test")
    gen.generate_random_seeds(n=-5)
except ValueError as e:
    print(f"‚ùå Error: {e}")

print("\n‚úÖ All error handling works correctly!")

## 12. Best Practices üí°

### ‚úÖ DO:
- Use descriptive identifiers: `"data_split_v1"`, `"model_training_2024"`
- Version your experiments: `"exp_v1"`, `"exp_v2"`, etc.
- Call `set_deterministic_mode()` at the start of your script
- Document your seed identifiers in code comments

### ‚ùå DON'T:
- Use random or timestamp-based identifiers
- Change identifiers mid-experiment
- Mix manual seeding with SeedHash seeding
- Forget to seed frameworks before random operations

### Example: Good Practice
```python
# At the start of your script
from seedhash import SeedHashGenerator

# Version your experiment
EXPERIMENT_VERSION = "v1.0"
gen = SeedHashGenerator(f"my_project_{EXPERIMENT_VERSION}")
gen.set_deterministic_mode()

# Now all random operations are reproducible!
```

## 13. Next Steps üöÄ

Congratulations! You now know the basics of SeedHash. Ready for more?

### üìö Continue Learning:

1. **Hierarchical Sampling** (`02_Hierarchical_Sampling.ipynb`)
   - Learn about `SeedExperimentManager`
   - Master hierarchical seed generation
   - Track ML experiments systematically

2. **Advanced ML Paradigms** (`03_Advanced_ML_Paradigms.ipynb`)
   - Semi-supervised learning metrics
   - Reinforcement learning tracking
   - Federated learning evaluation

### üìñ Additional Resources:

- **GitHub**: https://github.com/melhzy/seedhash
- **Python Examples**: `../Python/examples/`
- **Full Documentation**: `../Python/README.md`
- **R Package**: `../R/` (if you use R)

---

## üéâ Summary

You learned how to:
- ‚úÖ Generate reproducible seeds with `SeedHashGenerator`
- ‚úÖ Seed Python, NumPy, PyTorch, TensorFlow
- ‚úÖ Use `set_deterministic_mode()` for maximum reproducibility
- ‚úÖ Generate multiple seeds with `generate_random_seeds()`
- ‚úÖ Apply SeedHash to data splitting and model training
- ‚úÖ Understand MD5 hashing and error handling
- ‚úÖ Follow best practices for experiment reproducibility

**Happy experimenting! üî¨**