# üåü SeedHash: Complete Tutorial

## Reproducible Random Seed Generation for Machine Learning

Welcome to the comprehensive tutorial for the **seedhash** library!

### What is SeedHash?

SeedHash generates **deterministic random seeds from string inputs** using MD5 hashing, making your experiments reproducible across:
- Python's `random` module
- NumPy
- PyTorch
- TensorFlow

### Key Features

‚úÖ **String-to-Seed Conversion**: Turn experiment names into seeds  
‚úÖ **Cross-Framework Support**: Seeds Python, NumPy, PyTorch, TensorFlow  
‚úÖ **Hierarchical Sampling**: Generate master ‚Üí seeds ‚Üí sub-seeds  
‚úÖ **ML Experiment Tracking**: Track experiments with DataFrame output  
‚úÖ **Advanced Paradigms**: Semi-supervised, RL, Federated Learning support

---

## 1. Installation üì¶

Install seedhash directly from GitHub:

In [None]:
# Install seedhash with all dependencies
# !pip install "git+https://github.com/melhzy/seedhash.git#subdirectory=Python[all]"

# For local development, add Python directory to path
import sys
sys.path.insert(0, '../Python')

# Import seedhash
from seedhash import SeedHashGenerator

# Import other libraries
import random
import numpy as np

print("‚úÖ SeedHash imported successfully!")

## 2. Basic Usage üöÄ

Create a `SeedHashGenerator` with an experiment name:

In [None]:
# Create a generator with an experiment identifier
gen = SeedHashGenerator("my_experiment")

print(f"Generated seed: {gen.seed_number}")
print(f"Seed type: {type(gen.seed_number)}")
print(f"Seed range: 0 to {2**31 - 1}")

# The same identifier always produces the same seed!
gen_copy = SeedHashGenerator("my_experiment")
print(f"\nVerification: {gen.seed_number == gen_copy.seed_number}")

## 3. Reproducibility in Action üîÑ

The **most powerful feature**: same identifier ‚Üí same seed, always!

In [None]:
# Demonstrate reproducibility
identifier = "data_split_v1"

# First run
gen1 = SeedHashGenerator(identifier)
random.seed(gen1.seed_number)
sample1 = random.sample(range(100), 10)

# Second run (completely separate)
gen2 = SeedHashGenerator(identifier)
random.seed(gen2.seed_number)
sample2 = random.sample(range(100), 10)

# Results are identical!
print(f"First sample:  {sample1}")
print(f"Second sample: {sample2}")
print(f"\nAre they equal? {sample1 == sample2} ‚úÖ")

## 4. Seeding Different Frameworks üß†

SeedHash seamlessly integrates with popular frameworks:

In [None]:
### Python Random Module
gen = SeedHashGenerator("python_random_exp")
gen.set_seed("python")
print(f"Python seeded with: {gen.seed_number}")
print(f"Random number: {random.random()}")

### NumPy
gen = SeedHashGenerator("numpy_exp")
gen.set_seed("numpy")
print(f"\nNumPy seeded with: {gen.seed_number}")
print(f"Random array: {np.random.rand(3)}")

### PyTorch (if installed)
try:
    import torch
    gen = SeedHashGenerator("pytorch_exp")
    gen.set_seed("torch")
    print(f"\nPyTorch seeded with: {gen.seed_number}")
    print(f"Random tensor: {torch.rand(3)}")
except ImportError:
    print("\n‚ö†Ô∏è PyTorch not installed (optional)")

### TensorFlow (if installed)
try:
    import tensorflow as tf
    gen = SeedHashGenerator("tensorflow_exp")
    gen.set_seed("tensorflow")
    print(f"\nTensorFlow seeded with: {gen.seed_number}")
    print(f"Random tensor: {tf.random.uniform([3])}")
except ImportError:
    print("‚ö†Ô∏è TensorFlow not installed (optional)")

## 5. Deterministic Mode üîí

For **maximum reproducibility**, use `seed_all()` to configure all frameworks at once!

In [None]:
gen = SeedHashGenerator("deterministic_exp")

# Seed ALL frameworks at once!
status = gen.seed_all(deterministic=True)

print("‚úÖ All frameworks seeded:")
for framework, state in status.items():
    print(f"  - {framework}: {state}")

# Now all random operations are reproducible!
print(f"\nPython random: {random.random()}")
print(f"NumPy random: {np.random.rand()}")

## 6. Generating Multiple Seeds üé≤

Need multiple random seeds? Use `generate_seeds()`:

In [None]:
gen = SeedHashGenerator("multi_seed_exp")

# Generate 5 random seeds
seeds = gen.generate_seeds(5)
print("Generated seeds:")
for i, seed in enumerate(seeds, 1):
    print(f"  Seed {i}: {seed}")

# These seeds are still reproducible!
gen2 = SeedHashGenerator("multi_seed_exp")
seeds2 = gen2.generate_seeds(5)
print(f"\nReproducible? {seeds == seeds2} ‚úÖ")

## 7. Custom Ranges üéØ

Generate seeds within specific ranges:

In [None]:
# Small range (0-999)
gen = SeedHashGenerator("small_range", min_value=0, max_value=999)
print(f"Small range seed: {gen.seed_number} (0-999)")

# Custom range (1000-9999)
gen = SeedHashGenerator("custom_range", min_value=1000, max_value=9999)
print(f"Custom range seed: {gen.seed_number} (1000-9999)")

# Large range (default)
gen = SeedHashGenerator("large_range")
print(f"Default range seed: {gen.seed_number} (0-{2**31-1})")

## 8. Practical Example: Data Splitting üìä

In [None]:
# Create synthetic data
X = np.random.randn(100, 5)
y = np.random.randint(0, 2, 100)

# Split data reproducibly
gen = SeedHashGenerator("data_split_v1")
gen.set_seed("numpy")

# Shuffle indices
indices = np.arange(len(X))
np.random.shuffle(indices)

# Split 80/20
split_point = int(0.8 * len(indices))
train_idx = indices[:split_point]
test_idx = indices[split_point:]

print(f"Training samples: {len(train_idx)}")
print(f"Test samples: {len(test_idx)}")
print(f"\nFirst 5 train indices: {sorted(train_idx)[:5]}")

# Verify reproducibility
gen2 = SeedHashGenerator("data_split_v1")
gen2.set_seed("numpy")
indices2 = np.arange(len(X))
np.random.shuffle(indices2)
train_idx2 = indices2[:split_point]

print(f"\nSame split? {np.array_equal(train_idx, train_idx2)} ‚úÖ")

## 9. Understanding MD5 Hashing üîê

In [None]:
gen = SeedHashGenerator("my_experiment")

# The full MD5 hash (hexadecimal)
print(f"MD5 hash: {gen.get_hash()}")
print(f"Hash length: {len(gen.get_hash())} characters")

# Converted to integer seed
print(f"\nInteger seed: {gen.seed_number}")
print(f"Seed range: [0, {2**31 - 1}]")

# Different identifiers ‚Üí Different seeds
print("\nDifferent identifiers:")
for identifier in ["exp_1", "exp_2", "exp_3"]:
    gen = SeedHashGenerator(identifier)
    print(f"  '{identifier}': Seed {gen.seed_number}")

## 10. Error Handling ‚ö†Ô∏è

In [None]:
# Test invalid range
try:
    gen = SeedHashGenerator("test", min_value=100, max_value=10)
except ValueError as e:
    print(f"‚ùå Error: {e}")

# Test invalid type
try:
    gen = SeedHashGenerator(12345)  # Must be string
except TypeError as e:
    print(f"‚ùå Error: {e}")

# Test invalid n for multiple seeds
try:
    gen = SeedHashGenerator("test")
    gen.generate_seeds(-5)
except ValueError as e:
    print(f"‚ùå Error: {e}")

print("\n‚úÖ All error handling works correctly!")

## 11. Best Practices üí°

### ‚úÖ DO:
- Use descriptive identifiers: `"data_split_v1"`, `"model_training_2024"`
- Version your experiments: `"exp_v1"`, `"exp_v2"`
- Call `seed_all()` at the start of your script
- Document your seed identifiers

### ‚ùå DON'T:
- Use random or timestamp-based identifiers
- Change identifiers mid-experiment
- Mix manual seeding with SeedHash

### Example: Good Practice
```python
from seedhash import SeedHashGenerator

# Version your experiment
EXPERIMENT_VERSION = "v1.0"
gen = SeedHashGenerator(f"my_project_{EXPERIMENT_VERSION}")
gen.seed_all()

# Now all random operations are reproducible!
```

## 12. Next Steps üöÄ

### Continue Learning:

1. **Tutorial #2: Hierarchical Sampling** (`02_Hierarchical_Sampling.ipynb`)
   - SeedExperimentManager
   - Hierarchical seed generation
   - 4 sampling methods
   - ML experiment tracking

2. **Tutorial #3: Advanced ML Paradigms** (`03_Advanced_ML_Paradigms.ipynb`)
   - Semi-supervised learning
   - Reinforcement learning
   - Federated learning

### Resources:
- **GitHub**: https://github.com/melhzy/seedhash
- **Examples**: `../Python/examples/`
- **Documentation**: `../Python/README.md`

---

## üéâ Summary

You learned:
- ‚úÖ Generate reproducible seeds with `SeedHashGenerator`
- ‚úÖ Seed Python, NumPy, PyTorch, TensorFlow
- ‚úÖ Use `seed_all()` for maximum reproducibility
- ‚úÖ Generate multiple seeds with `generate_seeds()`
- ‚úÖ Apply to data splitting and model training
- ‚úÖ Understand MD5 hashing and error handling

**Happy experimenting! üî¨**