# Interactive Learning Tutorial: Optimization Algorithms

**Welcome!** This notebook is designed to help you learn optimization algorithms hands-on.

**What you'll learn**:
- How Genetic Algorithms work (step by step)
- How to solve optimization problems with MIP
- When to use each approach
- How to tune parameters

**Prerequisites**: 
- Basic Python knowledge
- Read `LEARNING_GUIDE.md` sections 1-2 (recommended)

**Time needed**: 60-90 minutes

## Part 1: Setup and Data

In [None]:
# Import libraries
import sys
sys.path.append('../scripts')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random

from data_utils import calculate_distance_matrix
from ga_core import GeneticAlgorithm

# Set random seed for reproducibility
random.seed(42)
np.random.seed(42)

print("✓ Libraries imported successfully!")

### Create a Simple Dataset

We'll start with just 5 attractions so you can understand every step.

In [None]:
# Create sample data
attractions = pd.DataFrame({
    'name': ['Temple', 'Beach', 'Fort', 'Mountain', 'Museum'],
    'latitude': [7.29, 6.03, 6.93, 7.50, 6.50],
    'longitude': [80.64, 80.22, 79.85, 80.70, 80.10],
    'score': [9.5, 7.0, 8.8, 9.2, 6.5],
    'visit_duration': [2.0, 1.5, 2.5, 3.0, 1.0]  # hours
})

print("Our attractions:")
print(attractions)
print(f"\nTotal attractions: {len(attractions)}")
print(f"Best score possible: {attractions['score'].sum():.1f}")
print(f"Total visit time: {attractions['visit_duration'].sum():.1f} hours")

In [None]:
# Calculate distances between attractions
distance_matrix = calculate_distance_matrix(attractions)

print("Distance matrix (km):")
print(pd.DataFrame(distance_matrix, 
                   index=attractions['name'],
                   columns=attractions['name']).round(1))

## Part 2: Understanding the Problem

**Exercise 1**: Calculate fitness manually

In [None]:
# Let's manually calculate fitness for a simple tour
# Tour: Visit attractions in order [0, 2, 3] (Temple → Fort → Mountain)

tour = [0, 2, 3]
max_time = 12  # hours
avg_speed = 50  # km/h

print("Tour: ", " → ".join(attractions.iloc[tour]['name']))
print("\nStep-by-step calculation:")
print("=" * 60)

total_time = 0
total_score = 0

for i, idx in enumerate(tour):
    # Visit time
    visit_time = attractions.iloc[idx]['visit_duration']
    total_time += visit_time
    print(f"\nStop {i+1}: {attractions.iloc[idx]['name']}")
    print(f"  Visit time: {visit_time} hours")
    print(f"  Score: {attractions.iloc[idx]['score']}")
    
    # Travel time to next
    if i < len(tour) - 1:
        next_idx = tour[i + 1]
        distance = distance_matrix[idx, next_idx]
        travel_time = distance / avg_speed
        total_time += travel_time
        print(f"  Travel to {attractions.iloc[next_idx]['name']}: {distance:.1f} km = {travel_time:.2f} hours")
    
    total_score += attractions.iloc[idx]['score']
    print(f"  Cumulative time: {total_time:.2f} hours")

print("\n" + "=" * 60)
print(f"TOTAL SCORE: {total_score}")
print(f"TOTAL TIME: {total_time:.2f} hours")
print(f"TIME LIMIT: {max_time} hours")
print(f"FEASIBLE: {'✓ YES' if total_time <= max_time else '✗ NO'}")

if total_time > max_time:
    penalty = (total_time - max_time) * 10
    fitness = max(0, total_score - penalty)
    print(f"PENALTY: {penalty:.1f}")
    print(f"FINAL FITNESS: {fitness:.1f}")
else:
    print(f"FINAL FITNESS: {total_score}")

**❓ Question**: What if we change the order to [3, 2, 0]? Will it be better or worse?

In [None]:
# Try it yourself!
# TODO: Change the tour and re-run the calculation above
# tour = [3, 2, 0]  # Mountain → Fort → Temple

## Part 3: Genetic Algorithm - Step by Step

Now let's see how GA finds good solutions automatically!

### Step 1: Create Random Population

In [None]:
# Initialize GA
ga = GeneticAlgorithm(
    distance_matrix=distance_matrix,
    scores=attractions['score'].values,
    visit_durations=attractions['visit_duration'].values,
    max_time=12,
    population_size=6,  # Small for demonstration
    generations=1,  # Just 1 for now
    mutation_rate=0.2,
    crossover_rate=0.8
)

# Create initial population
population = ga.create_population()

print("Initial Population (random tours):")
print("=" * 60)
for i, individual in enumerate(population):
    tour_names = " → ".join(attractions.iloc[individual]['name'])
    fitness = ga.fitness(individual)
    print(f"Individual {i}: {individual}")
    print(f"  Tour: {tour_names}")
    print(f"  Fitness: {fitness:.2f}")
    print()

**Observation**: Notice how different orderings give different fitness scores!

### Step 2: Selection

In [None]:
# Calculate fitness for all
fitnesses = [ga.fitness(ind) for ind in population]

print("Selection using Tournament Selection:")
print("=" * 60)

# Select 2 parents
for i in range(2):
    parent = ga.selection(population, fitnesses)
    parent_idx = next(j for j, ind in enumerate(population) if ind == parent)
    print(f"\nParent {i+1}: Individual {parent_idx}")
    print(f"  Tour: {parent}")
    print(f"  Fitness: {fitnesses[parent_idx]:.2f}")

print("\n💡 Notice: Better individuals have higher chance of being selected!")

### Step 3: Crossover

In [None]:
# Select two parents
parent1 = ga.selection(population, fitnesses)
parent2 = ga.selection(population, fitnesses)

print("Crossover (Order Crossover):")
print("=" * 60)
print(f"Parent 1: {parent1}")
print(f"Parent 2: {parent2}")

# Perform crossover
child = ga.crossover(parent1.copy(), parent2.copy())

print(f"\nChild:    {child}")
print(f"\nChild inherits traits from both parents!")
print(f"Child fitness: {ga.fitness(child):.2f}")

### Step 4: Mutation

In [None]:
# Set high mutation rate for demonstration
ga.mutation_rate = 1.0  # Always mutate

original = child.copy()
mutated = ga.mutate(child)

print("Mutation (Swap):")
print("=" * 60)
print(f"Before: {original}")
print(f"After:  {mutated}")

# Find what changed
changed_positions = [i for i, (a, b) in enumerate(zip(original, mutated)) if a != b]
if changed_positions:
    print(f"\nSwapped positions: {changed_positions}")
    print(f"Values swapped: {original[changed_positions[0]]} ↔ {original[changed_positions[1]]}")

ga.mutation_rate = 0.2  # Reset

## Part 4: Run Complete GA

Now let's run the full algorithm!

In [None]:
# Run GA with more generations
ga = GeneticAlgorithm(
    distance_matrix=distance_matrix,
    scores=attractions['score'].values,
    visit_durations=attractions['visit_duration'].values,
    max_time=12,
    population_size=20,
    generations=100,
    mutation_rate=0.1,
    crossover_rate=0.8
)

print("Running Genetic Algorithm...")
print("=" * 60)

best_solution, best_fitness, history = ga.evolve()

print(f"\n✓ Optimization complete!")
print(f"\nBest solution: {best_solution}")
print(f"Best fitness: {best_fitness:.2f}")

# Get valid tour
tour = ga.get_valid_tour(best_solution)
tour_names = " → ".join(attractions.iloc[tour]['name'])
print(f"\nOptimal tour: {tour_names}")
print(f"Visits {len(tour)} attractions")

### Visualize Convergence

In [None]:
# Plot fitness evolution
generations = [h['generation'] for h in history]
max_fitness = [h['max_fitness'] for h in history]
avg_fitness = [h['avg_fitness'] for h in history]

plt.figure(figsize=(10, 6))
plt.plot(generations, max_fitness, 'b-', label='Best Fitness', linewidth=2)
plt.plot(generations, avg_fitness, 'r--', label='Average Fitness', linewidth=2)
plt.xlabel('Generation', fontsize=12)
plt.ylabel('Fitness', fontsize=12)
plt.title('Genetic Algorithm Convergence', fontsize=14)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\n📊 Observations:")
print(f"  - Started with best fitness: {max_fitness[0]:.2f}")
print(f"  - Ended with best fitness: {max_fitness[-1]:.2f}")
print(f"  - Improvement: {max_fitness[-1] - max_fitness[0]:.2f} ({(max_fitness[-1]/max_fitness[0]-1)*100:.1f}%)")

## Part 5: Experimentation

**Your turn!** Try modifying parameters and see what happens.

### Experiment 1: Population Size

In [None]:
# Compare different population sizes
pop_sizes = [10, 20, 50, 100]
results = []

for pop_size in pop_sizes:
    ga = GeneticAlgorithm(
        distance_matrix=distance_matrix,
        scores=attractions['score'].values,
        visit_durations=attractions['visit_duration'].values,
        max_time=12,
        population_size=pop_size,
        generations=50,
        mutation_rate=0.1,
        crossover_rate=0.8
    )
    
    _, fitness, _ = ga.evolve()
    results.append(fitness)
    print(f"Population size {pop_size:3d}: Fitness = {fitness:.2f}")

# Plot
plt.figure(figsize=(8, 5))
plt.bar(range(len(pop_sizes)), results, tick_label=pop_sizes)
plt.xlabel('Population Size')
plt.ylabel('Best Fitness')
plt.title('Effect of Population Size')
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

print(f"\n💡 Insight: {'Larger' if results[-1] > results[0] else 'Smaller'} populations tend to perform better (usually)")

### Experiment 2: Mutation Rate

In [None]:
# TODO: Try different mutation rates
# mutation_rates = [0.05, 0.1, 0.2, 0.3]
# Compare results
# What happens with very high mutation? Very low?

### Experiment 3: Different Time Limits

In [None]:
# How does the solution change with different time budgets?
time_limits = [6, 9, 12, 15, 18, 24]

for max_time in time_limits:
    ga = GeneticAlgorithm(
        distance_matrix=distance_matrix,
        scores=attractions['score'].values,
        visit_durations=attractions['visit_duration'].values,
        max_time=max_time,
        population_size=30,
        generations=50
    )
    
    solution, fitness, _ = ga.evolve()
    tour = ga.get_valid_tour(solution)
    
    print(f"Time limit: {max_time:2d}h → Visits: {len(tour)} attractions, Fitness: {fitness:.1f}")

## Part 6: Challenge Questions

Test your understanding!

**Challenge 1**: Add a 6th attraction and re-run the GA. How does it affect the solution?

```python
# Add: Park with score=8.0, duration=2.0h, at (7.0, 80.5)
```

In [None]:
# Your code here


**Challenge 2**: Modify the fitness function to prefer attractions visited earlier (morning bonus)

Hint: Add a bonus that decreases as total_time increases

In [None]:
# Your code here


**Challenge 3**: Compare GA to a greedy algorithm

Greedy approach: Always pick the highest-scoring attraction that fits

In [None]:
# Your code here


## Summary

**What you learned**:
✅ How to represent a tour (permutation encoding)  
✅ How fitness function works  
✅ GA operations: Selection, Crossover, Mutation  
✅ How to run and tune GA  
✅ How to interpret results  

**Next steps**:
1. Read `LEARNING_GUIDE.md` section 6 (MIP)
2. Run notebook `02_Genetic_Algorithm_Implementation.ipynb` with full dataset
3. Try `03_MIP_Model_Benchmark.ipynb` to compare approaches
4. Complete the challenge questions above

**Questions?** Check `LEARNING_GUIDE.md` or open an issue on GitHub!

---

**Great job! 🎉** You now understand the basics of genetic algorithms for optimization!