# TSP Baselines Evaluation

This notebook evaluates baselines for TSP, specifically:
1. **Christofides Algorithm**: guarantees a 1.5-approx resut
2. **Google's OR-Tools**: Google's optimization solver
3. **Concorde**: fastest, most accurate TSP solver for large instances
4. **Supervised**: NN trained on 'output' values in the training datasets

In [20]:
%load_ext autoreload
%autoreload 2

import numpy as np
from pathlib import Path

from baselines import (
    load_tsp_data,
    calculate_tour_length,
    christofides_tsp,
    ortools_tsp,
    concorde_tsp,
    evaluate_baselines,
    print_results_table,
    SupervisedTSPModel,
    greedy_decode,
    TSPDataset,
    DEVICE,
)

import torch
import torch.nn as nn
from torch.utils.data import DataLoader

# Check if CUDA is available
print(f"Using device: {DEVICE}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
    print(f"CUDA memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Using device: cpu


## Configuration

In [21]:
DATA_DIR = Path('../data/data') # i'm currently storing the zip files data in data/data/<files>, change this 

# Test datasets
TSP20_TEST = DATA_DIR / 'tsp_20_test.txt'
TSP50_TEST = DATA_DIR / 'tsp50_test.txt'

# Training datasets 
TSP5_TRAIN_DIR = DATA_DIR / 'tsp_5_train'
TSP10_TRAIN_DIR = DATA_DIR / 'tsp_10_train'
TSP520_TRAIN_DIR = DATA_DIR / 'tsp_5-20_train'

TSP5_TEST = TSP5_TRAIN_DIR / 'tsp5_test.txt'
TSP10_TEST = TSP10_TRAIN_DIR / 'tsp10_test.txt'


# Model params (TODO: try different values, haven't done any hyperparam search yet)
HIDDEN_DIM = 128
LEARNING_RATE = 1e-3
BATCH_SIZE = 32
NUM_EPOCHS = 10

## Data Loading

In [22]:
# load test data
print("Loading TSP-5 test data...")
tsp5_test = load_tsp_data(str(TSP5_TEST))
print(f"Loaded {len(tsp5_test)} TSP-5 instances")

print("\nLoading TSP-10 test data...")
tsp10_test = load_tsp_data(str(TSP10_TEST))
print(f"Loaded {len(tsp10_test)} TSP-10 instances")

print("\nLoading TSP-20 test data...")
tsp20_test = load_tsp_data(str(TSP20_TEST))
print(f"Loaded {len(tsp20_test)} TSP-20 instances")

print("\nLoading TSP-50 test data...")
tsp50_test = load_tsp_data(str(TSP50_TEST))
print(f"Loaded {len(tsp50_test)} TSP-50 instances")

# verifying format (debug)
coords, tour = tsp20_test[0]
print(f"\nExample TSP-20 instance:")
print(f"  Coordinates shape: {coords.shape}")
print(f"  Tour length: {len(tour)}")
print(f"  Optimal tour length: {calculate_tour_length(coords, tour):.4f}")

Loading TSP-5 test data...
Loaded 10000 TSP-5 instances

Loading TSP-10 test data...
Loaded 10000 TSP-10 instances

Loading TSP-20 test data...
Loaded 10000 TSP-20 instances

Loading TSP-50 test data...
Loaded 10000 TSP-20 instances

Loading TSP-50 test data...
Loaded 10000 TSP-50 instances

Example TSP-20 instance:
  Coordinates shape: (20, 2)
  Tour length: 21
  Optimal tour length: 3.7264
Loaded 10000 TSP-50 instances

Example TSP-20 instance:
  Coordinates shape: (20, 2)
  Tour length: 21
  Optimal tour length: 3.7264


## Evaluate Christofides Baseline

In [23]:
# quick test on single instance to check if works
coords, optimal_tour = tsp20_test[0]
christofides_tour, christofides_length = christofides_tsp(coords)
optimal_length = calculate_tour_length(coords, optimal_tour)

print(f"Christofides: {christofides_length:.4f}")
print(f"Optimal: {optimal_length:.4f}")
print(f"Gap: {((christofides_length - optimal_length) / optimal_length * 100):.2f}%")

Christofides: 3.7521
Optimal: 3.7264
Gap: 0.69%


In [24]:
# full evaluation on TSP-20 (first 1k instances)
print("Evaluating Christofides on TSP-20...")
results_christofides_20 = evaluate_baselines(
    tsp20_test[:1000],  # Evaluate on first 1000 instances
    methods=['christofides', 'concorde'],
    progress=True
)
print_results_table(results_christofides_20, problem_size=20)

Evaluating Christofides on TSP-20...
Processing instance 0/1000...
Processing instance 10/1000...
Processing instance 10/1000...
Processing instance 20/1000...
Processing instance 20/1000...
Processing instance 30/1000...
Processing instance 30/1000...
Processing instance 40/1000...
Processing instance 40/1000...
Processing instance 50/1000...
Processing instance 50/1000...
Processing instance 60/1000...
Processing instance 60/1000...
Processing instance 70/1000...
Processing instance 70/1000...
Processing instance 80/1000...
Processing instance 80/1000...
Processing instance 90/1000...
Processing instance 90/1000...
Processing instance 100/1000...
Processing instance 100/1000...
Processing instance 110/1000...
Processing instance 110/1000...
Processing instance 120/1000...
Processing instance 120/1000...
Processing instance 130/1000...
Processing instance 130/1000...
Processing instance 140/1000...
Processing instance 140/1000...
Processing instance 150/1000...
Processing instance 150

## Evaluate OR-Tools Baseline

In [25]:
# quick test on single instance
coords, optimal_tour = tsp20_test[0]
ortools_tour, ortools_length = ortools_tsp(coords, time_limit_seconds=10)

if ortools_tour is not None:
    print(f"OR-Tools: {ortools_length:.4f}")
    print(f"Optimal: {optimal_length:.4f}")
    print(f"Gap: {((ortools_length - optimal_length) / optimal_length * 100):.2f}%")
else:
    print("OR-Tools failed")

OR-Tools: 3.8465
Optimal: 3.7264
Gap: 3.22%


In [26]:
# Full evaluation on TSP-20
print("Evaluating OR-Tools on TSP-20...")
results_ortools_20 = evaluate_baselines(
    tsp20_test[:100],  # smaller sample for time, just checking if this actually runs
    methods=['ortools', 'concorde'],
    progress=True
)
print_results_table(results_ortools_20, problem_size=20)

Evaluating OR-Tools on TSP-20...
Processing instance 0/100...
Processing instance 10/100...
Processing instance 10/100...
Processing instance 20/100...
Processing instance 20/100...
Processing instance 30/100...
Processing instance 30/100...
Processing instance 40/100...
Processing instance 40/100...
Processing instance 50/100...
Processing instance 50/100...
Processing instance 60/100...
Processing instance 60/100...
Processing instance 70/100...
Processing instance 70/100...
Processing instance 80/100...
Processing instance 80/100...
Processing instance 90/100...
Processing instance 90/100...

TSP-20 Evaluation Results
Method            Avg Length  Avg Gap (%)    Avg Ratio Success Rate
------------------------------------------------------------
ortools               3.9117         0.88       1.0088      100.00%
concorde              3.8779         0.00       1.0000      100.00%


TSP-20 Evaluation Results
Method            Avg Length  Avg Gap (%)    Avg Ratio Success Rate
----------

## Evaluate Concorde (True Optimal Solver)

Concorde is an exact TSP solver that guarantees finding the optimal solution.

In [27]:
# quick test for Concorde on single instance
coords, provided_tour = tsp20_test[0]
concorde_tour, concorde_length = concorde_tsp(coords)
provided_length = calculate_tour_length(coords, provided_tour)

print(f"Concorde (TRUE optimal): {concorde_length:.4f}")
print(f"Dataset 'optimal': {provided_length:.4f}")
print(f"Gap: {((provided_length - concorde_length) / concorde_length * 100):.2f}%")

Concorde (TRUE optimal): 3.5178
Dataset 'optimal': 3.7264
Gap: 5.93%


In [28]:
# full evaluation on TSP-20
print("Evaluating Concorde on TSP-20 (this will take some time)...")
results_concorde_20 = evaluate_baselines(
    tsp20_test[:100],  # Start with 100 instances for now
    methods=['concorde', 'dataset_output'],
    progress=True
)
print_results_table(results_concorde_20, problem_size=20)

Evaluating Concorde on TSP-20 (this will take some time)...
Processing instance 0/100...
Processing instance 10/100...
Processing instance 10/100...
Processing instance 20/100...
Processing instance 20/100...
Processing instance 30/100...
Processing instance 30/100...
Processing instance 40/100...
Processing instance 40/100...
Processing instance 50/100...
Processing instance 50/100...
Processing instance 60/100...
Processing instance 60/100...
Processing instance 70/100...
Processing instance 70/100...
Processing instance 80/100...
Processing instance 80/100...
Processing instance 90/100...
Processing instance 90/100...

TSP-20 Evaluation Results
Method            Avg Length  Avg Gap (%)    Avg Ratio Success Rate
------------------------------------------------------------
concorde              3.8779         0.00       1.0000      100.00%
dataset_output        4.2836        10.47       1.1047      100.00%


TSP-20 Evaluation Results
Method            Avg Length  Avg Gap (%)    Avg Ra

## Train Supervised Baseline
Simplified training loop for now

In [29]:
# load training data (using mixed size data for better generalization)
print("Loading training data...")
train_files = list((TSP520_TRAIN_DIR).glob('*.txt'))
print(f"Found {len(train_files)} training files")

train_data = []
for file in train_files:  
    train_data.extend(load_tsp_data(str(file)))

print(f"Loaded {len(train_data)} training instances")

# fileter to same size for batching (use TSP-20 instances)
train_data_20 = [(coords, tour) for coords, tour in train_data if len(coords) == 20]
print(f"Filtered to {len(train_data_20)} TSP-20 instances for training")

# create dataset and dataloader
train_dataset = TSPDataset(train_data_20)
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)

Loading training data...
Found 16 training files
Loaded 1600000 training instances
Filtered to 100000 TSP-20 instances for training
Loaded 1600000 training instances
Filtered to 100000 TSP-20 instances for training


In [30]:
# init model
model = SupervisedTSPModel(input_dim=2, hidden_dim=HIDDEN_DIM)
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)
criterion = nn.CrossEntropyLoss()

print(f"Model parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"Model device: {next(model.parameters()).device}")

Model parameters: 33,537
Model device: cpu


In [31]:
# training loop
print("\nTraining supervised model...")
model.train()
losses = []

for epoch in range(NUM_EPOCHS):
    epoch_losses = []
    
    for batch_idx, (coords, tours) in enumerate(train_loader):
        # Move data to device
        coords = coords.to(DEVICE)
        tours = tours.to(DEVICE)
        
        optimizer.zero_grad()
        
        # Forward pass
        logits = model(coords)  # (batch, seq_len)
        
        # predict the first node in the tour
        # simplificaiton: just predict which node should be visited first
        target = tours[:, 0] - 1  # First node in tour, convert to 0-indexed
        batch_loss = criterion(logits, target)
        
        # Backward pass
        batch_loss.backward()
        optimizer.step()
        
        epoch_losses.append(batch_loss.item())
        
        if batch_idx % 10 == 0:
            print(f"Epoch {epoch+1}/{NUM_EPOCHS}, Batch {batch_idx}/{len(train_loader)}, Loss: {batch_loss.item():.4f}")
    
    avg_loss = np.mean(epoch_losses)
    losses.append(avg_loss)
    print(f"Epoch {epoch+1} completed. Average loss: {avg_loss:.4f}")

print("\nTraining completed!")


Training supervised model...
Epoch 1/10, Batch 0/3125, Loss: 2.9932
Epoch 1/10, Batch 10/3125, Loss: 2.9896
Epoch 1/10, Batch 20/3125, Loss: 3.0021
Epoch 1/10, Batch 30/3125, Loss: 2.9915
Epoch 1/10, Batch 40/3125, Loss: 2.9964
Epoch 1/10, Batch 50/3125, Loss: 2.9875
Epoch 1/10, Batch 60/3125, Loss: 2.9989
Epoch 1/10, Batch 70/3125, Loss: 2.9901
Epoch 1/10, Batch 40/3125, Loss: 2.9964
Epoch 1/10, Batch 50/3125, Loss: 2.9875
Epoch 1/10, Batch 60/3125, Loss: 2.9989
Epoch 1/10, Batch 70/3125, Loss: 2.9901
Epoch 1/10, Batch 80/3125, Loss: 3.0078
Epoch 1/10, Batch 90/3125, Loss: 2.9960
Epoch 1/10, Batch 100/3125, Loss: 2.9964
Epoch 1/10, Batch 110/3125, Loss: 3.0044
Epoch 1/10, Batch 80/3125, Loss: 3.0078
Epoch 1/10, Batch 90/3125, Loss: 2.9960
Epoch 1/10, Batch 100/3125, Loss: 2.9964
Epoch 1/10, Batch 110/3125, Loss: 3.0044
Epoch 1/10, Batch 120/3125, Loss: 2.9931
Epoch 1/10, Batch 130/3125, Loss: 2.9963
Epoch 1/10, Batch 140/3125, Loss: 2.9953
Epoch 1/10, Batch 150/3125, Loss: 2.9943
Epo

In [32]:
# save model
torch.save(model.state_dict(), 'supervised_tsp_model.pt')
print("Model saved to supervised_tsp_model.pt")

Model saved to supervised_tsp_model.pt


## Evaluate Supervised Baseline

In [33]:
# quick test on single instance
coords, dataset_tour = tsp20_test[0]
supervised_tour, supervised_length = greedy_decode(model, coords)
concorde_tour, concorde_length = concorde_tsp(coords)

print(f"Supervised: {supervised_length:.4f}")
print(f"Concorde (optimal): {concorde_length:.4f}")
print(f"Gap: {((supervised_length - concorde_length) / concorde_length * 100):.2f}%")

Supervised: 6.9455
Concorde (optimal): 3.5178
Gap: 97.44%


## Comprehensive Evaluation

In [34]:
# full evaluation on TSP-5
print("\nComprehensive evaluation on TSP-5...")
results_tsp5 = evaluate_baselines(
    tsp5_test[:500],
    methods=['supervised', 'christofides', 'ortools', 'concorde', 'dataset_output'],
    supervised_model=model,
    progress=True
)
print_results_table(results_tsp5, problem_size=5)


Comprehensive evaluation on TSP-5...
Processing instance 0/500...
Processing instance 10/500...
Processing instance 20/500...
Processing instance 30/500...
Processing instance 40/500...
Processing instance 50/500...
Processing instance 30/500...
Processing instance 40/500...
Processing instance 50/500...
Processing instance 60/500...
Processing instance 70/500...
Processing instance 80/500...
Processing instance 60/500...
Processing instance 70/500...
Processing instance 80/500...
Processing instance 90/500...
Processing instance 100/500...
Processing instance 110/500...
Processing instance 120/500...
Processing instance 90/500...
Processing instance 100/500...
Processing instance 110/500...
Processing instance 120/500...
Processing instance 130/500...
Processing instance 140/500...
Processing instance 150/500...
Processing instance 130/500...
Processing instance 140/500...
Processing instance 150/500...
Processing instance 160/500...
Processing instance 170/500...
Processing instance

In [35]:
# full evaluation on TSP-10
print("\nComprehensive evaluation on TSP-10...")
results_tsp10 = evaluate_baselines(
    tsp10_test[:500],
    methods=['supervised', 'christofides', 'ortools', 'concorde', 'dataset_output'],
    supervised_model=model,
    progress=True
)
print_results_table(results_tsp10, problem_size=10)


Comprehensive evaluation on TSP-10...
Processing instance 0/500...
Processing instance 10/500...
Processing instance 20/500...
Processing instance 30/500...
Processing instance 20/500...
Processing instance 30/500...
Processing instance 40/500...
Processing instance 50/500...
Processing instance 40/500...
Processing instance 50/500...
Processing instance 60/500...
Processing instance 70/500...
Processing instance 60/500...
Processing instance 70/500...
Processing instance 80/500...
Processing instance 90/500...
Processing instance 80/500...
Processing instance 90/500...
Processing instance 100/500...
Processing instance 110/500...
Processing instance 100/500...
Processing instance 110/500...
Processing instance 120/500...
Processing instance 130/500...
Processing instance 120/500...
Processing instance 130/500...
Processing instance 140/500...
Processing instance 150/500...
Processing instance 140/500...
Processing instance 150/500...
Processing instance 160/500...
Processing instance

In [36]:
# full evaluation on TSP-20
print("\nComprehensive evaluation on TSP-20...")
results_tsp20 = evaluate_baselines(
    tsp20_test[:500],
    methods=['supervised', 'christofides', 'ortools', 'concorde', 'dataset_output'],
    supervised_model=model,
    progress=True
)
print_results_table(results_tsp20, problem_size=20)


Comprehensive evaluation on TSP-20...
Processing instance 0/500...
Processing instance 10/500...
Processing instance 10/500...
Processing instance 20/500...
Processing instance 20/500...
Processing instance 30/500...
Processing instance 30/500...
Processing instance 40/500...
Processing instance 40/500...
Processing instance 50/500...
Processing instance 50/500...
Processing instance 60/500...
Processing instance 60/500...
Processing instance 70/500...
Processing instance 70/500...
Processing instance 80/500...
Processing instance 80/500...
Processing instance 90/500...
Processing instance 90/500...
Processing instance 100/500...
Processing instance 100/500...
Processing instance 110/500...
Processing instance 110/500...
Processing instance 120/500...
Processing instance 120/500...
Processing instance 130/500...
Processing instance 130/500...
Processing instance 140/500...
Processing instance 140/500...
Processing instance 150/500...
Processing instance 150/500...
Processing instance 

In [37]:
# full evaluation on TSP-50 (smaller sample due to computational cost)
print("\nComprehensive evaluation on TSP-50...")
results_tsp50 = evaluate_baselines(
    tsp50_test[:100],
    methods=['supervised', 'christofides', 'ortools', 'concorde', 'dataset_output'],
    supervised_model=model,
    progress=True
)
print_results_table(results_tsp50, problem_size=50)


Comprehensive evaluation on TSP-50...
Processing instance 0/100...
Processing instance 10/100...
Processing instance 10/100...
Processing instance 20/100...
Processing instance 20/100...
Processing instance 30/100...
Processing instance 30/100...
Processing instance 40/100...
Processing instance 40/100...
Processing instance 50/100...
Processing instance 50/100...
Processing instance 60/100...
Processing instance 60/100...
Processing instance 70/100...
Processing instance 70/100...
Processing instance 80/100...
Processing instance 80/100...
Processing instance 90/100...
Processing instance 90/100...

TSP-50 Evaluation Results
Method            Avg Length  Avg Gap (%)    Avg Ratio Success Rate
------------------------------------------------------------
supervised           16.5614       191.68       2.9168      100.00%
christofides          6.3437        11.61       1.1161      100.00%
ortools               5.8381         2.73       1.0273      100.00%
concorde              5.6834    

## Result Summary

1. **Christofides Algorithm**: Provides 1.5-approximation guarantee, typically within 1-2% of optimal
2. **OR-Tools**: Google's optimization solver with guided local search, is very effective (closest to concorde!)
3. **Concorde**: Result from Concorde solver, is the best
4. **Supervised Learning**: Neural network trained on optimal tours, achieves good results when trained on sufficient data
5. **Dataset output**: 'Optimal' tour values from the given test dataset

## Timing Evaluation

Measure execution time for each baseline method across different TSP sizes.

In [38]:
from measure_times import measure_and_return_times

# This will measure execution times for all baseline methods
# on TSP-5, TSP-10, TSP-20, and TSP-50 (100 instances each)
print("Measuring execution times for all baselines...")
print("This may take several minutes...\n")

timing_df = measure_and_return_times()

Measuring execution times for all baselines...
This may take several minutes...


Measuring execution times for baseline methods


--- Processing TSP-5 ---
Loaded 100 test instances
Processing instance 0/100...
Processing instance 10/100...
Processing instance 20/100...
Processing instance 30/100...
Processing instance 40/100...
Processing instance 50/100...
Processing instance 60/100...
Processing instance 30/100...
Processing instance 40/100...
Processing instance 50/100...
Processing instance 60/100...
Processing instance 70/100...
Processing instance 80/100...
Processing instance 90/100...

TSP-5 Average Times (seconds per instance):
  concorde            : 0.000002s
  christofides        : 0.001137s
  ortools             : 0.002555s
  dataset_output      : 0.000001s

--- Processing TSP-10 ---
Processing instance 70/100...
Processing instance 80/100...
Processing instance 90/100...

TSP-5 Average Times (seconds per instance):
  concorde            : 0.000002s
  christofides        

In [40]:
# Save timing results for use in report
timing_df.to_csv('timing_results.csv', index=False)
print("Timing results saved to timing_results.csv")

Timing results saved to timing_results.csv
