# RL-WMATA: Metro Station Placement Optimization
## Google Colab Setup with GPU Support

This notebook sets up and runs the RL-WMATA project on Google Colab with GPU acceleration.


## 1. Setup and Installation


In [1]:
# Check GPU availability
import torch
if torch.cuda.is_available():
    print(f"‚úÖ GPU Available: {torch.cuda.get_device_name(0)}")
    print(f"   CUDA Version: {torch.version.cuda}")
    print(f"   GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    DEVICE = 'cuda'
else:
    print("‚ö†Ô∏è  GPU not available, using CPU")
    DEVICE = 'cpu'


‚úÖ GPU Available: NVIDIA L4
   CUDA Version: 12.6
   GPU Memory: 23.80 GB


In [2]:
# Install dependencies
!pip install -q stable-baselines3[extra] gymnasium geopandas osmnx networkx pandas numpy matplotlib tqdm tensorboard Pillow shapely fiona pyproj rtree

# Verify installation
import stable_baselines3
import gymnasium
print(f"‚úÖ Stable-Baselines3: {stable_baselines3.__version__}")
print(f"‚úÖ Gymnasium: {gymnasium.__version__}")


[?25l   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/101.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m101.5/101.5 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/187.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m187.2/187.2 kB[0m [31m16.8 MB/s[0m eta [36m0:00:00[0m
[?25h‚úÖ Stable-Baselines3: 2.7.0
‚úÖ Gymnasium: 1.2.2


Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
  return datetime.utcnow().replace(tzinfo=utc)


## 2. Upload Project Files

Upload your project files or clone from GitHub


In [None]:
# Option 1: Upload project files using Colab's file upload
# Then uncomment:
# %cd /content/RL-WMATA

# Option 2: Clone from GitHub (if you have a repo)
# !git clone https://github.com/yourusername/RL-WMATA.git
# %cd RL-WMATA


## 3. Upload Data Files


In [11]:
# Option 1: Mount Google Drive (if data is in Drive)
from google.colab import drive

drive.mount('/content/drive', force_remount=True)

# Copy data from Drive to working directory
# !cp -r /content/drive/MyDrive/RL-WMATA/data ./data


Mounted at /content/drive


In [12]:
import os
os.listdir('/content/drive/')


['.shortcut-targets-by-id', 'MyDrive', 'Shareddrives', '.Trash-0']

In [14]:
# Option 2: Upload data folder directly
from google.colab import files
import zipfile
import os

os.chdir('/content/drive/MyDrive/RL')
# Upload data.zip file
# uploaded = files.upload()

# Extract if needed
# for filename in uploaded.keys():
#     if filename.endswith('.zip'):
#         with zipfile.ZipFile(filename, 'r') as zip_ref:
#             zip_ref.extractall('.')
#         print(f"‚úÖ Extracted {filename}")


### 4.1. Validate Candidates Data (IMPORTANT!)

**‚ö†Ô∏è Critical Step**: Check if your candidates data has correct coordinates. If all candidates have the same coordinates, training won't work properly!


In [15]:
# Check candidates data for issues
!python check_candidates_data.py


Candidates Data Diagnostic

üìÑ Checking GeoJSON: data/prepared/candidates_final.geojson
   Total candidates: 86
   Columns: ['population', 'NAME', 'state', 'county', 'tract', 'index_right', 'bbox_west', 'bbox_south', 'bbox_east', 'bbox_north', 'place_id', 'osm_type', 'osm_id', 'lat', 'lon', 'class', 'type', 'place_rank', 'importance', 'addresstype', 'name', 'display_name', 'distance_to_nearest_station', 'candidate_id', 'geometry']
   ‚úÖ Has lat/lon columns
   Lat range: 38.895037 to 38.895037
   Lon range: -77.036543 to -77.036543
   Unique coordinates: 1/86
   Sample duplicates:
  candidate_id        lat        lon
0           C0  38.895037 -77.036543
1           C1  38.895037 -77.036543
2           C2  38.895037 -77.036543
3           C3  38.895037 -77.036543
4           C4  38.895037 -77.036543
5           C5  38.895037 -77.036543
6           C6  38.895037 -77.036543
7           C7  38.895037 -77.036543
8           C8  38.895037 -77.036543
9           C9  38.895037 -77.036543
   

### 4.2. Fix Candidates Data (if needed)

If the check above shows duplicate coordinates, run this to fix it:


In [16]:
# Fix candidates data (regenerates CSV from GeoJSON with correct coordinates)
!python fix_candidates_data.py


Fixing Candidates Data

üìÑ Loading GeoJSON: data/prepared/candidates_final.geojson
   Loaded 86 candidates
   Extracting coordinates from geometry...
   ‚úÖ Extracted lat/lon from geometry

üìä Coordinate Analysis:
   Total candidates: 86
   Unique coordinates: 86
   Duplicate rate: 0.0%
   ‚úÖ Good coordinate diversity

üìç Coordinate Ranges:
   Lat: 38.800181 to 38.976988
   Lon: -77.110532 to -76.924897

üíæ Saving to CSV: data/prepared/candidates_final.csv
   ‚úÖ Saved 86 candidates to CSV

‚úÖ Verification:
   CSV has 86 candidates
   Unique coordinates in CSV: 86
   Sample from CSV:
  candidate_id        lat        lon  population
0           C0  38.927786 -77.110532        5979
1           C1  38.865206 -76.978971        5568
2           C2  38.883599 -77.056161        5308
3           C3  38.800181 -77.032056        5144
4           C4  38.819554 -77.005104        4919

‚úÖ Candidates data fixed!

Next steps:
1. Retrain your models with the fixed data
2. The models should 

### 4.3. Run Data Improvements


## 4. Data Validation and Preparation

**Important**: Before training, validate your candidates data to ensure coordinates are correct!


In [17]:
# Step 1: Check candidates data for issues
print("üîç Checking candidates data...")
!python check_candidates_data.py

# Step 2: Fix candidates data if needed (regenerates CSV from GeoJSON)
print("\nüîß Fixing candidates data (if needed)...")
!python fix_candidates_data.py

# Step 3: Run data improvement setup
print("\nüìä Running data improvements...")
!python setup_data_improvements.py


üîç Checking candidates data...
Candidates Data Diagnostic

üìÑ Checking GeoJSON: data/prepared/candidates_final.geojson
   Total candidates: 86
   Columns: ['population', 'NAME', 'state', 'county', 'tract', 'index_right', 'bbox_west', 'bbox_south', 'bbox_east', 'bbox_north', 'place_id', 'osm_type', 'osm_id', 'lat', 'lon', 'class', 'type', 'place_rank', 'importance', 'addresstype', 'name', 'display_name', 'distance_to_nearest_station', 'candidate_id', 'geometry']
   ‚úÖ Has lat/lon columns
   Lat range: 38.895037 to 38.895037
   Lon range: -77.036543 to -77.036543
   Unique coordinates: 1/86
   Sample duplicates:
  candidate_id        lat        lon
0           C0  38.895037 -77.036543
1           C1  38.895037 -77.036543
2           C2  38.895037 -77.036543
3           C3  38.895037 -77.036543
4           C4  38.895037 -77.036543
5           C5  38.895037 -77.036543
6           C6  38.895037 -77.036543
7           C7  38.895037 -77.036543
8           C8  38.895037 -77.036543
9      

## 5. Train Models with GPU

**Note:** The training scripts automatically:
- Detect GPU availability
- Use 4 parallel environments with DummyVecEnv (optimized for CPU-bound envs)
- Adjust settings for small test runs
- Test environment before training
- Provide progress diagnostics
- **Performance optimized**: 30-60x faster than before!

**Recommended timesteps:**
- **Test run**: 1000-10000 timesteps (~1-2 minutes on GPU)
- **Short training**: 100,000 timesteps (~15-25 minutes on GPU)
- **Full training**: 1,000,000 timesteps (~2-3 hours on GPU)

**‚ö†Ô∏è Important**: Make sure you fixed the candidates data (Step 4) before long training!


In [None]:
# Train PPO with GPU (automatically detected)
# Performance: ~1-2 min for 10k timesteps, ~15-25 min for 100k timesteps

# Quick test (recommended first - verifies everything works)
# print("üß™ Running quick test (1000 timesteps)...")
# !python -m agents.train_ppo --total_timesteps 1000

# Short training (good for initial results)
# !python -m agents.train_ppo --total_timesteps 100000

# Full training (best results - takes 2-3 hours)
!python -m agents.train_ppo --total_timesteps 1000000


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
[2KDEBUG Coverage: covered_population=15365, total_population=273272, 
coverage=0.056
[2KDEBUG Coverage: placed_indices=[0]
[2KDEBUG Coverage: min_distances range: 0.0m to 16836.2m
[2KDEBUG Coverage: covered_mask sum: 1/86 candidates
[2KDEBUG Coverage: catchment_radius=800m
[2KDEBUG Coverage: covered_population=5979, total_population=273272, coverage=0.022
[2KDEBUG Coverage: placed_indices=[0]
[2KDEBUG Coverage: min_distances range: 0.0m to 16836.2m
[2KDEBUG Coverage: covered_mask sum: 1/86 candidates
[2KDEBUG Coverage: catchment_radius=800m
[2KDEBUG Coverage: covered_population=5979, total_population=273272, coverage=0.022
[2KDEBUG Coverage: placed_indices=[0 1]
[2KDEBUG Coverage: min_distances range: 0.0m to 9540.8m
[2KDEBUG Coverage: covered_mask sum: 3/86 candidates
[2KDEBUG Coverage: catchment_radius=800m
[2KDEBUG Coverage: covered_population=15365, total_population=273272, 
coverage=0.056
[2KDEBUG C

In [None]:
# Train DQN with GPU (automatically detected)
# Performance: ~1-2 min for 10k timesteps, ~15-25 min for 100k timesteps

# Quick test (recommended first - verifies everything works)
# print("üß™ Running quick test (1000 timesteps)...")
# !python -m agents.train_dqn --total_timesteps 1000

# Short training (good for initial results)
# !python -m agents.train_dqn --total_timesteps 100000

# Full training (best results - takes 2-3 hours)
!python -m agents.train_dqn --total_timesteps 1000000


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
[2KDEBUG Coverage: covered_population=9386, total_population=273272, coverage=0.034
[2KDEBUG Coverage: placed_indices=[1]
[2KDEBUG Coverage: min_distances range: 0.0m to 13994.6m
[2KDEBUG Coverage: covered_mask sum: 2/86 candidates
[2KDEBUG Coverage: catchment_radius=800m
[2KDEBUG Coverage: covered_population=9386, total_population=273272, coverage=0.034
[2KDEBUG Coverage: placed_indices=[1]
[2KDEBUG Coverage: min_distances range: 0.0m to 13994.6m
[2KDEBUG Coverage: covered_mask sum: 2/86 candidates
[2KDEBUG Coverage: catchment_radius=800m
[2KDEBUG Coverage: covered_population=9386, total_population=273272, coverage=0.034
[2KDEBUG Coverage: placed_indices=[ 1 35]
[2KDEBUG Coverage: min_distances range: 0.0m to 13994.6m
[2KDEBUG Coverage: covered_mask sum: 3/86 candidates
[2KDEBUG Coverage: catchment_radius=800m
[2KDEBUG Coverage: covered_population=12671, total_population=273272, 
coverage=0.046
[2KDEBUG 

## 6. Monitor Training with TensorBoard


In [None]:
# Load TensorBoard extension
%load_ext tensorboard
%tensorboard --logdir ./logs/


<IPython.core.display.Javascript object>

  return datetime.utcnow().replace(tzinfo=utc)


## 7. Evaluate and Visualize


In [None]:
# Test models and create visualizations
print("üìä Testing PPO model...")
!python test_PPO.py

print("\nüìä Testing DQN model...")
!python test_DQN.py

# Verify candidates data is correct (should show different coordinates)
print("\nüîç Verifying candidates data...")
!python check_candidates_data.py

print("\nüìà Creating comparison visualization...")
!python visualize_results.py

print("\nüé¨ Creating training GIFs...")
!python create_training_gif.py --model both


## 8. Download Results


In [None]:
# Download trained models and visualizations
from google.colab import files

# Download models
files.download('models/ppo_station_placement.zip')
files.download('models/dqn_station_placement.zip')

# Download visualizations
files.download('visualizations/station_placements.png')

# Download training GIFs (if created)
files.download('visualizations/ppo_training.gif')
files.download('visualizations/dqn_training.gif')
