# DCM Model Estimation on Google Colab

This notebook runs Discrete Choice Models (MNL, MXL, HCM, ICLV) on Google Colab.

## Available Models
| Model | Description | Est. Time |
|-------|-------------|----------|
| MNL Basic | Multinomial Logit | 2-3 min |
| MNL Demographics | MNL with interactions | 3-5 min |
| MXL Basic | Mixed Logit (random coefficients) | 10-15 min |
| HCM Basic | Hybrid Choice Model (1 latent variable) | 15-20 min |
| HCM Full | HCM with 4 latent variables | 25-35 min |
| ICLV | Integrated Choice & Latent Variable | 30-45 min |

## Instructions
1. Run cells 1-3 to set up the environment
2. Select your model from the dropdown
3. Run estimation
4. Download results

In [None]:
# Cell 1: Environment Setup
# ========================
import sys

# Check if running in Colab
IN_COLAB = 'google.colab' in sys.modules
print(f"Running in Colab: {IN_COLAB}")

# Check available RAM
try:
    import psutil
    ram_gb = psutil.virtual_memory().total / (1024**3)
    print(f"Available RAM: {ram_gb:.1f} GB")
    if ram_gb < 8:
        print("WARNING: Low RAM - consider upgrading to Colab Pro for complex models")
except:
    print("Could not check RAM (psutil not available)")

# Install dependencies
print("\nInstalling dependencies...")
!pip install biogeme>=3.3.1 pandas numpy scipy scikit-learn matplotlib -q
print("Dependencies installed successfully!")

In [None]:
# Cell 2: Clone Repository from GitHub
# ====================================
import os

# IMPORTANT: Replace with your GitHub username/repo
GITHUB_USER = "YOUR_USERNAME"  # <-- CHANGE THIS
REPO_NAME = "dcm-models"       # <-- CHANGE THIS if different
BRANCH = "main"

REPO_URL = f"https://github.com/{GITHUB_USER}/{REPO_NAME}.git"

# Clone if not already cloned
if not os.path.exists(REPO_NAME):
    print(f"Cloning repository from {REPO_URL}...")
    !git clone --depth 1 -b {BRANCH} {REPO_URL}
    print("Repository cloned successfully!")
else:
    print(f"Repository '{REPO_NAME}' already exists. Pulling latest changes...")
    os.chdir(REPO_NAME)
    !git pull
    os.chdir('..')

# Navigate to models directory
MODELS_DIR = f"/content/{REPO_NAME}/models"
print(f"\nModels directory: {MODELS_DIR}")
print(f"Available models: {[d for d in os.listdir(MODELS_DIR) if os.path.isdir(os.path.join(MODELS_DIR, d)) and not d.startswith('.')]}")

In [None]:
# Cell 3: Model Selection (Interactive)
# =====================================
import ipywidgets as widgets
from IPython.display import display, clear_output

# Model dropdown
model_dropdown = widgets.Dropdown(
    options=[
        ('MNL Basic (2-3 min)', 'mnl_basic'),
        ('MNL Demographics (3-5 min)', 'mnl_demographics'),
        ('MXL Basic (10-15 min)', 'mxl_basic'),
        ('HCM Basic (15-20 min)', 'hcm_basic'),
        ('HCM Full (25-35 min)', 'hcm_full'),
        ('ICLV (30-45 min)', 'iclv'),
    ],
    value='mnl_basic',
    description='Model:',
    style={'description_width': '100px'}
)

# Options checkboxes
regenerate_data = widgets.Checkbox(
    value=True,
    description='Regenerate simulated data',
    style={'description_width': 'auto'}
)

run_estimation = widgets.Checkbox(
    value=True,
    description='Run model estimation',
    style={'description_width': 'auto'}
)

# Display widgets
print("Select model and options:")
print("="*40)
display(model_dropdown)
display(regenerate_data)
display(run_estimation)
print("\nThen run the next cell to start estimation.")

In [None]:
# Cell 4: Run Model Pipeline
# ==========================
import subprocess
import time
import os

# Get selected model
MODEL = model_dropdown.value
MODEL_DIR = f"/content/{REPO_NAME}/models/{MODEL}"

print("="*60)
print(f"RUNNING: {MODEL.upper()}")
print("="*60)

# Change to model directory
os.chdir(MODEL_DIR)
print(f"Working directory: {os.getcwd()}")

start_time = time.time()

# Step 1: Regenerate data (optional)
if regenerate_data.value:
    print("\n[Step 1/2] Generating simulated data...")
    result = subprocess.run(
        ['python3', 'simulate_full_data.py'],
        capture_output=True, text=True
    )
    if result.returncode != 0:
        print(f"ERROR: {result.stderr}")
    else:
        # Print last few lines of output
        lines = result.stdout.strip().split('\n')
        for line in lines[-10:]:
            print(f"  {line}")
        print("  Data generated successfully!")

# Step 2: Run estimation
if run_estimation.value:
    print("\n[Step 2/2] Estimating model (this may take a while)...")
    print("-"*60)
    
    result = subprocess.run(
        ['python3', 'model.py'],
        capture_output=True, text=True
    )
    
    if result.returncode != 0:
        print(f"ERROR: {result.stderr}")
    else:
        # Print output (last 3000 chars to avoid truncation)
        print(result.stdout[-3000:])

elapsed = time.time() - start_time
print("\n" + "="*60)
print(f"COMPLETED in {elapsed/60:.1f} minutes ({elapsed:.0f} seconds)")
print("="*60)

In [None]:
# Cell 5: Display Results
# =======================
import pandas as pd
from IPython.display import display

MODEL_DIR = f"/content/{REPO_NAME}/models/{model_dropdown.value}"

print("\nPARAMETER ESTIMATES")
print("="*60)
try:
    params = pd.read_csv(f'{MODEL_DIR}/results/parameter_estimates.csv')
    display(params)
except FileNotFoundError:
    print("Results file not found - check estimation output above")

print("\nWTP ANALYSIS")
print("="*60)
try:
    wtp = pd.read_csv(f'{MODEL_DIR}/policy_analysis/wtp_results.csv')
    display(wtp)
except FileNotFoundError:
    print("WTP results not found")

print("\nMARKET SHARES")
print("="*60)
try:
    shares = pd.read_csv(f'{MODEL_DIR}/policy_analysis/market_shares.csv')
    display(shares)
except FileNotFoundError:
    print("Market share results not found")

In [None]:
# Cell 6: Download Results
# ========================
from google.colab import files
import shutil
import os

MODEL = model_dropdown.value
MODEL_DIR = f"/content/{REPO_NAME}/models/{MODEL}"

print(f"Packaging results for {MODEL}...")

# Create ZIP archives
archives = []

# Results folder
if os.path.exists(f'{MODEL_DIR}/results'):
    archive_path = f'/content/{MODEL}_results'
    shutil.make_archive(archive_path, 'zip', MODEL_DIR, 'results')
    archives.append(f'{archive_path}.zip')
    print(f"  Created: {MODEL}_results.zip")

# Policy analysis folder
if os.path.exists(f'{MODEL_DIR}/policy_analysis'):
    archive_path = f'/content/{MODEL}_policy'
    shutil.make_archive(archive_path, 'zip', MODEL_DIR, 'policy_analysis')
    archives.append(f'{archive_path}.zip')
    print(f"  Created: {MODEL}_policy.zip")

# LaTeX output folder
if os.path.exists(f'{MODEL_DIR}/output/latex'):
    archive_path = f'/content/{MODEL}_latex'
    shutil.make_archive(archive_path, 'zip', f'{MODEL_DIR}/output', 'latex')
    archives.append(f'{archive_path}.zip')
    print(f"  Created: {MODEL}_latex.zip")

# Download all archives
print("\nDownloading files...")
for archive in archives:
    try:
        files.download(archive)
    except Exception as e:
        print(f"  Could not download {archive}: {e}")

## Troubleshooting

### Common Issues

1. **"Module not found" error**
   - Re-run Cell 1 to install dependencies

2. **"Repository not found" error**
   - Check that GITHUB_USER and REPO_NAME are correct in Cell 2
   - Ensure your repository is public

3. **Session timeout**
   - Complex models (HCM Full, ICLV) may take 30+ minutes
   - Consider Colab Pro for longer timeouts

4. **Out of memory**
   - Go to Runtime > Change runtime type > High-RAM
   - Or upgrade to Colab Pro

### Getting Help
- Check the model output above for error messages
- Ensure all files were properly uploaded to GitHub