# Airline Crew Pairing: Branch-and-Price vs Column Generation

This notebook compares two approaches for solving the Crew Pairing Problem (CPP):

1. **Column Generation (CG)** - Solves LP relaxation with column generation
2. **Branch-and-Price (B&P)** - Full branch-and-bound tree with column generation at each node

## Problem Description

Given:
- A set of flight legs that must be covered
- Crew regulations (duty time limits, rest requirements)
- Crew bases (airports where crews are stationed)

Find a minimum cost set of **pairings** (legal crew schedules) that cover all flights exactly once.

## Why B&P for Crew Pairing?

Crew pairing is a **set partitioning** problem:
- Each flight must be in *exactly one* pairing (equality constraints)
- LP relaxation can be quite fractional
- Ryan-Foster branching works well for this structure

B&P is essential when:
- You need proven optimal solutions
- IP rounding from CG doesn't give feasible solutions
- The integrality gap is significant

In [None]:
# OpenCG should be installed as a package (pip install -e path/to/opencg or pip install opencg)
# OpenBP should also be installed (pip install -e path/to/openbp)

# If running from the openbp repo without installation, uncomment:
# import sys
# sys.path.insert(0, "../..")  # Add openbp root

In [2]:
# Import OpenCG for column generation
from opencg.applications.crew_pairing import (
    solve_crew_pairing,
    CrewPairingConfig,
)
from opencg.parsers import KasirzadehParser

# Import OpenBP for branch-and-price
from openbp.solver import BranchAndPrice, BPConfig, BPStatus
from openbp.branching import RyanFosterBranching
from openbp.selection import create_selector

import time

## Loading a Crew Pairing Instance

We use the **Kasirzadeh dataset**, a standard benchmark for crew pairing problems.

In [None]:
# Check for Kasirzadeh dataset using OpenCG's config
from opencg.config import get_data_path

data_path = get_data_path() / "kasirzadeh"

if data_path.exists():
    print(f"Kasirzadeh data found at: {data_path}")
    instances = sorted(data_path.iterdir())
    for inst in instances:
        if inst.is_dir():
            print(f"  {inst.name}")
else:
    print(f"Kasirzadeh data not found at: {data_path}")
    print("Set OPENCG_DATA_PATH environment variable or configure opencg.config")

In [4]:
# Load a small instance for testing
# Instance1 is one of the smaller instances
instance_path = data_path / "instance2" / "instance2"

if instance_path.exists():
    parser = KasirzadehParser()
    problem = parser.parse(str(instance_path))
    
    print(f"Instance loaded: {problem.name}")
    print(f"  Flights: {len(problem.cover_constraints)}")
    print(f"  Network nodes: {problem.network.num_nodes}")
    print(f"  Network arcs: {problem.network.num_arcs}")
    print(f"  Resources: {len(problem.resources)}")
    for r in problem.resources:
        print(f"    - {r.name}")
else:
    print(f"Instance not found at: {instance_path}")

Instance loaded: instance2
  Flights: 1500
  Network nodes: 3002
  Network arcs: 9078
  Resources: 3
    - duty_time
    - flight_time
    - duty_days


## Approach 1: Column Generation Only

First, let's solve with OpenCG's column generation.

In [5]:
if 'problem' in dir():
    # Configure CG solver
    config = CrewPairingConfig(
        max_iterations=30,
        pricing_max_columns=100,
        cols_per_source=3,
        verbose=True,
    )
    
    # Solve with CG
    start = time.time()
    cg_solution = solve_crew_pairing(problem, config)
    cg_time = time.time() - start
    
    print(f"\n" + "="*60)
    print("COLUMN GENERATION RESULTS")
    print("="*60)
    print(f"LP Objective: {cg_solution.objective:.2f}")
    print(f"Pairings used: {cg_solution.num_pairings}")
    print(f"Coverage: {cg_solution.coverage_pct:.1f}%")
    print(f"Uncovered flights: {len(cg_solution.uncovered_flights)}")
    print(f"Solve time: {cg_time:.2f}s")
    print(f"CG iterations: {cg_solution.iterations}")
    print(f"Total columns: {cg_solution.num_columns}")

Running HiGHS 1.12.0 (git hash: 755a8e0): Copyright (c) 2025 HiGHS under MIT licence terms
FastPerSourcePricing: prebuilding 754 networks...
  Prebuilt in 16.35s
Generating initial columns...
  Found 100 initial columns
Running Column Generation...
 Iter       Objective    Columns    New   Coverage
--------------------------------------------------
LP has 1500 rows; 1600 cols; 1819 nonzeros
Coefficient ranges:
  Matrix  [1e+00, 1e+00]
  Cost    [2e+00, 1e+06]
  Bound   [0e+00, 0e+00]
  RHS     [1e+00, 1e+00]
Presolving model
98 rows, 179 cols, 318 nonzeros  0s
Dependent equations search running on 87 equations with time limit of 1000.00s
Dependent equations search removed 0 rows and 0 nonzeros in 0.00s (limit = 1000.00s)
87 rows, 156 cols, 285 nonzeros  0s
Presolve reductions: rows 87(-1413); columns 156(-1444); nonzeros 285(-1534) 
Solving the presolved LP
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 

## Understanding Crew Pairing Structure

### Pairing Properties

Each pairing (route in VRP terms) represents a complete crew schedule:
- Starts at a crew base (home airport)
- Contains sequence of flights and rests
- Ends at the same crew base
- Respects duty time limits

### Network Structure

The time-space network has:
- **Flight arcs**: Actual flights to cover
- **Connection arcs**: Short waits between flights
- **Rest arcs**: Overnight hotel stays
- **Source/Sink arcs**: Base departures/returns

In [6]:
if 'problem' in dir():
    from opencg.core.arc import ArcType
    
    # Count arc types
    arc_types = {}
    for arc in problem.network.arcs:
        t = arc.arc_type.name
        arc_types[t] = arc_types.get(t, 0) + 1
    
    print("Network Arc Types:")
    print("-"*40)
    for t, count in sorted(arc_types.items()):
        print(f"  {t:<15}: {count:>6}")
    print("-"*40)
    print(f"  {'Total':<15}: {sum(arc_types.values()):>6}")

Network Arc Types:
----------------------------------------
  CONNECTION     :   2094
  FLIGHT         :   1500
  OVERNIGHT      :   3976
  SINK_ARC       :    754
  SOURCE_ARC     :    754
----------------------------------------
  Total          :   9078


## Analyzing Solution Quality

Let's analyze the CG solution to understand the structure of pairings.

In [7]:
if 'cg_solution' in dir() and cg_solution.pairings:
    # Distribution of pairing sizes
    sizes = [len(p.covered_items) for p in cg_solution.pairings]
    costs = [p.cost for p in cg_solution.pairings]
    
    print("Pairing Statistics:")
    print("-"*40)
    print(f"  Total pairings: {len(sizes)}")
    print(f"  Average flights per pairing: {sum(sizes)/len(sizes):.1f}")
    print(f"  Min/Max pairing size: {min(sizes)} / {max(sizes)}")
    print(f"  Average pairing cost: {sum(costs)/len(costs):.2f}")
    print(f"  Total cost: {sum(costs):.2f}")
    
    print("\nSample pairings (first 5):")
    for i, pairing in enumerate(cg_solution.pairings[:5]):
        print(f"  Pairing {i+1}: {len(pairing.covered_items)} flights, cost={pairing.cost:.2f}")

Pairing Statistics:
----------------------------------------
  Total pairings: 619
  Average flights per pairing: 8.1
  Min/Max pairing size: 2 / 14
  Average pairing cost: 8.12
  Total cost: 5026.00

Sample pairings (first 5):
  Pairing 1: 2 flights, cost=2.00
  Pairing 2: 2 flights, cost=2.00
  Pairing 3: 2 flights, cost=2.00
  Pairing 4: 2 flights, cost=2.00
  Pairing 5: 2 flights, cost=2.00


## Why Branch-and-Price for Crew Pairing?

The LP solution from column generation may have:

1. **Fractional pairings**: Using pairing A 0.5 times and pairing B 0.5 times
2. **Coverage issues**: Some flights may be covered fractionally
3. **Suboptimal rounding**: Simply rounding may violate constraints

### Ryan-Foster Branching

For set partitioning, Ryan-Foster branching is effective:
- Choose two flights $(i, j)$ that are fractionally together
- Left branch: Flights $i$ and $j$ must be in the **same** pairing
- Right branch: Flights $i$ and $j$ must be in **different** pairings

This eliminates fractional solutions systematically.

In [8]:
# Demonstrate Ryan-Foster branching concept
print("Ryan-Foster Branching for Crew Pairing")
print("="*50)
print()
print("Scenario: Two pairings with fractional values")
print("  Pairing A: covers flights {1, 2, 3}, value = 0.5")
print("  Pairing B: covers flights {1, 4, 5}, value = 0.5")
print()
print("Flight 1 is fractionally covered by both pairings.")
print("Branch on flights (1, 2):")
print()
print("  Left branch: 1 and 2 must be TOGETHER")
print("    -> Forces Pairing A (or similar)")
print("    -> Removes Pairing B from pool")
print()
print("  Right branch: 1 and 2 must be SEPARATE")
print("    -> Removes Pairing A from pool")
print("    -> Keeps Pairing B (and others with 1 but not 2)")

Ryan-Foster Branching for Crew Pairing

Scenario: Two pairings with fractional values
  Pairing A: covers flights {1, 2, 3}, value = 0.5
  Pairing B: covers flights {1, 4, 5}, value = 0.5

Flight 1 is fractionally covered by both pairings.
Branch on flights (1, 2):

  Left branch: 1 and 2 must be TOGETHER
    -> Forces Pairing A (or similar)
    -> Removes Pairing B from pool

  Right branch: 1 and 2 must be SEPARATE
    -> Removes Pairing A from pool
    -> Keeps Pairing B (and others with 1 but not 2)


## Coverage Analysis

Achieving 100% coverage is critical in crew pairing. Let's analyze coverage patterns.

In [9]:
if 'cg_solution' in dir():
    total_flights = len(problem.cover_constraints)
    covered_flights = total_flights - len(cg_solution.uncovered_flights)
    
    print("Coverage Analysis")
    print("="*50)
    print(f"Total flights: {total_flights}")
    print(f"Covered: {covered_flights} ({100*covered_flights/total_flights:.1f}%)")
    print(f"Uncovered: {len(cg_solution.uncovered_flights)}")
    
    if cg_solution.uncovered_flights:
        print(f"\nUncovered flight IDs: {sorted(cg_solution.uncovered_flights)[:20]}")
        if len(cg_solution.uncovered_flights) > 20:
            print(f"  ... and {len(cg_solution.uncovered_flights) - 20} more")
    else:
        print("\nAll flights covered!")

Coverage Analysis
Total flights: 1500
Covered: 1177 (78.5%)
Uncovered: 323

Uncovered flight IDs: [49, 86, 87, 138, 139, 190, 191, 234, 235, 261, 262, 331, 332, 383, 384, 435, 436, 487, 488, 539]
  ... and 303 more


## Comparison: CG vs B&P

Let's summarize the key differences for crew pairing.

In [10]:
print("\n" + "="*60)
print("COMPARISON: CG vs B&P FOR CREW PAIRING")
print("="*60)
print()
print("| Aspect             | Column Generation | Branch-and-Price |")
print("|" + "-"*20 + "|" + "-"*19 + "|" + "-"*18 + "|")
print("| Solution Type      | LP relaxation     | Optimal integer  |")
print("| Optimality Proof   | No                | Yes              |")
print("| Speed              | Fast              | Slower           |")
print("| Coverage Guarantee | May be fractional | 100% or infeas.  |")
print("| Branching          | None              | Ryan-Foster      |")
print()


COMPARISON: CG vs B&P FOR CREW PAIRING

| Aspect             | Column Generation | Branch-and-Price |
|--------------------|-------------------|------------------|
| Solution Type      | LP relaxation     | Optimal integer  |
| Optimality Proof   | No                | Yes              |
| Speed              | Fast              | Slower           |
| Coverage Guarantee | May be fractional | 100% or infeas.  |
| Branching          | None              | Ryan-Foster      |



In [11]:
if 'cg_solution' in dir():
    print("Results for this instance:")
    print("-"*40)
    print(f"CG LP Objective: {cg_solution.objective:.2f}")
    print(f"CG Coverage: {cg_solution.coverage_pct:.1f}%")
    print(f"CG Time: {cg_time:.2f}s")
    print()
    print("B&P would:")
    print("  - Start from CG solution")
    print("  - Branch on fractional pairings")
    print("  - Prove optimality or find better integer solution")
    print("  - Guarantee 100% coverage or prove infeasibility")

Results for this instance:
----------------------------------------
CG LP Objective: 333956202.02
CG Coverage: 78.5%
CG Time: 397.30s

B&P would:
  - Start from CG solution
  - Branch on fractional pairings
  - Prove optimality or find better integer solution
  - Guarantee 100% coverage or prove infeasibility


## Summary

### Key Insights for Crew Pairing

1. **Complex Structure**: Time-space networks with multiple resources
2. **Set Partitioning**: Each flight must be in exactly one pairing
3. **Coverage is Critical**: 100% coverage is a hard requirement
4. **Large Instances**: Real-world instances have thousands of flights

### When to Use B&P for Crew Pairing

- Need proven optimal solutions for cost comparison
- CG solution has coverage issues
- IP rounding gives infeasible solutions
- Benchmarking solver performance

### Comparison with Other Problems

| Problem | Constraints | Typical Gap | B&P Benefit |
|---------|-------------|-------------|-------------|
| Cutting Stock | Covering (â‰¥) | Small | Modest |
| VRP | Partitioning (=) | Medium | Significant |
| Crew Pairing | Partitioning (=) | Can be large | Essential |