# Minimizing Travel Distance League-Wide

**Contributor: Ari Kolahal, Aryan Sehgal, Barrett Ratzlaff, Zhe-Yu Lin**

The following notebook demonstrates our process building an optimization model that aims to minimize the distance traveled by teams in Major League Baseball given a set period of time. First, we establish a distance matrix between airports associated with each team.

In [None]:
!pip install airportsdata
!pip install gurobipy
!pip install geopy



In [None]:
import airportsdata
from geopy.distance import geodesic
import pandas as pd
import numpy as np
import gurobipy as gp
from gurobipy import GRB

In [None]:
airports = airportsdata.load('IATA')

# one airport for each team
na_airports = ['BOS', 'JFK', 'LGA', 'ORD', 'MDW', 'YYZ', 'TPA', 'BWI', 'PHX',
               'SMF', 'ATL', 'CVG', 'CLE', 'DEN', 'DTW', 'IAH', 'MCI', 'LAX',
               'LGB', 'SNA', 'MIA', 'MKE', 'MSP', 'PHL', 'PIT', 'SFO', 'SAN', 'SEA',
               'STL', 'DFW', 'DCA']

selected_airports = {code: airports[code] for code in na_airports if code in airports}

airport_coords = {}
for code, data in selected_airports.items():
    airport_coords[code] = (data['lat'], data['lon'])


distances = {}
for code1 in na_airports:
    distances[code1] = {}
    for code2 in na_airports:
        if code1 == code2:
            distances[code1][code2] = 0
        else:
            coord1 = airport_coords[code1]
            coord2 = airport_coords[code2]
            # Calculate distance in miles
            dist = geodesic(coord1, coord2).miles
            distances[code1][code2] = round(dist, 2)

distance_df = pd.DataFrame(distances)

Next, we build a dataframe associating MLB teams to an airport, a league, and a division.

In [None]:
# connecting teams to airports, divisions, league

team_links = {'Boston Red Sox': ['BOS','AL East', 'AL'],
                'New York Yankees': ['JFK','AL East','AL'],
                'New York Mets': ['LGA','NL East', 'NL'],
                'Chicago White Sox': ['ORD','AL Central', 'AL'],
                'Chicago Cubs': ['MDW','NL Central','NL'],
                'Toronto Blue Jays': ['YYZ','AL East','AL'],
                'Tampa Bay Rays': ['TPA','AL East','AL'],
                'Baltimore Orioles': ['BWI','AL East', 'AL'],
                'Arizona Diamondbacks': ['PHX','NL West', 'NL'],
                'Athletics': ['SMF', 'AL West', 'AL'],
                'Atlanta Braves': ['ATL', 'NL East', 'NL'],
                'Cincinnati Reds': ['CVG', 'NL Central', 'NL'],
                'Cleveland Guardians': ['CLE', 'AL Central', 'AL'],
                'Colorado Rockies': ['DEN', 'NL West', 'NL'],
                'Detroit Tigers': ['DTW','AL Central', 'AL'],
                'Houston Astros': ['IAH','AL West', 'AL'],
                'Kansas City Royals': ['MCI','AL Central', 'AL'],
                'Los Angeles Dodgers': ['LAX','NL West', 'NL'],
                'Los Angeles Angels': ['LGB','AL West', 'AL'],
                'Miami Marlins': ['MIA','NL East', 'NL'],
                'Milwaukee Brewers': ['MKE','NL Central', 'NL'],
                'Minnesota Twins': ['MSP','AL Central', 'AL'],
                'Philadelphia Phillies': ['PHL','NL East', 'NL'],
                'Pittsburgh Pirates': ['PIT','NL Central', 'NL'],
                'San Francisco Giants': ['SFO','NL West', 'NL'],
                'San Diego Padres': ['SAN','NL West', 'NL'],
                'Seattle Mariners': ['SEA','AL West', 'AL'],
                'St. Louis Cardinals': ['STL','NL Central', 'NL'],
                'Texas Rangers': ['DFW','AL West', 'AL'],
                'Washington Nationals': ['DCA','NL East', 'NL']}

tl_df = pd.DataFrame(team_links).T
tl_df.columns = ['Airport', 'Division', 'League']
tl_df = tl_df.reset_index().rename(columns={'index':'Team'})

In [None]:
# filtering for an AL-East only dataframe
al_e = tl_df[tl_df['Division'] == 'AL East']

# for calculating distance below
team_airport = dict(zip(tl_df['Team'], tl_df['Airport']))

def get_distance(team_i, team_j):
  airport_i = team_airport[team_i]
  airport_j = team_airport[team_j]
  return distance_df.loc[airport_i, airport_j]

# Slot model

We will use integer programming to tackle this problem.

For our base model, we will create a simple representation of time, instead of calendar days we will use "slots" that represent a series of three games between the same two teams.

To also reduce computational load initially, we will filter for a five team environment, all teams in the AL East (Boston Red Sox, New York Yankees, Baltimore Orioles, Tampa Bay Rays, Toronto Blue Jays).

Finally, we will use six slots to represent about a month of play.

**Decision Variables**

$$x_{ijs} \in \{0, 1\} \quad \forall i,j \in \{1,\ldots,n\}, \; s \in \{1,\ldots,m\}$$

where $x_{ijs} = 1$ if team $i$ plays at team $j$ in series slot $s$, and 0 otherwise.

**Constraint 1: No self games**

$$x_{iis} = 0 \quad \forall i \in \{1,\ldots,n\}, \; s \in \{1,\ldots,m\}$$

**Constraint 2: All teams must play exactly 6 series**

$$\sum_{j=1}^{n} \sum_{s=1}^{m} x_{ijs} + \sum_{j=1}^{n} \sum_{s=1}^{m} x_{jis} = 6 \quad \forall i \in \{1,\ldots,n\}$$

**Constraint 3: Each team plays at most one series per slot**

$$\sum_{\substack{j=1 \\ j \neq i}}^{n} x_{ijs} + \sum_{\substack{j=1 \\ j \neq i}}^{n} x_{jis} \leq 1 \quad \forall i \in \{1,\ldots,n\}, \; s \in \{1,\ldots,m\}$$

**Constraint 4: Minimum away series requirement**

$$\sum_{j=1}^{n} \sum_{s=1}^{m} x_{ijs} \geq 2 \quad \forall i \in \{1,\ldots,n\}$$

**Constraint 5: Minimum home series requirement**

$$\sum_{j=1}^{n} \sum_{s=1}^{m} x_{jis} \geq 2 \quad \forall i \in \{1,\ldots,n\}$$

**Constraint 6: No back-to-back series against the same opponent**

$$x_{ijs} + x_{ij(s+1)} \leq 1 \quad \forall i,j \in \{1,\ldots,n\}, \; i \neq j, \; s \in \{1,\ldots,m-1\}$$

where $n = 5$ (number of teams) and $m = 8$ (number of series slots).


In [None]:

dist_model = gp.Model('distance_model')

num_teams = 5
num_slots = 8

# decision variables - integers (i, j, s) 1 if team i plays team j in series slot s, 0 otherwise
x = {}
for i in range(num_teams):
    for j in range(num_teams):
        for s in range(num_slots):
            x[i,j,s] = dist_model.addVar(vtype=GRB.BINARY,
                                         name=f"team_{i}_vs_team_{j}_in_series_{s}")

# constraint 1 - no self games
for i in range(num_teams):
    for s in range(num_slots):
        dist_model.addConstr(x[i,i,s] == 0, name=f"no_self_play_team_{i}_{s}")

# constraint 2 - all teams must play 6 series.
for i in range(num_teams):
    dist_model.addConstr(
        gp.quicksum(x[i,j,s] for j in range(num_teams) for s in range(num_slots)) +
        gp.quicksum(x[j,i,s] for j in range(num_teams) for s in range(num_slots))
        == 6,
        name=f"total_series_team_{i}"
    )


# constraint 3 - host team can only host 1 opponent per slot
for i in range(num_teams):
    for s in range(num_slots):
        dist_model.addConstr(
            gp.quicksum(x[i,j,s] for j in range(num_teams) if j != i) +
            gp.quicksum(x[j,i,s] for j in range(num_teams) if j != i)
            <= 1,
            name=f"one_series_per_team_in_slot_{i}_{s}"
        )


# constraint 4 - Number of away series >= required number (set 2 for here)
for i in range(num_teams):
    dist_model.addConstr(
        gp.quicksum(x[i,j,s] for j in range(num_teams) for s in range(num_slots)) >= 2,
        name=f"min_away_team_{i}"
    )

# constraint 5 - Number of home series >= required number (set 2 for here)
for i in range(num_teams):
    dist_model.addConstr(
        gp.quicksum(x[j,i,s] for j in range(num_teams) for s in range(num_slots)) >= 2,
        name=f"min_home_team_{i}"
    )

# constraint 6 - no back-to-back series vs same opponent
for i in range(num_teams):
    for j in range(num_teams):
        if i != j:
            for s in range(num_slots - 1):
                dist_model.addConstr(
                    x[i,j,s] + x[i,j,s+1] <= 1,
                    name=f"no_back_to_back_{i}_{j}_{s}"
                )
dist_model.update()

For the basis of our models, the objective function looks something like this:

$$\min \sum_{i=1}^{n} \sum_{\substack{j=1 \\ j \neq i}}^{n} \sum_{s=1}^{m} d_{ij} \cdot x_{ijs}$$

Where:
- $n$ = number of teams
- $m$ = number of time slots
- $d_{ij}$ = distance between team $i$'s location and team $j$'s location
- $x_{ijs}$ = binary decision variable (1 if team $i$ plays at team $j$ in time slot $s$, 0 otherwise)

In [None]:
# Set objective function
team_names = list(al_e['Team'])

dist_model.setObjective(
    gp.quicksum(
        get_distance(team_names[i], team_names[j]) * x[i,j,s]
        for i in range(num_teams)
        for j in range(num_teams)
        if i != j
        for s in range(num_slots)
    ),
    GRB.MINIMIZE
)

In [None]:
# Run the model
dist_model.optimize()

Gurobi Optimizer version 12.0.3 build v12.0.3rc0 (mac64[rosetta2] - Darwin 24.6.0 24G90)

CPU model: Apple M1
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 235 rows, 200 columns and 1400 nonzeros
Model fingerprint: 0x21eadc56
Variable types: 0 continuous, 200 integer (200 binary)
Coefficient statistics:
  Matrix range     [1e+00, 2e+00]
  Objective range  [2e+02, 1e+03]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 6e+00]
Found heuristic solution: objective 9094.7400000
Presolve removed 40 rows and 40 columns
Presolve time: 0.01s
Presolved: 195 rows, 160 columns, 1240 nonzeros
Variable types: 0 continuous, 160 integer (160 binary)

Root relaxation: objective 7.969950e+03, 65 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0 7969.95000    0    6 9094.74000 7969.950

At first glance, the summative distance in our model's solution seems plausible. There are about 1,000 air miles between Tampa and Toronto (the farthest apart teams in this set); all five teams along the East Coast could manage to collectively travel just under 8,000 miles in with six opportunities to play.

The below code inspects the results to see what the travel looks like.

In [None]:
# Print the results
rows = []
for s in range(num_slots):
    for i in range(num_teams):
        for j in range(num_teams):
            if i != j and x[i,j,s].X > 0.5:
                rows.append([s, j, i, get_distance(team_names[i], team_names[j])])

df = pd.DataFrame(rows, columns=["Slot", "Home", "Away", "Distance"])
df

Unnamed: 0,Slot,Home,Away,Distance
0,0,0,1,186.6
1,0,4,2,346.59
2,1,1,0,186.6
3,1,3,4,842.28
4,2,3,2,1094.58
5,3,1,0,186.6
6,3,2,3,1094.58
7,4,1,2,366.26
8,4,4,3,842.28
9,5,1,0,186.6


In [None]:
# Constraints that we could consider adding: At least one series must be scheduled in each slot.
for s in range(num_slots):
    dist_model.addConstr(
        gp.quicksum(x[i,j,s] for i in range(num_teams) for j in range(num_teams) if i != j) >= 1,
        name=f"slot_used_{s}"
    )

This model is an okay start, but there are a few fundamental problems we will need to iterate over:
1. Our initial model has no tracking mechanism for teams. If a team were to travel to an away stadium for a series, then from there fly back home, this model would not count the distance traveled back home. It assumes the home stadium is always the origin point.
2. The slots do not yet accurately represent what we want them to, with some slots having an odd number of teams playing, which should be impossible.


The below represents a slot model that adds a location matrix to track where teams are based on the slots they occupy. This should give a more realistic representation of distance traveled.

# Slot model with true distance

In order to track where teams are and calculate the distance, we have to introduce two more **decision variables**:

$$
loc_{isk} \in \{0,1\} \  
$$

where $loc_{isk}=1$ if team $i$ stays at team $k's$ stadium in slot $s$, $0$ otherwise.


$$
travel_{is k_1 k_2} \in \{0,1\}
$$

where $travel_{is k_1 k_2}=1$ if team $i$ stays at team $k_1's$ stadium in slot $s$ and stays at team $k_2's$ stadium in slot $(s+1)$, $0$ otherwise.


Additionally, we have following **locational and linking constraints:**

**Location Constraint 1: Each team must stay in exactly one stadium each slot.**

$$
\sum_{k=1}^{n} loc_{isk} = 1,
\quad \forall i \in \{1,\ldots,n\}, \forall s \in \{1,\ldots,m\}
$$

**Location Constraint 2: If team $i$ is home, it must stay at home stadium.**

$$
loc_{isi}
\ge
\sum_{\substack{j=1 \\ j \neq i}}^{n}  x_{jis},
\quad \forall i \in \{1,\ldots,n\}, \forall s \in \{1,\ldots,m\}
$$

**Location Constraint 3: If team $i$ plays away at team $j$, they must stay at stadium $j$.**

$$
loc_{isj} \ge x_{ijs},
\quad \forall i,j \in \{1,\ldots,n\},\; j \ne i,\; \forall s \in \{1,\ldots,m\}
$$

**Linking Travel and Location (linearization):**
$$
\
\begin{align*}
travel_{i,s,k1,k2} &\le loc_{i,s,k1} \\
travel_{i,s,k1,k2} &\le loc_{i,s+1,k2} \\
travel_{i,s,k1,k2} &\ge loc_{i,s,k1} + loc_{i,s+1,k2} - 1,
\quad \forall i, k_1, k_2 \in \{1,\ldots,n\}, \forall s \in \{1,\ldots,m-1\}
\end{align*}
$$

In [None]:
# LOCATION VARIABLES
loc = dist_model.addVars(num_teams, num_slots, num_teams,
                         vtype=GRB.BINARY, name="loc")

for i in range(num_teams):
    for s in range(num_slots):

        # if play -> exactly 1 location; if not play -> 0 location
        dist_model.addConstr(
            gp.quicksum(loc[i,s,k] for k in range(num_teams)) == 1 # game[i,s]
        )

        # if home -> must allow choosing home stadium
        dist_model.addConstr(
            loc[i,s,i] >= gp.quicksum(x[j,i,s] for j in range(num_teams) if j != i)
        )

        # if away -> must allow choosing opponent stadium
        for j in range(num_teams):
            if j != i:
                dist_model.addConstr(
                    loc[i,s,j] >= x[i,j,s]
                )

travel = dist_model.addVars(
    num_teams, num_slots - 1, num_teams, num_teams,
    vtype=GRB.BINARY, name="travel"
)

for i in range(num_teams):
    for s in range(num_slots - 1):
        for k1 in range(num_teams):
            for k2 in range(num_teams):

                dist_model.addConstr(travel[i,s,k1,k2] <= loc[i,s,k1])
                dist_model.addConstr(travel[i,s,k1,k2] <= loc[i,s+1,k2])
                dist_model.addConstr(
                    travel[i,s,k1,k2] >= loc[i,s,k1] + loc[i,s+1,k2] - 1
                )


For our travel distance model, the **objective functions** would be

$$
\min
\sum_{i=1}^{n}
\sum_{s=1}^{m}
\sum_{k_1=1}^{n}
\sum_{k_2=1}^{n}
travel_{i s k_1 k_2} \cdot d_{k_1 k_2}
$$

Where:
- $n$ = number of teams
- $m$ = number of time slots
- $d_{ij}$ = distance between team $i$'s location and team $j$'s location
- $travel_{i,s, k_1 k_2}$ = binary decision variable (1 if team $i$ plays at stadium $k_1$ in time slot $s$ and plays at stadium $k_2$ in time slot $(s+1)$, 0 otherwise)

In [None]:
team_names = list(al_e['Team'])

dist_model.setObjective(
    gp.quicksum(
        travel[i,s,k1,k2] * get_distance(team_names[k1], team_names[k2])
        for i in range(num_teams)
        for s in range(num_slots - 1)
        for k1 in range(num_teams)
        for k2 in range(num_teams)
    ),
    GRB.MINIMIZE
)



In [None]:
dist_model.optimize()

Gurobi Optimizer version 12.0.3 build v12.0.3rc0 (mac64[rosetta2] - Darwin 24.6.0 24G90)

CPU model: Apple M1
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 3108 rows, 1275 columns and 8405 nonzeros
Model fingerprint: 0x97634a49
Variable types: 0 continuous, 1275 integer (1275 binary)
Coefficient statistics:
  Matrix range     [1e+00, 2e+00]
  Objective range  [2e+02, 1e+03]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 6e+00]

MIP start from previous solve produced solution with objective 10523.5 (0.02s)
Loaded MIP start from previous solve with objective 10523.5

Presolve removed 1965 rows and 215 columns
Presolve time: 0.02s
Presolved: 1143 rows, 1060 columns, 4220 nonzeros
Variable types: 0 continuous, 1060 integer (1060 binary)

Root relaxation: objective 0.000000e+00, 477 iterations, 0.01 seconds (0.01 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth 

The minimized distance traveled by all teams is halved by our updated model, which is surprising. The below code shows representations of what the schedules would look like under this solution.

In [None]:
# Series Schedule Table
rows = []
for i in range(num_teams):
    for s in range(num_slots):
        for j in range(num_teams):
            if x[i,j,s].X > 0.5:
                rows.append([team_names[i], s, team_names[j]])

schedule_df = pd.DataFrame(rows, columns=["Away_Team","Slot","Opponent"])
schedule_df.sort_values(["Away_Team","Slot"], inplace=True)
schedule_df

Unnamed: 0,Away_Team,Slot,Opponent
12,Baltimore Orioles,1,Toronto Blue Jays
13,Baltimore Orioles,5,Tampa Bay Rays
14,Baltimore Orioles,7,Tampa Bay Rays
0,Boston Red Sox,0,New York Yankees
1,Boston Red Sox,3,New York Yankees
2,New York Yankees,4,Boston Red Sox
3,New York Yankees,6,Boston Red Sox
8,Tampa Bay Rays,0,Toronto Blue Jays
9,Tampa Bay Rays,1,New York Yankees
10,Tampa Bay Rays,2,Baltimore Orioles


The below shows the schedule organized by team and slot, giving a clearer picture of where each team's path of travel is.

In [None]:
rows = []
for i in range(num_teams):
    for j in range(num_teams):
        for s in range(num_slots):
            if x[i,j,s].X > 0.5:
                rows.append([team_names[i], team_names[j], s, "AWAY"])
            if x[j,i,s].X > 0.5:
                rows.append([team_names[i], team_names[j], s, "HOME"])

schedule_df = pd.DataFrame(rows, columns=["Team","Opponent","Slot","Home/Away"])
schedule_df.sort_values(["Team","Slot"], inplace=True)
schedule_df.head(12)


Unnamed: 0,Team,Opponent,Slot,Home/Away
24,Baltimore Orioles,Toronto Blue Jays,1,AWAY
26,Baltimore Orioles,Tampa Bay Rays,2,HOME
25,Baltimore Orioles,Toronto Blue Jays,3,HOME
27,Baltimore Orioles,Tampa Bay Rays,4,HOME
28,Baltimore Orioles,Tampa Bay Rays,5,AWAY
29,Baltimore Orioles,Tampa Bay Rays,7,AWAY
0,Boston Red Sox,New York Yankees,0,AWAY
1,Boston Red Sox,New York Yankees,3,AWAY
2,Boston Red Sox,New York Yankees,4,HOME
4,Boston Red Sox,Toronto Blue Jays,5,HOME


The below table shows the path of travel for a team according to their schedules series of play.

In [None]:
# Travel Table
travel_rows = []
for i in range(num_teams):
    for s in range(num_slots - 1):
        for j1 in range(num_teams):
            for j2 in range(num_teams):
                if travel[i,s,j1,j2].X > 0.5:
                    travel_rows.append([
                        team_names[i],
                        s,
                        team_names[j1],
                        team_names[j2],
                        get_distance(team_names[j1], team_names[j2])
                    ])

travel_df = pd.DataFrame(
    travel_rows,
    columns=["Team","Slot","From","To","Distance"]
)
travel_df.head(14)

Unnamed: 0,Team,Slot,From,To,Distance
0,Boston Red Sox,0,New York Yankees,New York Yankees,0.0
1,Boston Red Sox,1,New York Yankees,New York Yankees,0.0
2,Boston Red Sox,2,New York Yankees,New York Yankees,0.0
3,Boston Red Sox,3,New York Yankees,Boston Red Sox,186.6
4,Boston Red Sox,4,Boston Red Sox,Boston Red Sox,0.0
5,Boston Red Sox,5,Boston Red Sox,Boston Red Sox,0.0
6,Boston Red Sox,6,Boston Red Sox,Boston Red Sox,0.0
7,New York Yankees,0,New York Yankees,New York Yankees,0.0
8,New York Yankees,1,New York Yankees,New York Yankees,0.0
9,New York Yankees,2,New York Yankees,New York Yankees,0.0


The below shows a straightforward representation of each team's route.

In [None]:
# Route Summary
print("\n=== LOCATION ROUTE (by slot) ===\n")
for i in range(num_teams):
    route = []
    for s in range(num_slots):
        for k in range(num_teams):
            if loc[i,s,k].X > 0.5:
                route.append(f"{s}:{team_names[k]}")
                break
    print(f"{team_names[i]}: " + " → ".join(route))


=== LOCATION ROUTE (by slot) ===

Boston Red Sox: 0:New York Yankees → 1:New York Yankees → 2:New York Yankees → 3:New York Yankees → 4:Boston Red Sox → 5:Boston Red Sox → 6:Boston Red Sox → 7:Boston Red Sox
New York Yankees: 0:New York Yankees → 1:New York Yankees → 2:New York Yankees → 3:New York Yankees → 4:Boston Red Sox → 5:Boston Red Sox → 6:Boston Red Sox → 7:Boston Red Sox
Toronto Blue Jays: 0:Toronto Blue Jays → 1:Toronto Blue Jays → 2:New York Yankees → 3:Baltimore Orioles → 4:Boston Red Sox → 5:Boston Red Sox → 6:Boston Red Sox → 7:Boston Red Sox
Tampa Bay Rays: 0:Toronto Blue Jays → 1:New York Yankees → 2:Baltimore Orioles → 3:Baltimore Orioles → 4:Baltimore Orioles → 5:Tampa Bay Rays → 6:Tampa Bay Rays → 7:Tampa Bay Rays
Baltimore Orioles: 0:Toronto Blue Jays → 1:Toronto Blue Jays → 2:Baltimore Orioles → 3:Baltimore Orioles → 4:Baltimore Orioles → 5:Tampa Bay Rays → 6:Tampa Bay Rays → 7:Tampa Bay Rays


Our distance matrix reveals that our constraints may be too loose, or our five-team scenario is too restrictive. When looking at the order of opponents, there are many repeat opponents, sometimes shuttling between two teams' stadiums to play the same opponent. Perhaps by expanding the model to the fifteen teams in the American League, we can give the model more options to choose from, thereby making the schedule more dynamic.

# Slot model with 15 teams

We'll start with the same constraints, but expanding to the entire American League. Aside from the number of the teams inserted, the model is the same.

In [None]:
al = tl_df[tl_df['League'] == 'AL']

In [None]:
dist_model = gp.Model('distance_model')

num_teams = 15
num_slots = 8

# decision variables - integers (i, j, s) 1 if team i plays team j in series slot s, 0 otherwise
x = {}
for i in range(num_teams):
    for j in range(num_teams):
        for s in range(num_slots):
            x[i,j,s] = dist_model.addVar(vtype=GRB.BINARY,
                                         name=f"team_{i}_vs_team_{j}_in_series_{s}")

# constraint 1 - no self games
for i in range(num_teams):
    for s in range(num_slots):
        dist_model.addConstr(x[i,i,s] == 0, name=f"no_self_play_team_{i}_{s}")

# constraint 2 - all teams must play 8 series. (but seems to be impossible? I change the number to 6 for now.)
for i in range(num_teams):
    dist_model.addConstr(
        gp.quicksum(x[i,j,s] for j in range(num_teams) for s in range(num_slots)) +
        gp.quicksum(x[j,i,s] for j in range(num_teams) for s in range(num_slots))
        >= 6,
        name=f"total_series_team_{i}"
    )

# constraint 3 - host team can only host 1 opponent per slot
for i in range(num_teams):
    for s in range(num_slots):
        dist_model.addConstr(
            gp.quicksum(x[i,j,s] for j in range(num_teams) if j != i) +
            gp.quicksum(x[j,i,s] for j in range(num_teams) if j != i)
            <= 1,
            name=f"one_series_per_team_in_slot_{i}_{s}"
        )



# constraint 4 - Number of away series >= required number (set 2 for here)
for i in range(num_teams):
    dist_model.addConstr(
        gp.quicksum(x[i,j,s] for j in range(num_teams) for s in range(num_slots)) >= 2,
        name=f"min_away_team_{i}"
    )

# constraint 5 - Number of home series >= required number (set 2 for here)
for i in range(num_teams):
    dist_model.addConstr(
        gp.quicksum(x[j,i,s] for j in range(num_teams) for s in range(num_slots)) >= 2,
        name=f"min_home_team_{i}"
    )

# constraint 6 - no back-to-back series vs same opponent
for i in range(num_teams):
    for j in range(num_teams):
        if i != j:
            for s in range(num_slots - 1):
                dist_model.addConstr(
                    x[i,j,s] + x[i,j,s+1] <= 1,
                    name=f"no_back_to_back_{i}_{j}_{s}"
                )


dist_model.update()

In [None]:
# LOCATION VARIABLES
loc = dist_model.addVars(num_teams, num_slots, num_teams,
                         vtype=GRB.BINARY, name="loc")

for i in range(num_teams):
    for s in range(num_slots):

        # if play -> exactly 1 location; if not play -> 0 location
        dist_model.addConstr(
            gp.quicksum(loc[i,s,k] for k in range(num_teams)) == 1 # game[i,s]
        )

        # if home -> must allow choosing home stadium
        dist_model.addConstr(
            loc[i,s,i] >= gp.quicksum(x[j,i,s] for j in range(num_teams) if j != i)
        )

        # if away -> must allow choosing opponent stadium
        for j in range(num_teams):
            if j != i:
                dist_model.addConstr(
                    loc[i,s,j] >= x[i,j,s]
                )

travel = dist_model.addVars(
    num_teams, num_slots - 1, num_teams, num_teams,
    vtype=GRB.BINARY, name="travel"
)

for i in range(num_teams):
    for s in range(num_slots - 1):
        for k1 in range(num_teams):
            for k2 in range(num_teams):

                dist_model.addConstr(travel[i,s,k1,k2] <= loc[i,s,k1])
                dist_model.addConstr(travel[i,s,k1,k2] <= loc[i,s+1,k2])
                dist_model.addConstr(
                    travel[i,s,k1,k2] >= loc[i,s,k1] + loc[i,s+1,k2] - 1
                )


In [None]:
team_names = list(al['Team'])

dist_model.setObjective(
    gp.quicksum(
        travel[i,s,k1,k2] * get_distance(team_names[k1], team_names[k2])
        for i in range(num_teams)
        for s in range(num_slots - 1)
        for k1 in range(num_teams)
        for k2 in range(num_teams)
    ),
    GRB.MINIMIZE
)



In [None]:
dist_model.setParam("TimeLimit", 1200)   # run at most 20 minutes
dist_model.setParam("MIPGap", 0.20)     # stop when gap <= 20%
dist_model.setParam("MIPFocus", 1) # focus on optimality
dist_model.optimize()

Set parameter TimeLimit to value 1200
Set parameter MIPGap to value 0.2
Set parameter MIPFocus to value 1
Gurobi Optimizer version 12.0.3 build v12.0.3rc0 (mac64[rosetta2] - Darwin 24.6.0 24G90)

CPU model: Apple M1
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Non-default parameters:
TimeLimit  1200
MIPGap  0.2
MIPFocus  1

Optimize a model with 74550 rows, 27225 columns and 185835 nonzeros
Model fingerprint: 0x14c4e5a6
Variable types: 0 continuous, 27225 integer (27225 binary)
Coefficient statistics:
  Matrix range     [1e+00, 2e+00]
  Objective range  [1e+02, 3e+03]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 6e+00]
Presolve removed 48945 rows and 1695 columns
Presolve time: 0.31s
Presolved: 25605 rows, 25530 columns, 86130 nonzeros
Variable types: 0 continuous, 25530 integer (25530 binary)

Root relaxation: objective 0.000000e+00, 5394 iterations, 0.48 seconds (0.95 work units)

    Nodes    |    Current Node    |     Objective Bounds

This model results in ~12,000 cumulative miles traveled. This is much more than previous models, but does it address the problem with repeat opponents? We'll strictly look at the route map to check.

In [None]:
# Route Summary
print("\n=== LOCATION ROUTE (by slot) ===\n")
for i in range(num_teams):
    route = []
    for s in range(num_slots):
        for k in range(num_teams):
            if loc[i,s,k].X > 0.5:
                route.append(f"{s}:{team_names[k]}")
                break
    print(f"{team_names[i]}: " + " → ".join(route))


=== LOCATION ROUTE (by slot) ===

Boston Red Sox: 0:Boston Red Sox → 1:Boston Red Sox → 2:Boston Red Sox → 3:Boston Red Sox → 4:Boston Red Sox → 5:New York Yankees → 6:Baltimore Orioles → 7:New York Yankees
New York Yankees: 0:Boston Red Sox → 1:Boston Red Sox → 2:Boston Red Sox → 3:Boston Red Sox → 4:Boston Red Sox → 5:New York Yankees → 6:New York Yankees → 7:New York Yankees
Chicago White Sox: 0:Minnesota Twins → 1:Minnesota Twins → 2:Minnesota Twins → 3:Chicago White Sox → 4:Chicago White Sox → 5:Kansas City Royals → 6:Kansas City Royals → 7:Kansas City Royals
Toronto Blue Jays: 0:Toronto Blue Jays → 1:Toronto Blue Jays → 2:Toronto Blue Jays → 3:Toronto Blue Jays → 4:Toronto Blue Jays → 5:Toronto Blue Jays → 6:Cleveland Guardians → 7:Detroit Tigers
Tampa Bay Rays: 0:Tampa Bay Rays → 1:Tampa Bay Rays → 2:Tampa Bay Rays → 3:Tampa Bay Rays → 4:Tampa Bay Rays → 5:Baltimore Orioles → 6:New York Yankees → 7:Baltimore Orioles
Baltimore Orioles: 0:Tampa Bay Rays → 1:Tampa Bay Rays → 2:Tam

It's clear the problem is still the same with 15 teams, highlighting the need for stronger constraints. Distance is minimized, but this schedule doesn't accurately reflect a true MLB schedule. In the sensitivity analysis, we'll adjust the constraints in efforts to minimize distance *and* have a more realistic schedule.

# Sensitivity Analysis

In this model we add three constraints:

**Constraint 7: Limit total series between any pair of teams**

$$\sum_{s=1}^{m} (x_{ijs} + x_{jis}) \leq K \quad \forall i,j \in \{1,\ldots,n\}, \; i < j$$

where $K = 2$ (maximum number of series between any two teams).

**Constraint 8: No more than 3 consecutive home series**

$$\sum_{j=1}^{n} \sum_{k=0}^{3} x_{ji(s+k)} \leq 3 \quad \forall i \in \{1,\ldots,n\}, \; s \in \{1,\ldots,m-3\}$$

**Constraint 9: No more than 3 consecutive away series**

$$\sum_{j=1}^{n} \sum_{k=0}^{3} x_{ij(s+k)} \leq 3 \quad \forall i \in \{1,\ldots,n\}, \; s \in \{1,\ldots,m-3\}$$

In [None]:
dist_model = gp.Model('distance_model')

num_teams = 15
num_slots = 8

# decision variables - integers (i, j, s) 1 if team i plays team j in series slot s, 0 otherwise
x = {}
for i in range(num_teams):
    for j in range(num_teams):
        for s in range(num_slots):
            x[i,j,s] = dist_model.addVar(vtype=GRB.BINARY,
                                         name=f"team_{i}_vs_team_{j}_in_series_{s}")

# constraint 1 - no self games
for i in range(num_teams):
    for s in range(num_slots):
        dist_model.addConstr(x[i,i,s] == 0, name=f"no_self_play_team_{i}_{s}")

# constraint 2 - all teams must play 8 series. (but seems to be impossible? I change the number to 6 for now.)
for i in range(num_teams):
    dist_model.addConstr(
        gp.quicksum(x[i,j,s] for j in range(num_teams) for s in range(num_slots)) +
        gp.quicksum(x[j,i,s] for j in range(num_teams) for s in range(num_slots))
        >= 6,
        name=f"total_series_team_{i}"
    )

# constraint 3 - host team can only host 1 opponent per slot
for i in range(num_teams):
    for s in range(num_slots):
        dist_model.addConstr(
            gp.quicksum(x[i,j,s] for j in range(num_teams) if j != i) +
            gp.quicksum(x[j,i,s] for j in range(num_teams) if j != i)
            <= 1,
            name=f"one_series_per_team_in_slot_{i}_{s}"
        )



# constraint 4 - Number of away series >= required number (set 2 for here)
for i in range(num_teams):
    dist_model.addConstr(
        gp.quicksum(x[i,j,s] for j in range(num_teams) for s in range(num_slots)) >= 2,
        name=f"min_away_team_{i}"
    )

# constraint 5 - Number of home series >= required number (set 3 for here)
for i in range(num_teams):
    dist_model.addConstr(
        gp.quicksum(x[j,i,s] for j in range(num_teams) for s in range(num_slots)) >= 3,
        name=f"min_home_team_{i}"
    )

# constraint 6 - no back-to-back series vs same opponent
for i in range(num_teams):
    for j in range(i+1, num_teams):
        for s in range(num_slots - 2):  # Note: -2 to look at 3 consecutive slots
            dist_model.addConstr(
                x[i,j,s] + x[j,i,s] + x[i,j,s+1] + x[j,i,s+1] + x[i,j,s+2] + x[j,i,s+2] <= 1,
                name=f"no_consecutive_{i}_{j}_{s}"
            )


# constraint 7 - limit the total number of series between any pair of teams to at most K
# (set K=2 for here)
K = 2
for i in range(num_teams):
    for j in range(i+1, num_teams):
        dist_model.addConstr(
            gp.quicksum(x[i,j,s] + x[j,i,s] for s in range(num_slots)) <= K
        )

# constraint 8 - no more than 3 consecutive home series
for i in range(num_teams):
    for s in range(num_slots - 3):  # Look at 4 consecutive slots
        dist_model.addConstr(
            gp.quicksum(x[j,i,s+k] for j in range(num_teams) for k in range(4)) <= 3,
            name=f"max_3_consecutive_home_{i}_{s}"
        )

# constraint 9 - no more than 3 consecutive away series
for i in range(num_teams):
    for s in range(num_slots - 3):  # Look at 4 consecutive slots
        dist_model.addConstr(
            gp.quicksum(x[i,j,s+k] for j in range(num_teams) for k in range(4)) <= 3,
            name=f"max_3_consecutive_away_{i}_{s}"
        )

dist_model.update()

In [None]:
# LOCATION VARIABLES
loc = dist_model.addVars(num_teams, num_slots, num_teams,
                         vtype=GRB.BINARY, name="loc")

for i in range(num_teams):
    for s in range(num_slots):

        # if play -> exactly 1 location; if not play -> 0 location
        dist_model.addConstr(
            gp.quicksum(loc[i,s,k] for k in range(num_teams)) == 1 # game[i,s]
        )

        # if home -> must allow choosing home stadium
        dist_model.addConstr(
            loc[i,s,i] >= gp.quicksum(x[j,i,s] for j in range(num_teams) if j != i)
        )

        # if away -> must allow choosing opponent stadium
        for j in range(num_teams):
            if j != i:
                dist_model.addConstr(
                    loc[i,s,j] >= x[i,j,s]
                )

        # NEW CONSTRAINT: If team i is at stadium k (where k != i),
        # they must be playing away at team k
        for k in range(num_teams):
            if k != i:
                dist_model.addConstr(
                    loc[i,s,k] <= x[i,k,s],
                    name=f"away_location_{i}_{s}_{k}"
                )

travel = dist_model.addVars(
    num_teams, num_slots - 1, num_teams, num_teams,
    vtype=GRB.BINARY, name="travel"
)

for i in range(num_teams):
    for s in range(num_slots - 1):
        for k1 in range(num_teams):
            for k2 in range(num_teams):

                dist_model.addConstr(travel[i,s,k1,k2] <= loc[i,s,k1])
                dist_model.addConstr(travel[i,s,k1,k2] <= loc[i,s+1,k2])
                dist_model.addConstr(
                    travel[i,s,k1,k2] >= loc[i,s,k1] + loc[i,s+1,k2] - 1
                )


In [None]:
team_names = list(al['Team'])

dist_model.setObjective(
    gp.quicksum(
        travel[i,s,k1,k2] * get_distance(team_names[k1], team_names[k2])
        for i in range(num_teams)
        for s in range(num_slots - 1)
        for k1 in range(num_teams)
        for k2 in range(num_teams)
    ),
    GRB.MINIMIZE
)



In [None]:
dist_model.setParam("TimeLimit", 1200)   # run at most 20 minutes
dist_model.setParam("MIPGap", 0.20)     # stop when gap <= 20%
dist_model.setParam("MIPFocus", 1) # focus on optimality
dist_model.optimize()

Set parameter TimeLimit to value 1200
Set parameter MIPGap to value 0.2
Set parameter MIPFocus to value 1
Gurobi Optimizer version 12.0.3 build v12.0.3rc0 (mac64[rosetta2] - Darwin 24.6.0 24G90)

CPU model: Apple M1
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Non-default parameters:
TimeLimit  1200
MIPGap  0.2
MIPFocus  1

Optimize a model with 75645 rows, 27225 columns and 200715 nonzeros
Model fingerprint: 0xe22f3b16
Variable types: 0 continuous, 27225 integer (27225 binary)
Coefficient statistics:
  Matrix range     [1e+00, 2e+00]
  Objective range  [1e+02, 3e+03]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 6e+00]
Presolve removed 52305 rows and 3375 columns
Presolve time: 0.48s
Presolved: 23340 rows, 23850 columns, 93690 nonzeros
Variable types: 0 continuous, 23850 integer (23850 binary)

Root relaxation: objective 0.000000e+00, 1333 iterations, 0.14 seconds (0.26 work units)

    Nodes    |    Current Node    |     Objective Bounds

In [None]:
# Series Schedule Table
rows = []
for i in range(num_teams):
    for s in range(num_slots):
        for j in range(num_teams):
            if x[i,j,s].X > 0.5:
                rows.append([team_names[i], s, team_names[j]])

schedule_df = pd.DataFrame(rows, columns=["Away_Team","Slot","Opponent"])
schedule_df.sort_values(["Away_Team","Slot"], inplace=True)
schedule_df.head(12)

Unnamed: 0,Away_Team,Slot,Opponent
18,Athletics,3,Los Angeles Angels
19,Athletics,5,Seattle Mariners
20,Athletics,7,Los Angeles Angels
15,Baltimore Orioles,2,New York Yankees
16,Baltimore Orioles,3,Boston Red Sox
17,Baltimore Orioles,4,Toronto Blue Jays
0,Boston Red Sox,5,New York Yankees
1,Boston Red Sox,6,Toronto Blue Jays
2,Boston Red Sox,7,Cleveland Guardians
6,Chicago White Sox,5,Detroit Tigers


In [None]:
rows = []
for i in range(num_teams):
    for j in range(num_teams):
        for s in range(num_slots):
            if x[i,j,s].X > 0.5:
                rows.append([team_names[i], team_names[j], s, "AWAY"])
            if x[j,i,s].X > 0.5:
                rows.append([team_names[i], team_names[j], s, "HOME"])

schedule_df = pd.DataFrame(rows, columns=["Team","Opponent","Slot","Home/Away"])
schedule_df.sort_values(["Team","Slot"], inplace=True)
schedule_df.head(20)


Unnamed: 0,Team,Opponent,Slot,Home/Away
40,Athletics,Texas Rangers,0,HOME
38,Athletics,Seattle Mariners,2,HOME
36,Athletics,Los Angeles Angels,3,AWAY
39,Athletics,Seattle Mariners,5,AWAY
41,Athletics,Texas Rangers,6,HOME
37,Athletics,Los Angeles Angels,7,AWAY
34,Baltimore Orioles,Toronto Blue Jays,0,HOME
31,Baltimore Orioles,New York Yankees,2,AWAY
30,Baltimore Orioles,Boston Red Sox,3,AWAY
35,Baltimore Orioles,Toronto Blue Jays,4,AWAY


In [None]:
# Travel Table
travel_rows = []
for i in range(num_teams):
    for s in range(num_slots - 1):
        for j1 in range(num_teams):
            for j2 in range(num_teams):
                if travel[i,s,j1,j2].X > 0.5:
                    travel_rows.append([
                        team_names[i],
                        s,
                        team_names[j1],
                        team_names[j2],
                        get_distance(team_names[j1], team_names[j2])
                    ])

travel_df = pd.DataFrame(
    travel_rows,
    columns=["Team","Slot","From","To","Distance"]
)
travel_df.head(16)

Unnamed: 0,Team,Slot,From,To,Distance
0,Boston Red Sox,0,Boston Red Sox,Boston Red Sox,0.0
1,Boston Red Sox,1,Boston Red Sox,Boston Red Sox,0.0
2,Boston Red Sox,2,Boston Red Sox,Boston Red Sox,0.0
3,Boston Red Sox,3,Boston Red Sox,Boston Red Sox,0.0
4,Boston Red Sox,4,Boston Red Sox,New York Yankees,186.6
5,Boston Red Sox,5,New York Yankees,Toronto Blue Jays,366.26
6,Boston Red Sox,6,Toronto Blue Jays,Cleveland Guardians,193.35
7,New York Yankees,0,Boston Red Sox,New York Yankees,186.6
8,New York Yankees,1,New York Yankees,New York Yankees,0.0
9,New York Yankees,2,New York Yankees,New York Yankees,0.0


In [None]:
travel_df[travel_df['Team']=='Athletics']

Unnamed: 0,Team,Slot,From,To,Distance
42,Athletics,0,Athletics,Athletics,0.0
43,Athletics,1,Athletics,Athletics,0.0
44,Athletics,2,Athletics,Los Angeles Angels,387.19
45,Athletics,3,Los Angeles Angels,Athletics,387.19
46,Athletics,4,Athletics,Seattle Mariners,605.42
47,Athletics,5,Seattle Mariners,Athletics,605.42
48,Athletics,6,Athletics,Los Angeles Angels,387.19


In [None]:
# Route Summary
print("\n=== LOCATION ROUTE (by slot) ===\n")
for i in range(num_teams):
    route = []
    for s in range(num_slots):
        for k in range(num_teams):
            if loc[i,s,k].X > 0.5:
                route.append(f"{s}:{team_names[k]}")
                break
    print(f"{team_names[i]}: " + " → ".join(route))


=== LOCATION ROUTE (by slot) ===

Boston Red Sox: 0:Boston Red Sox → 1:Boston Red Sox → 2:Boston Red Sox → 3:Boston Red Sox → 4:Boston Red Sox → 5:New York Yankees → 6:Toronto Blue Jays → 7:Cleveland Guardians
New York Yankees: 0:Boston Red Sox → 1:New York Yankees → 2:New York Yankees → 3:New York Yankees → 4:New York Yankees → 5:New York Yankees → 6:Baltimore Orioles → 7:Tampa Bay Rays
Chicago White Sox: 0:Chicago White Sox → 1:Chicago White Sox → 2:Chicago White Sox → 3:Chicago White Sox → 4:Chicago White Sox → 5:Detroit Tigers → 6:Tampa Bay Rays → 7:Baltimore Orioles
Toronto Blue Jays: 0:Baltimore Orioles → 1:New York Yankees → 2:Boston Red Sox → 3:Toronto Blue Jays → 4:Toronto Blue Jays → 5:Toronto Blue Jays → 6:Toronto Blue Jays → 7:Toronto Blue Jays
Tampa Bay Rays: 0:Tampa Bay Rays → 1:Tampa Bay Rays → 2:Tampa Bay Rays → 3:Houston Astros → 4:Texas Rangers → 5:Kansas City Royals → 6:Tampa Bay Rays → 7:Tampa Bay Rays
Baltimore Orioles: 0:Baltimore Orioles → 1:Baltimore Orioles → 

While the optimal distance found by the model is much higher, this better reflects what a MLB schedule might actually look like. Teams have a few series at home before traveling for a handful of away series against different opponents.

How does our optimization model's solution compare to the real world travel of teams?

According to [Baseball Savant](https://baseballsavant.mlb.com/visuals/map), each team traveled about 35,000 miles on average over the season, or about 5,800 miles a month.

In our final model most reflective of realistic scheduling conventions, the summative distance traveled by 15 teams in 8 slots (representative of a month) was 25,832 miles. So, a team traveled 1,722 miles on average in our model solution.

This is a significant decrease from real-world travel in Major League Baseball. If the sole priority is team comfort and safety when scheduling, there is a lot of room to reduce distance traveled.

Real-world travel in baseball remains so high because of the league operators' competing priorities in TV partners, streaming platforms, local government regulations, etc.

If the league decided to prioritize reducing injury rates, leveraging integer programming similar to the above code could help by reducing the travel load of athletes.

## GenAI Usage

We leveraged Generative AI primarily to identify blind spots in our core constraints that kept our proposed solution from reflecting a realistic MLB schedule.

Additionally, we gave constraints to GenAI such that it could share with us LaTeX markdown translations so the mathematical expressions look clean in this notebook file.

It was also used for general debugging and initial research.