# Recitation 2 - Formulations

In today's recitation, we will build different formulations of the same problem, and explore the computational implications.

First, we load packages.

In [3]:
using JuMP, Gurobi

using DataFrames, CSV, Combinatorics
using LinearAlgebra

┌ Info: Precompiling Gurobi [2e9cd046-0924-5485-92f1-d5272153d98b]
└ @ Base loading.jl:1278


LoadError: [91mArgumentError: Package Combinatorics not found in current path:[39m
[91m- Run `import Pkg; Pkg.add("Combinatorics")` to install the Combinatorics package.[39m


In [4]:
import Pkg; Pkg.add("Combinatorics")

[32m[1m   Updating[22m[39m registry at `~/.julia/registries/General`
######################################################################### 100.0%
[32m[1m  Resolving[22m[39m package versions...
[32m[1m  Installed[22m[39m Combinatorics ─ v1.0.2
[32m[1mUpdating[22m[39m `~/.julia/environments/v1.5/Project.toml`
 [90m [861a8166] [39m[92m+ Combinatorics v1.0.2[39m
[32m[1mUpdating[22m[39m `~/.julia/environments/v1.5/Manifest.toml`
 [90m [861a8166] [39m[92m+ Combinatorics v1.0.2[39m


## Part 1: Facility location

### 1.1 Problem setup

We are now ready to formulate our first integer optimization problem. The facility location problem takes as inputs two sets and two parameters:
- Sets:
    - facilities $j\in \mathcal{J}=\{1,\ldots,n\}$
    - customers $i\in \mathcal{I}=\{1, \ldots, m\}$
- Parameters:
    - $d_{ij}$: distance from customer $i$ to facility $j$
    - $c_j$: cost of facility $j$
    
Let's define the size of the problem.

In [None]:
n = 50
m = 500

In real-life, we would have data on the location of facilities and customers. For simplicity, we assume that the facilities and customers are uniformly sampled over the unit square $[0,1]^2$. The first column of the following arrays denotes the x coordinate, and the second column denotes the y coordinate.

In [None]:
facilities = rand(n,2);
customers = rand(m,2); # notice we add a semicolon to suppress Jupyter output

We can now define the distance matrix:

In [None]:
dist = [LinearAlgebra.norm(customers[i, :] .- facilities[j, :]) for i=1:m, j=1:n];
@show size(dist);

Finally, we sample a vector of facility costs uniformly between 10 and 20:

In [None]:
c = rand(n)*10 .+ 10;

### 1.2 Formulations

We covered two formulations of the problem in class, one with many constraints but a tight formulation, the other with fewer constraints but a less tight formulation. Time to implement!

#### Formulation 1

$$\min \sum_{j=1}^nc_j y_j + \sum_{i=1}^m\sum_{j=1}^nd_{ij}x_{ij}$$
$$\text{subject to}$$
$$\sum_{j=1}^n x_{ij}=1 \quad \forall i\in[m]$$
$$x_{ij}\le y_j \quad\forall i\in[m], j\in[n]$$
$$x_{ij}\in\{0,1\}\quad\forall i\in[m], j\in[n]$$
$$y_j \in \{0,1\}\quad \forall j\in[n]$$

In [None]:
"Build facility location model 1"
function facility_model_1(distance::Matrix, cost::Vector)
    # extract problem dimensions from distance matrix and verify coherence of input data
    m, n = size(distance)
    @assert length(cost) == n
    model = Model(Gurobi.Optimizer)
    set_optimizer_attribute(model, "TimeLimit", 1800)
    # VARIABLES
    # Whether to open each facility
    @variable(model, y[1:n], Bin)
    # Whether to solve a particular customer from a particular facility
    @variable(model, x[1:m, 1:n], Bin)
    # CONSTRAINTS
    @constraint(
        model, serve_every_customer[i = 1:m],
        sum(x[i, j] for j = 1:n) == 1
    )
    @constraint(
        model, only_serve_from_open_facility[i = 1:m, j=1:n],
        x[i, j] <= y[j]
    )
    # OBJECTIVE
    @objective(
        model, Min, sum(cost[j] * y[j] for j = 1:n) + sum(distance[i, j] * x[i, j] for i=1:m, j=1:n)
    )
    return model, x, y
end

Now we can call our function to build the model with our sampled data, then solve and report the elapsed time.

In [None]:
buildtime1 = @elapsed model1, x1, y1 = facility_model_1(dist, c);
solvetime1 = @elapsed optimize!(model1)
@show buildtime1
@show solvetime1

#### Formulation 2

$$\min \sum_{j=1}^nc_j y_j + \sum_{i=1}^m\sum_{j=1}^nd_{ij}x_{ij}$$
$$\text{subject to}$$
$$\sum_{j=1}^n x_{ij}=1 \quad \forall i\in[m]$$
$$\sum_{i=1}^m x_{ij}\le my_j \quad\forall j\in[n]$$
$$x_{ij}\in\{0,1\}\quad\forall i\in[m], j\in[n]$$
$$y_j \in \{0,1\}\quad \forall j\in[n]$$

In [None]:
"Build facility location model 1"
function facility_model_2(distance::Matrix, cost::Vector)
    # extract problem dimensions from distance matrix and verify coherence of input data
    m, n = size(distance)
    @assert length(cost) == n
    model = Model(Gurobi.Optimizer)
    set_optimizer_attribute(model, "TimeLimit", 1800)
    # VARIABLES
    # Whether to open each facility
    @variable(model, y[1:n], Bin)
    # Whether to solve a particular customer from a particular facility
    @variable(model, x[1:m, 1:n], Bin)
    # CONSTRAINTS
    @constraint(
        model, serve_every_customer[i = 1:m],
        sum(x[i, j] for j = 1:n) == 1
    )
    @constraint(
        model, only_serve_from_open_facility[j=1:n],
        sum(x[i, j] for i=1:m) <= m * y[j]
    )
    # OBJECTIVE
    @objective(
        model, Min, sum(cost[j] * y[j] for j = 1:n) + sum(distance[i, j] * x[i, j] for i=1:m, j=1:n)
    )
    return model, x, y
end

In [None]:
buildtime2 = @elapsed model2, x2, y2 = facility_model_2(dist, c);
solvetime2 = @elapsed optimize!(model2)
@show buildtime2
@show solvetime2

## Part 2: TSP

In the traveling salesman problem, we have $n$ locations, or "cities", indexed by $[n]=\{1,\ldots,n\}$ with $d_{ij}$ denoting the distance between location $i$ and location $j$.

The goal is to find a tour that visits each location exactly once, while minimizing total distance traveled.

### 2.1 Problem setup

We have a few TSP instances saved in the `tsp` directory. The data are stored as ASCII text files, which we can read as follows.

In [5]:
file = open("tsp/berlin52.tsp");
data = readlines(file)
data[1:10]

10-element Array{String,1}:
 "NAME: berlin52"
 "TYPE: TSP"
 "COMMENT: 52 locations in Berlin (Groetschel)"
 "DIMENSION: 52"
 "EDGE_WEIGHT_TYPE: EUC_2D"
 "NODE_COORD_SECTION"
 "1 565.0 575.0"
 "2 25.0 185.0"
 "3 345.0 750.0"
 "4 945.0 685.0"

The first few lines describe the file and the problem type, and the following lines list the (x,y) coordinates of each point in a grid. We would to extract the coordinates and define a distance matrix. The function below is one way to do this, but there are many others.

In [8]:
function get_distances(filename)
    file = open(filename)
    data = readlines(file)
    close(file)
    # get useful lines
    dimension_line = findfirst(x -> occursin("DIMENSION", x), data)
    metric_line = findfirst(x -> occursin("EDGE_WEIGHT_TYPE", x), data)
    first_data_line = findfirst(x -> occursin("NODE_COORD_SECTION", x), data) + 1
    last_data_line = findfirst(x -> occursin(r"EOF", x), data) - 1
    # extract dimension
    n = parse(Int64, match(r"[0-9]+", data[dimension_line]).match)
    # check metric is Euclidean 2D
    occursin("EUC_2D", data[metric_line]) || error("Unsupported metric type")
    # Create coordinates
    coords = zeros(n, 2)
    for line in data[first_data_line:last_data_line]
        temp = split(line)
        parse(Int64,temp[1])
        coords[parse(Int64,temp[1]),:] = [parse(Float64,temp[2]) parse(Float64,temp[3])]
    end
    # Create distance matrix
    distances = [norm(coords[i, :] .- coords[j, :]) for i=1:n, j=1:n]
    return distances
end

get_distances (generic function with 1 method)

We can now use our shiny new `get_distances` function to convert a TSP text file to a distance matrix. We can visually check the matrix is 52x52, is symmetric, has nonnegative values, and zeros on the diagonal.

In [9]:
get_distances("tsp/berlin52.tsp")

LoadError: [91mUndefVarError: norm not defined[39m

### 2.2 Formulations

#### Basic formulation

$$\begin{align}
\min\quad & \sum_{i=1}^n\sum_{j=1}^n d_{ij}x_{ij}\\
\text{s.t.}\quad & \sum_{i=1}^n x_{ij}=1 & \forall j\in[n]\\
&\sum_{j=1}^nx_{ij} =1 & \forall i\in[n]\\
&x_{ii}=0 & \forall i\in[n]\\
& x_{ij}\in\{0,1\}&\forall i, j \in [n]
\end{align}$$

_Note: even though our distance matrix is symmetric, it is slightly easier to implement a more general formulation where we do not assume this property. If we know our distance matrix is symmetric, we can cut the number of variables in half by only defining $x_{ij}$ for $i < j$ (i.e. not defining both $x_{12}$ and $x_{21}$), but this is beyond the scope of this recitation._

In [10]:
"Construct TSP pre-model, without cycle elimination mechanism"
function prebuild_tsp(dist::Matrix)
    n = size(dist, 1)
    # Definition of model
    model = Model(Gurobi.Optimizer)
    # Main variable: x_ij=1 if the tour visits i and j in that order, 0 otherwise
    @variable(model, x[1:n, 1:n], Bin)
    # Objective: minimizing the total cost (distance) of the tour
    @objective(model, Min, sum(dist[i, j] * x[i, j] for i = 1:n, j = 1:n))
    # SHARED CONSTRAINTS
    @constraint(
        model, no_self_edges[i = 1:n], x[i,i] == 0
    )
    @constraint(
        model, exactly_one_successor[i = 1:n], sum(x[i, j] for j = 1:n) == 1
    )
    @constraint(
        model, exactly_one_predecessor[j = 1:n], sum(x[i, j] for i = 1:n) == 1
    )
    return model, x
end

prebuild_tsp

#### Issues with the formulation

The formulation we've built so far is nice, but it has one big flaw! It doesn't ensure that every city must be part of the same tour.

For instance, the following solution would be feasible:

![](extra/tsp-subtour.png)

We need to somehow eliminate these "subtours".

#### Attempt 1: MTZ formulation (compact)

This formulation eliminates subtours using a $u_i$ variable for each node $i$, which defines the "order" of the visit starting from node 1 ($u_1=1$). More precisely, we impose the following constraints:

$u_1=1$

$2\le u_i \le n \quad\forall i=2, \ldots, n$

$u_j \ge u_i + 1 - (n-1)(1-x_{ij}) \quad \forall i, j \in [n]$

The first two are straightforward. What does the third one mean?

- If $x_{ij}=1$, then we impose that $u_j\ge u_i +1$ (the order of $j$ is at least 1 + the order of $i$). Since all $u_i$ are upper-bounded by $n$, this leads to the order of $j$ being exactly 1 + the order of $i$).
- If $x_{ij}=0$, then we impose that $u_j\ge u_i + 2 - n$. Since $u_i$ is at most $n$, in the worst case we impose $u_j\ge 2$ which holds as long as $j>1$ (recall our convention that $u_1=1$). So $x_{ij}=0$ effectively "turns off" the constraint.

We implement the MTZ formulation by appending these constraints to the core assignment model.

In [None]:
"Solve TSP using MTZ formulation, return runtime and objective"
function solveMTZ(dist::Matrix; time_limit_seconds::Real=1800)
    model, x = prebuild_tsp(dist, time_limit_seconds);
    set_optimizer_attribute(model, "TimeLimit", time_limit_seconds)
    n = size(dist,1);
    # Lower bound: 1 for node 1, 2 for all other nodes
    lb = [1 ; 2*ones(n-1)]
    # Upper bound: 1 for node 1, n for all other nodes
    ub = [1 ; n*ones(n-1)]
    # We define the u variable with lower and upper bounds, ensuring in particular that u_1=1
    @variable(model, lb[i] <= u[i = 1:n] <= ub[i])
    # Constraint
    @constraint(
        model, [i = 2:n, j = 2:n], u[i] - u[j] + 1 <= (n-1) * (1 - x[i, j])
    )
    # We then solve the model and store the runtime
    start = time()
    optimize!(model)
    solvetime = time() - start
    return solvetime, objective_value(model)
end

Let's try it!

In [None]:
dist52 = getDistances("tsp/berlin52.tsp");
solvetime52_MTZ, obj52_MTZ = solveMTZ(dist52);
println("Runtime: $solvetime52_MTZ seconds")
println("Objective: $obj52_MTZ")

Let's try it on a larger instance now.

In [None]:
dist76 = getDistances("tsp/pr76.tsp");
solvetime76_MTZ, obj76_MTZ = solveMTZ(dist76, time_limit_seconds=60)
println("Runtime: $solvetime76_MTZ seconds")
println("Objective: $obj76_MTZ")

On 52 cities, we solve in about one second. On 76 cities, we are much slower! Even after 60 seconds, our objective gap is still 6.9%. Clearly this will not scale to hundreds of cities.

#### Attempt 2: combinatorial formulation (cutset)

OK, so MTZ is a bit of a dud (no offense). Can we do better? Yes, provided we're not afraid of large numbers.

*Intuition:* since we seem to dislike subtours so much, let's write a specific constraint for every possible subtour, preventing it from existing.

One way to do this is the following pair of "cutset" constraints:

$$\sum_{i\in S}\sum_{j\notin S} x_{ij}\ge 1 \quad\forall S\subset V, S\neq\emptyset, V$$

$$\sum_{i\notin S}\sum_{j\in S} x_{ij}\ge 1 \quad\forall S\subset V, S\neq\emptyset, V$$

_What's going on here?_ For a subset of nodes $S$, the constraints above ensure that there is at least one edge leaving the subset $S$, and at least another edge entering the subset $S$.

![](extra/cut-set-diagram.png)

Unfortunately, a direct implementation is simply impossible, even for small-scale instances. Indeed, the number of subsets of $\{1,n\}$ is equal to $2^n$. For $n=52$, this means $4.5\cdot 10^15$ -- clearly impossible with our computer's memory.

Instead, we will add constraints *only when we need them*. We have two options:
1. We will solve the model naively, without subtour elimination constraints, and add constraints that are violated by the incumbent solution.

2. We can also define *lazy constraints* in a callback function. This is a bit harder, but the idea is to give the constraints to the solver, and the solver will use them as needed through the branch-and-cut process. Specifically, the solver will keep the constraints in a "pool". As solutions are generated (in the branching tree), the solver checks which constraints are violated and adds them to the active formulation. Hence, the name "lazy" constraints because we let the solver use them as needed.

Either way, we need to write a function to find a subtour in a candidate solution. If we were trying to make our code as efficient as possible, our best bet would be to write this function ourselves. In this recitation, we're going to pick an easier route, using the `LightGraphs` package.

In [None]:
using LightGraphs

The `LightGraphs` package is a way to easily work with graphs in Julia.

Given a solution $\boldsymbol{x}^*$, we can define the _induced_ graph $G(\boldsymbol{x}^*)$ as the graph with one node per TSP node, and an edge between each pair of nodes $(i,j)$ for which $x^*_{ij}=1$.

We know that a graph induced by a feasible solution will have exactly one edge into each node, and one edge out of each node. There is therefore a one-to-one mapping between subtours in the induced TSP graph and connected components of the induced graph.

In [None]:
"""
    Given the induced graph as an adjacency list (i.e., next[i] is the next node to visit after node i),
        compute all subtours.
    Return them as a list of lists of nodes in the same component
"""
function find_subtours(next::Vector{Int})
    n = length(next)
    g = DiGraph(n)
    for i = 1:n
        add_edge!(g, i, next[i])
    end
    components = strongly_connected_components(g)
    return sort(components, by=length)
end

Let's try using this function

In [None]:
dist52 = getDistances("tsp/berlin52.tsp")
model, x = prebuild_tsp(dist52)
optimize!(model)
next = [findfirst(x -> x > 0.5, value.(x[i, :])) for i = 1:size(x, 1)]
find_subtours(next)

We notice two things here:

1. We have a lot of subtours of length 2 - this is due to the directed formulation, different from the undirected formulation in lecture.

2. We have lost some information by using connected components, namely the order of the cycle. But it's ok, because the cutset constraints doesn't care about the order of the cycle anyway.

We also notice we have quite a few subtours to eliminate. For the subtours of length 2, we could decide to eliminate all of them at once, or to eliminate them as they come up. We can even see which approach is faster!

In [None]:
"Solve the TSP using an iterative approach"
function solve_iterative(dist::Matrix; time_limit_seconds::Real = 1800,
                         eliminate_length_2::Bool=false,
                         verbose::Bool = true)
    # We first solve the model without any subtour elimination consideration
    model, x = prebuild_tsp(dist)
    n = size(dist,1)
    if eliminate_length_2
        @constraint(model, no_length_2[i = 1:n, j = 1:n], x[i, j] + x[j, i] <= 1)
    end
    verbose || set_optimizer_attribute(model, "OutputFlag", 0)
    start=time()
    optimize!(model)

    while true
        # We store the incumbent solution
        next = [findfirst(x -> x > 0.5, value.(x[i, :])) for i = 1:n]
        # Note: checking for >0.5 is conservative (x is binary!) but it avoids numerical errors
        subtours = find_subtours(next)
        println("Found $(length(subtours)) subtours after $(time() - start) seconds")
        if length(subtours) == 1 # only one cycle, the TSP solution
            solvetime = time() - start
            return solvetime, objective_value(model)
        else
            # eliminate subtours
            for subtour in subtours
                @constraint(model, sum(x[i, j] for i=subtour, j=setdiff(1:n, subtour)) >= 1)
                @constraint(model, sum(x[i, j] for i=setdiff(1:n, subtour), j=subtour) >= 1)
            end
        end
        optimize!(model)
        time() - start > time_limit_seconds && return solvetime, objective_value(model)
    end
end

In [None]:
solvetime52_iterative, obj52_iterative = solve_iterative(dist52, verbose=false, eliminate_length_2=false)
println("Runtime: $solvetime52_iterative seconds")
println("Objective: $obj52_iterative")

Now let's try that 76-city instance. Remember it didn't solve in one minute using the MTZ formulation.

In [None]:
solvetime76_iterative, obj76_iterative = solve_iterative(dist76, verbose=false,
                                                         eliminate_length_2=false)
println("Runtime: $solvetime76_iterative seconds")
println("Objective: $obj76_iterative")

Let's try to solve even larger instances! This one has 280 locations.

In [None]:
dist280 = getDistances("tsp/a280.tsp")
solvetime280_iterative, obj280_iterative = solve_iterative(dist280, verbose=false);
println("Runtime: $solvetime280_iterative seconds")
println("Objective: $obj280_iterative")