# The Transportation Problem

**Key Ideas**
- Supply point and supply constraints
- Demand point and demand constraints
- Balanced transportation problem
- Bipartite graph
- Integrality properties
- Sensitivity analysis

**Reading Assignment**
- Read the first part of Handout 7 on the transportation problem (first 4 pages)

**Brief description:** We will explore examples of the transportation problem including some unexpected ones. We will also learn how to use Python and OR-Tools to represent and solve mathematical programming problems.


In [None]:
# imports
import pandas as pd
import math, itertools
import matplotlib.pyplot as plt
import networkx as nx
from networkx.algorithms import bipartite
from ortools.linear_solver import pywraplp as OR

## Part 1: The Caterer's Problem

(From Winston, page 390) The Carter Caterer Company must have the following number of clean napkins available
at the beginning of each of the next four days:

|day | napkins |
|----|---------|
| 1  |    15   |
| 2  |    12   |
| 3  |    18   |
| 4  |     6   |

After being used, a napkin can be cleaned by one of two methods: fast service or slow service. Fast service costs 10 cents per napkin, and a napkin cleaned via fast service is available for use the day after it is last used. Slow service costs 6 cents per napkin, and these napkins can be reused two days after they are last used. New napkins can be purchased for a cost of 20 cents per napkin. The catering company currently has no napkins, whatsoever. We wish to meet the demand for the next four days as cheaply as possible.

The following questions will lead you to formulate this optimization problem as a balanced transportation problem.

**Q:** First think about the demand points. What do you think is being demanded? What is the demand for each of them? (Hint: There are four demand points)

**A:** <font color='blue'> There are four demand points; 1 for each of the next four days. They have demand 15,12,18, and 6 respectivley.</font>

**Q:** Now think about the supply points. One supply point is the store where napkins are purchased. What are all the supply points? What is the supply for each of them? (Hint: Napkins used on day $i$ can come from those used on day $i-1$ given fast service or day $i-2$ given slow service)

**A:** <font color='blue'> The other supply points are the ends of the first three days. Their supplies are 15, 12, and 18 respectively. The supply of the first supply node is infinity.</font>  

**Q:** What is the per-unit shipping cost between each supply point and each demand point?

**A:** <font color='blue'> The per-unit shipping cost between the store supply node and every demand node is 20. The per-unit shipping cost between the supply node representing the supply at the end of the day $i$ and day $i+1$ is 10 and 6 for day $i+2$. </font>

**Q:** Create a graphical representation of this input (combine your answers from Q1-Q3). Upload it as an image into the next cell by entering the image name if it is in the same folder as this lab. Make sure that it displays properly within the notebook.

![title](images-key/q4_graph.png)

We came up with the following formulation for the Caterer’s Problem where the demand points are the four days $(1, 2, 3, 4)$ and the supply points are the store ($s$) and the ends of the first three days $(1’, 2’, 3’)$. The supplies/demands and costs are summarized in the table below. Did you get a similar formulation?

|         | 1  | 2  | 3  | 4  | supplies |
|---------|----|----|----|----|----------|
| $s$     | 20 | 20 | 20 | 20 | $\infty$ |
| $1'$      | -  | 10 | 6  | -  | 15       |
| $2'$      | -  | -  | 10 | 6  | 12       |
| $3'$      | -  | -  | -  | 10 | 18       |
| demands | 15 | 12 | 18 | 6  |          |

**Q:** Since we do not usually deal with infinity, what is a large enough value that can be the amount of supply at $s$? (Hint: Buy new napkins every day)

**A:** <font color='blue'> If we buy new napkins every day then we need $15+12+18+6 = 51$ napkins from the store.</font>

**Q:** Is this formulation balanced? If not, how can you convert it into a balanced transportation problem?

**A:** <font color='blue'> This formulation is not balanced because the total supply is 96 while the total demand is only 51. We can add a dummy demand node to make the formulation balanced. </font>

**Q:** What is the purpose of a dummy demand point in terms of napkins? For instance, what napkins come here, and what happens to them?

**A:** <font color='blue'> Napkins that come to the dummy demand node are just thown away. They are neither laundered nor used. We assume it costs nothing to dispose of the napkins. </font>

**Q:** Fill out the missing values (XXX) in the table below. We substituted the dashes with a big enough number like 1000.

|         | 1    | 2    | 3    | 4    | dummy | supplies |
|---------|------|------|------|------|-------|----------|
| $s$     | 20   | 20   | 20   | 20   | XXX   | 51      |
| 1'      | 1000 | 10   | 6    | 1000 | XXX   | 15       |
| 2'      | 1000 | 1000 | 10   | 6    | XXX   | 12       |
| 3'      | 1000 | 1000 | 1000 | 10   | XXX   | 18       |
| demands | 15   | 12   | 18   | 6    | XXX   |          |

 <font color='blue'>

|         | 1    | 2    | 3    | 4    | dummy | supplies |
|---------|------|------|------|------|-------|----------|
| $s$     | 20   | 20   | 20   | 20   | 0   | 51      |
| 1'      | 1000 | 10   | 6    | 1000 | 0   | 15       |
| 2'      | 1000 | 1000 | 10   | 6    | 0   | 12       |
| 3'      | 1000 | 1000 | 1000 | 10   | 0   | 18       |
| demands | 15   | 12   | 18   | 6    | 45   |          |
    
</font>

### Part 2: Solving Using a Computer

The next cell contains our model. In later labs, you will practice writing your own models, but for now, read through the code and run it.

In [None]:
def transportation(data, integer=False):
    """A model for solving the transportation problem.
    
    Args:
        data (pd.DataFrame): Dataframe with demand, supplies, and cost matrix.
    """
    ORIG = list(data.index)[:-1]                                # origins
    DEST = list(data.columns)[:-1]                              # destinations
    supply = data['supply'][:-1].to_dict()                      # supply
    demand = data.transpose()['demand'][:-1].to_dict()          # demand
    cost = data.iloc[:-1,:-1].transpose().to_dict()
    cost = {(i,j) : cost[i][j] for i in cost for j in cost[i]}  # cost
    ARCS = list(cost)                                           # arcs
    
    # define model
    m = OR.Solver('transportation', OR.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
    
    # decision variables
    x = {}  # units to be shipped on each edge
    for i,j in ARCS:
        if integer:
            x[i,j] = m.IntVar(0, m.infinity(), ('(%s, %s)' % (i,j))) 
        else:
            x[i,j] = m.NumVar(0, m.infinity(), ('(%s, %s)' % (i,j)))
        
    # objective function
    m.Minimize(sum(cost[i,j]*x[i,j] for i,j in ARCS))
        
    # subject to: all supply delivered at each origin node
    for i in ORIG:
        m.Add(sum(x[i,j] for j in DEST) == supply[i])
        
    # subject to: demand met at each demand node
    for j in DEST:
        m.Add(sum(x[i,j] for i in ORIG) == demand[j])
    
    return m, x

In [None]:
def solve(m):
    m.Solve()
    print('Solution:')
    print('Objective value =', m.Objective().Value())
    for var in m.variables():
        print(var.name(), ':',  var.solution_value())

In [None]:
steel_data = pd.read_csv('data/transportation_steel.csv', index_col=0)

Here is an example set of data you can solve using this model. The `supply` column gives the tons of steel produced by three different steel mills. The `demand` row gives the tons of steel requested by each car manufacturer. The remaining portion of the dataframe gives the shipping cost per ton from each steel mill to each car manufacturer.

In [None]:
display(steel_data)

Run the next cell to create a model to solve this transportation problem and then solve it!

In [None]:
m, x = transportation(steel_data)
solve(m)

Now, let's use this model to solve the caterer's problem! (Hint: the dummy node is labeled `d`)

In [None]:
caterer_data = pd.read_csv('data/transportation_caterer.csv', index_col=0)
display(caterer_data)

**Q:** Does the table above match what you found in **Q8?** What did you get wrong?

**A:** <font color='blue'> Depends. </font>

Run the cell below to create the transportation model and solve it.

In [None]:
m, x = transportation(caterer_data)
solve(m)

**Q:** Reinterpret the solution in words including how we get napkins for each day.

**A:** <font color='blue'> The first day, we just have to buy all of our napkins (15). We then get 9 cleaned using the fast service and 6 cleaned using the slow service. The 9 in addition to 3 new napkins satisfy the demand for the second day. After which, all 12 are cleaned using the fast service. These, along with 6 from day 1 satisfy the demand for the third day. Lastly, 6 napkins are cleaned using the fast system to be ready for the final day. </font>

### Part 3: Exploring the Optimal Solution

We will start by solving a new problem. An oil company imports crude oil from three sources and refines it at five refineries. Sources 1, 2, 3 can ship 20, 50, 20 units of crude respectively each week. Refineries 1 to 5 need 10, 24, 6, 20, 30 units of crude respectively each week. The table below contains the unit shipping costs from the sources to the refineries. The entry in the $i$th row and $j$th column gives the cost to ship from source $i$ to refinery $j$.

|    |    |    |    |    |
|----|----|----|----|----|
| 30 | 30 | 10 | 27 | 15 |
| 15 | 15 | 8  | 13 | 5  |
| 25 | 21 | 5  | 15 | 21 |

In [None]:
oil_data = pd.read_csv('data/transportation_oil.csv', index_col=0)
display(oil_data)

In [None]:
m, x = transportation(oil_data)
solve(m)

Now consider the values $u(1) = 25$, $u(2) = 15$, $u(3) = 17$, $v(1) = 0$, $v(2) = 0$, $v(3) = -15$, $v(4) = -2$, and $v(5) = -10$.  

**Q:** Write down the modified cost matrix, where as shown in class, we subtract $u(i)$ from all the entries in row $i$, and similarly we subtract $v(j)$ from each entry of column $j$. Argue why the solution computed above is indeed optimal for the original input.

**A:**

|    |    |    |    |    |
|----|----|----|----|----|
|   |  |  |  |  |
|   |  |  |  |   |
|   |  |  |  |  |

<font color='blue'> 
    
|    |    |    |    |    |
|----|----|----|----|----|
| 5 | 5 | 0 | 4 | 0 |
| 0 | 0 |8  | 0 | 0  |
| 8 | 4 | 3  | 0 | 14 |

In the solution above, all non-zero $x_{ij}$ have modified cost equal to zero. Hence, the solution must be optimal. </font>

Now suppose that we anticipate the cost of shipping from source 1 to refinery 1 (currently equal 30) to go down. Observe that in the current optimal solution, we do not ship anything from source 1 to refinery 1. An interesting question is "By how much should $c(1,1)$ decrease before we will consider shipping some positive amount along that path?"

**Q:** Try the following values for $c(1,1)$: 28, 26, 25, 24. Record the optimal solutions in the table. Also include the value $\overline{c}(1,1)$ from the matrix of modified costs (recall $\overline{c}(1,1) = c(1,1) - u(1) - v(1)$). What can you conclude?

In [None]:
oil_data.loc['S1','R1'] = 30
m, x = transportation(oil_data)
solve(m)

**A:** 

|                     | 28          | 26          | 25         | 24 |
|---------------------|-------------|-------------|------------|----|
|                     | x(1,3) = XXX  | XXX | XXX | x(1,1) = XXX |   
|                     | x(1,5) = XXX |             |            | x(1,3) = XXX  |
|                     | x(2,1) = XXX |             |            | x(1,5) = XXX   |
|                     | x(2,2) = XXX |             |            | x(2,2) = XXX   |
|                     | x(2,5) = XXX |             |            | x(2,5) = XXX   |
|                     | x(3,4) = XXX |             |            | x(3,4) = XXX   |
| $\overline{c}$(1,1) |  XXX          |  XXX          |    XXX    | XXX |

<font color='blue'>

|                     | 28          | 26          | 25         | 24 |
|---------------------|-------------|-------------|------------|----|
|                     | x(1,3) = 6  | same as 28  | same as 28 | x(1,1) = 10 |   
|                     | x(1,5) = 14 |             |            | x(1,3) = 6  |
|                     | x(2,1) = 10 |             |            | x(1,5) = 4   |
|                     | x(2,2) = 24 |             |            | x(2,2) = 24   |
|                     | x(2,5) = 16 |             |            | x(2,5) = 26   |
|                     | x(3,4) = 20 |             |            | x(3,4) = 20   |
| $\overline{c}$(1,1) |  3          |  1          |      0     | -1 |

 Once the modified cost becomes negative, the edge will enter the solution. It could enter when the modified cost is zero though.</font>

Now consider the original model with $c(1,1) = 30$. Make sure to run the cell below to reset the value.

In [None]:
oil_data.loc['S1','R1'] = 30

**Q:** Could you use the modified cost matrix to answer the same kind of question raised above for changes in any $c(i,j)$ parameter? Try to use your observations to figure out by how much the value of $c(2,3)$ needs to be reduced so that we will consider shipping from source 2 to refinery 3. Check your answer using our Python model.

**A:** <font color='blue'> The value of $c(2,3)$ needs to be decreased by at least 8 for the edge to enter the solution. </font>

Once again start with the original model and data. Because of a trade agreement, the amount that is shipped from source 2 to refinery 5 cannot exceed the amount that is shipped to refinery 5 from sources 1 and 3 combined by more than 1. The following cell adds this aditional constraint to the model befoe solving.

**Q:** Rerun the model cell and re-solve to find the new optimal solution and objective value.

In [None]:
m, x = transportation(oil_data)
m.Add(x['S2','R5'] <= x['S1','R5'] + x['S3','R5'] + 1)
solve(m)

**A:** <font color='blue'> The objective value is 1161.5.

|          | Refinery 1 | 2 | 3 | 4 | 5 |
|----------|------------|---|---|---|---|
| Source 1 |      0      | 0  |  5 |  0 | 14.5  |
| 2        |      10      | 24  | 0  | 0.5  | 15.5  |
| 3        |      0      |  0 | 0.5  | 19.5  | 0  |</font>

**Q:** What is different from your previous solutions? What property does not hold anymore now that we have added the trade agreement constraint?

**A:** <font color='blue'> The objective value is higher (worse) and the integrality property no longer holds.</font>

Since the shipping is done in barrels, we want the optimal solution to have only integer flow values. In the next cell, we set the parameter `integer` to true to enforce this. Again, we include our new constraint before solving.

In [None]:
m, x = transportation(oil_data, integer=True)
m.Add(x['S2','R5'] <= x['S1','R5'] + x['S3','R5'] + 1)
solve(m)

**Q:** Re-solve. Give the optimal solution and objective.

**A:** <font color='blue'> The objective value is 1163 

|          | Refinery 1 | 2 | 3 | 4 | 5 |
|----------|------------|---|---|---|---|
| Source 1 |    0        | 0  | 5  | 0 | 15 |
| 2        |     10       | 24  | 0  |  1 | 15  |
| 3        |      0      |  0 | 1  | 19  |  0 |</font>

**Q:** Note that the optimal objective function value is higher than it was before we added the integrality constraint. Can you explain why it could never be lower than it was before we added the constraint that all *Trans* values had to be integer?

**A:** <font color='blue'> By adding a integer constraint, we decrease thre feasible region of our problem. We have tightened our acceptable number of solutions, so our solution can only be greater or equal to the original problem's solution.</font>