# Transportation Problems in Python

## Try me
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ffraile/operations-research-notebooks/blob/main/docs/source/MIP/tutorials/PuLP%20and%20Python%20MIP%20Tutorial.ipynb)[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ffraile/operations-research-notebooks/main?labpath=docs%2Fsource%2FMIP%2Ftutorials%2FPuLP%20and%20Python%20MIP%20Tutorial.ipynb)

## Introduction
This notebook contains a simple transportation problem and a solution built using PuLP and Pandas. 

## Problem Model Formulation
We are trying to find the optimal distribution network to distribute products from a set of plants to a set of retailer regions, using the optimal set of warehouses which can be built in a set of possible locations. Therefore, we decisions to be made are: 

a) where to build warehouses 

b) which warehouses should transport which regions,
 
c) how many product units should we transport from each production plant to each warehouse and 

d) how many units should we transport from every warehouse to every retail region. 

Building a warehouse has a certain cost for the company, named the *building cost*. Likewise, each warehouse has a different *operation costs*, which is the cost of operating one warehouse for a period of a year. Transporting units from plants to warehouses and from warehouses to regions also has a cost, and also, handling one unit in a warehouse has an associated cost as well. 
In this example of the problem, we also have the following requirement from the company: If a warehouse supplies to a region, then it has to supply the demand for all the products (**single-source**).

We can summarise these data using the following definitions:

**Indexes:**

- **i:** Production plants
- **j:** Possible locations (where to  place the warehouses)
- **k:** retailer regions
- **l:** products

**Data:**

$d_{kl} \quad \text{yearly demand in retail region k  from product l}$

$a_{ijl} \quad \text{cost of transporting 1 units of product l  from plant i to warehouse j}$

$b_{jkl} \quad \text{cost of transporting 1 units of product from warehouse j  to retailer region k}$

$I_{j} \quad \text{cost of building warehouse j}$
 
$F_{j} \quad \text{yearly operation costs of warehouse j}$

$v_{jl} \quad \text{handling cost of 1 unit of product l in warehouse j}$

$c_{il} \quad \text{yearly production capacity of product l in plant i}$

$C \quad$ maximum amount of warehouses to build
$B \quad$ maximum investment for warehouses building

**Decision variables:**

$Y_{j} \quad \text{binary (1 if a warehouse is placed in location j , 0 otherwise)}$

$W_{jk} \quad \text{binary (1 if the warehouse j supplies the retailer region k), (0 otherwise)}$

$S_{ijl} \quad \text{integer units of product l transported from plant i to warehouse j}$

$T_{jkl} \quad \text{integer units of product l transported from warehouse j to region k}$ 

### Objective function ###
The objective is to minimise the cost, taking into account the building costs, the operation costs, the transportation costs, and the handling costs. We can formulate these costs using the definitions above as:

- **Building Costs:**  

$Cost_{b} = \sum_{j}{I_{j}*Y_{j}}$

- **Yearly Operation Costs:**  

$Cost_{o} = \sum_{j}{F_{j}*Y_{j}}$

- **Transportation Costs:** 
    - From plants to warehouses:
    
    $CostT{ij} = \sum_{i}{\sum_{j}{\sum_{l}{a_{ijl}·S_{ijl}}}}$
    
    - From warehouses to regions: 
    
    $CostT_{jk} = \sum_{j}{\sum_{k}{\sum_{l}{b_{jkl}·T_{jkl}}}}$

Now, for the unit costs, if we assume that we are not storing any units in a year, then every unit that enters a warehouse from a production plant must be transported to a region, therefore:

$\sum_{i}{S_{ijl}} = \sum_{k}{T_{jkl}} \quad \forall j, \forall l$

For the handling cost, we can then write
- **Unit handling cost:** 

$Cost_{h} = \sum_{j}{\sum_{k}{\sum_{l}{v_{jl}·T_{jkl}}}}$

This is in general, the expression of the handling costs, however, since we have the single source requirement, we know that:

$\sum_{j}{W_{jk}} = 1 \quad \forall k$

that is, only one warehouse can supply a region. If this holds, to make sure we satisfy the demand, we know that:

$\sum_{j}{d_{kl}·W_{jk}} = d_{kl} \quad \forall k, \forall l$

And also that the number of products to be transported to each region from every warehouse must match the demand for every product, and therefore:

$\sum_{j}\sum_{l}{{d_{kl}·W_{jk}}} = \sum_{j}{\sum_{l}{T_{jkl}}} \quad \forall k$

Therefore, we can express the handling costs as:

$Cost_{h} = \sum_{j}{\sum_{k}{\sum_{l}{v_{jl}·d_{kl}·W_{jk}}}}$

Finally, the objective function is expressed as:

$\min z = Cost_{b} + Cost_{o} + CostT{ij} + CostT_{jk} + CostT_{h}$

or

$\min z = \sum_{j}{I_{j}*Y_{j}} + \sum_{j}{F_{j}*Y_{j}} + \sum_{i}{\sum_{j}{\sum_{l}{a_{ijl}·S_{ijl}}}} + \sum_{j}{\sum_{k}{\sum_{l}{b_{jkl}·T_{jkl}}}} + \sum_{j}{\sum_{k}{\sum_{l}{v_{jl}·d_{kl}·W_{jk}}}}$

### Constraints ###
Now, regarding the constraints, the previous section already introduced some of the requirements of the problem. Let us look at them one by one.

**Demands:**
For instance, looking at the flow of materials, the demand in every region and for every product must be met:

$\sum_{j}{T_{jkl}}= d_{kl} \quad \forall k, \forall l$


**Material flows:**
Also, the amount of each product that leaves a warehouse cannot be higher than the amount of product that enters a warehouse. Assuming that we do not storing any amount for next periods, then both variables are equal instead of less or equal:

$\sum_{i}{S_{ijl}} = \sum_{k}{T_{jkl}} \forall j, \forall l$

**Production capacities:**
The amount of a each product produced in a plant must not exceed production capacity:

$\sum_{j}{S_{ijl}} \leq c_{il} \forall i, \forall l$

**Financial constraints:**
The amount of money invested in building warehouses must not exceed the budget limit:

$\sum_{j}{I_{j}*Y_{j}} \leq B$

**Company Policy constraint:**
The number of warehouses must not exceed the limit:

$\sum_{j}{W_{j}} \leq C$

**Single Source Constraint:** Each region receives all its products from only one warehouse:

$\sum_{j}{W_{jk}} = 1 \quad \forall k$

**Logical constraints:** Finally, it is necessary to introduce some logical constraints. First, a warehouse must be built if it supplies to any region.

$W_{jk} \leq Y_{j} \quad \forall j, \forall k$

Since both decision variables are binary, with this constraint we make sure that if any warehouse supplies to a region, i.e. if any $W_{jk}$ for any value of k is 1, then we must build a warehouse in that location, i.e. $Y_{j}$ must be 1.

This is an alternative formulation of this constraint:

$\sum_{k} W_{jk} \leq M·Y_{j} \quad \forall j$

Where M is a large number. In this alternative formulation, again, if any warehouse supplies a region, we make sure that the warehouse is built. Otherwise, $Y_{j}$ must be zero.

Similarly, a warehouse will supply a region only when the amount transported of all products from such warehouse to such region is non zero.

$\sum_{l}T_{jkl} \leq M·W_{jk} \quad \forall j, \forall k$

In [1]:
import pandas as pd
import pulp
from IPython.display import display, Markdown
import os


def solve():
    #First, we define the indices as arrays:
    plants = [1, 2]                 # This will be our index i
    regions = [1, 2, 3, 4, 5, 6]    # This will be our index k
    products = [1, 2, 3]            # This will be our index l
    warehouses = [1, 2, 3, 4]       # This will be our index j

    # Now we load the data into Pandas dataframes
    # demand from region k of product l
    pd_demands = pd.read_excel('Demands.xlsx', index_col=[0, 1], headers=[2])
    d = pd_demands['Demand']        # In the expressions, we will use it as d[k,l]

    # cost of transporting 1 unit of product l from plant i to warehouse j
    pd_costs_pw = pd.read_excel('Transport_P_W.xlsx', index_col=[0, 1, 2], headers=[3, 4])
    a = pd_costs_pw['CostPW']       # In the expressions, we will use it as a[i, j, l]

    # cost of transporting 1 unit of product l from warehouse j to region k
    pd_costs_wr = pd.read_excel('Transport_W_R.xlsx', index_col=[0, 1, 2], headers=[3, 4])
    b = pd_costs_wr['CostWR']       # In the expressions, we will use it as b[j, k, l]

    # cost of building warehouse j
    pd_warehouses = pd.read_excel('Warehouses.xlsx', index_col=[0], headers=[1, 2, 3])
    I = pd_warehouses['CostBuilding']  # In the expressions, we will use it as I[j]

    # yearly operation cost of warehouse j
    F = pd_warehouses['YOperation']    # In the expressions, we will use it as F[j]

    # handling cost of 1 unit of product l in warehouse j
    pd_warehouses_products = pd.read_excel('Warehouses_Products.xlsx', index_col=[0, 1], headers=[2])
    v = pd_warehouses_products['HandlingCost']  # In the expressions, we will use it as v[j,l]

    # yearly production capacity of product l in plant i
    pd_plants_products = pd.read_excel('Plants_Products.xlsx', index_col=[0, 1], headers=[2])
    c = pd_plants_products['YPCapacity']  # In the expressions, we will use it as c[i,l]

    #Some of the data is given as constants
    # maximum amount of warehouses to build
    C = 3

    # maximum investment for warehouses building
    B = 400

    # A very large number of products
    M = 99999
    
    # First we create our problem model
    model = pulp.LpProblem("Transport Planning", pulp.LpMinimize)

    #Then we define our decision variables
    # binary { 1 if a warehouse j is built, 0 otherwise }
    Y = pulp.LpVariable.dicts("Y",
                              [j for j in warehouses],
                              lowBound=0,
                              cat='Binary')

    #  binary { 1 if the warehouse j supplies the region k } { 0 otherwise }
    W = pulp.LpVariable.dicts("W",
                              [(j, k) for j in warehouses for k in regions],
                              lowBound=0,
                              cat='Binary')

    # units of product l transported from plant i to warehouse j
    S = pulp.LpVariable.dicts("S",
                              [(i, j, l) for i in plants for j in warehouses for l in products],
                              lowBound=0,
                              cat='Integer')

    # units of product l transported from warehouse j to region k
    T = pulp.LpVariable.dicts("T",
                              [(j, k, l) for j in warehouses for k in regions for l in products],
                              lowBound=0,
                              cat='Integer')

    # We define functions for the different expressions in the costs
    def transportation_costs_pw():
        return pulp.lpSum([
            a[i, j, l] * S[i, j, l]
        for i in plants for j in warehouses for l in products])

    def transportation_costs_wr():
        return pulp.lpSum([
            b[j, k, l] * T[j, k, l]
        for j in warehouses for k in regions for l in products
        ])

    def warehouse_costs():
        return pulp.lpSum([
            d[k, l] * v[j, l] * W[j, k]
        for j in warehouses for k in regions for l in products])

    def operation_costs():
        return pulp.lpSum([
            F[j]*Y[j]
            for j in warehouses])

    def building_costs():
        return pulp.lpSum([
            I[j]*Y[j]
            for j in warehouses])

    #This is our actual objective function:
    model += transportation_costs_pw() + transportation_costs_wr() + warehouse_costs() + operation_costs() + building_costs(), "Distribution Costs"

    #Now we introduce the different demands
    #subject to:
    #The demand for all products in all regions must be satisfied

    for k in regions:
        for l in products:
            model += pulp.lpSum([
                T[j, k, l]
                for j in warehouses]) == d[k, l], "Demand" + str((k, l))

    # The amount of each product which arrives to a warehouse should be equal to the amount which exit from that warehouse
    for j in warehouses:
        for l in products:
            model += pulp.lpSum([
                S[i, j, l]
                for i in plants]) == pulp.lpSum([
                T[j, k, l]
                for k in regions]), "Flow " + str((j, l))

    # The amount of each product produced by a plant should not exceed the production capacity
    for i in plants:
        for l in products:
            model += pulp.lpSum([
                S[i, j, l]
                for j in warehouses]) <= c[i, l], "Plant capacity limit " + str((i, l))

    # The amount of money invested in building warehouses should not exceed the investment budget
    model += pulp.lpSum([
        I[j] * Y[j]
        for j in warehouses]) <= B, "Building cost budget"

    # Each region receives its products from only one warehouse
    for k in regions:
        model += pulp.lpSum([
            W[j, k]
        for j in warehouses]) == 1, "single source distribution " + str(k)

    # The number of warehouses built should not exceed the limit
    model += pulp.lpSum([
        Y[j]
    ]) <= C, "Warehouse limit constraint"

    # A warehouse will supply a region only when the amount transported of all products from such warehouse to such region is nonzero
    for j in warehouses:
        for k in regions:
            model += pulp.lpSum([
                T[j, k, l]
            for l in products]) <= M*W[j, k], "Logic constraint warehouse " + str((j, k))

    # A warehouse should be built if it supplies to any region
    for j in warehouses:
        for k in regions:
            model += pulp.lpSum([
                W[j, k]
            ]) <= Y[j], "Logic constraint 2 warehouse " + str((j, k))

    model.solve()
    print(pulp.LpStatus[model.status])

    # Solution
    max_z = pulp.value(model.objective)
    print(max_z)

    #Now we create dataframes with the solution:
    
    Y_df = pd.DataFrame.from_dict(Y, orient="index",
                           columns=["Y"], dtype=object)
    Y_df["Solution"] = Y_df["Y"].apply(lambda item: item.varValue)
    Y_df = pd.DataFrame(Y_df["Solution"], index=warehouses, columns=["Solution"])
    display(Y_df)

    W_df = pd.DataFrame.from_dict(W, orient="index",
                                  columns=["W"], dtype=object)
    W_df["Solution"] = W_df["W"].apply(lambda item: item.varValue)
    w_idx = pd.MultiIndex.from_product([warehouses, regions])
    W_df = pd.DataFrame(W_df["Solution"], index=w_idx, columns=["Solution"])
    display(W_df)

    S_df = pd.DataFrame.from_dict(S, orient="index",
                           columns=["S"], dtype=object)
    S_df["Solution"] = S_df["S"].apply(lambda item: item.varValue)
    s_idx = pd.MultiIndex.from_product([plants, warehouses, products])
    S_df = pd.DataFrame(S_df["Solution"], index=s_idx,
                        columns=["Solution"])
    display(S_df)

    T_df = pd.DataFrame.from_dict(T, orient="index",
                                  columns=["T"], dtype=object)
    T_df["Solution"] = T_df["T"].apply(lambda item: item.varValue)
    t_idx = pd.MultiIndex.from_product([warehouses, regions, products])
    T_df = pd.DataFrame(T_df["Solution"], index=t_idx, columns=["Solution"])
    display(T_df)

    if os.path.exists('solution.xlsx'):
        os.remove('solution.xlsx')
    else:
        print("Close the solution file and try to solve again")

    with pd.ExcelWriter('solution.xlsx') as writer:
        Y_df.to_excel(writer, sheet_name='Y', index_label=['warehouses'])
        W_df.to_excel(writer, sheet_name='W', index_label=['warehouses', 'regions'])
        S_df.to_excel(writer, sheet_name="S", index_label=['plants', 'warehouses', 'products'])
        T_df.to_excel(writer, sheet_name="T", index_label=['warehouses', 'regions', 'products'])

solve()

Optimal
152759.0
