Simple outline for this notebook:   

1. Introduce the logic of the current code.

2. Show the whole code of the k-nearest p-median model.

3. Use the example to test the model.

#### 1. The logic

a. The logic of `k_nearest_loop()`.   

It can find the optimal solution, when a full distance/travel cost dataframe and the intial k value are given.

![Flowchart](flowchart/k_nearest_loop().png)

b. The logic of `from_cost_table()`

from_cost_table()
   |
   +--> k_nearest_dataframe()
   |
   +--> build_sparse_matrix()
   |
   +--> from_cost_matrix() <---

c. The content of `from_cost_matrix()`

1. Three decision variables:   
    one for the pair between clients and facilities,     
    one for the placeholder facilities,     
    one for all the facilities.

2. One objective function

3. Three constraints:   
    one is about assignment constraint (each client must be served by one facility);    
    one is about the overall accommodation constraint (the whole facilities are able to serve all the demands);    
    the last one is about the individual capacity constraint (each facility has a capacity)

#### 2. The complete code (binary version)

In [2]:
import pandas as pd
import numpy as np
import pulp
from sklearn import preprocessing
from scipy.sparse import csr_matrix
from scipy.sparse import find

In [14]:
def clean_travel_table(full_travel_table, facility_name, client_name, cost_column):
    """
    rename key columns in the travel table and create new indices for clients and facilities.
    """
    # rename key columns
    full_travel_table.rename(
        columns={
            client_name: "client",
            facility_name: "facility",
            cost_column: "cost",
        },inplace=True)

    # create new index for clients and facilities
    # sparse matrix only accpet integar as the index, so if the original ID of clients or facilities are string or
    # contain string, we need to create a new integar index for them
    encoder = preprocessing.LabelEncoder()
    full_travel_table["facility_index"] = encoder.fit_transform(full_travel_table['facility'])
    full_travel_table["client_index"] = encoder.fit_transform(full_travel_table['client'])
    return full_travel_table

def k_nearest_dataframe(distance, k_list):
    """
    create the dataframe contains the distance between the clients and their k nearest facilities;
    """
    result = pd.DataFrame()
    for client, k in zip(distance['client'].unique(), k_list):
        k_per_client = (
            distance[distance['client'] == client]
            .nsmallest(k, 'cost')
            .reset_index(drop=True)
        )
        result = pd.concat([result, k_per_client], ignore_index=True)
    return result

def build_sparse_matrix(table, row_shape, column_shape):
    """
    build the sparse matrix contains the distance between the clients and their k nearest facilities;
    """
    data = table['cost'].values
    row = table['client_index'].values
    col = table['facility_index'].values
    sparse_matrix = csr_matrix((data, (row, col)), shape=(row_shape, column_shape))
    return sparse_matrix

def create_k_list(decision_g, k_list):
    """
    increase the k value of client with the g_i > 0, create a new k list
    """
    new_k_list = k_list.copy()
    for i in range(len(decision_g)):
        if decision_g[i].value() > 0:
            new_k_list[i] = new_k_list[i] + 1
    return new_k_list

def from_cost_table(full_travel_table, k, demand=None, capacity=None ):
    """
    transform a cost table into a sparse distance matrix and subsequently solve a location allocation problem
    """
    # get the total number of clients and facilities
    row_shape = full_travel_table["client"].nunique()
    column_shape = full_travel_table["facility"].nunique()

    # call other functions
    k_nearest_table = k_nearest_dataframe(full_travel_table, k)
    sparse_distance_matrix = build_sparse_matrix(k_nearest_table, row_shape, column_shape)

    return from_cost_matrix(sparse_distance_matrix, demand=demand, capacity=capacity)

def from_cost_matrix(sparse_distance_matrix, demand=None, capacity=None):
    """
    create and solve a p-median problem of a given sparse distance matrix, and demand and capacity
    """
    # get the indices for clients and facilities
    n_cli = sparse_distance_matrix.shape[0]
    r_cli = range(n_cli)
    r_fac = range(sparse_distance_matrix.shape[1])

    # get the demand
    demand_sum = demand.sum()
    demand = np.reshape(demand, (n_cli, 1))

    # find the indices in this matrix
    row_indices, col_indices, values = find(sparse_distance_matrix)

    # set up the problem
    problem = pulp.LpProblem("k-nearest-p-median", pulp.LpMinimize)

    # set the decision variable for the pair between client and k nearest facilities
    decision = pulp.LpVariable.dicts("x", [(i, j) for i, j in zip(row_indices, col_indices)], 0, 1, pulp.LpBinary)

    # set the decision variable for placeholder facility
    decision_g = pulp.LpVariable.dicts("g", (i for i in r_cli), 0, 1, pulp.LpBinary)

    # set the decision variable for all the facilities
    decision_f = pulp.LpVariable.dicts("y", (j for j in r_fac), 0, 1, pulp.LpBinary)

    # set the objective
    # to complete the objective, we need to get the maximum distance for each client first
    max_distance = sparse_distance_matrix.max(axis=1).toarray().flatten()
    objective = pulp.lpSum(
        pulp.lpSum(
            decision.get((i, j), 0) * sparse_distance_matrix[i, j] for j in r_fac
        )
        + (decision_g[i] * (max_distance[i] + 1))
        for i in r_cli
    )
    problem += objective

    # constraint 1. Each client is assigned to a facility
    for i in r_cli:
        problem += (
            pulp.lpSum(decision.get((i, j), 0) for j in r_fac)
            + decision_g[i] == 1
        )

    # constraint 2. The amount of capacity equals or exceeds the sum of all demands
    problem += pulp.lpSum(decision_f[j] * capacity[j] for j in r_fac) >= demand_sum

    # constraint 3. Demand value the k nearest facility can serve is no more than its capacity.
    for j in col_indices:
        problem += (
            pulp.lpSum(demand[i] * decision.get((i, j), 0) for i in r_cli) <= decision_f[j] * capacity[j]
        )

    problem.solve(pulp.PULP_CBC_CMD(msg=False))
    return problem, decision, decision_g

def k_nearest_loop(
    k_list, full_travel_table, facility_name, client_name, cost_column,demand=None, capacity=None
    ):
    """
    the iteration which can find the optimal solution of a given distance dataframe and k value
    """
    sum_gi = 1
    clean_table = clean_travel_table(full_travel_table, facility_name, client_name, cost_column)
    while sum_gi > 0:
        prob, decision, decision_g = from_cost_table(
            clean_table, k_list, demand=demand, capacity=capacity)
        if prob.status != 1:
            print("This problem doesn't have the optimal solution")
            break
        sum_gi = sum(decision_g[i].value() for i in range(len(decision_g)) if decision_g[i].value() > 0)
        if sum_gi > 0:
            k_list = create_k_list(decision_g, k_list)
    return prob, decision, decision_g

#### 3. Test case

##### a. simple case, set the intial k equal to 1, and through the loop, it needs to be increased to 2.

In [16]:
# Example cost table
cost_table = pd.DataFrame({
    'client': ['Client 1', 'Client 1', 'Client 1', 
               'Client 2','Client 2', 'Client 2'],
    'facility': ['Facility A', 'Facility B', 'Facility C', 
                 'Facility A', 'Facility B', 'Facility C',],
    'cost': [10, 14, 17, 12, 15, 13]
})

# Define other inputs
k_list = [1, 1]  # List of k values for each client
facility_name = 'facility'
client_name = 'client'
cost_column = 'cost'
demand = np.array([1, 1])  # Demand values for each client
capacity = np.array([1, 1, 1])  # Capacity values for each facility

# Solve the location allocation problem
prob, decision, decision_g = k_nearest_loop(k_list, cost_table, 'facility', 'client', 'cost', demand=demand, capacity=capacity)

In [17]:
prob.status

1

In [19]:
cost_table

Unnamed: 0,client,facility,cost,facility_index,client_index
0,Client 1,Facility A,10,0,0
1,Client 1,Facility B,14,1,0
2,Client 1,Facility C,17,2,0
3,Client 2,Facility A,12,0,1
4,Client 2,Facility B,15,1,1
5,Client 2,Facility C,13,2,1


In [20]:
for i in range(len(demand)):
    for j in range(len(capacity)):
        if (i, j) in decision and decision[(i, j)].value() == 1:
            print(i, j)

0 0
1 2


Client 1 - Facility A    
Client 2 - Facility C