### Install wheels for Basemap
- install Proj: https://proj.org/install.html#install
- go to above link >> find Windows: click OSGeo4W >> download 64bit >> following above link's Window section to isntall PROJ
- install basemap wheel and pyproj wheel from link: https://www.lfd.uci.edu/~gohlke/pythonlibs/
- find: Basemap: a matplotlib toolkit for plotting 2D data on maps based on GEOS. 
- find: Pyproj: an interface to the PROJ library for cartographic transformations.
- #### Important: pip install numpy --upgrade ###

### Install wheels for geopandas 
Installing geopandas and its dependencies manually
refer to: https://stackoverflow.com/questions/34427788/how-to-successfully-install-pyproj-and-geopandas

Installing geopandas and its dependencies manually

1. First and most important: do not try to directly pip install or conda install any of the dependencies – if you do, they will fail in some way later, often silently or obscurely, making troubleshooting difficult. If any are already installed, uninstall them now.

2. Download the wheels for GDAL, Fiona, pyproj, rtree, and shapely from Gohlke. Make sure you choose the wheel files that match your architecture (64-bit) and Python version (2.7 or 3.5). If Gohlke mentions any prerequisites in his descriptions of those 5 packages, install the prerequisites now (there might be a C++ redistributable or something similar listed there)

3. If OSGeo4W, GDAL, Fiona, pyproj, rtree, or shapely is already installed, uninstall it now. The GDAL wheel contains a complete GDAL installation – don’t use it alongside OSGeo4W or other distributions.

4. Open a command prompt and change directories to the folder where you downloaded these 5 wheels.

5. pip install the GDAL wheel file you downloaded. Your actual command will be something like: pip install
GDAL-1.11.2-cp27-none-win_amd64.whl

6. Add the new GDAL path to the windows PATH environment variable, something like C:\Anaconda\Lib\site-packages\osgeo
pip install your Fiona wheel file, then your pyproj wheel file, then rtree, and then shapely.

7. Now that GDAL and geopandas’s dependencies are all installed, you can just pip install geopandas from the command prompt

# MilkRun Initial Routing Modeling

In [8]:
# import general packages:
from openpyxl import load_workbook
import win32com.client
import numpy as np
import pandas as pd
from pandas import Grouper
from pandas import Timestamp
import os
import io
import datetime as dt
import time 
import feather
import itertools
from math import sqrt
import csv
import dask.dataframe as dd
from datetime import datetime
import timestring
from IPython.core.display import display, HTML
from collections import Counter

# import modeling packages
from sklearn.cluster import AffinityPropagation
from sklearn.cluster import KMeans
from sklearn import preprocessing, datasets
from sklearn.metrics import pairwise_distances_argmin
from scipy.spatial.distance import cdist,pdist
from scipy import stats
from scipy.sparse import *

# import visualization packages:
from matplotlib import pyplot as plt
# from mpl_toolkits.basemap import Basemap
import seaborn as sns
# import ggplot
%matplotlib inline

# checking path and dir
os.chdir('C:\\Users\\u279014\\Documents\\H_Drive\\7.AA Models\\12.Logistic_Optimization\\data')
os.getcwd()

'C:\\Users\\u279014\\Documents\\H_Drive\\7.AA Models\\12.Logistic_Optimization\\data'

In [4]:
from __future__ import print_function
from ortools.constraint_solver import routing_enums_pb2
from ortools.constraint_solver import pywrapcp

In [9]:
def riding_distance(riding_distance_matrix, geo):
    """
    Compute a distance matrix of the coordinates using a spherical metric.
    :param  
        coordinate_df: numpy.ndarray with shape (n,n); riding_distance_matri: dataframe, col & index type: str 
        geo_zipcode: Data.Series, element type: str
    :returns distance_mat: numpy.ndarray with shape (n, n) containing distance in km between coords.
    """
    d_matrix = []
    zipcodes = geo['zip_code']
    for i in zipcodes:
        d_row = []
        for j in zipcodes:
            d_row.append(riding_distance_matrix.loc[i,j])
        d_matrix.append(d_row)
    return np.asarray(d_matrix)

In [10]:
def load_riding_distance_matrix(path,file):
    riding_distance_matrix = pd.read_excel(os.path.join(path,file)).set_index('zipcode')
    riding_distance_matrix.columns = riding_distance_matrix.columns.astype('str')
    riding_distance_matrix.index = riding_distance_matrix.index.astype('str')
    return riding_distance_matrix

In [None]:
# def distance_on_sphere_numpy(coordinate_df):
#     """
#     Compute a distance matrix of the coordinates using a spherical metric.
#     :param coordinate_array: numpy.ndarray with shape (n,2); latitude is in 1st col, longitude in 2nd.
#     :returns distance_mat: numpy.ndarray with shape (n, n) containing distance in km between coords.
#     """
#     # Radius of the earth in km (GRS 80-Ellipsoid)
#     EARTH_RADIUS = 6371.007176
#     km2mile_ratio = 0.62137

#     # Unpacking coordinates
#     latitudes = coordinate_df.loc[:,'latitude']
#     longitudes = coordinate_df.loc[:,'longitude']

#     # Convert latitude and longitude to spherical coordinates in radians.
#     degrees_to_radians = np.pi/180.0
#     phi_values = (90.0 - latitudes)*degrees_to_radians
#     theta_values = longitudes*degrees_to_radians

#     # Expand phi_values and theta_values into grids
#     theta_1, theta_2 = np.meshgrid(theta_values, theta_values)
#     theta_diff_mat = theta_1 - theta_2

#     phi_1, phi_2 = np.meshgrid(phi_values, phi_values)

#     # Compute spherical distance from spherical coordinates
#     angle = (np.sin(phi_1) * np.sin(phi_2) * np.cos(theta_diff_mat) + 
#            np.cos(phi_1) * np.cos(phi_2))
#     arc = np.arccos(angle)

#     # Multiply by earth's radius to obtain distance in km
#     return np.nan_to_num(arc * EARTH_RADIUS * km2mile_ratio)

##  Modeling Start >>>>>>
## 1. Data_prep
### 1.1 load saved feather supplier-cluster dataset

In [11]:
cass_zip_cluster = feather.read_dataframe('cass_zip_cluster')
cluster_copy = cass_zip_cluster.copy() # make a copy of original dataset
cluster_copy = cluster_copy[cluster_copy.label != -1] # drop label(cluser) = -1, which do not belong to any group
cluster_copy['shipping_date'] = '10-01-2019'

FileNotFoundError: [WinError 2] Failed to open local file 'cass_zip_cluster'. Detail: [Windows error 2] The system cannot find the file specified.


### 1.2 choose supplier-cluster to run milkrun Model

### Select top n supplier-cluster

In [6]:
rank = 1 # option for choosing supplier-cluster to run milkrun
label_no = Counter(cluster_copy.label).most_common()[rank-1][0]
cluster = cluster_copy[cluster_copy.label == label_no]

# only append Greenville WH with sliced clusering
greenville = pd.DataFrame([['0','54942',-88.53557,44.293820,'mid_west','GREENVILLE_WH','WI',0,0,0,999,'10-01-2019']],
                          columns=cluster.columns)
cass_zip_cluster_copy = greenville.append(cluster).reset_index(drop = True)

In [7]:
cass_zip_cluster_copy

Unnamed: 0,index,zip_code,longitude,latitude,cluster,shipper_name,shipper_state,ship_weight,miles,billed_amount,label,shipping_date
0,0,54942,-88.53557,44.29382,mid_west,GREENVILLE_WH,WI,0,0,0.0,999,10-01-2019
1,116,53012,-87.99794,43.305412,mid_west,ATACO STEEL,WI,663,95,44.3,6,10-01-2019
2,117,53024,-87.94573,43.32546,mid_west,"KAPCO, INC.",WI,1519,188,97.31,6,10-01-2019
3,118,53027,-88.37332,43.313361,mid_west,FABRIFAST LLC,WI,10354,602,980.89,6,10-01-2019
4,119,53027,-88.37332,43.313361,mid_west,HARTFORD FIN,WI,885,172,92.24,6,10-01-2019
5,120,53027,-88.37332,43.313361,mid_west,HYDRO ELECTR,WI,540,172,88.88,6,10-01-2019
6,121,53027,-88.37332,43.313361,mid_west,SIGNICAST CORPORATION,WI,6150,172,229.4,6,10-01-2019
7,122,53027,-88.37332,43.313361,mid_west,STEELCRAFT C,WI,7547,344,406.49,6,10-01-2019
8,123,53029,-88.34737,43.132743,mid_west,PRICE ENGINEERING,WI,7173,624,357.89,6,10-01-2019
9,125,53037,-88.17011,43.322213,mid_west,LASER SHOP I,WI,700,83,45.86,6,10-01-2019


### 1.3 Samples Initialization with small selections: 100 locations

In [8]:
path = r'C:\Users\u279014\Documents\H_Drive\7.AA Models\12.Logistic_Optimization\data'
file = r'riding_distance_matrix.xlsx'
riding_distance_matrix = load_riding_distance_matrix(path,file)

In [9]:
cass_zip_toy = cass_zip_cluster_copy[:100]
distance_matrix_toy = riding_distance(riding_distance_matrix, cass_zip_toy)
# distance_matrix_toy = distance_on_sphere_numpy(cass_zip_toy)
df_distance_matrix = pd.DataFrame(distance_matrix_toy,index=cass_zip_toy.zip_code,columns=cass_zip_toy.zip_code)

unique_cass_zip_toy = cass_zip_toy.drop_duplicates(subset=['zip_code'])
unique_distance_matrix_toy = riding_distance(riding_distance_matrix, unique_cass_zip_toy)
# unique_distance_matrix_toy = distance_on_sphere_numpy(unique_cass_zip_toy)
df_unique_distance_matrix = pd.DataFrame(unique_distance_matrix_toy,
                                         index=unique_cass_zip_toy.zip_code,
                                         columns=unique_cass_zip_toy.zip_code)

ship_wight_list_toy = cass_zip_toy.ship_weight
ship_wight_list_toy.sum()

127900

## 2. Model_Prep
### I. Initilizing Opt-model

In [10]:
def create_data_model(distance_matrix=0, 
                      ship_weight_list = 0, 
                      each_vehicle_capacity = 45000, 
                      num_vehicles = 10):
    """Stores the data for the problem."""
    data = {}
    data['distance_matrix']=distance_matrix
    data['demands'] = ship_weight_list
    data['vehicle_capacities'] = [each_vehicle_capacity]*num_vehicles
    data['num_vehicles'] = num_vehicles
    data['depot']=0
    return data

### II. Customized model output_NCv-2

In [11]:
""" optimize algorithm for accurate route """
def print_solution_3(data, manager, routing, assignment):
    """Prints assignment on console."""
    total_distance = 0
    total_load = 0
    
    vehicle_routes = dict() # for list out the same truck pick zipcodes

    for vehicle_id in range(data['num_vehicles']):
        index = routing.Start(vehicle_id)
        plan_output = 'Route for vehicle {}:\n'.format(vehicle_id)
        plan_output_backward = 'Route for vehicle {}:\n'.format(vehicle_id) # if backward is shorter path
        route_distance = 0
        route_load = 0
        edge_distance = []
        while not routing.IsEnd(index):
            node_index = manager.IndexToNode(index)
            route_load += data['demands'][node_index]
            plan_output += ' {0} Load({1}) -> '.format(node_index, route_load)
            plan_output_backward += ' {0} Load({1}) <- '.format(node_index, route_load) # if backward is shorter path
            
            previous_index = index            
            index = assignment.Value(routing.NextVar(index))
            
            if vehicle_id in vehicle_routes:
                vehicle_routes[vehicle_id].append(node_index)   # adding zipcodes to same truck
            else:
                vehicle_routes[vehicle_id] = [node_index]
            
            route_distance += routing.GetArcCostForVehicle(previous_index, index, vehicle_id)
            edge_distance.append(routing.GetArcCostForVehicle(previous_index, index, vehicle_id))
        
        # adding destination to entire route

        """ this situation is Fudging Headacheeeeeeee"""
        # distance from greenville to first supplier is larger than last supplier to greenville, 
        # truck starts from first supplier, remove first span of driving from VRP
        if edge_distance[0] >= edge_distance[-1]:
            vehicle_routes[vehicle_id].append(0)
            vehicle_routes[vehicle_id].pop(0)
            route_distance = route_distance - edge_distance[0]
            plan_output += ' {0} Load({1})\n'.format(manager.IndexToNode(index),route_load)
            plan_output += 'Distance of the route: {} miles\n'.format(route_distance)
            plan_output += 'Load of the route: {}\n'.format(route_load)
            # print(plan_output)
            print(plan_output.replace('0 Load(0) ->  ',''))
            total_distance += route_distance
            total_load += route_load
        
        # truck starts form last supplier,remove last span of driving from VRP
        else:
            route_distance = route_distance - edge_distance[-1]
            vehicle_routes[vehicle_id] = vehicle_routes[vehicle_id][::-1]
            plan_output_backward += ' {0} Load({1})\n'.format(manager.IndexToNode(index),route_load)
            plan_output_backward += 'Distance of the route: {} miles\n'.format(route_distance)
            plan_output_backward += 'Load of the route: {}\n'.format(route_load)
            print(plan_output_backward)
            total_distance += route_distance
            total_load += route_load
    print('Total distance of all routes: {} miles'.format(total_distance))
    print('Total load of all routes: {}'.format(total_load))
    return vehicle_routes

### III. Running Opt_Medel: initialize truck_max_capacity & total truck_available

In [12]:
# Initiate data problem
data = create_data_model(distance_matrix=distance_matrix_toy,
                         ship_weight_list=ship_wight_list_toy,
                         each_vehicle_capacity=45000,
                         num_vehicles=10)

In [52]:
# Create routing index manager
manager = pywrapcp.RoutingIndexManager(len(data['distance_matrix']),data['num_vehicles'],data['depot'])

In [53]:
# Create Routing Model
routing = pywrapcp.RoutingModel(manager)

In [54]:
# Register transit callback
def distance_callback(from_index, to_index):
    from_node = manager.IndexToNode(from_index)
    to_node = manager.IndexToNode(to_index)
    return data['distance_matrix'][from_node][to_node]

transit_callback_index = routing.RegisterTransitCallback(distance_callback)

In [55]:
# Define cost of each arch
routing.SetArcCostEvaluatorOfAllVehicles(transit_callback_index)

In [56]:
# Add Capacity constraint
def demand_callback(from_index):
    from_code = manager.IndexToNode(from_index)
    return data['demands'][from_code]

demand_callback_index = routing.RegisterUnaryTransitCallback(demand_callback)

In [57]:
routing.AddDimensionWithVehicleCapacity(demand_callback_index,
        0,  # null capacity slack
        data['vehicle_capacities'],  # vehicle maximum capacities
        True,  # start cumul to zero
        'Capacity')

True

In [58]:
# Setting first solution heuristic.
search_parameters = pywrapcp.DefaultRoutingSearchParameters()
search_parameters.first_solution_strategy = (routing_enums_pb2.FirstSolutionStrategy.PATH_CHEAPEST_ARC)

In [59]:
# Solve the problem.
assignment = routing.SolveWithParameters(search_parameters)

In [60]:
if assignment:
    route_dictionary = print_solution_3(data,manager,routing,assignment)

Route for vehicle 0:
 0 Load(0)
Distance of the route: 0 miles
Load of the route: 0

Route for vehicle 1:
 0 Load(0)
Distance of the route: 0 miles
Load of the route: 0

Route for vehicle 2:
 0 Load(0)
Distance of the route: 0 miles
Load of the route: 0

Route for vehicle 3:
 0 Load(0)
Distance of the route: 0 miles
Load of the route: 0

Route for vehicle 4:
 0 Load(0)
Distance of the route: 0 miles
Load of the route: 0

Route for vehicle 5:
 0 Load(0)
Distance of the route: 0 miles
Load of the route: 0

Route for vehicle 6:
 0 Load(0)
Distance of the route: 0 miles
Load of the route: 0

Route for vehicle 7:
 19 Load(2480) ->  20 Load(4685) ->  21 Load(5990) ->  29 Load(6643) ->  30 Load(9431) ->  23 Load(13603) ->  22 Load(14468) ->  27 Load(15025) ->  26 Load(42785) ->  28 Load(43125) ->  0 Load(43125)
Distance of the route: 197 miles
Load of the route: 43125

Route for vehicle 8:
 0 Load(0) <-  18 Load(436) <-  17 Load(6175) <-  10 Load(6425) <-  9 Load(7125) <-  2 Load(8644) <-  1 

## 3. Result Visualization to PowerBI

In [61]:
def route_schedule(route_dictionary):
    """ generat truck:pick_node map in dataFrame """
    df = pd.DataFrame()
    for k in route_dictionary.keys():
        if len(route_dictionary[k]) == 1: # this step eliminate dummy trucks like #0,#1 trucks doing nothing
            continue
        for v in route_dictionary[k]:
            df = df.append(pd.DataFrame({'truck_number':[k],'pick_node':[v]}))
    return df.reset_index(drop = True)

In [62]:
route_schedule = route_schedule(route_dictionary)

### Note: input of Graph must be unique distance matrix 

In [63]:
def distance_index(df,x):
    '''
    param:
        df: distance matrix with UNIQUE index & columns
        x: truck location source and truck location next-stop 
    return:
        DataFrame: distance matrix
    '''
    try:
        return df.loc[x[0],x[1]]
    except:
        return 0

In [64]:
route_in_weight = route_schedule.merge(cass_zip_toy,left_on='pick_node',right_index=True,how='left')
route_in_weight['next_zip_code'] = route_in_weight.groupby(['truck_number'])['zip_code'].shift(-1)
route_in_weight['next_shipper_name'] = route_in_weight.groupby(['truck_number'])['shipper_name'].shift(-1)

route_in_weight['milk_run_distance'] = route_in_weight[['zip_code','next_zip_code']].apply(lambda x: round(distance_index(df_unique_distance_matrix,x)),axis=1)
route_in_weight['stop_number'] = route_in_weight.groupby('truck_number').cumcount()

In [65]:
route_in_weight.to_csv(r'C:\Users\u279014\Documents\H_Drive\7.AA Models\12.Logistic_Optimization\data\route_in_weight.csv',index=True,index_label='time_sequence')
# route_in_weight.to_csv(r'S:\CORP-Share\DEPT\IT\DT-AA\FY20\GPSC\UseCases\8. Logistics Route Optimization\route_in_weight.csv',index=True,index_label='time_sequence')

##  Analytical Result: Miles & Cost Saving Comparison

In [66]:
# distance matrix
df_unique_distance_matrix

zip_code,54942,53012,53024,53027,53029,53037,53051,53072,53092,53095,...,53149,53150,53154,53172,53186,53206,53207,53208,53403,53404
zip_code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
54942,0.0,94.9,94.4,83.9,100.1,83.2,93.7,97.8,98.6,79.7,...,126.9,119.2,123.8,126.3,103.6,103.9,115.1,102.8,140.5,134.4
53012,94.8,0.0,6.7,22.4,33.8,11.7,13.2,28.5,6.6,17.8,...,46.7,39.0,40.0,42.4,32.3,24.1,29.5,18.5,56.7,50.5
53024,94.3,6.7,0.0,21.9,40.4,11.2,19.8,35.1,9.9,17.3,...,50.5,41.3,38.0,40.4,37.7,22.1,27.5,24.9,54.7,48.5
53027,81.3,22.3,21.9,0.0,17.1,10.7,21.2,23.4,26.1,14.9,...,34.9,46.7,51.3,53.8,30.8,31.4,42.6,30.3,68.0,61.8
53029,103.6,34.0,41.1,18.1,0.0,35.0,22.8,5.7,29.8,39.2,...,20.3,33.2,38.5,41.0,13.1,22.2,30.1,20.3,55.2,49.0
53037,83.1,11.7,11.2,10.7,26.0,0.0,14.7,29.6,14.9,6.1,...,47.8,40.1,44.7,47.2,33.4,24.8,36.0,23.7,61.4,55.3
53051,93.3,13.2,20.3,21.3,21.5,14.1,0.0,16.2,9.0,18.3,...,34.4,26.7,31.3,33.8,20.0,11.3,22.6,10.3,48.0,41.9
53072,98.1,28.6,35.7,23.1,5.5,29.5,17.4,0.0,24.4,33.8,...,23.5,27.8,33.1,35.6,7.7,16.8,24.7,14.8,49.8,43.7
53092,98.3,6.6,9.9,26.2,29.6,14.9,8.9,24.3,0.0,23.3,...,42.5,34.7,35.7,38.2,28.0,19.8,25.3,14.3,52.4,46.3
53095,79.4,18.0,17.6,14.9,30.2,6.4,18.9,33.9,23.7,0.0,...,52.1,44.3,49.0,51.4,37.6,29.0,40.2,27.9,65.6,59.5


In [67]:
# routing work-order
route_in_weight[['truck_number','shipper_name','zip_code','milk_run_distance','next_shipper_name','next_zip_code','ship_weight','miles']]

Unnamed: 0,truck_number,shipper_name,zip_code,milk_run_distance,next_shipper_name,next_zip_code,ship_weight,miles
0,7,UPI MANUFACT,53119,10.0,"GS GLOBAL RESOURCES, INC.",53149.0,2480,345
1,7,"GS GLOBAL RESOURCES, INC.",53149,21.0,J M GRIMSTAD,53150.0,2205,508
2,7,J M GRIMSTAD,53150,28.0,R & B GRINDI,53403.0,1305,238
3,7,R & B GRINDI,53403,4.0,R & B GRINDI,53404.0,653,560
4,7,R & B GRINDI,53404,12.0,"METALCUT PRODUCTS, INC.",53172.0,2788,268
5,7,"METALCUT PRODUCTS, INC.",53172,4.0,NATIONAL TEC,53154.0,4172,252
6,7,NATIONAL TEC,53154,12.0,JLG MILITARY,53207.0,865,248
7,7,JLG MILITARY,53207,7.0,ELECTRON BEA,53206.0,557,115
8,7,ELECTRON BEA,53206,3.0,BEYOND VISIO,53208.0,27760,312
9,7,BEYOND VISIO,53208,102.0,GREENVILLE_WH,54942.0,340,103


In [68]:
total_tmc_miles = route_in_weight.miles.sum()
total_milk_miles = route_in_weight.milk_run_distance.sum()
miles_saving = (total_tmc_miles-total_milk_miles)
print('-original_miles:{0} \n-milkrun_miles:{1}\n-miles reducton:{2}'.format(total_tmc_miles,total_milk_miles,miles_saving))

-original_miles:8193 
-milkrun_miles:435.0
-miles reducton:7758.0


##  <<<<<<  Modeling Completed

## Financial Impact >>>>>>

In [69]:
def load_data(path,file,sheet_name = None):
    df = pd.read_excel(os.path.join(path,file),sheet_name=sheet_names)
    df = pd.concat(df[frame] for frame in df.keys())
    df.reset_index(drop=True, inplace=True)
    df.to_feather(os.path.join(path,'tmc_feather'))
    return feather.read_dataframe(os.path.join(path,'tmc_feather'))

In [70]:
path = r'C:\Users\u279014\Documents\H_Drive\7.AA Models\12.Logistic_Optimization\data'
file = r'TMC_freight_rate.xlsx'
sheet_names = ['Phase 1','Phase 2','Phase 3','Phase 4','Phase 5']

In [71]:
df = load_data(path=path,file=file,sheet_name=sheet_names)

  if isinstance(df, _pandas_api.pd.SparseDataFrame):


In [72]:
# standardize dataframe colume names
def col_name(df):
    """
    this is to trim the data_frame column names to a unique format:
    all case, replace space to underscore, remove parentheses
    param df:
        raw from share drive for
    return:
        polished data set with new column names
    """
    df.columns = df.columns.str.strip().str.lower().str.replace('-','').str.replace(' ', '_').str.replace('(', '').\
                    str.replace(')', '').str.replace('"','')
    return df

In [73]:
""" Slice tmc """
def clean_tmc(df, sink_state = 'WI', source_states = 'IL'):
    """
    parameter: 
        df: original TMC dataset
        sink_state: destination warehouse, only one locations allowed
        source_states: shipping states, allowing multiple states as source state
    return:
        cleaned TMC including freight_cost from all states to sink_state
    """
    # starndardize col name
    df = col_name(df)
    
    # drop rows if all cols are nan
    df.dropna(how='all',subset=['market_rate_over_quarter_decmar',
       'market_rate_over_jan_2019mar_2020',
       'market_rate_all_offers_jan_2019_mar_2020_no_fb',
       'market_rate_all_offers_jan_2019_mar_2020_with_fb'],inplace=True)
    
    # generate freight_cost = market_rate_all_offers_jan_2019_mar_2020_no_fb or max of all
    df['freight_cost'] = np.round(np.where(df.market_rate_all_offers_jan_2019_mar_2020_no_fb.isnull(),
                               np.max(df,axis=1),
                               df.market_rate_all_offers_jan_2019_mar_2020_no_fb),2)  
    df['source_state'] = df.lane.apply(lambda x: x[:2]) # find source state short code
    df['sink_state'] = df.lane.apply(lambda x: x[-2:]) # find sink state short code
    
    df = df[df.source_state.isin(source_states)] # slice only source state
    df = df[df.sink_state.str.contains(sink_state)] # slice to include destination state only
    df = df.groupby(['source_state','sink_state'])['freight_cost'].mean().reset_index() # average duplidate states to same destination, 
    return df

In [74]:
# generate cleaned TMC dataset
source_states = cluster.shipper_state.unique()
tmc = clean_tmc(df, sink_state='WI', source_states = source_states)

In [75]:
# updating full truck load cost
route_in_weight['milk_run_cost'] = 0
TL_cost = np.max(tmc.freight_cost)
route_in_weight.loc[route_in_weight.groupby('truck_number').tail(1).index,'milk_run_cost'] = TL_cost
route_in_weight.to_csv(r'C:\Users\u279014\Documents\H_Drive\7.AA Models\12.Logistic_Optimization\data\route_in_weight.csv',index=True,index_label='time_sequence')

In [76]:
route_in_weight

Unnamed: 0,truck_number,pick_node,index,zip_code,longitude,latitude,cluster,shipper_name,shipper_state,ship_weight,miles,billed_amount,label,shipping_date,next_zip_code,next_shipper_name,milk_run_distance,stop_number,milk_run_cost
0,7,19,137,53119,-88.47117,42.881035,mid_west,UPI MANUFACT,WI,2480,345,149.48,6,10-01-2019,53149.0,"GS GLOBAL RESOURCES, INC.",10.0,0,0.0
1,7,20,138,53149,-88.34409,42.872477,mid_west,"GS GLOBAL RESOURCES, INC.",WI,2205,508,185.64,6,10-01-2019,53150.0,J M GRIMSTAD,21.0,1,0.0
2,7,21,139,53150,-88.12464,42.901235,mid_west,J M GRIMSTAD,WI,1305,238,96.76,6,10-01-2019,53403.0,R & B GRINDI,28.0,2,0.0
3,7,29,147,53403,-87.80062,42.704519,mid_west,R & B GRINDI,WI,653,560,442.78,6,10-01-2019,53404.0,R & B GRINDI,4.0,3,0.0
4,7,30,148,53404,-87.80534,42.743169,mid_west,R & B GRINDI,WI,2788,268,168.39,6,10-01-2019,53172.0,"METALCUT PRODUCTS, INC.",12.0,4,0.0
5,7,23,141,53172,-87.86395,42.909816,mid_west,"METALCUT PRODUCTS, INC.",WI,4172,252,203.03,6,10-01-2019,53154.0,NATIONAL TEC,4.0,5,0.0
6,7,22,140,53154,-87.8992,42.884347,mid_west,NATIONAL TEC,WI,865,248,92.81,6,10-01-2019,53207.0,JLG MILITARY,12.0,6,0.0
7,7,27,145,53207,-87.89998,42.985465,mid_west,JLG MILITARY,WI,557,115,44.58,6,10-01-2019,53206.0,ELECTRON BEA,7.0,7,0.0
8,7,26,144,53206,-87.93476,43.076179,mid_west,ELECTRON BEA,WI,27760,312,973.78,6,10-01-2019,53208.0,BEYOND VISIO,3.0,8,0.0
9,7,28,146,53208,-87.96618,43.047863,mid_west,BEYOND VISIO,WI,340,103,44.58,6,10-01-2019,54942.0,GREENVILLE_WH,102.0,9,0.0


In [77]:
route_in_weight[['truck_number','shipper_name','zip_code','milk_run_cost','next_shipper_name','next_zip_code','ship_weight','billed_amount']]

Unnamed: 0,truck_number,shipper_name,zip_code,milk_run_cost,next_shipper_name,next_zip_code,ship_weight,billed_amount
0,7,UPI MANUFACT,53119,0.0,"GS GLOBAL RESOURCES, INC.",53149.0,2480,149.48
1,7,"GS GLOBAL RESOURCES, INC.",53149,0.0,J M GRIMSTAD,53150.0,2205,185.64
2,7,J M GRIMSTAD,53150,0.0,R & B GRINDI,53403.0,1305,96.76
3,7,R & B GRINDI,53403,0.0,R & B GRINDI,53404.0,653,442.78
4,7,R & B GRINDI,53404,0.0,"METALCUT PRODUCTS, INC.",53172.0,2788,168.39
5,7,"METALCUT PRODUCTS, INC.",53172,0.0,NATIONAL TEC,53154.0,4172,203.03
6,7,NATIONAL TEC,53154,0.0,JLG MILITARY,53207.0,865,92.81
7,7,JLG MILITARY,53207,0.0,ELECTRON BEA,53206.0,557,44.58
8,7,ELECTRON BEA,53206,0.0,BEYOND VISIO,53208.0,27760,973.78
9,7,BEYOND VISIO,53208,0.0,GREENVILLE_WH,54942.0,340,44.58


In [78]:
truck_used = len(route_in_weight.truck_number.unique())
total_tmc_billed = route_in_weight.billed_amount.sum()
total_milk_cost = round(np.max(tmc.freight_cost)*truck_used,2)
# total_milk_cost = round(float(tmc.freight_cost)*truck_used,2)
cost_saving = round((total_tmc_billed - total_milk_cost),2)
print('-original_cost:{0} \n-milkrun_cost:{1}\n-cost reducton:{2}'.format(total_tmc_billed,total_milk_cost,cost_saving))

-original_cost:7245.49 
-milkrun_cost:1959.3
-cost reducton:5286.19


### Nathan's Note: WI - cluster-13 over charged by miles