For this assignment, you will replicate the supplier selection analysis we covered in class on a new dataset. Specifically, in this dataset you will be evaluating suppliers of refridgerated warehouse services.

The following code block imports some necessary libraries.

In [4]:
import pathlib

import re

import numpy as np
import pandas as pd

from sklearn import preprocessing

import OM527_functions as omf

The following code block defines the `custom_grouper` function that we used in the notebook covering the concept of spend analysis.

In [2]:
def custom_grouper(df, agg_dict, groupby_columns):
    '''
    This function groups the provided DataFrame, df, by the columns
    specified in the groupby_columns argument. The aggregations specified
    in the agg_dict dictionary are applied. Also, each numeric column in the 
    aggregated DataFrame is used to create a proportion column. The aggregated data
    is returned as a DataFrame sorted by the keys of the agg_dict
    dictionary, in the order they are specified, i.e., first key
    has a higher sort priority than the second, etc...
    '''
    
    grouped_df = df.groupby(groupby_columns).agg(agg_dict)
    
    grouped_df.columns = ['_'.join(col).strip() for col in grouped_df.columns.values]
    
    numeric_columns = grouped_df.select_dtypes(include='number').columns.tolist()

    for column in numeric_columns:
        grouped_df[f'{column}_proportion'] = (grouped_df[column]/grouped_df[column].sum())
        
    grouped_df = grouped_df.sort_values(numeric_columns)

    return grouped_df

1) Read in the supplier data stored in the file `rw_supplier_data.csv` and print the first five rows. **(5 points)**

In [5]:
wb = pd.read_excel("data/OM_527_WCE_Questions.xlsx")

wb.head()

Unnamed: 0,Alternative,Credit Rating,Location,Delivery Guarantee,Annual Revenue
0,Alternative 1,B,Alabama,+/- 6 Hours,1015650.65
1,Alternative 2,B-,Alabama,+/- 24 Hours,1023218.1
2,Alternative 3,B-,Alabama,+/- 6 Hours,940268.39
3,Alternative 4,B,Georgia,+/- 6 Hours,976207.83
4,Alternative 5,B,Alabama,+/- 6 Hours,857593.91


2) Limit the data so that it only includes companies with an NAICS decription of "Refrigerated Warehousing & Storage". **(5 points)**

4) Use the `Credit Score Alpha` column to assign values to a new column named `Credit Score Num` based on the following mapping:

- A+ = 1
- A = 0.8
- B+ = 0.6
- B = 0.4
- C+ = 0.2

**(5 points)**

In [7]:
Credit_Dict = {
    'A+' : 1,
    'A' : 0.98,
    'A-' : 0.95,
    'B+' : 0.9,
    'B' : 0.8,
    'B-' : 0.7,
    'C+' : 0.6,
    'C' : 0.5,
    'C-' : 0.0,
}

Locat_dict = {
    'Alabama' : 1,
    'Tennessee' : 0.4,
    'Georgia' : 0.4,
    'Mississippi' : 0.4,
    
    
}
Delivery_Dict = {
    '+/- 6 Hours' : 1,
    '+/- 24 Hours' : 0.4,
    'NA' : 0.4,
}

weights = {
    'Credit Rating' : 0.125,
    'Location' : 0.25,
    'Delivery Guarantee' : 0.375,
    'Annual Revenue criteria' : 0.125,
    
}

#df4 = wb

wb['Credit Rating'] = wb['Credit Rating'].map(Credit_Dict)
wb['Location'] = wb['Location'].map(Locat_dict)
wb['Delivery Guarantee'] = wb['Delivery Guarantee'].map(Delivery_Dict)

wb.to_clipboard()

wb

Unnamed: 0,Alternative,Credit Rating,Location,Delivery Guarantee,Annual Revenue
0,Alternative 1,,,,1015650.65
1,Alternative 2,,,,1023218.10
2,Alternative 3,,,,940268.39
3,Alternative 4,,,,976207.83
4,Alternative 5,,,,857593.91
...,...,...,...,...,...
95,Alternative 96,,,,918700.70
96,Alternative 97,,,,1027451.64
97,Alternative 98,,,,910908.49
98,Alternative 99,,,,884264.47


5) Create a column called `Location Score` that equals 1 if the company is in Alabama and 0.5 if the company is located elsewhere. **(10 points)**

In [8]:
#df5 = wb

tacos = lambda x: 1 if str(x) == "AL" else 0.5 # I forget the type coercion behavior

wb['Location Score'] = wb['State'].apply(tacos)
wb[['Location Score', 'State']]

Unnamed: 0,Location Score,State
1,1.0,AL
2,1.0,AL
4,1.0,AL
5,1.0,AL
9,1.0,AL
...,...,...
146,0.5,TN
151,0.5,TN
154,0.5,TN
155,0.5,TN


6) Recall that some companies operated multiple locations in the data we used to demonstrate supplier selection methods. We only want one observation for each company in our final data. Ensure that the current data only includes one entry for each company. If a company has multiple entries, reconstruct the data so that we **average the values in the `Credit Score Num` and `Location Score` columns for companies with multiple locations. Also, sum the values in the `Location Sales Volume Actual` column for companies with multiple locations.** **(10 points)**

In [9]:
groupby_columns6 = ['Company Name']

agg_dict6 = {
    'Credit Score Num':['mean'],
    'Location Score':['mean'],
    'Location Sales Volume Actual':['sum'],
}

df6 = wb.groupby(groupby_columns6).agg(agg_dict6)
df6.columns = agg_dict6.keys()
df6 = df6.reset_index()
#print(wb[wb['Company Name'] == "ACBL"])
df6

Unnamed: 0,Company Name,Credit Score Num,Location Score,Location Sales Volume Actual
0,Agro Merchants,0.8,0.5,1317000
1,Albany Cold Storage LLC,1.0,0.5,1080000
2,Ameri Peanut Growers Cold Stge,1.0,0.5,10846000
3,American Cold Storage,1.0,0.5,1174000
4,Americold,1.0,1.0,2818000
5,Americold Corp,0.6,0.5,777000
6,Americold Logistics,1.0,0.625,81098000
7,Americold Logistics LLC,1.0,0.5,2360000
8,B & B Transport Warehousing,0.6,0.5,665000
9,Bayou Ice Boxes,0.8,1.0,2060000


7) Normalize the data in the `Location Sales Volume Actual` so that all values are between 0 and 1 with the lowest value in the column being assigned a value of zero and the highest value in the column being assigned a value of one. **(5 points)**

In [10]:
df7 = df6

sales_array = df7['Location Sales Volume Actual'].values.reshape((-1,1))

df7['Location Sales Volume Actual'] = preprocessing.MinMaxScaler().fit_transform(sales_array)

df7

Unnamed: 0,Company Name,Credit Score Num,Location Score,Location Sales Volume Actual
0,Agro Merchants,0.8,0.5,0.01624
1,Albany Cold Storage LLC,1.0,0.5,0.013317
2,Ameri Peanut Growers Cold Stge,1.0,0.5,0.133739
3,American Cold Storage,1.0,0.5,0.014476
4,Americold,1.0,1.0,0.034748
5,Americold Corp,0.6,0.5,0.009581
6,Americold Logistics,1.0,0.625,1.0
7,Americold Logistics LLC,1.0,0.5,0.029101
8,B & B Transport Warehousing,0.6,0.5,0.0082
9,Bayou Ice Boxes,0.8,1.0,0.025401


8) Assume that the following weights are given for the `Credit Score Num`, `Location Sales Volume Actual`, and `Location Score` values:
- `Credit Score Num` = 9
- `Location Score` = 8
- `Location Sales Volume Actual` = 5

Normalize these weights and use them to determine weighted sum and weighted product scores for all of the companies. What are the top 3 companies by each scoring method? **(30 points - 10 for weight normalization, 10 for correct application of weighted sum, and 10 for correct application of weighted product)**

In [11]:
df8 = df7
df9 = df7

weights = {
    'Credit Score Num' : 9,
    'Location Score' : 8,
    'Location Sales Volume Actual' : 5,
}

weights = {key : value/sum(weights.values()) for key, value in weights.items()}

weights

{'Credit Score Num': 0.4090909090909091,
 'Location Score': 0.36363636363636365,
 'Location Sales Volume Actual': 0.22727272727272727}

In [13]:
df8['WS'] = omf.mcdm.compute_weighted_sum(df8, weights)
df8.sort_values(by = 'WS', ascending=False)

Unnamed: 0,Company Name,Credit Score Num,Location Score,Location Sales Volume Actual,WS
6,Americold Logistics,1.0,0.625,1.0,0.863636
22,Gulf States Refrigerated Stge,1.0,1.0,0.053972,0.784994
4,Americold,1.0,1.0,0.034748,0.780625
28,Mid-South Distributors Inc,1.0,1.0,0.027929,0.779075
35,Seaonus,1.0,1.0,0.023194,0.777999
9,Bayou Ice Boxes,0.8,1.0,0.025401,0.696682
17,Decatur Business Park,0.8,1.0,0.024279,0.696427
38,Southern Customs,0.8,1.0,0.024119,0.696391
36,Serv-Cold LLC,0.8,1.0,0.023502,0.696251
42,Sun States Refrigerated Svc,0.8,1.0,0.009125,0.692983


In [14]:
df8['WP'] = omf.mcdm.compute_weighted_product(df8, weights)
df8.sort_values(by = 'WP', ascending=False)
print("\n\nTop 3 by Weighted Sum:\n\n")
print(df8.sort_values(by = 'WS', ascending=False).iloc[0:3])
print("\n\nTop 3 by Weighted Product:\n\n")
print(df8.sort_values(by = 'WP', ascending=False).iloc[0:3])



Top 3 by Weighted Sum:


                     Company Name  Credit Score Num  Location Score  \
6             Americold Logistics               1.0           0.625   
22  Gulf States Refrigerated Stge               1.0           1.000   
4                       Americold               1.0           1.000   

    Location Sales Volume Actual        WS        WP  
6                       1.000000  0.863636  0.842897  
22                      0.053972  0.784994  0.515058  
4                       0.034748  0.780625  0.466008  


Top 3 by Weighted Product:


                     Company Name  Credit Score Num  Location Score  \
6             Americold Logistics               1.0        0.625000   
31            Nordic Cold Storage               0.9        0.583333   
22  Gulf States Refrigerated Stge               1.0        1.000000   

    Location Sales Volume Actual        WS        WP  
6                       1.000000  0.863636  0.842897  
31                      0.155898  0.615734

9) Assume that the following preferences are given:
- a high `Credit Score Num` is strongly more preferable than a high `Location Score`,
- a high `Credit Score Num` is extremely more preferable than a high `Location Sales Volume Actual`, and
- a high `Location Score` is moderately more preferable than a high `Location Sales Volume Actual`.

Use these preferences to derive new weights and use these to determine updated weighted sum and weighted product scores for all companies. Again, specify the top 3 companies according to each scoring method. **(20 points)**

In [15]:
comparison_data = [
    [1, 5, 9],
    [0.2, 1, 3],
    [0.111, 0.333, 1],
]

index_vals = ['Credit Score Num', 'Location Score', 'Location Sales Volume Actual']

df_comp = pd.DataFrame(comparison_data, index = index_vals, columns = index_vals)

column_sums = np.sum(df_comp, axis=0)
scores_div = df_comp/column_sums

pv = np.average(scores_div, axis = 1)

# max_eigenvalue = np.inner(pv, column_sums)

# CI = (max_eigenvalue - len(pv))/(len(pv)-1)
# RI = [0, 0, 0, 0.58, 0.9, 1.12, 1.24, 1.32, 1.41]
# CR = np.round(CI/RI[len(pv)], 5)

# print(CR)

weights9 = {
    'Credit Score Num' : pv[0],
    'Location Score' : pv[1],
    'Location Sales Volume Actual' : pv[2]
}

df9

Unnamed: 0,Company Name,Credit Score Num,Location Score,Location Sales Volume Actual,WS,WP
0,Agro Merchants,0.8,0.5,0.01624,0.512782,0.278101
1,Albany Cold Storage LLC,1.0,0.5,0.013317,0.593936,0.291249
2,Ameri Peanut Growers Cold Stge,1.0,0.5,0.133739,0.621304,0.491991
3,American Cold Storage,1.0,0.5,0.014476,0.594199,0.296826
4,Americold,1.0,1.0,0.034748,0.780625,0.466008
5,Americold Corp,0.6,0.5,0.009581,0.42945,0.219285
6,Americold Logistics,1.0,0.625,1.0,0.863636,0.842897
7,Americold Logistics LLC,1.0,0.5,0.029101,0.597523,0.347873
8,B & B Transport Warehousing,0.6,0.5,0.0082,0.429136,0.211663
9,Bayou Ice Boxes,0.8,1.0,0.025401,0.696682,0.396116


In [17]:
df9['WS'] = omf.mcdm.compute_weighted_sum(df8, weights)
df9.sort_values(by = 'WS', ascending=False)

Unnamed: 0,Company Name,Credit Score Num,Location Score,Location Sales Volume Actual,WS,WP
6,Americold Logistics,1.0,0.625,1.0,0.863636,0.842897
22,Gulf States Refrigerated Stge,1.0,1.0,0.053972,0.784994,0.515058
4,Americold,1.0,1.0,0.034748,0.780625,0.466008
28,Mid-South Distributors Inc,1.0,1.0,0.027929,0.779075,0.443436
35,Seaonus,1.0,1.0,0.023194,0.777999,0.425104
9,Bayou Ice Boxes,0.8,1.0,0.025401,0.696682,0.396116
17,Decatur Business Park,0.8,1.0,0.024279,0.696427,0.39207
38,Southern Customs,0.8,1.0,0.024119,0.696391,0.39148
36,Serv-Cold LLC,0.8,1.0,0.023502,0.696251,0.389183
42,Sun States Refrigerated Svc,0.8,1.0,0.009125,0.692983,0.313884


In [18]:
df9['WP'] = omf.mcdm.compute_weighted_product(df8, weights)
df9.sort_values(by = 'WP', ascending=False)

Unnamed: 0,Company Name,Credit Score Num,Location Score,Location Sales Volume Actual,WS,WP
6,Americold Logistics,1.0,0.625,1.0,0.863636,0.842897
31,Nordic Cold Storage,0.9,0.583333,0.155898,0.615734,0.516077
22,Gulf States Refrigerated Stge,1.0,1.0,0.053972,0.784994,0.515058
26,Lineage Flint River Svc Inc,1.0,0.5,0.145256,0.623922,0.501315
2,Ameri Peanut Growers Cold Stge,1.0,0.5,0.133739,0.621304,0.491991
4,Americold,1.0,1.0,0.034748,0.780625,0.466008
44,United States Cold Storage,1.0,0.5,0.095909,0.612707,0.456182
30,Nashville Refrigerated Svc,1.0,0.5,0.093492,0.612157,0.453544
28,Mid-South Distributors Inc,1.0,1.0,0.027929,0.779075,0.443436
14,Claxton Cold Storage Inc,1.0,0.5,0.084503,0.610114,0.443243


In [19]:
print("\n\nTop 3 by Weighted Sum:\n\n")
print(df9.sort_values(by = 'WS', ascending=False).iloc[0:3])
print("\n\nTop 3 by Weighted Product:\n\n")
print(df9.sort_values(by = 'WP', ascending=False).iloc[0:3])



Top 3 by Weighted Sum:


                     Company Name  Credit Score Num  Location Score  \
6             Americold Logistics               1.0           0.625   
22  Gulf States Refrigerated Stge               1.0           1.000   
4                       Americold               1.0           1.000   

    Location Sales Volume Actual        WS        WP  
6                       1.000000  0.863636  0.842897  
22                      0.053972  0.784994  0.515058  
4                       0.034748  0.780625  0.466008  


Top 3 by Weighted Product:


                     Company Name  Credit Score Num  Location Score  \
6             Americold Logistics               1.0        0.625000   
31            Nordic Cold Storage               0.9        0.583333   
22  Gulf States Refrigerated Stge               1.0        1.000000   

    Location Sales Volume Actual        WS        WP  
6                       1.000000  0.863636  0.842897  
31                      0.155898  0.615734