For this assignment, you will replicate the supplier selection analysis we covered in class on a new dataset. Specifically, in this dataset you will be evaluating suppliers of refridgerated warehouse services.

The following code block imports some necessary libraries.

In [1]:
import pathlib

import re

import numpy as np
import pandas as pd

from sklearn import preprocessing

import OM527_functions as omf

The following code block defines the `custom_grouper` function that we used in the notebook covering the concept of spend analysis.

In [2]:
def custom_grouper(df, agg_dict, groupby_columns):
    '''
    This function groups the provided DataFrame, df, by the columns
    specified in the groupby_columns argument. The aggregations specified
    in the agg_dict dictionary are applied. Also, each numeric column in the 
    aggregated DataFrame is used to create a proportion column. The aggregated data
    is returned as a DataFrame sorted by the keys of the agg_dict
    dictionary, in the order they are specified, i.e., first key
    has a higher sort priority than the second, etc...
    '''
    
    grouped_df = df.groupby(groupby_columns).agg(agg_dict)
    
    grouped_df.columns = ['_'.join(col).strip() for col in grouped_df.columns.values]
    
    numeric_columns = grouped_df.select_dtypes(include='number').columns.tolist()

    for column in numeric_columns:
        grouped_df[f'{column}_proportion'] = (grouped_df[column]/grouped_df[column].sum())
        
    grouped_df = grouped_df.sort_values(numeric_columns)

    return grouped_df

1) Read in the supplier data stored in the file `rw_supplier_data.csv` and print the first five rows. **(5 points)**

In [3]:
wb = pd.read_csv("data/rw_supplier_data.csv")

wb.head()

Unnamed: 0,Company Name,City,State,ZIP Code,Primary NAICS,Primary NAICS Description,Location Employee Size Actual,Location Sales Volume Actual,Type of Business,Square Footage,Credit Score Alpha,Saturday Open,Sunday Open
0,Advanced Mini Storage,Northport,AL,35473,531130,Lessors Of Miniwarehouses & Self-Storage Units,3,"$400,000",Private,"2,500 - 4,999",B+,Closed,Closed
1,Americold,Montgomery,AL,36108,493120,Refrigerated Warehousing & Storage,30,"$2,818,000",Private,"20,000 - 39,999",A+,,
2,Americold Logistics,Albertville,AL,35951,493120,Refrigerated Warehousing & Storage,32,"$2,645,000",Private,"20,000 - 39,999",A+,,
3,Americold Logistics,Birmingham,AL,35204,541611,Administrative & General Mgmt Consulting Services,6,"$592,000",Private,"2,500 - 4,999",A,,
4,Americold Logistics,Mobile,AL,36615,493120,Refrigerated Warehousing & Storage,25,"$2,239,000",Private,"20,000 - 39,999",A+,,


2) Limit the data so that it only includes companies with an NAICS decription of "Refrigerated Warehousing & Storage". **(5 points)**

In [4]:
wb = wb.loc[wb['Primary NAICS Description'] == "Refrigerated Warehousing & Storage"]

In [5]:
wb

Unnamed: 0,Company Name,City,State,ZIP Code,Primary NAICS,Primary NAICS Description,Location Employee Size Actual,Location Sales Volume Actual,Type of Business,Square Footage,Credit Score Alpha,Saturday Open,Sunday Open
1,Americold,Montgomery,AL,36108,493120,Refrigerated Warehousing & Storage,30,"$2,818,000",Private,"20,000 - 39,999",A+,,
2,Americold Logistics,Albertville,AL,35951,493120,Refrigerated Warehousing & Storage,32,"$2,645,000",Private,"20,000 - 39,999",A+,,
4,Americold Logistics,Mobile,AL,36615,493120,Refrigerated Warehousing & Storage,25,"$2,239,000",Private,"20,000 - 39,999",A+,,
5,Bayou Ice Boxes,Irvington,AL,36544,493120,Refrigerated Warehousing & Storage,23,"$2,060,000",Private,"10,000 - 19,999",A,Closed,Closed
9,Decatur Business Park,Decatur,AL,35603,493120,Refrigerated Warehousing & Storage,23,"$1,969,000",Private,"40,000 - 99,999",A,Open 24 Hours,Open 24 Hours
...,...,...,...,...,...,...,...,...,...,...,...,...,...
146,Nashville Refrigerated Svc,Lebanon,TN,37090,493120,Refrigerated Warehousing & Storage,90,"$7,582,000",Private,"40,000 - 99,999",A+,,
151,Southernbelle,Martin,TN,38237,493120,Refrigerated Warehousing & Storage,7,"$1,113,000",Private,"2,500 - 4,999",B+,,
154,United States Cold Storage,Covington,TN,38019,493120,Refrigerated Warehousing & Storage,45,"$3,293,000",Private,"20,000 - 39,999",A+,,
155,United States Cold Storage,La Vergne,TN,37086,493120,Refrigerated Warehousing & Storage,20,"$2,057,000",Private,"20,000 - 39,999",A+,,


3) Convert the data in the `Location Sales Volumne Actual` column to a numeric format. **(10 points)**

In [6]:
#df3 = wb

money_to_num = lambda x: re.sub("[^0-9]", "", x)

wb['Location Sales Volume Actual'] = pd.to_numeric(wb['Location Sales Volume Actual'].apply(money_to_num))
wb['Location Sales Volume Actual']

1      2818000
2      2645000
4      2239000
5      2060000
9      1969000
        ...   
146    7582000
151    1113000
154    3293000
155    2057000
156    2428000
Name: Location Sales Volume Actual, Length: 67, dtype: int64

4) Use the `Credit Score Alpha` column to assign values to a new column named `Credit Score Num` based on the following mapping:

- A+ = 1
- A = 0.8
- B+ = 0.6
- B = 0.4
- C+ = 0.2

**(5 points)**

In [7]:
my_dict = {
    'A+' : 1,
    'A' : 0.8,
    'B+' : 0.6,
    'B' : 0.4,
    'C+' : 0.2,
}

#df4 = wb

wb['Credit Score Num'] = wb['Credit Score Alpha'].map(my_dict)

print(wb[['Credit Score Num','Credit Score Alpha']]) # Won't let me pass it as a tuple cause it's deprecated octothorpe trash

     Credit Score Num Credit Score Alpha
1                 1.0                 A+
2                 1.0                 A+
4                 1.0                 A+
5                 0.8                  A
9                 0.8                  A
..                ...                ...
146               1.0                 A+
151               0.6                 B+
154               1.0                 A+
155               1.0                 A+
156               1.0                 A+

[67 rows x 2 columns]


5) Create a column called `Location Score` that equals 1 if the company is in Alabama and 0.5 if the company is located elsewhere. **(10 points)**

In [8]:
#df5 = wb

tacos = lambda x: 1 if str(x) == "AL" else 0.5 # I forget the type coercion behavior

wb['Location Score'] = wb['State'].apply(tacos)
wb[['Location Score', 'State']]

Unnamed: 0,Location Score,State
1,1.0,AL
2,1.0,AL
4,1.0,AL
5,1.0,AL
9,1.0,AL
...,...,...
146,0.5,TN
151,0.5,TN
154,0.5,TN
155,0.5,TN


6) Recall that some companies operated multiple locations in the data we used to demonstrate supplier selection methods. We only want one observation for each company in our final data. Ensure that the current data only includes one entry for each company. If a company has multiple entries, reconstruct the data so that we **average the values in the `Credit Score Num` and `Location Score` columns for companies with multiple locations. Also, sum the values in the `Location Sales Volume Actual` column for companies with multiple locations.** **(10 points)**

In [9]:
groupby_columns6 = ['Company Name']

agg_dict6 = {
    'Credit Score Num':['mean'],
    'Location Score':['mean'],
    'Location Sales Volume Actual':['sum'],
}

df6 = wb.groupby(groupby_columns6).agg(agg_dict6)
df6.columns = agg_dict6.keys()
df6 = df6.reset_index()
#print(wb[wb['Company Name'] == "ACBL"])
df6

Unnamed: 0,Company Name,Credit Score Num,Location Score,Location Sales Volume Actual
0,Agro Merchants,0.8,0.5,1317000
1,Albany Cold Storage LLC,1.0,0.5,1080000
2,Ameri Peanut Growers Cold Stge,1.0,0.5,10846000
3,American Cold Storage,1.0,0.5,1174000
4,Americold,1.0,1.0,2818000
5,Americold Corp,0.6,0.5,777000
6,Americold Logistics,1.0,0.625,81098000
7,Americold Logistics LLC,1.0,0.5,2360000
8,B & B Transport Warehousing,0.6,0.5,665000
9,Bayou Ice Boxes,0.8,1.0,2060000


7) Normalize the data in the `Location Sales Volume Actual` so that all values are between 0 and 1 with the lowest value in the column being assigned a value of zero and the highest value in the column being assigned a value of one. **(5 points)**

In [10]:
df7 = df6

sales_array = df7['Location Sales Volume Actual'].values.reshape((-1,1))

df7['Location Sales Volume Actual'] = preprocessing.MinMaxScaler().fit_transform(sales_array)

df7

Unnamed: 0,Company Name,Credit Score Num,Location Score,Location Sales Volume Actual
0,Agro Merchants,0.8,0.5,0.01624
1,Albany Cold Storage LLC,1.0,0.5,0.013317
2,Ameri Peanut Growers Cold Stge,1.0,0.5,0.133739
3,American Cold Storage,1.0,0.5,0.014476
4,Americold,1.0,1.0,0.034748
5,Americold Corp,0.6,0.5,0.009581
6,Americold Logistics,1.0,0.625,1.0
7,Americold Logistics LLC,1.0,0.5,0.029101
8,B & B Transport Warehousing,0.6,0.5,0.0082
9,Bayou Ice Boxes,0.8,1.0,0.025401


8) Assume that the following weights are given for the `Credit Score Num`, `Location Sales Volume Actual`, and `Location Score` values:
- `Credit Score Num` = 9
- `Location Score` = 8
- `Location Sales Volume Actual` = 5

Normalize these weights and use them to determine weighted sum and weighted product scores for all of the companies. What are the top 3 companies by each scoring method? **(30 points - 10 for weight normalization, 10 for correct application of weighted sum, and 10 for correct application of weighted product)**

In [11]:
df8 = df7
df9 = df7

weights = {
    'Credit Score Num' : 9,
    'Location Score' : 8,
    'Location Sales Volume Actual' : 5,
}

weights = {key : value/sum(weights.values()) for key, value in weights.items()}

weights

{'Credit Score Num': 0.4090909090909091,
 'Location Score': 0.36363636363636365,
 'Location Sales Volume Actual': 0.22727272727272727}

In [12]:
temp = pd.Series(index = df8.index, data = 0)

for key, weight in weights.items():
    if key in df8.columns:
        temp += df8[key]*weight
    else:
        continue
        
temp.tolist()
    
df8['WS'] = temp
df8.sort_values(by = 'WS', ascending=False)

Unnamed: 0,Company Name,Credit Score Num,Location Score,Location Sales Volume Actual,WS
6,Americold Logistics,1.0,0.625,1.0,0.863636
22,Gulf States Refrigerated Stge,1.0,1.0,0.053972,0.784994
4,Americold,1.0,1.0,0.034748,0.780625
28,Mid-South Distributors Inc,1.0,1.0,0.027929,0.779075
35,Seaonus,1.0,1.0,0.023194,0.777999
9,Bayou Ice Boxes,0.8,1.0,0.025401,0.696682
17,Decatur Business Park,0.8,1.0,0.024279,0.696427
38,Southern Customs,0.8,1.0,0.024119,0.696391
36,Serv-Cold LLC,0.8,1.0,0.023502,0.696251
42,Sun States Refrigerated Svc,0.8,1.0,0.009125,0.692983


In [13]:
temp = pd.Series(index = df8.index, data = 0)

for key, weight in weights.items():
    if key in df8.columns:
        temp += df8[key]**weight
    else:
        continue
        
temp.tolist()
    
df8['WP'] = temp
df8.sort_values(by = 'WP', ascending=False)
print("\n\nTop 3 by Weighted Sum:\n\n")
print(df8.sort_values(by = 'WS', ascending=False).iloc[0:3])
print("\n\nTop 3 by Weighted Product:\n\n")
print(df8.sort_values(by = 'WP', ascending=False).iloc[0:3])



Top 3 by Weighted Sum:


                     Company Name  Credit Score Num  Location Score  \
6             Americold Logistics               1.0           0.625   
22  Gulf States Refrigerated Stge               1.0           1.000   
4                       Americold               1.0           1.000   

    Location Sales Volume Actual        WS        WP  
6                       1.000000  0.863636  2.842897  
22                      0.053972  0.784994  2.515058  
4                       0.034748  0.780625  2.466008  


Top 3 by Weighted Product:


                     Company Name  Credit Score Num  Location Score  \
6             Americold Logistics               1.0           0.625   
22  Gulf States Refrigerated Stge               1.0           1.000   
4                       Americold               1.0           1.000   

    Location Sales Volume Actual        WS        WP  
6                       1.000000  0.863636  2.842897  
22                      0.053972  0.784994

9) Assume that the following preferences are given:
- a high `Credit Score Num` is strongly more preferable than a high `Location Score`,
- a high `Credit Score Num` is extremely more preferable than a high `Location Sales Volume Actual`, and
- a high `Location Score` is moderately more preferable than a high `Location Sales Volume Actual`.

Use these preferences to derive new weights and use these to determine updated weighted sum and weighted product scores for all companies. Again, specify the top 3 companies according to each scoring method. **(20 points)**

In [19]:
comparison_data = [
    [1, 5, 9],
    [0.2, 1, 3],
    [0.111, 0.333, 1],
]

index_vals = ['Credit Score Num', 'Location Score', 'Location Sales Volume Actual']

df_comp = pd.DataFrame(comparison_data, index = index_vals, columns = index_vals)

column_sums = np.sum(df_comp, axis=0)
scores_div = df_comp/column_sums

pv = np.average(scores_div, axis = 1)

# max_eigenvalue = np.inner(pv, column_sums)

# CI = (max_eigenvalue - len(pv))/(len(pv)-1)
# RI = [0, 0, 0, 0.58, 0.9, 1.12, 1.24, 1.32, 1.41]
# CR = np.round(CI/RI[len(pv)], 5)

# print(CR)

weights9 = {
    'Credit Score Num' : pv[0],
    'Location Score' : pv[1],
    'Location Sales Volume Actual' : pv[2]
}

df9

Unnamed: 0,Company Name,Credit Score Num,Location Score,Location Sales Volume Actual,WS,WP
0,Agro Merchants,0.8,0.5,0.01624,0.688764,1.72869
1,Albany Cold Storage LLC,1.0,0.5,0.013317,0.838404,1.882453
2,Ameri Peanut Growers Cold Stge,1.0,0.5,0.133739,0.838404,1.882453
3,American Cold Storage,1.0,0.5,0.014476,0.838404,1.882453
4,Americold,1.0,1.0,0.034748,0.928609,2.0
5,Americold Corp,0.6,0.5,0.009581,0.539124,1.564811
6,Americold Logistics,1.0,0.625,1.0,0.860956,1.918702
7,Americold Logistics LLC,1.0,0.5,0.029101,0.838404,1.882453
8,B & B Transport Warehousing,0.6,0.5,0.0082,0.539124,1.564811
9,Bayou Ice Boxes,0.8,1.0,0.025401,0.778969,1.846237


In [21]:
temp = pd.Series(index = df9.index, data = 0)



for key, weight in weights9.items():
    if key in df9.columns:
        temp += df9[key]*weight
    else:
        continue
        
temp.tolist()
    
df9['WS'] = temp
print(temp)
df9.sort_values(by = 'WS', ascending=False)

0     0.689924
1     0.839355
2     0.847952
3     0.839438
4     0.931090
5     0.539808
6     0.932347
7     0.840482
8     0.539710
9     0.780782
10    0.390508
11    0.689420
12    0.839795
13    0.390508
14    0.844437
15    0.539546
16    0.688764
17    0.780702
18    0.689876
19    0.479860
20    0.615849
21    0.689520
22    0.932462
23    0.689352
24    0.689601
25    0.839795
26    0.848774
27    0.390216
28    0.930603
29    0.239913
30    0.845079
31    0.789748
32    0.838404
33    0.843866
34    0.689833
35    0.930265
36    0.780647
37    0.540017
38    0.780691
39    0.616051
40    0.688764
41    0.390450
42    0.779620
43    0.540054
44    0.845251
45    0.688764
46    0.689345
47    0.540515
dtype: float64


Unnamed: 0,Company Name,Credit Score Num,Location Score,Location Sales Volume Actual,WS,WP
22,Gulf States Refrigerated Stge,1.0,1.0,0.053972,0.932462,2.0
6,Americold Logistics,1.0,0.625,1.0,0.932347,1.918702
4,Americold,1.0,1.0,0.034748,0.93109,2.0
28,Mid-South Distributors Inc,1.0,1.0,0.027929,0.930603,2.0
35,Seaonus,1.0,1.0,0.023194,0.930265,2.0
26,Lineage Flint River Svc Inc,1.0,0.5,0.145256,0.848774,1.882453
2,Ameri Peanut Growers Cold Stge,1.0,0.5,0.133739,0.847952,1.882453
44,United States Cold Storage,1.0,0.5,0.095909,0.845251,1.882453
30,Nashville Refrigerated Svc,1.0,0.5,0.093492,0.845079,1.882453
14,Claxton Cold Storage Inc,1.0,0.5,0.084503,0.844437,1.882453


In [22]:
temp = pd.Series(index = df9.index, data = 0)

for key, weight in weights9.items():
    if key in df9.columns:
        temp += df9[key]**weight
    else:
        continue
        
temp.tolist()
    
df9['WP'] = temp
df9.sort_values(by = 'WP', ascending=False)

Unnamed: 0,Company Name,Credit Score Num,Location Score,Location Sales Volume Actual,WS,WP
6,Americold Logistics,1.0,0.625,1.0,0.932347,2.918702
22,Gulf States Refrigerated Stge,1.0,1.0,0.053972,0.932462,2.811873
4,Americold,1.0,1.0,0.034748,0.93109,2.786748
28,Mid-South Distributors Inc,1.0,1.0,0.027929,0.930603,2.774573
35,Seaonus,1.0,1.0,0.023194,0.930265,2.764368
26,Lineage Flint River Svc Inc,1.0,0.5,0.145256,0.848774,2.753785
2,Ameri Peanut Growers Cold Stge,1.0,0.5,0.133739,0.847952,2.748662
44,United States Cold Storage,1.0,0.5,0.095909,0.845251,2.728343
30,Nashville Refrigerated Svc,1.0,0.5,0.093492,0.845079,2.726803
14,Claxton Cold Storage Inc,1.0,0.5,0.084503,0.844437,2.720731


In [23]:
print("\n\nTop 3 by Weighted Sum:\n\n")
print(df9.sort_values(by = 'WS', ascending=False).iloc[0:3])
print("\n\nTop 3 by Weighted Product:\n\n")
print(df9.sort_values(by = 'WP', ascending=False).iloc[0:3])



Top 3 by Weighted Sum:


                     Company Name  Credit Score Num  Location Score  \
22  Gulf States Refrigerated Stge               1.0           1.000   
6             Americold Logistics               1.0           0.625   
4                       Americold               1.0           1.000   

    Location Sales Volume Actual        WS        WP  
22                      0.053972  0.932462  2.811873  
6                       1.000000  0.932347  2.918702  
4                       0.034748  0.931090  2.786748  


Top 3 by Weighted Product:


                     Company Name  Credit Score Num  Location Score  \
6             Americold Logistics               1.0           0.625   
22  Gulf States Refrigerated Stge               1.0           1.000   
4                       Americold               1.0           1.000   

    Location Sales Volume Actual        WS        WP  
6                       1.000000  0.932347  2.918702  
22                      0.053972  0.932462