For this assignment, you will replicate the spend analysis process we covered in class on a new dataset. The dataset covers government contracts for the state of Delaware during Fiscal Year 2019. The dataset was downloaded from https://www.usaspending.gov/. 

The following code block imports some necessary libraries and defines the path to the data file.

In [1]:
import pathlib

import pandas as pd

# Specify maximum columns = 40
pd.set_option('display.max_columns', 40)

# Specify no max column width
pd.set_option('display.max_colwidth', None)

# Specify floating-point precision
pd.set_option('display.float_format', '{:.4f}'.format)

data_filepath = pathlib.Path('data', 'DE_FY2019_base.csv')

The following code block defines the `custom_grouper` function that we used in the notebook covering the concept of spend analysis.

In [2]:
def custom_grouper(df, agg_dict, groupby_columns):
    '''
    This function groups the provided DataFrame, df, by the columns
    specified in the groupby_columns argument. The aggregations specified
    in the agg_dict dictionary are applied. Also, each numeric column in the 
    aggregated DataFrame is used to create a proportion column. The aggregated data
    is returned as a DataFrame sorted by the keys of the agg_dict
    dictionary, in the order they are specified, i.e., first key
    has a higher sort priority than the second, etc...
    '''
    
    grouped_df = df.groupby(groupby_columns).agg(agg_dict)
    
    grouped_df.columns = ['_'.join(col).strip() for col in grouped_df.columns.values]
    
    numeric_columns = grouped_df.select_dtypes(include='number').columns.tolist()

    for column in numeric_columns:
        grouped_df[f'{column}_proportion'] = (grouped_df[column]/grouped_df[column].sum())
        
    grouped_df = grouped_df.sort_values(numeric_columns)

    return grouped_df

The following code block reads the data and prints the first five rows.

In [3]:
data = pd.read_csv(data_filepath)
data.head()

Unnamed: 0,award_id,dollars_obligated,awarding_agency_name,recipient_duns,recipient_name,contract_pricing_code,contract_pricing,naics_code,naics_description,extent_competed,solicitation_procedures,awarding_agency_abbr
0,36C24819D0009,0.0,DEPARTMENT OF VETERANS AFFAIRS (VA),195823570,"AGILENT TECHNOLOGIES, INC.",J,FIRM FIXED PRICE,334510,ELECTROMEDICAL AND ELECTROTHERAPEUTIC APPARATUS MANUFACTURING,NOT COMPETED,ONLY ONE SOURCE,VA
1,36C25019D0016,0.0,DEPARTMENT OF VETERANS AFFAIRS (VA),51154354,"ADVANT-EDGE SOLUTIONS OF MIDDLE ATLANTIC, INC.",J,FIRM FIXED PRICE,562211,HAZARDOUS WASTE TREATMENT AND DISPOSAL,COMPETED UNDER SAP,SIMPLIFIED ACQUISITION,VA
2,VA24414C0449,56760.0,DEPARTMENT OF VETERANS AFFAIRS (VA),51154354,"ADVANT-EDGE SOLUTIONS OF MIDDLE ATLANTIC, INC.",J,FIRM FIXED PRICE,562998,ALL OTHER MISCELLANEOUS WASTE MANAGEMENT SERVICES,FULL AND OPEN COMPETITION AFTER EXCLUSION OF SOURCES,NEGOTIATED PROPOSAL/QUOTE,VA
3,VA24414C0450,291625.47,DEPARTMENT OF VETERANS AFFAIRS (VA),51154354,"ADVANT-EDGE SOLUTIONS OF MIDDLE ATLANTIC, INC.",J,FIRM FIXED PRICE,562998,ALL OTHER MISCELLANEOUS WASTE MANAGEMENT SERVICES,FULL AND OPEN COMPETITION AFTER EXCLUSION OF SOURCES,NEGOTIATED PROPOSAL/QUOTE,VA
4,VA24514C0099,2500212.0,DEPARTMENT OF VETERANS AFFAIRS (VA),51154354,"ADVANT-EDGE SOLUTIONS OF MIDDLE ATLANTIC, INC.",J,FIRM FIXED PRICE,562211,HAZARDOUS WASTE TREATMENT AND DISPOSAL,FULL AND OPEN COMPETITION AFTER EXCLUSION OF SOURCES,NEGOTIATED PROPOSAL/QUOTE,VA


1) Identify the top 5 NAICS categories of products/services purchased by number of awards. **(20 points)**

In [4]:
groupby_columns1 = ['naics_description']

agg_dict1 = {
    'award_id':['nunique']
}

data1 = custom_grouper(data, agg_dict1, groupby_columns1)

# ignore these two lines, they're a remnant I refuse to get rid of from before
# I saw that you had thrown in the custom_grouper() function

Top5 = data1.iloc[::-1]['award_id_nunique'].head()
top_naics_description = Top5.index.tolist()[0]

data1.nlargest(5, 'award_id_nunique').loc[:,['award_id_nunique']]

Unnamed: 0_level_0,award_id_nunique
naics_description,Unnamed: 1_level_1
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,3533
MOTOR VEHICLE ELECTRICAL AND ELECTRONIC EQUIPMENT MANUFACTURING,1919
ALL OTHER MISCELLANEOUS GENERAL PURPOSE MACHINERY MANUFACTURING,819
ALL OTHER MISCELLANEOUS ELECTRICAL EQUIPMENT AND COMPONENT MANUFACTURING,576
ANALYTICAL LABORATORY INSTRUMENT MANUFACTURING,516


2) For the top NAICS category identified in part 1), determine the number of contracts and the total dollars obligated for each awarding agency. **(20 points)**

In [5]:
groupby_columns2 = ['naics_description', 'awarding_agency_name']

agg_dict2 = {
    'dollars_obligated':['sum'],
    'award_id':['nunique'],
#     'award_id_nunique':['sum']
}

data2 = custom_grouper(data, agg_dict2, groupby_columns2)

data2.loc[data2.index.get_level_values('naics_description') == top_naics_description, ['dollars_obligated_sum', 'award_id_nunique']]
# print(data2[data2.index.get_level_values('naics_description').isin(Top5.index.tolist())]['dollars_obligated_sum'].index.get_level_values('naics_description'))

Unnamed: 0_level_0,Unnamed: 1_level_0,dollars_obligated_sum,award_id_nunique
naics_description,awarding_agency_name,Unnamed: 2_level_1,Unnamed: 3_level_1
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,DEPARTMENT OF VETERANS AFFAIRS (VA),110235.05,4
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,DEPARTMENT OF HEALTH AND HUMAN SERVICES (HHS),1454369.21,2
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,DEPARTMENT OF DEFENSE (DOD),2444286.38,3527


3) For the top NAICS category identified in 1), determine the number of awarding agencies and the amount they spend with each recipient. **(20 points)**

In [6]:
groupby_columns3 = ['naics_description', 'recipient_name']

agg_dict3 = {
    'dollars_obligated':['sum'],
    'award_id':['nunique'],
    'awarding_agency_name':['unique','nunique']
}

data3 = custom_grouper(data, agg_dict3, groupby_columns3)
data3.loc[data3.index.get_level_values('naics_description') == top_naics_description, ['awarding_agency_name_nunique', 'dollars_obligated_sum', 'awarding_agency_name_unique']]

Unnamed: 0_level_0,Unnamed: 1_level_0,awarding_agency_name_nunique,dollars_obligated_sum,awarding_agency_name_unique
naics_description,recipient_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,AKSU PARTS LLC,1,0.0,[DEPARTMENT OF DEFENSE (DOD)]
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,ONE STOP MEDICAL INC.,1,16400.0,[DEPARTMENT OF DEFENSE (DOD)]
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,LAUREL BRIDGE SOFTWARE INC,1,19790.0,[DEPARTMENT OF VETERANS AFFAIRS (VA)]
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,MOOR INSTRUMENTS INC,3,219303.76,"[DEPARTMENT OF HEALTH AND HUMAN SERVICES (HHS), DEPARTMENT OF VETERANS AFFAIRS (VA), DEPARTMENT OF DEFENSE (DOD)]"
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,"ANP TECHNOLOGIES, INC.",1,1060021.17,[DEPARTMENT OF DEFENSE (DOD)]
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,DENTSPLY INTERNATIONAL INC.,1,1255679.71,[DEPARTMENT OF DEFENSE (DOD)]
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,"ADVANT-EDGE SOLUTIONS OF MIDDLE ATLANTIC, INC.",1,1437696.0,[DEPARTMENT OF HEALTH AND HUMAN SERVICES (HHS)]


4) For the top NAICS category identified in part 1), determine the dollars obligated for each contract pricing type. **(20 points)**

In [7]:
groupby_columns4 = ['naics_description', 'contract_pricing']

agg_dict4 = {
    'dollars_obligated':['sum'],
}

data4 = custom_grouper(data, agg_dict4, groupby_columns4)
data4.loc[data4.index.get_level_values('naics_description') == top_naics_description, ['dollars_obligated_sum']]

Unnamed: 0_level_0,Unnamed: 1_level_0,dollars_obligated_sum
naics_description,contract_pricing,Unnamed: 2_level_1
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,FIXED PRICE WITH ECONOMIC PRICE ADJUSTMENT,0.0
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,FIRM FIXED PRICE,4008890.64


5) For the top NAICS category identified in part 1), determine the dollars obligated for each combination of (`extent_competed`, `solicitation_procedures`). **(20 points)**

In [8]:
groupby_columns5 = ['naics_description', 'extent_competed', 'solicitation_procedures']

agg_dict5 = {
    'dollars_obligated':['sum'],
}

data5 = custom_grouper(data, agg_dict5, groupby_columns5)
data5.loc[data5.index.get_level_values('naics_description') == top_naics_description, ['dollars_obligated_sum']]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,dollars_obligated_sum
naics_description,extent_competed,solicitation_procedures,Unnamed: 3_level_1
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,NOT COMPETED,ONLY ONE SOURCE,19790.0
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,FULL AND OPEN COMPETITION,SUBJECT TO MULTIPLE AWARD FAIR OPPORTUNITY,157098.76
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,NOT COMPETED UNDER SAP,SIMPLIFIED ACQUISITION,1060021.17
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,FULL AND OPEN COMPETITION,NEGOTIATED PROPOSAL/QUOTE,1255679.71
SURGICAL AND MEDICAL INSTRUMENT MANUFACTURING,COMPETED UNDER SAP,SIMPLIFIED ACQUISITION,1516301.0
