For this assignment, you will replicate the spend analysis process we covered in class on a new dataset. The dataset covers government contracts for the state of Delaware during Fiscal Year 2019. The dataset was downloaded from https://www.usaspending.gov/. 

The following code block imports some necessary libraries and defines the path to the data file.

In [1]:
import pathlib

import pandas as pd

# Specify maximum columns = 40
pd.set_option('display.max_columns', 40)

# Specify floating-point precision
pd.set_option('display.float_format', '{:.4f}'.format)

data_filepath = pathlib.Path('data', 'DE_FY2019_base.csv')

The following code block defines the `custom_grouper` function that we used in the notebook covering the concept of spend analysis.

In [2]:
def custom_grouper(df, agg_dict, groupby_columns):
    '''
    This function groups the provided DataFrame, df, by the columns
    specified in the groupby_columns argument. The aggregations specified
    in the agg_dict dictionary are applied. Also, each numeric column in the 
    aggregated DataFrame is used to create a proportion column. The aggregated data
    is returned as a DataFrame sorted by the keys of the agg_dict
    dictionary, in the order they are specified, i.e., first key
    has a higher sort priority than the second, etc...
    '''
    
    grouped_df = df.groupby(groupby_columns).agg(agg_dict)
    
    grouped_df.columns = ['_'.join(col).strip() for col in grouped_df.columns.values]
    
    numeric_columns = grouped_df.select_dtypes(include='number').columns.tolist()

    for column in numeric_columns:
        grouped_df[f'{column}_proportion'] = (grouped_df[column]/grouped_df[column].sum())
        
    grouped_df = grouped_df.sort_values(numeric_columns)

    return grouped_df

The following code block reads the data and prints the first five rows.

In [3]:
data = pd.read_csv(data_filepath)
data.head()

Unnamed: 0,award_id,dollars_obligated,awarding_agency_name,recipient_duns,recipient_name,contract_pricing_code,contract_pricing,naics_code,naics_description,extent_competed,solicitation_procedures,awarding_agency_abbr
0,36C24819D0009,0.0,DEPARTMENT OF VETERANS AFFAIRS (VA),195823570,"AGILENT TECHNOLOGIES, INC.",J,FIRM FIXED PRICE,334510,ELECTROMEDICAL AND ELECTROTHERAPEUTIC APPARATU...,NOT COMPETED,ONLY ONE SOURCE,VA
1,36C25019D0016,0.0,DEPARTMENT OF VETERANS AFFAIRS (VA),51154354,"ADVANT-EDGE SOLUTIONS OF MIDDLE ATLANTIC, INC.",J,FIRM FIXED PRICE,562211,HAZARDOUS WASTE TREATMENT AND DISPOSAL,COMPETED UNDER SAP,SIMPLIFIED ACQUISITION,VA
2,VA24414C0449,56760.0,DEPARTMENT OF VETERANS AFFAIRS (VA),51154354,"ADVANT-EDGE SOLUTIONS OF MIDDLE ATLANTIC, INC.",J,FIRM FIXED PRICE,562998,ALL OTHER MISCELLANEOUS WASTE MANAGEMENT SERVICES,FULL AND OPEN COMPETITION AFTER EXCLUSION OF S...,NEGOTIATED PROPOSAL/QUOTE,VA
3,VA24414C0450,291625.47,DEPARTMENT OF VETERANS AFFAIRS (VA),51154354,"ADVANT-EDGE SOLUTIONS OF MIDDLE ATLANTIC, INC.",J,FIRM FIXED PRICE,562998,ALL OTHER MISCELLANEOUS WASTE MANAGEMENT SERVICES,FULL AND OPEN COMPETITION AFTER EXCLUSION OF S...,NEGOTIATED PROPOSAL/QUOTE,VA
4,VA24514C0099,2500212.0,DEPARTMENT OF VETERANS AFFAIRS (VA),51154354,"ADVANT-EDGE SOLUTIONS OF MIDDLE ATLANTIC, INC.",J,FIRM FIXED PRICE,562211,HAZARDOUS WASTE TREATMENT AND DISPOSAL,FULL AND OPEN COMPETITION AFTER EXCLUSION OF S...,NEGOTIATED PROPOSAL/QUOTE,VA


1) Identify the top 5 NAICS categories of products/services purchased by number of awards. **(20 points)**

2) For the top NAICS category identified in part 1), determine the number of contracts and the total dollars obligated for each awarding agency. **(20 points)**

3) For the top NAICS category identified in 1), determine the number of awarding agencies and the amount they spend with each recipient. **(20 points)**

4) For the top NAICS category identified in part 1), determine the dollars obligated for each contract pricing type. **(20 points)**

5) For the top NAICS category identified in part 1), determine the dollars obligated for each combination of (`extent_competed`, `solicitation_procedures`). **(20 points)**