### Data aggregation analysis of campaign finance data in NYC

In this notebook, we will analyze and aggregate raw campaign finance data using the [`.groupby()` function](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html) in pandas. 

We will:
- import dependencies
- open and explore the data
- aggregate the data in different ways

In [3]:
import pandas as pd

In [11]:
dtypes = {
    "RECIPID":"str",
    "INTZIP":"str",
    "INT_C_CODE":"str"
    
}

date_columns =["Fiscal Year", "Agency Start Date"]

salary_data = pd.read_csv(
    "../data/Citywide_Payroll_Data__Fiscal_Year__20240616.csv",
    dtype=dtypes,
    parse_dates=[date_columns]
)

print(
    len(salary_data),
    salary_data.columns
)

552938 Index(['Fiscal Year_Agency Start Date', 'Payroll Number', 'Agency Name',
       'Last Name', 'First Name', 'Mid Init', 'Work Location Borough',
       'Title Description', 'Leave Status as of June 30', 'Base Salary',
       'Pay Basis', 'Regular Hours', 'Regular Gross Paid', 'OT Hours',
       'Total OT Paid', 'Total Other Pay'],
      dtype='object')


  salary_data = pd.read_csv(


In [9]:
salary_data.head()

Unnamed: 0,Fiscal Year_Agency Start Date,Payroll Number,Agency Name,Last Name,First Name,Mid Init,Work Location Borough,Title Description,Leave Status as of June 30,Base Salary,Pay Basis,Regular Hours,Regular Gross Paid,OT Hours,Total OT Paid,Total Other Pay
0,2023 07/22/2019,67,ADMIN FOR CHILDREN'S SVCS,ROODE,SELENA,R,BRONX,CHILD PROTECTIVE SPECIALIST,ACTIVE,65921.0,per Annum,1820.0,65977.23,6.5,244.34,4400.66
1,2023 03/21/2016,67,ADMIN FOR CHILDREN'S SVCS,AARON,TERESA,,BRONX,CHILD PROTECTIVE SPECIALIST,ACTIVE,65921.0,per Annum,1820.0,65998.3,448.0,22072.1,15938.8
2,2023 08/08/2016,67,ADMIN FOR CHILDREN'S SVCS,AARONS,CAMELIA,M,BROOKLYN,CHILD PROTECTIVE SPECIALIST,ON LEAVE,65921.0,per Annum,602.0,23163.11,28.75,1295.17,8994.61
3,2023 11/21/2022,67,ADMIN FOR CHILDREN'S SVCS,ABBASSI,MARIAM,N,QUEENS,CHILD PROTECTIVE SPECIALIST,ACTIVE,55463.0,per Annum,1050.0,31082.77,27.75,928.43,3115.5
4,2023 04/24/2023,67,ADMIN FOR CHILDREN'S SVCS,ABDEL WEDOUD,LASHAWN,,BROOKLYN,YOUTH DEVELOPMENT SPECIALIST,ACTIVE,50001.0,per Annum,280.0,7792.26,0.0,0.0,0.0


### Aggregate data in different ways 

- recipient name with the most number of donations
- recipient name with the most number of donations and the highest total amounts of donations
- occupation that occur the most in the data by city

In [12]:
salary_data["Title Description"].value_counts()

Title Description
TEACHER- PER SESSION                                            79950
TEACHER                                                         56489
ELECTION WORKER                                                 29686
TEACHER SPECIAL EDUCATION                                       29228
ANNUAL ED PARA                                                  28225
                                                                ...  
DIRECTOR OF PUPPETRY                                                1
*PRINCIPAL PARK SUPERVISOR                                          1
COMMISSIONER OF PARKS & RECREATION                                  1
DIRECTOR OF COMMUNITY INVOLVEMENT                                   1
COMMISSIONER OF DEPT OF INFO TECHNOLOGY & TELECOMMUNICATIONS        1
Name: count, Length: 1535, dtype: int64

In [16]:
#  recipient name with the most number of donations
salary_data.groupby(
        ["Title Description"]
    )["Base Salary"].median(
    
    ).reset_index(
    
    ).sort_values(
        by="Base Salary",
        ascending=False
)

Unnamed: 0,Title Description,Base Salary
358,CHANCELLOR,363346.00
364,CHIEF ACTUARY,318442.50
1130,PRESIDENT,280908.00
907,FIRST DEPUTY MAYOR,275000.00
1055,MAYOR,258750.00
...,...,...
1396,SUBSTITUTE SCHOOL AIDE,14.37
1397,SUBSTITUTE SCHOOL LUNCH HELPER,14.37
1395,SUBSTITUTE RECREATION ASSISTANT,14.00
789,ELECTION WORKER,1.00


In [19]:
# recipient name with the most number of donations and the highest total amounts of donations
salary_data.groupby(
        ["Title Description"]
    ).agg(
        {
            "Base Salary":"median",
            "Total OT Paid":"median",
            "Total Other Pay":"median"
        }
    ).sort_values(
        by="Base Salary",
        ascending=False
)

Unnamed: 0_level_0,Base Salary,Total OT Paid,Total Other Pay
Title Description,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
CHANCELLOR,363346.00,0.0,0.000
CHIEF ACTUARY,318442.50,0.0,10848.720
PRESIDENT,280908.00,0.0,60000.000
FIRST DEPUTY MAYOR,275000.00,0.0,7211.350
MAYOR,258750.00,0.0,0.000
...,...,...,...
SUBSTITUTE SCHOOL AIDE,14.37,0.0,1551.490
SUBSTITUTE SCHOOL LUNCH HELPER,14.37,0.0,261.785
SUBSTITUTE RECREATION ASSISTANT,14.00,0.0,675.110
ELECTION WORKER,1.00,0.0,0.000


In [23]:

salary_data.groupby(
        ["Agency Name", "Work Location Borough"]
    ).agg(
        {
            "Base Salary":"median"
        }
    ).sort_values(
        by="Base Salary",
        ascending=False
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Base Salary
Agency Name,Work Location Borough,Unnamed: 2_level_1
OFFICE OF THE MAYOR,WASHINGTON DC,154000.00
COMMUNITY COLLEGE (LAGUARDIA),BRONX,152939.00
DISTRICT ATTORNEY-SPECIAL NARC,BROOKLYN,145000.00
DISTRICT ATTORNEY-SPECIAL NARC,QUEENS,141250.00
OFFICE OF COLLECTIVE BARGAININ,MANHATTAN,140000.00
...,...,...
PUBLIC SERVICE CORPS,MANHATTAN,16.00
DEPT OF PARKS & RECREATION,WESTCHESTER,15.91
PERSONNEL MONITORS,MANHATTAN,15.00
BOARD OF ELECTION POLL WORKERS,BROOKLYN,1.00
