In [1]:
import pandas as pd

DATA = '/kaggle/input/mnjblj/City_-_Expenditures.csv'
df = pd.read_csv(filepath_or_buffer=DATA).drop(columns=['Type'])
df.head()

Unnamed: 0,Entity Name,Fiscal Year,County,Field Name,Category,Subcategory 1,Subcategory 2,Line Description,Value,"City, State, Zip",Estimated Population,Row Number
0,Adelanto,2018,San Bernardino,PERS_SERV,Internal Service Fund,Operating Expenses,Personnel Services,Personnel Services,0.0,"Adelanto, CA 92301",35293.0,201814821268
1,Adelanto,2018,San Bernardino,CONTRACT_SERV,Internal Service Fund,Operating Expenses,Contractual Services,Contractual Services,0.0,"Adelanto, CA 92301",35293.0,201814821267
2,Adelanto,2018,San Bernardino,MATERIAL_SUPP,Internal Service Fund,Operating Expenses,Materials and Supplies,Materials and Supplies,0.0,"Adelanto, CA 92301",35293.0,201814821266
3,Sausalito,2017,Marin,ELEC_PURCHASES,Electric Enterprise Fund,Operating Expenses,Electricity Production Expenses,Electricity Purchases_Electric Enterprise Fund,0.0,"Sausalito, CA 94965",7327.0,201710831241
4,Adelanto,2018,San Bernardino,GEN_ADMIN_EXP,Internal Service Fund,Operating Expenses,General and Administrative Expenses,General and Administrative Expenses,0.0,"Adelanto, CA 92301",35293.0,201814821265


How much data do we have?

In [2]:
df.shape

(1253353, 12)

We have a lot of data. First let's see if we can plot the total expenditures as a time series.

In [3]:
from plotly import express
from plotly import io

io.renderers.default = 'iframe'
express.line(data_frame=df[['Fiscal Year', 'Value']].groupby(by='Fiscal Year').sum().reset_index(), x='Fiscal Year', y='Value')

Total expenditures have more than doubled in twenty years. That seems like a lot, but what does it look like as a year over year change?

In [4]:
annual_df = df[['Fiscal Year', 'Value']].groupby(by='Fiscal Year').sum().reset_index()
annual_df['YoY change'] = (annual_df['Value']/annual_df['Value'].shift(1)).fillna(value=1)
express.line(data_frame=annual_df, x='Fiscal Year', y='YoY change')

In [5]:
annual_df['YoY change'][1:].mean()

1.0428068693397772

What do we see? We see that overall expenditures grow about four percent per year. 

Let's see if we can aggregate budgets to the county level.

In [6]:
from plotly import colors

express.line(log_y=True, height=900, color_discrete_sequence = colors.sample_colorscale('HSV', 55),  data_frame=df[['Fiscal Year', 'County', 'Value']].groupby(by=['Fiscal Year', 'County']).sum().reset_index(), x='Fiscal Year', y='Value', color='County')

We need to use a log plot here because expenditures vary so much from rich counties to poor counties. And we need a custom palette if we want each county to have its own color.

In [7]:
express.line(log_y=False, height=900, color_discrete_sequence = colors.sample_colorscale('HSV', 55),  data_frame=df[['Fiscal Year', 'County', 'Value']].groupby(by=['Fiscal Year', 'County']).sum().reset_index(), x='Fiscal Year', y='Value', color='County')

If we don't use a lot plot most of our counties disappear into a smear at the bottom of our graph.

Compared to counties we have a moderate number of categories. Let's plot those too.

In [8]:
express.scatter(log_y=True, height=900, color_discrete_sequence = colors.sample_colorscale('HSV', 27),  data_frame=df[['Fiscal Year', 'Category', 'Value']].groupby(by=['Fiscal Year', 'Category']).sum().reset_index(), x='Fiscal Year', y='Value', color='Category')

What do we see? We see that the categories changed abruptly from 2016 to 2017, so we can't compare category data before and after.