# Demo 1 - Durham Budget

In this demo, we will perform some basic growth analysis on the city of Durham, North Carolina's annual budgets from 2010-2015.  This will allow us to familiarize ourselves with some of the basics of growth analysis, including generating and charting percent changes by category.

We will first load some packages, including the tidyverse package (which installs a large number of packages) and formattable.

In [None]:
install.packages("lazyeval", repos = "http://cran.us.r-project.org")
install.packages("tidyverse", repos = "http://cran.us.r-project.org")
install.packages("formattable", repos = "http://cran.us.r-project.org")

In [None]:
library(lazyeval)
library(tidyverse)
library(formattable)

We will load the data set from the Data directory.  This is a semi-colon delimited file but otherwise is pretty clean data.

In [None]:
durham.budget.raw <- read.csv('Data/DurhamBudget2015.csv', sep=";")

We will grab the first several rows just to get an idea of what the data frame looks like.

In [None]:
head(durham.budget.raw)

We can see that this is *not* a tidy data set.  Notice the FY.##.Actual and FY.##.Bud values.  This kind of wide table is common for third-party data sets, but isn't great for our analysis.  Let's use the *gather* and *separate* functions (from dplyr, part of the tidyverse) to turn this into a more classical data set.

In [None]:
durham.budget <- durham.budget.raw %>%
                  filter(!grepl("Total", Fund)) %>%
                  #Use gather to unpviot our actuals & estimates by fiscal year into a single column
                  gather(FY, Amount, FY.10.Actual:FY.15.YTD.Actual, na.rm = TRUE) %>%
                  #Separate out whether the budget is actuals or budgeted values
                  separate(FY, c("FiscalYear", "BudgetType"), 6)

This is an intermediate step, but let's take a quick look at how the data looks right now:

In [None]:
head(durham.budget, 3)

FiscalYear is pretty nice, but we want to turn it into a year number.  Then let's do some more data type cleanup.

In [None]:
durham.budget$FiscalYear <- paste("20", substring(durham.budget$FiscalYear, 4, 5), sep="")

durham.budget$Fund <- as.character(durham.budget$Fund)
durham.budget$Fund_Desc <- as.character(durham.budget$Fund_Desc)
durham.budget$Dept <- as.character(durham.budget$Dept)
durham.budget$Dept.Name <- as.character(durham.budget$Dept.Name)
durham.budget$Char.Code <- as.character(durham.budget$Char.Code)
durham.budget$FiscalYear <- as.numeric(durham.budget$FiscalYear)
durham.budget$BudgetType <- as.factor(durham.budget$BudgetType)

We now have some cleaned up data, so let's filter out to get just the actuals.  Note that this filters out 2015 YTD actuals, which is what we want:  FY15 isn't done yet in our data sample.

In [None]:
durham.budget.actuals <- durham.budget %>%
                          filter(BudgetType == "Actual")

The first thing we want to look at is dollar amounts.

In [None]:
ggplot(data=durham.budget.actuals, aes(x=FiscalYear, y=Amount, group=1)) +
  stat_summary(geom="bar", fun.y = sum) +
  scale_y_continuous(expand = c(0, 0), limits = c(0, 420000000), labels = scales::dollar)

The dollar amount looks pretty consistent.  2010 is a bit lower than the other years, but not outlandishly different.

Let's look at the totals in log-normal form to get a clearer idea of the rate of change.

In [None]:
durham.budget.summary <- durham.budget.actuals %>%
                          group_by(FiscalYear, Dept.Name) %>%
                          summarize(Total = sum(Amount)) %>%
                          arrange(Dept.Name, FiscalYear)

head(durham.budget.summary, 10)

What we have done so far is group the actuals by department name.

We can calculate percentage growth using one neat trick.
http://stackoverflow.com/questions/19824601/how-calculate-growth-rate-in-long-format-data-frame

In [None]:
durham.growth <- plyr::ddply(durham.budget.summary, "Dept.Name", transform,
                             Percent.Change = c(0, exp(diff(log(Total))) - 1))

head(durham.growth, 10)

Now we can reconstitute the annual budget values to look for patterns.

In [None]:
budget.table <- durham.budget.summary %>%
                  mutate(Total = currency(Total, digits = 0)) %>%
                  spread(FiscalYear, Total) %>%
                  mutate(Diff = (`2014`-`2010`))
budget.table

It seems interesting that Human Relations fell off a cliff after 2011.  What's up with this?

In [None]:
durham.budget %>%
  filter(Dept.Name == "HUMAN RELATIONS") %>%
  spread(BudgetType, Amount) %>%
  arrange(FiscalYear, Fund_Desc, Char.Code)

Looks like it was originally budgeted for FY 12 but stripped from the revised budget and never returned.

So with Human Relations dropping off so much, let's see what are the major drivers behind budget growth post-2010.

In [None]:
budget.table %>%
  select(Dept.Name, `2010`, `2014`, Diff) %>%
  arrange(desc(Diff))

Human Relations dropped off the map, but Human Resources shot up.  What's up with that?

In [None]:
durham.budget %>%
  filter(Dept.Name == "HUMAN RESOURCES") %>%
  spread(BudgetType, Amount) %>%
  arrange(FiscalYear, Fund_Desc, Char.Code)

There's an employee insurance fund which took off in 2011.  Prior to that, looks like it was listed in the general fund as operating expenses.  Interesting that it jumped $15m per year!

Let's look at the top ten departments in the 2014 budget.

In [None]:
top.ten <- durham.budget.summary %>%
  filter(FiscalYear == 2014) %>%
  arrange(desc(Total)) %>%
  head(10)

Let's get the percent change for the ten biggest departments.

In [None]:
durham.percent.change <- durham.growth %>%
  mutate(Percent.Change = 100.0 * Percent.Change) %>%
  merge(top.ten, by = "Dept.Name") %>%
  select(FiscalYear = FiscalYear.x, Dept.Name, Percent.Change)

durham.percent.change %>%
  spread(FiscalYear, Percent.Change)

In [None]:
ggplot(data=durham.percent.change, aes(x=FiscalYear, y=Percent.Change, group=Dept.Name, color=Dept.Name)) +
  geom_line()

We can see the giant bump in the HR budget in 2011 and the corresponding drop in the General Services budget.