# Data Compiling & Cleaning

We compiled monthly rate data from EIA's [Average Price of Electricity to Customers](https://www.eia.gov/electricity/monthly/epm_table_grapher.php?t=epmt_5_6_a), from Oct 2003 to Sept 2018. 

Raw data can be found in /data. Older data (pre-July 2012) is formatted and processed differently than the newer data (see electricity_rates.py)

The full compiled dataset for all 50 states over these 15 years can be found in data/full dataset.csv.

Currently we are just looking at the All Sectors average, but the rate breakdown by customer class is available (resi/comm/industrial). I noted the lines in the code where this change would need to be made.

# Removing Seasonality

In most states, electricity rates oscillate on a seasonal basis. Some states, such as Iowa, vary significantly more than others, like Kentucky.

![Iowa](figs/graphs/Iowa.png)

In order to compare the trends in rates across different states, we need to remove this seasonality component. We use a time series decomposition to decompose the trend, seasonality, and residual (noise) for each state:

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
s = seasonal_decompose(df.loc[state], model='additive')
s.plot()

![Iowa](figs/decomp/Iowa.png)

As shown above for Iowa, the observed (actual) data can be broken down into a general trend, a repeating seasonal oscillation, and a residual. From this we can tell:
* Iowa's rates have been steadily increasing over time.
* Rates predictably peak in the summer every year, costing about 2 cents more than in the winter.
* There is more variability in rates in recent years (2016 onwards).

# Trends across States

We take the "Trend" components for each state from the decomposition, and compare against the US state average:

![summary](figs/summary.png)

As shown above, the Appalachian states generally have lower rates than the rest of the US, but all have been increasing steadily. Indiana and Kentucky show similar growth patterns, while West Virginia has a very different trajectory (why?). We look at a few time periods:

* 2004 – 2009: Many states saw a surge in rates around 2008, Appalachian states included, likely due to the oil crisis. Iowa and WV seemed to be least affected during this time period.
* 2010 – 2013: Rates were stagnant in most US states, but rates in Appalachian states continued to grow, rising 7-15% over this time period, compared to just 1% nationally.
* 2014 – 2018: After a slight increase in 2013, most US states again saw stagnant rates. Appalachian rates continued to grow, increasing 4-17% over this period, compared to just 1% nationally.

Since 2010, rates have increased:
* Kentucky & Indiana: 30%
* West Virginia: 27%
* Iowa: 22%
* US Average: 8%

__This seems to suggest that Appalachian rates are increasing at a faster rate than the national average. However, Iowa does not appear to be immune to these rate hikes either.__

# Further questions

* Turn each of the above trendlines into a straight line (least squares fit) to determine average rate increases over 2009-2018. Compare KY, WV, and IN with nearby states in midwest/appalachia (Ohio, Illinois, Tennessee). I lost some data on this front but from my memory: KY, WV, & IN had rate increases ~2x more than IL & TN, but comparable to OH.
* Variability in rates: how often rates fluctuate (not due to seasons). Rate fluctuates could be problematic for industry who expect predictable rate structures for their bottom line.