# Analyzing State-by-State Changes In Earthquake Frequency

_adapted from [this notebook](https://github.com/BuzzFeedNews/2015-03-earthquake-maps/blob/master/notebooks/earthquake-state-analysis.ipynb) by [John Templon](https://twitter.com/jtemplon)_

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

## Load and clean up data

The data was obtained from USGS here:

http://earthquake.usgs.gov/earthquakes/search/. 

It was then run through a PostGIS database to determine the location of the epicenters by state, since USGS does provide specific state locations in its data.

In [None]:
all_quakes = pd.DataFrame.from_csv("earthquakes.csv", 
                                   index_col=None, 
                                   parse_dates=["time", "updated"])
len(all_quakes)

In [None]:
all_quakes.head()

Some earthquakes within the U.S. bounding box don't have epicenters outside any state (e.g., the ocean).

In [None]:
us_quakes = all_quakes.dropna(subset=["state"])
len(us_quakes)

Count number of earthquakes per state

In [None]:
state_cts = pd.DataFrame(us_quakes.state.value_counts())
state_cts.head()

## Charting Earthquake Activity Over Time, By State

In [None]:
data = (us_quakes[us_quakes["state"] == "Oklahoma"]
        .set_index("time")["id"]
        .resample("A", how="count"))
data.head()

In [None]:
plt.plot(data)

In [None]:
data = (us_quakes[us_quakes["state"] == "California"]
        .set_index("time")["id"]
        .resample("A", how="count"))
plt.plot(data)

In [None]:
data = (us_quakes[us_quakes["state"] == "Texas"]
        .set_index("time")["id"]
        .resample("A", how="count"))
plt.plot(data)

In [None]:
data = (us_quakes[us_quakes["state"] == "Kansas"]
        .set_index("time")["id"]
        .resample("A", how="count"))
plt.plot(data)

## Calculating Percentage Change Decade Over Decade

The most recent complete year of earthquakes is 2014. Below, we compare 2005-2014 to the prior decade, 1995-2004.

In [None]:
def quake_percentage_change(state):
    by_year = (pd.DataFrame(us_quakes[us_quakes["state"] == state]
                            .set_index("time")["id"]
                            .resample("AS", how="count")))
    by_year["start"] = by_year.index
    by_year["year"] = by_year["start"].apply(lambda x: int(x[0:4]))
    
    # Decade 2005-2014
    decade_05_14 = (by_year[(by_year["year"] >= 2005) 
                            & (by_year["year"] <= 2014)])
    total_05_14 = decade_05_14["count"].sum()
    
    # Decade 1995-2004
    decade_95_04 = (by_year[(by_year["year"] >= 1995) 
                            & (by_year["year"] <= 2004)])
    total_95_04 = decade_95_04["count"].sum()
    
    if total_95_04 != 0:
        pct = round(100.0 * (total_05_14 - total_95_04) / total_95_04, 2)
    else:
        pct = None
    return total_95_04, total_05_14, pct

In [None]:
state_cts["name"] = state_cts.index
state_cts["95-04"], state_cts["05-14"], state_cts["perc_change"] = \
    zip(*state_cts["name"].apply(lambda x: quake_percentage_change(x)))

States with at least 5 earthquakes from 1995-2004 (sorted by percentage change decade-over-decade):

In [None]:
states_5_at_least = state_cts[state_cts["95-04"] >= 5]
states_5_at_least.sort_values(by="perc_change", ascending=False).head(10)