## Analyzing State-by-State Changes In Earthquake Frequency


In [1]:
import matplotlib.pyplot as plt
import pandas as pd

### Load and clean up data

The data was obtained from USGS here: http://earthquake.usgs.gov/earthquakes/search/. It was then run through a PostGIS database to determine the location of the epicenters by state, since USGS does provide specific state locations in its data.

In [2]:
all_quakes = pd.DataFrame.from_csv("earthquake_states_lite.csv", index_col=None, parse_dates=["time", "updated"])
len(all_quakes)

999

In [3]:
all_quakes.head()

Unnamed: 0,time,latitude,longitude,mag,id,updated,place,type,geom,state
0,2014-12-30 05:12:02.710,35.936667,-117.222167,3.47,ci37300152,2015-02-20 02:27:38.024,"23km NE of Trona, California",earthquake,0101000020AD100000A9B0ADFA374E5DC04AE0C5B1E4F7...,California
1,2012-08-27 07:24:12.060,7.853,-78.214,4.3,usp000jqve,2014-11-07 01:48:37.813,Panama,earthquake,0101000020AD10000004560E2DB28D53C0B6F3FDD47869...,
2,2014-12-30 04:13:42.000,12.817,-88.558,4.3,usc000tapp,2015-02-20 02:24:36.573,"51km S of Puerto El Triunfo, El Salvador",earthquake,0101000020AD100000C0CAA145B62356C0FCA9F1D24DA2...,
3,2014-12-30 04:08:52.200,14.837,-93.656,3.7,usc000te0p,2015-02-07 22:16:57.585,"104km SW of Mapastepec, Mexico",earthquake,0101000020AD100000448B6CE7FB6957C0068195438BAC...,
4,2014-12-29 22:50:55.000,36.7545,-98.1778,3.1,usc000takq,2014-12-30 01:53:36.056,"15km E of Cherokee, Oklahoma",earthquake,0101000020AD100000A54E4013618B58C07F6ABC749360...,Oklahoma


Some earthquakes within the U.S. bounding box don't have epicenters outside any state (e.g., the ocean).

In [4]:
us_quakes = all_quakes.dropna(subset=["state"])
len(us_quakes)

485

Count number of earthquakes per state

In [87]:
state_counts = pd.DataFrame(us_quakes.state.value_counts())
state_counts.head()

Unnamed: 0,state
California,18108
Alaska,12326
Nevada,1975
Idaho,1231
Washington,973


## Charting Earthquake Activity Over Time, By State

In [75]:
data = us_quakes[us_quakes["state"] == "Oklahoma"].set_index("time")["id"].resample("A", how="count")
data.head()

time
1974-12-31    1
1975-12-31    3
1976-12-31    2
1977-12-31    0
1978-12-31    0
Freq: A-DEC, Name: id, dtype: int64

In [48]:
plt.plot(data)

[<matplotlib.lines.Line2D at 0x110bf7d10>]

In [49]:
data = us_quakes[us_quakes["state"] == "California"].set_index("time")["id"].resample("A", how="count")
plt.plot(data)

[<matplotlib.lines.Line2D at 0x113ff4b50>]

In [50]:
data = us_quakes[us_quakes["state"] == "Texas"].set_index("time")["id"].resample("A", how="count")
plt.plot(data)

[<matplotlib.lines.Line2D at 0x110bf7350>]

In [51]:
data = us_quakes[us_quakes["state"] == "Kansas"].set_index("time")["id"].resample("A", how="count")
plt.plot(data)

[<matplotlib.lines.Line2D at 0x114091b90>]

## Calculating Percentage Change Decade Over Decade

The most recent complete year of earthquakes is 2014. Below, we compare 2005-2014 to the prior decade, 1995-2004.

In [88]:
def quake_percentage_change(state):
    by_year = pd.DataFrame(us_quakes[us_quakes["state"] == state].set_index("time")["id"].resample("AS", how="count"))
    by_year["start"] = by_year.index
    by_year["year"] = by_year["start"].apply(lambda x: int(x[0:4]))
    decade_05_14 = by_year[(by_year["year"] >= 2005) & (by_year["year"] <= 2014)]
    total_05_14 = decade_05_14["count"].sum()
    decade_95_04 = by_year[(by_year["year"] >= 1995) & (by_year["year"] <= 2004)]
    total_95_04 = decade_95_04["count"].sum()
    if total_95_04 != 0:
        pct = round(100.0 * (total_05_14 - total_95_04) / total_95_04, 2)
    else:
        pct = None
    return total_95_04, total_05_14, pct

In [89]:
state_counts["name"] = state_counts.index
state_counts["total_95-04"], state_counts["total_05-14"], state_counts["percentage_change"] =\
    zip(*state_counts["name"].apply(lambda x: quake_percentage_change(x)))

States with at least 5 earthquakes from 1995-2004 (sorted by percentage change decade-over-decade):

In [91]:
state_counts[state_counts["total_95-04"] >= 5].sort_values(by="percentage_change", ascending=False).head(10)

Unnamed: 0,state,name,total_95-04,total_05-14,percentage_change
Oklahoma,899,Oklahoma,14,860,6042.86
Arkansas,147,Arkansas,9,70,677.78
Kansas,61,Kansas,6,45,650.0
Texas,130,Texas,21,74,252.38
Hawaii,640,Hawaii,72,240,233.33
Illinois,61,Illinois,5,15,200.0
Arizona,174,Arizona,22,65,195.45
Virginia,28,Virginia,5,13,160.0
Colorado,143,Colorado,35,69,97.14
New Mexico,165,New Mexico,39,57,46.15
