# Part 1 Analysis

Nate Silver discusses the difficulty of predicting earthquakes in **The Signal and the Noise**. Nevertheless, we will try to identify some patterns by analyzing the deadly earthquakes that have occurred since 1900.

To start, read the table of earthquakes from https://en.wikipedia.org/wiki/List_of_deadly_earthquakes_since_1900 using the `requests` and/or `beautifulsoup` library and load it to a pandas dataframe. You will need to do some data cleaning before you can proceed.

Data cleaning tasks include:

* Replace empty strings with NaN
* Remove the footnotes from the 'Other Source Deaths' column
* Convert Magnitude to a numeric type. For this portion, you can ignore differences in seismic magnitude scales.
* Correct number of deaths when there is more than one value. When there is more than one value given, choose the largest.
* Create a new column ('deaths') that evaluates the four total-death columns ('PDE Total Deaths', 'Utsu Total Deaths', 'EM-DAT Total Deaths', and 'Other Source Deaths') and populates the new column with the highest value.
* Explore the data in terms of when and where earthquakes occurred and how severe they were (magnitude, deaths, secondary effects).

Also, add any supplemental data you'd like in order to explore ideas related to earthquake occurrence and effects, but understand that it is not required.

Answer the following questions:

1. Are there factors that make an earthquake more likely?
2. Are there factors that make an earthquake more deadly?

In [1]:
import numpy as np
import pandas as pd

## Reading the cleaned data

In [2]:
world_earthquake = pd.read_csv("../data/world_earthquakes_06_clean.csv")
world_earthquake.head()

Unnamed: 0,date,year,month,day,time,country,latitude,longitude,depth,magnitude,secondary_effects,deaths
0,1900-05-11 17:23:00,1900,5,11,17:23:00,Japan,38.7,141.1,5.0,7.0,,0
1,1900-07-12 06:25:00,1900,7,12,06:25:00,Turkey,40.3,43.1,,5.9,,140
2,1900-10-29 09:11:00,1900,10,29,09:11:00,Venezuela,11.0,-66.0,0.0,7.7,,0
3,1901-02-15 00:00:00,1901,2,15,00:00:00,China,26.0,100.1,0.0,6.5,,0
4,1901-03-31 07:11:00,1901,3,31,07:11:00,Bulgaria,43.4,28.7,,6.4,,4


In [3]:
world_earthquake.tail()

Unnamed: 0,date,year,month,day,time,country,latitude,longitude,depth,magnitude,secondary_effects,deaths
1335,2011-03-24 20:25:00,2011,3,24,20:25:00,Burma,,,,6.8,,150
1336,2011-04-07 14:32:00,2011,4,7,14:32:00,Japan,38.2,140.0,66.0,7.1,,0
1337,2011-09-18 12:40:00,2011,9,18,12:40:00,India,27.723,88.064,19.7,6.9,landslide,111
1338,2011-09-23 10:41:00,2011,9,23,10:41:00,Turkey,38.6,43.5,7.2,7.1,,601
1339,2018-08-05 19:46:00,2018,8,5,19:46:00,Indonesia,,,31.0,6.9,,0


In [4]:
world_earthquake.shape

(1340, 12)