<details>
<summary><strong>DBAS3017 User Experience - Group Project</strong></summary>

**Group C:**
- Louise Fear
- James Laurence
- Gabriela Mkonde
- Niki Zheng
- Peter MacKinnon

**Earthquake Data Set:**
Source [Significant Earthquakes Dataset](https://www.kaggle.com/datasets/usamabuttar/significant-earthquakes/)

</details>

In [1]:
# Library Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from IPython.display import HTML
from IPython.display import IFrame
import folium
from folium.plugins import MarkerCluster
import warnings
import cartopy.crs as ccrs

# Ignore FutureWarnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [2]:
# load earthquake dataset with column and index set
eq = pd.read_csv('original_eq.csv', index_col=[0])

# set index
eq.index.name = 'Index'

# display dataframe
eq

Unnamed: 0_level_0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,1900-10-09T12:25:00.000Z,57.0900,-153.4800,,7.86,mw,,,,,...,2022-05-09T14:44:17.838Z,"16 km SW of Old Harbor, Alaska",earthquake,,,,,reviewed,ushis,pt
1,1901-03-03T07:45:00.000Z,36.0000,-120.5000,,6.40,ms,,,,,...,2018-06-04T20:43:44.000Z,"12 km NNW of Parkfield, California",earthquake,,,,,reviewed,ushis,ell
2,1901-07-26T22:20:00.000Z,40.8000,-115.7000,,5.00,fa,,,,,...,2018-06-04T20:43:44.000Z,"6 km SE of Elko, Nevada",earthquake,,,,,reviewed,ushis,sjg
3,1901-12-30T22:34:00.000Z,52.0000,-160.0000,,7.00,ms,,,,,...,2018-06-04T20:43:44.000Z,south of Alaska,earthquake,,,,,reviewed,ushis,abe
4,1902-01-01T05:20:30.000Z,52.3800,-167.4500,,7.00,ms,,,,,...,2018-06-04T20:43:44.000Z,"113 km ESE of Nikolski, Alaska",earthquake,,,,,reviewed,ushis,abe
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99554,2023-11-06T21:31:01.203Z,-7.4821,156.1245,38.062,5.30,mww,169.0,26.0,5.120,0.49,...,2023-11-30T04:18:42.040Z,,earthquake,9.59,4.538,0.083,14.0,reviewed,us,us
99555,2023-11-05T13:27:28.428Z,51.5184,178.3647,80.999,5.10,mb,71.0,143.0,0.457,0.81,...,2023-12-03T13:37:40.037Z,"Rat Islands, Aleutian Islands, Alaska",earthquake,7.14,5.544,0.025,500.0,reviewed,us,us
99556,2023-11-04T09:38:37.943Z,20.2138,147.6712,10.000,5.00,mb,98.0,77.0,7.112,0.59,...,2023-12-02T09:48:49.538Z,Mariana Islands region,earthquake,9.40,1.837,0.032,305.0,reviewed,us,us
99557,2023-11-03T18:32:14.876Z,50.0317,156.3754,65.215,5.00,mb,137.0,94.0,3.190,0.61,...,2023-11-29T22:27:54.040Z,"73 km SSE of Severo-Kuril’sk, Russia",earthquake,9.83,5.143,0.044,174.0,reviewed,us,us


In [3]:
# Dataset Metadata
# time:             The time of the earthquake, reported as the number of milliseconds since the Unix epoch (January 1, 1970, 00:00:00 UTC).
# latitude:         The latitude of the earthquake's epicenter, reported in decimal degrees.
# longitude:        The longitude of the earthquake's epicenter, reported in decimal degrees.
# depth:            The depth of the earthquake, reported in kilometers.
# mag:              The magnitude of the earthquake, reported on various magnitude scales (see magType column below).
# magType:          The magnitude type used to report the earthquake magnitude (e.g. "mb", "ml", "mw").
# nst:              The total number of seismic stations used to calculate the earthquake location and magnitude.
# gap:              The largest azimuthal gap between azimuthally adjacent stations (in degrees).
# dmin:             The distance to the nearest station in degrees.
# rms:              The root-mean-square of the residuals of the earthquake's hypocenter location.
# net:              The ID of the seismic network used to locate the earthquake.
# id:               A unique identifier for the earthquake event.
# updated:          The time when the earthquake event was most recently updated in the catalog, reported as the number of milliseconds since the Unix epoch.
# place:            A human-readable description of the earthquake's location.
# type:             The type of seismic event (e.g. "earthquake", "quarry blast", "explosion").
# horizontalError:  The horizontal error, in kilometers, of the location reported in the latitude and longitude columns.
# depthError:       The depth error, in kilometers, of the depth column.
# magError:         The estimated standard error of the reported earthquake magnitude.
# magNst:           The number of seismic stations used to calculate the earthquake magnitude.
# status:           The status of the earthquake event in the USGS earthquake catalog (e.g. "reviewed", "automatic").
# locationSource:   The ID of the agency or network that provided the earthquake location.
# magSource:        The ID of the agency or network that provided the earthquake magnitude.

In [4]:
eq.isna().sum()

time                   0
latitude               0
longitude              0
depth                285
mag                    0
magType                0
nst                70588
gap                60295
dmin               80233
rms                28743
net                    0
id                     0
updated                0
place                832
type                   0
horizontalError    81608
depthError         49711
magError           66978
magNst             59967
status                 0
locationSource         0
magSource              0
dtype: int64

In [5]:
# Display dataframe shape
print(f'\nEarthquake Dataframe Shape is {eq.shape}\n')
# Display dataframe info
eq.info()


Earthquake Dataframe Shape is (99559, 22)

<class 'pandas.core.frame.DataFrame'>
Index: 99559 entries, 0 to 99558
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   time             99559 non-null  object 
 1   latitude         99559 non-null  float64
 2   longitude        99559 non-null  float64
 3   depth            99274 non-null  float64
 4   mag              99559 non-null  float64
 5   magType          99559 non-null  object 
 6   nst              28971 non-null  float64
 7   gap              39264 non-null  float64
 8   dmin             19326 non-null  float64
 9   rms              70816 non-null  float64
 10  net              99559 non-null  object 
 11  id               99559 non-null  object 
 12  updated          99559 non-null  object 
 13  place            98727 non-null  object 
 14  type             99559 non-null  object 
 15  horizontalError  17951 non-null  float64
 16  depthError       49

In [6]:
# set time column as Date Time data type to be UTC to remove string characters
eq['time'] = pd.to_datetime(eq['time']).dt.tz_convert(None)

# set update column as Date Time data type to be UTC to remove string characters
eq['updated'] = pd.to_datetime(eq['updated']).dt.tz_convert(None)

eq.head()

Unnamed: 0_level_0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,1900-10-09 12:25:00,57.09,-153.48,,7.86,mw,,,,,...,2022-05-09 14:44:17.838,"16 km SW of Old Harbor, Alaska",earthquake,,,,,reviewed,ushis,pt
1,1901-03-03 07:45:00,36.0,-120.5,,6.4,ms,,,,,...,2018-06-04 20:43:44.000,"12 km NNW of Parkfield, California",earthquake,,,,,reviewed,ushis,ell
2,1901-07-26 22:20:00,40.8,-115.7,,5.0,fa,,,,,...,2018-06-04 20:43:44.000,"6 km SE of Elko, Nevada",earthquake,,,,,reviewed,ushis,sjg
3,1901-12-30 22:34:00,52.0,-160.0,,7.0,ms,,,,,...,2018-06-04 20:43:44.000,south of Alaska,earthquake,,,,,reviewed,ushis,abe
4,1902-01-01 05:20:30,52.38,-167.45,,7.0,ms,,,,,...,2018-06-04 20:43:44.000,"113 km ESE of Nikolski, Alaska",earthquake,,,,,reviewed,ushis,abe


In [7]:
# listed remaining null values
eq.isna().sum()

time                   0
latitude               0
longitude              0
depth                285
mag                    0
magType                0
nst                70588
gap                60295
dmin               80233
rms                28743
net                    0
id                     0
updated                0
place                832
type                   0
horizontalError    81608
depthError         49711
magError           66978
magNst             59967
status                 0
locationSource         0
magSource              0
dtype: int64

In [8]:
# replacing null values in place column with 'unknown' in orignal eq dataframe
eq['place'].fillna('Unknown', inplace=True)

# verfiying Unknown Inputs in Place column
unknown = (eq.isin(['Unknown']).sum())
unknown

time                 0
latitude             0
longitude            0
depth                0
mag                  0
magType              0
nst                  0
gap                  0
dmin                 0
rms                  0
net                  0
id                   0
updated              0
place              832
type                 0
horizontalError      0
depthError           0
magError             0
magNst               0
status               0
locationSource       0
magSource            0
dtype: int64

In [9]:
# drop the row nulls from the depth column
eq.dropna(subset=['depth'], inplace=True)

In [10]:
# listed remaining null values/verifying depth nulls are gone
eq.isna().sum()

time                   0
latitude               0
longitude              0
depth                  0
mag                    0
magType                0
nst                70303
gap                60010
dmin               79948
rms                28458
net                    0
id                     0
updated                0
place                  0
type                   0
horizontalError    81323
depthError         49426
magError           66693
magNst             59682
status                 0
locationSource         0
magSource              0
dtype: int64

In [11]:
# Convert 'time' column to datetime to split off year
eq['time'] = pd.to_datetime(eq['time'])

# Extract year from 'time' column
eq['year'] = eq['time'].dt.year

# reset the index and drops it
eq = eq.reset_index(drop=True)

# sets the index to start at 1
eq.index = eq.index + 1

# display the dataframe
eq.head()

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,year
1,1904-04-04 10:26:00.880,41.758,23.249,15.0,7.02,mw,,,,,...,"7 km SE of Stara Kresna, Bulgaria",earthquake,,4.8,0.4,,reviewed,iscgem,iscgem,1904
2,1904-04-04 10:02:34.560,41.802,23.108,15.0,6.84,mw,,,,,...,"6 km W of Stara Kresna, Bulgaria",earthquake,,4.8,0.4,,reviewed,iscgem,iscgem,1904
3,1904-06-25 21:00:38.720,52.763,160.277,30.0,7.7,mw,,,,,...,"115 km ESE of Petropavlovsk-Kamchatsky, Russia",earthquake,,10.3,0.4,,reviewed,iscgem,iscgem,1904
4,1904-06-25 14:45:39.140,51.424,161.638,15.0,7.5,mw,,,,,...,"274 km SE of Petropavlovsk-Kamchatsky, Russia",earthquake,,25.0,0.4,,reviewed,iscgem,iscgem,1904
5,1904-08-30 11:43:20.850,30.684,100.608,15.0,7.09,mw,,,,,...,"150 km WNW of Kangding, China",earthquake,,25.0,0.4,,reviewed,iscgem,iscgem,1904


In [12]:
eq.type.value_counts()

type
earthquake           98784
nuclear explosion      424
volcanic eruption       54
explosion               10
rock burst               1
mine collapse            1
Name: count, dtype: int64

In [13]:
eq.magType.value_counts()

magType
mb            39788
mw            24311
mwc           17458
mww           10563
ms             3157
mwb            3060
ml              414
mwr             390
md               50
mh               23
m                21
uk                8
mwp               6
fa                5
Mi                4
ml(texnet)        4
ms_20             3
mc                2
mlg               1
Md                1
Ml                1
mb_lg             1
lg                1
Mb                1
ma                1
Name: count, dtype: int64

In [14]:
# for magTypes column --> removing less than 500 data points would only result in a loss of less than 0.5% of your total data
# won't affect overall outcomes to remove these outliers
# create a boolean mask to filter dataset
mag_types = ['mb', 'mw', 'mwc', 'mww', 'ms', 'mwb']

# Create a new DataFrame that only includes the rows with the specified magTypes
eq = eq[eq['magType'].isin(mag_types)]

eq.head()

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,year
1,1904-04-04 10:26:00.880,41.758,23.249,15.0,7.02,mw,,,,,...,"7 km SE of Stara Kresna, Bulgaria",earthquake,,4.8,0.4,,reviewed,iscgem,iscgem,1904
2,1904-04-04 10:02:34.560,41.802,23.108,15.0,6.84,mw,,,,,...,"6 km W of Stara Kresna, Bulgaria",earthquake,,4.8,0.4,,reviewed,iscgem,iscgem,1904
3,1904-06-25 21:00:38.720,52.763,160.277,30.0,7.7,mw,,,,,...,"115 km ESE of Petropavlovsk-Kamchatsky, Russia",earthquake,,10.3,0.4,,reviewed,iscgem,iscgem,1904
4,1904-06-25 14:45:39.140,51.424,161.638,15.0,7.5,mw,,,,,...,"274 km SE of Petropavlovsk-Kamchatsky, Russia",earthquake,,25.0,0.4,,reviewed,iscgem,iscgem,1904
5,1904-08-30 11:43:20.850,30.684,100.608,15.0,7.09,mw,,,,,...,"150 km WNW of Kangding, China",earthquake,,25.0,0.4,,reviewed,iscgem,iscgem,1904


In [15]:
eq.magType.value_counts()

magType
mb     39788
mw     24311
mwc    17458
mww    10563
ms      3157
mwb     3060
Name: count, dtype: int64

In [16]:
# check updated dataframe
eq

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,year
1,1904-04-04 10:26:00.880,41.7580,23.2490,15.000,7.02,mw,,,,,...,"7 km SE of Stara Kresna, Bulgaria",earthquake,,4.800,0.400,,reviewed,iscgem,iscgem,1904
2,1904-04-04 10:02:34.560,41.8020,23.1080,15.000,6.84,mw,,,,,...,"6 km W of Stara Kresna, Bulgaria",earthquake,,4.800,0.400,,reviewed,iscgem,iscgem,1904
3,1904-06-25 21:00:38.720,52.7630,160.2770,30.000,7.70,mw,,,,,...,"115 km ESE of Petropavlovsk-Kamchatsky, Russia",earthquake,,10.300,0.400,,reviewed,iscgem,iscgem,1904
4,1904-06-25 14:45:39.140,51.4240,161.6380,15.000,7.50,mw,,,,,...,"274 km SE of Petropavlovsk-Kamchatsky, Russia",earthquake,,25.000,0.400,,reviewed,iscgem,iscgem,1904
5,1904-08-30 11:43:20.850,30.6840,100.6080,15.000,7.09,mw,,,,,...,"150 km WNW of Kangding, China",earthquake,,25.000,0.400,,reviewed,iscgem,iscgem,1904
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99270,2023-11-06 21:31:01.203,-7.4821,156.1245,38.062,5.30,mww,169.0,26.0,5.120,0.49,...,Unknown,earthquake,9.59,4.538,0.083,14.0,reviewed,us,us,2023
99271,2023-11-05 13:27:28.428,51.5184,178.3647,80.999,5.10,mb,71.0,143.0,0.457,0.81,...,"Rat Islands, Aleutian Islands, Alaska",earthquake,7.14,5.544,0.025,500.0,reviewed,us,us,2023
99272,2023-11-04 09:38:37.943,20.2138,147.6712,10.000,5.00,mb,98.0,77.0,7.112,0.59,...,Mariana Islands region,earthquake,9.40,1.837,0.032,305.0,reviewed,us,us,2023
99273,2023-11-03 18:32:14.876,50.0317,156.3754,65.215,5.00,mb,137.0,94.0,3.190,0.61,...,"73 km SSE of Severo-Kuril’sk, Russia",earthquake,9.83,5.143,0.044,174.0,reviewed,us,us,2023


In [17]:
# Extract year, month, day, and time
eq['month'] = eq['time'].dt.month
eq['day'] = eq['time'].dt.day
eq['time'] = eq['time'].dt.time

# Define the new column order
new_order = ['year', 'month', 'day', 'time', 'type', 'mag', 'magType', 'depth', 'place', 'longitude', 'latitude'] + [col for col in eq.columns if col not in ['year', 'month', 'day', 'time', 'type', 'mag', 'magType', 'depth', 'place', 'longitude', 'latitude']]

# Reorder the columns
eq = eq[new_order]

# Reset the index and rename the index column to 'Index'
eq.reset_index(inplace=True)

In [18]:
# Create a boolean mask for duplicate rows
duplicate_mask = eq.duplicated(keep=False)

# create a df copy to leave eq intact
duplicates = eq.copy()

# Print out the duplicate rows
duplicates = eq[duplicate_mask]

duplicates = duplicates.sort_values(by='time', ascending=True)

duplicates

Unnamed: 0,index,year,month,day,time,type,mag,magType,depth,place,...,net,id,updated,horizontalError,depthError,magError,magNst,status,locationSource,magSource


In [19]:
# Group by all columns and get the size of each group
counts = duplicates.groupby(duplicates.columns.tolist()).size().reset_index(name='Frequency')

# Display rows that are duplicated
counts[counts['Frequency'] > 1]

Unnamed: 0,index,year,month,day,time,type,mag,magType,depth,place,...,id,updated,horizontalError,depthError,magError,magNst,status,locationSource,magSource,Frequency


In [20]:
# remove duplicate rows from the original eq dataframe
eq = eq.drop_duplicates(keep='first')

eq

Unnamed: 0,index,year,month,day,time,type,mag,magType,depth,place,...,net,id,updated,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,1,1904,4,4,10:26:00.880000,earthquake,7.02,mw,15.000,"7 km SE of Stara Kresna, Bulgaria",...,iscgem,iscgem16957813,2022-04-26 14:54:31.433,,4.800,0.400,,reviewed,iscgem,iscgem
1,2,1904,4,4,10:02:34.560000,earthquake,6.84,mw,15.000,"6 km W of Stara Kresna, Bulgaria",...,iscgem,iscgem610326271,2022-04-25 20:36:18.723,,4.800,0.400,,reviewed,iscgem,iscgem
2,3,1904,6,25,21:00:38.720000,earthquake,7.70,mw,30.000,"115 km ESE of Petropavlovsk-Kamchatsky, Russia",...,iscgem,iscgem16957819,2022-04-25 20:22:48.406,,10.300,0.400,,reviewed,iscgem,iscgem
3,4,1904,6,25,14:45:39.140000,earthquake,7.50,mw,15.000,"274 km SE of Petropavlovsk-Kamchatsky, Russia",...,iscgem,iscgem16957820,2022-05-09 22:48:24.972,,25.000,0.400,,reviewed,iscgem,iscgem
4,5,1904,8,30,11:43:20.850000,earthquake,7.09,mw,15.000,"150 km WNW of Kangding, China",...,iscgem,iscgem16957826,2022-04-25 20:23:00.657,,25.000,0.400,,reviewed,iscgem,iscgem
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98332,99270,2023,11,6,21:31:01.203000,earthquake,5.30,mww,38.062,Unknown,...,us,us7000l98e,2023-11-30 04:18:42.040,9.59,4.538,0.083,14.0,reviewed,us,us
98333,99271,2023,11,5,13:27:28.428000,earthquake,5.10,mb,80.999,"Rat Islands, Aleutian Islands, Alaska",...,us,us7000l90i,2023-12-03 13:37:40.037,7.14,5.544,0.025,500.0,reviewed,us,us
98334,99272,2023,11,4,09:38:37.943000,earthquake,5.00,mb,10.000,Mariana Islands region,...,us,us7000l8um,2023-12-02 09:48:49.538,9.40,1.837,0.032,305.0,reviewed,us,us
98335,99273,2023,11,3,18:32:14.876000,earthquake,5.00,mb,65.215,"73 km SSE of Severo-Kuril’sk, Russia",...,us,us7000l8pk,2023-11-29 22:27:54.040,9.83,5.143,0.044,174.0,reviewed,us,us


In [21]:
# Create a boolean mask for duplicate rows
duplicate_mask2 = eq.duplicated(keep=False)

# Print out the duplicate rows
remaining = eq[duplicate_mask]

remaining = remaining.sort_values(by='time', ascending=True)

remaining

Unnamed: 0,index,year,month,day,time,type,mag,magType,depth,place,...,net,id,updated,horizontalError,depthError,magError,magNst,status,locationSource,magSource


In [22]:
# Group by all columns and get the size of each group
counts2 = remaining.groupby(remaining.columns.tolist()).size().reset_index(name='Frequency')

# Display rows that are duplicated
counts2[counts2['Frequency'] > 1]

Unnamed: 0,index,year,month,day,time,type,mag,magType,depth,place,...,id,updated,horizontalError,depthError,magError,magNst,status,locationSource,magSource,Frequency


In [23]:
# describe updated dataframe info
eq.describe()

Unnamed: 0,index,year,month,day,mag,depth,longitude,latitude,nst,gap,dmin,rms,updated,horizontalError,depthError,magError,magNst
count,98337.0,98337.0,98337.0,98337.0,98337.0,98337.0,98337.0,98337.0,28320.0,38451.0,18860.0,70006.0,98337,17406.0,49126.0,32166.0,39033.0
mean,49570.897394,1991.620855,6.547027,15.232568,5.455373,62.755887,42.271792,3.156597,159.917585,61.445453,4.335152,0.968218,2019-07-21 19:38:40.907754752,7.60977,8.073112,0.177137,53.674557
min,1.0,1904.0,1.0,1.0,5.0,-4.0,-179.997,-77.08,0.0,6.5,0.001376,-1.0,2013-09-25 17:45:49,0.08,-1.0,0.019,0.0
25%,24793.0,1979.0,4.0,8.0,5.1,13.812,-71.651,-17.879,67.0,35.6,1.319,0.83,2014-11-07 01:49:46.140999936,6.1,2.5,0.061,12.0
50%,49535.0,1994.0,7.0,15.0,5.3,33.0,100.794,-1.549,118.0,54.0,2.715,0.98,2022-04-25 22:14:40.217999872,7.5,5.0,0.1,27.0
75%,74273.0,2010.0,10.0,23.0,5.7,51.4,142.961,28.605,214.0,79.0,5.26725,1.1,2022-04-29 16:54:17.796999936,9.1,9.8,0.23,60.0
max,99274.0,2023.0,12.0,31.0,9.5,700.0,180.0,87.386,929.0,340.0,50.901,2.53,2023-12-03 15:35:48.239000,21.1,744.1,1.84,941.0
std,28624.950514,24.013733,3.442391,8.475996,0.485497,109.113391,121.507942,29.99177,128.713435,35.086144,5.173698,0.217974,,2.399338,9.757055,0.157564,79.101235


In [24]:
# info for updated dataframe
eq.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 98337 entries, 0 to 98336
Data columns (total 26 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   index            98337 non-null  int64         
 1   year             98337 non-null  int32         
 2   month            98337 non-null  int32         
 3   day              98337 non-null  int32         
 4   time             98337 non-null  object        
 5   type             98337 non-null  object        
 6   mag              98337 non-null  float64       
 7   magType          98337 non-null  object        
 8   depth            98337 non-null  float64       
 9   place            98337 non-null  object        
 10  longitude        98337 non-null  float64       
 11  latitude         98337 non-null  float64       
 12  nst              28320 non-null  float64       
 13  gap              38451 non-null  float64       
 14  dmin             18860 non-null  float

In [25]:
# Drop columns that have NAN greater than 5% of total dataset
# eq = eq.drop(columns=['nst','gap','dmin','rms','horizontalError', 'depthError','magError','magNst'])