## General - Exploratory Data Analysis 0

## Dataset and Environment Setup

### Earthquakes dataset

We will be working with [global earthquake data](https://earthquake.usgs.gov/earthquakes/feed/v1.0/csv.php) taken from USGS Earthquake Data (All Earthquakes, Past Month), source: U.S. Geological Survey (USGS), which contains information about global earthquakes for the past 30 days, from the date accessed of October 23, 2025.

In [1]:
import pandas as pd
import altair as alt
import numpy as np

In [3]:
earthquakes = pd.read_csv('../data/processed/ordinal_data.csv')

In [4]:
earthquakes.head()

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,horizontalError,depthError,magError,magNst,status,locationSource,magSource,mag_ordinal,depth_ordinal,gap_level
0,2025-10-23T22:11:40.587Z,32.274,-101.931,4.2122,1.4,ml,40.0,40.0,0.0,0.5,...,0.0,0.813793,0.2,25.0,automatic,tx,tx,Minor (<4.0),Shallow (0-70 km),high
1,2025-10-23T22:09:24.260Z,38.806835,-122.751999,-0.64,1.27,md,13.0,110.0,0.02331,0.04,...,0.25,0.73,0.21,13.0,automatic,nc,nc,Minor (<4.0),Negative (<0 km),moderate-low
2,2025-10-23T22:08:01.540Z,38.807835,-122.751167,0.19,1.24,md,12.0,112.0,0.02369,0.02,...,0.23,1.03,0.21,12.0,automatic,nc,nc,Minor (<4.0),Shallow (0-70 km),moderate-low
3,2025-10-23T22:07:48.630Z,38.834332,-122.796333,2.25,0.23,md,10.0,76.0,0.006201,0.02,...,0.57,0.46,0.11,10.0,automatic,nc,nc,Minor (<4.0),Shallow (0-70 km),moderate-high
4,2025-10-23T22:01:31.590Z,38.808998,-122.811668,3.66,0.74,md,10.0,83.0,0.01283,0.02,...,0.4,1.18,0.07,10.0,automatic,nc,nc,Minor (<4.0),Shallow (0-70 km),moderate-high


In [5]:
earthquakes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6281 entries, 0 to 6280
Data columns (total 25 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   time             6281 non-null   object 
 1   latitude         6281 non-null   float64
 2   longitude        6281 non-null   float64
 3   depth            6281 non-null   float64
 4   mag              6281 non-null   float64
 5   magType          6281 non-null   object 
 6   nst              6281 non-null   float64
 7   gap              6281 non-null   float64
 8   dmin             6281 non-null   float64
 9   rms              6281 non-null   float64
 10  net              6281 non-null   object 
 11  id               6281 non-null   object 
 12  updated          6281 non-null   object 
 13  place            6281 non-null   object 
 14  type             6281 non-null   object 
 15  horizontalError  6281 non-null   float64
 16  depthError       6281 non-null   float64
 17  magError      

#### Viewing summary statistics for quantitative columns

In [6]:
earthquakes.describe()

Unnamed: 0,latitude,longitude,depth,mag,nst,gap,dmin,rms,horizontalError,depthError,magError,magNst
count,6281.0,6281.0,6281.0,6281.0,6281.0,6281.0,6281.0,6281.0,6281.0,6281.0,6281.0,6281.0
mean,35.400717,-95.004115,16.826674,1.613285,25.230059,103.606591,0.445949,0.226609,1.722102,2.579109,0.164137,21.754498
std,16.86801,77.723609,53.227923,1.415182,23.649614,61.212456,1.500235,0.261649,3.087129,5.467239,0.092823,37.451734
min,-60.3594,-179.9723,-3.18,-1.44,0.0,12.0,0.0,0.0,0.0,0.12,0.0,1.0
25%,32.115,-122.78717,2.98,0.76,10.0,61.0,0.01523,0.08,0.28,0.52,0.1,7.0
50%,36.511333,-116.947167,6.99,1.27,18.0,85.0,0.05716,0.15,0.48,0.85,0.15,12.0
75%,44.5715,-102.16,10.24,1.96,33.0,130.0,0.1214,0.21,1.012044,1.891,0.2,24.0
max,85.0643,179.9355,643.06,7.6,401.0,345.0,41.153,1.57,21.34,42.4,1.17,756.0


In [11]:
unique = earthquakes['type'].unique()
countunique = earthquakes['type'].value_counts()
percent = earthquakes['type'].value_counts(normalize=True) * 100

print(f"Event types = {unique}")
print(f"Count of event types = {countunique}")
print(f"Percentage of event types = {percent}")

Event types = ['earthquake' 'quarry blast' 'explosion']
Count of event types = type
earthquake      6143
explosion         72
quarry blast      66
Name: count, dtype: int64
Percentage of event types = type
earthquake      97.802898
explosion        1.146314
quarry blast     1.050788
Name: proportion, dtype: float64


In [12]:
earthquakes.nunique()

time               6281
latitude           5074
longitude          5265
depth              3330
mag                 492
magType               7
nst                 146
gap                 319
dmin               4440
rms                 159
net                  12
id                 6281
updated            6281
place              3334
type                  3
horizontalError    1885
depthError         2415
magError           2165
magNst              208
status                2
locationSource       12
magSource            12
mag_ordinal           5
depth_ordinal         4
gap_level             4
dtype: int64