Let's load up our data and do a little cleanup; there are fields that need parsing, and a few fields we will definitely not need.

In [1]:
import pandas as pd

QUAKE = '/kaggle/input/recent-earthquakes/earthquakes.csv'

df = pd.read_csv(filepath_or_buffer=QUAKE, index_col=['id'], parse_dates=['date']).drop(columns=['url', 'detailUrl'])
df.head()

Unnamed: 0_level_0,magnitude,type,title,date,time,updated,felt,cdi,mmi,alert,...,location,continent,country,subnational,city,locality,postcode,what3words,timezone,locationDetails
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
us7000necw,4.8,earthquake,"M 4.8 - 33 km WSW of Ackerly, Texas",2024-09-17 00:49:42,1726534182289,1726583895255,1893,6,5,green,...,"Ackerly, Texas",North America,United States of America (the),Texas,Tarzan-Lenorah,Tarzan-Lenorah,79783.0,landmass.perkily.affords,-300,"[{'id': '80684', 'wikidataId': '', 'name': '79..."
tx2024shcj,5.1,earthquake,"M 5.1 - 34 km WSW of Ackerly, Texas",2024-09-17 00:49:42,1726534182183,1726672002991,2042,6,5,green,...,"Ackerly, Texas",North America,United States of America (the),Texas,Tarzan-Lenorah,Tarzan-Lenorah,79331.0,escalator.grownups.dwell,-300,"[{'id': '89341', 'wikidataId': '', 'name': '48..."
ci40734823,3.7,earthquake,"M 3.7 - 6 km N of Malibu, CA",2024-09-16 11:22:08,1726485728190,1726637414586,1580,4,4,,...,"Malibu, CA",North America,United States of America (the),California,Los Angeles,Agoura Hills-Malibu,90265.0,clocking.uploaded.issuer,-420,"[{'id': '93478', 'wikidataId': 'Q844837', 'nam..."
tx2024scvz,3.9,earthquake,"M 3.9 - 58 km S of Whites City, New Mexico",2024-09-14 17:01:06,1726333266539,1726584426218,5,3,4,green,...,"Whites City, New Mexico",North America,United States of America (the),Texas,Van Horn,Van Horn,,sailboats.sawn.speeding,-300,"[{'id': '9', 'wikidataId': 'Q49', 'name': 'Nor..."
us7000ndte,4.1,earthquake,"M 4.1 - 60 km S of Whites City, New Mexico",2024-09-14 17:01:06,1726333266382,1726334616179,4,3,4,green,...,"Whites City, New Mexico",North America,United States of America (the),Texas,Van Horn,Van Horn,,spinners.downtime.computes,-300,"[{'id': '9', 'wikidataId': 'Q49', 'name': 'Nor..."


It's kind of neat to see a dataset with what3words data, but unfortunately we don't have a way to use that data natively. We'll have to get by with good old fashioned latitude and longitude data. Let's make a map.

Let's use a dark on dark map to make our datapoints really pop, especially since earthquakes tend to follow features in tectonic plates more than they do say population centers.

In [2]:
from plotly import express

express.scatter_mapbox(data_frame=df, lat='latitude', lon='longitude', color='magnitude', hover_name='location', mapbox_style='carto-darkmatter', zoom=1, height=800)

We know that magnitude data is log data, meaning that a magnitude 5 earthquake is ten times bigger than a magnitude 4 earthquake. So we might expect to see lots more low magnitude earthquakes than high magnitude earthquakes.

In [3]:
express.histogram(data_frame=df, x='magnitude')

Instead we see a bimodal distribution with a mode at 5.5 and another in the 3-4 range. What does this mean? It probably means our dataset is missing a lot of earthquakes that happened in places without especially sensitive sensors that could register a small earthquake.