**The objective of this project is to practice data manipulation with Python and plotting data onto a map by using geolocation data from a CSV file, the dataset we are going to use for this project contains information about the earthquakes registered in Costa Rica from 1/31/2021 to 3/1/2021 with 40 data points describing: date, time, magnitud, deepth, location, latitude and longitude.*

In [1]:
#Imports of the modules we are going to use
import os
import pandas as pd

In [2]:
#This is the UNC path where the cr_earthquakes dataset lives, this is going to be our working path
path = "C:\\Users\\jocerdas\\OneDrive - Microsoft\\Documents\\Python\Python Projects\\earthquakers\\cr_earthquakes"

In [3]:
#We change the directory and confirm it was set correctly
os.chdir(path)
os.getcwd()

'c:\\Users\\jocerdas\\OneDrive - Microsoft\\Documents\\Python\\Python Projects\\earthquakers\\cr_earthquakes'

In [5]:
#Reading the dataset from the data.csv file
df = pd.read_csv("data.csv")
df.head(10)

Unnamed: 0,Fecha,Hora Local,Magnitud,Profundidad en km,Localizacion,Latitud,Longitud
0,1/31/2021,16:36:34,3.9,12.0,"25 km SO de jac�, Puntarenas",9.5208,-84.8364
1,1/31/2021,16:30:56,4.2,22.0,"28 km SO de Jac�, Puntarenas",9.499,-84.864
2,1/29/2021,6:41:34,6.0,15.0,"37.6 km hacia el Suroeste de Puerto Armuelles,...",7.9451,-82.9095
3,1/29/2021,5:23:24,4.9,19.0,"14.2 km al SurOeste de Limones, Punta Burica, ...",7.9623,-82.9021
4,1/23/2021,16:15:00,3.4,35.0,"9 km Nor Oeste de Tamarindo, Guanacaste",10.3423,-85.9112
5,1/21/2021,7:06:00,3.7,6.0,"2,3. km hacia el noroeste de Orosi.",9.8152,-83.865
6,1/20/2021,1:17:00,4.1,25.0,15 km Sur de la Cuesta de Corredores. Zona Sur,8.3412,-82.8522
7,1/8/2021,19:33:00,2.2,8.0,2 km al Norte de San Antonio de Desamparados,9.9151,-84.0475
8,1/7/2021,12:48:00,2.5,6.0,Deasamparados,9.8947,-84.0639
9,1/6/2021,17:55:00,4.0,25.0,5 km al Noreste de Rivas de P�rez Zeled�n,9.4613,-83.6398


In [6]:
#We dropped the column "Localizacion", since we are going to use the geopy library to find out the location based on the coordinates
df = df.drop(columns=["Localizacion"])

In [7]:
#Confirm the column "Localizacion" was dropped
df.head(10)

Unnamed: 0,Fecha,Hora Local,Magnitud,Profundidad en km,Latitud,Longitud
0,1/31/2021,16:36:34,3.9,12.0,9.5208,-84.8364
1,1/31/2021,16:30:56,4.2,22.0,9.499,-84.864
2,1/29/2021,6:41:34,6.0,15.0,7.9451,-82.9095
3,1/29/2021,5:23:24,4.9,19.0,7.9623,-82.9021
4,1/23/2021,16:15:00,3.4,35.0,10.3423,-85.9112
5,1/21/2021,7:06:00,3.7,6.0,9.8152,-83.865
6,1/20/2021,1:17:00,4.1,25.0,8.3412,-82.8522
7,1/8/2021,19:33:00,2.2,8.0,9.9151,-84.0475
8,1/7/2021,12:48:00,2.5,6.0,9.8947,-84.0639
9,1/6/2021,17:55:00,4.0,25.0,9.4613,-83.6398


In [8]:
#let's rename the columns from Spanish to English
df = df.rename(columns={"Fecha":"Date","Hora Local":"Local Time","Magnitud":"Magnitude","Profundidad en km":"Deepth(KM)", "Latitud":"Latitude", "Longitud":"Longitude"})

In [9]:
#Confirmed our columns were renamed correctly
df.head(10)

Unnamed: 0,Date,Local Time,Magnitude,Deepth(KM),Latitude,Longitude
0,1/31/2021,16:36:34,3.9,12.0,9.5208,-84.8364
1,1/31/2021,16:30:56,4.2,22.0,9.499,-84.864
2,1/29/2021,6:41:34,6.0,15.0,7.9451,-82.9095
3,1/29/2021,5:23:24,4.9,19.0,7.9623,-82.9021
4,1/23/2021,16:15:00,3.4,35.0,10.3423,-85.9112
5,1/21/2021,7:06:00,3.7,6.0,9.8152,-83.865
6,1/20/2021,1:17:00,4.1,25.0,8.3412,-82.8522
7,1/8/2021,19:33:00,2.2,8.0,9.9151,-84.0475
8,1/7/2021,12:48:00,2.5,6.0,9.8947,-84.0639
9,1/6/2021,17:55:00,4.0,25.0,9.4613,-83.6398


In [10]:
#Since the geopy reverse() function only accepts an unique string containing the latitude and longitude for the coordinates, then; let's create a column called Coordinates and let's concatenate the values for latitude and longitude and convert them to strings.
df["Coordinates"] = df["Latitude"].astype(str) + ", " + df["Longitude"].astype(str)

In [11]:
#Now we can see the column Coordinates holding both latitude and longitude but a strings
df.head()

Unnamed: 0,Date,Local Time,Magnitude,Deepth(KM),Latitude,Longitude,Coordinates
0,1/31/2021,16:36:34,3.9,12.0,9.5208,-84.8364,"9.5208, -84.8364"
1,1/31/2021,16:30:56,4.2,22.0,9.499,-84.864,"9.499, -84.864"
2,1/29/2021,6:41:34,6.0,15.0,7.9451,-82.9095,"7.9451, -82.9095"
3,1/29/2021,5:23:24,4.9,19.0,7.9623,-82.9021,"7.9623, -82.9021"
4,1/23/2021,16:15:00,3.4,35.0,10.3423,-85.9112,"10.3423, -85.9112"


Let's use the Geopy library to figure out from the coordinates the location where the earthquake was registered
https://pypi.org/project/geopy/


In [12]:
#Importing the geopy library
from geopy.geocoders import Nominatim

In [12]:
#This empty list will hold each location found based on the coordinates
location = []

In [13]:
geolocator = Nominatim(user_agent="Earthquakes in CR")

In [15]:
#We loop through the rows in the dataset, and call the reverse() function to find the location based on the coordinates
for index, row in df.iterrows():
    location.append(str(geolocator.reverse(row["Coordinates"])))
    print(str(index) + " " + location[index])

0 None
1 None
2 Chiriquí, Panamá
3 Chiriquí, Panamá
4 Carillo, Provincia Guanacaste, 50302, Costa Rica
5 Vía 224, Dulce Nombre, Cantón Cartago, Provincia Cartago, 30109, Costa Rica
6 Finca Guayacán, Rodolfo Aguilar Delgado, Distrito Barú, Chiriquí, Panamá
7 Avenida 42, Precario Barrio Nuevo, Curridabat, Cantón Curridabat, Provincia San José, 11801, Costa Rica
8 PALI, Avenida 88, Barrio Fallas, Desamparados, Cantón Desamparados, Provincia San José, 10301, Costa Rica
9 San José, Rivas, Cantón Pérez Zeledón, Provincia San José, 11904, Costa Rica
10 Flor del Roble de Coto Brus, Gutiérrez Braun, Cantón Coto Brus, Provincia Puntarenas, 60806, Costa Rica
11 Barú, Cantón Pérez Zeledón, Provincia San José, 11909, Costa Rica
12 Provincia Puntarenas, Costa Rica
13 Calle 47, Lomas de San Francisco, San Francisco de Dos Ríos, San José, Cantón San José, Provincia San José, 10106, Costa Rica
14 La Rosticería, Vía 210, Barrio La Colina, San Antonio, Cantón Desamparados, Provincia San José, 11804, Cost

In [27]:
df["Location"] = df["Coordinates"].apply(geolocator.reverse)

In [24]:
df.head(10)

Unnamed: 0,Date,Local Time,Magnitude,Deepth(KM),Latitude,Longitude,Coordinates,Location
0,1/31/2021,16:36:34,3.9,12.0,9.5208,-84.8364,"9.5208, -84.8364",
1,1/31/2021,16:30:56,4.2,22.0,9.499,-84.864,"9.499, -84.864",
2,1/29/2021,6:41:34,6.0,15.0,7.9451,-82.9095,"7.9451, -82.9095","(Chiriquí, Panamá, (8.139963649999999, -82.259..."
3,1/29/2021,5:23:24,4.9,19.0,7.9623,-82.9021,"7.9623, -82.9021","(Chiriquí, Panamá, (8.139963649999999, -82.259..."
4,1/23/2021,16:15:00,3.4,35.0,10.3423,-85.9112,"10.3423, -85.9112","(Carillo, Provincia Guanacaste, 50302, Costa R..."
5,1/21/2021,7:06:00,3.7,6.0,9.8152,-83.865,"9.8152, -83.865","(Vía 224, Dulce Nombre, Cantón Cartago, Provin..."
6,1/20/2021,1:17:00,4.1,25.0,8.3412,-82.8522,"8.3412, -82.8522","(Finca Guayacán, Rodolfo Aguilar Delgado, Dist..."
7,1/8/2021,19:33:00,2.2,8.0,9.9151,-84.0475,"9.9151, -84.0475","(Avenida 42, Precario Barrio Nuevo, Curridabat..."
8,1/7/2021,12:48:00,2.5,6.0,9.8947,-84.0639,"9.8947, -84.0639","(PALI, Avenida 88, Barrio Fallas, Desamparados..."
9,1/6/2021,17:55:00,4.0,25.0,9.4613,-83.6398,"9.4613, -83.6398","(San José, Rivas, Cantón Pérez Zeledón, Provin..."


Now, that we have our data set ready, let's create some visualization using plotly:https://plotly.com/python/mapbox-layers/ 

In [28]:
import plotly.express as px

In [29]:
fig = px.scatter_mapbox(df, lat="Latitude", lon="Longitude", hover_name="Date", hover_data=["Local Time", "Magnitude", "Deepth(KM)", "Coordinates"], color_discrete_sequence=["blue"])
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

In [20]:
fig.write_html("C:\\Users\jocerdas\\OneDrive - Microsoft\Documents\\Python\\Python Projects\\earthquakers\\cr_earthquakes\\cr_earthquakes.html")

In [20]:
type(df["Coordinates"])

pandas.core.series.Series