# üåç Global Earthquake Severity & Tsunami Risk Analysis

## üìå Project Overview
This project performs an end-to-end analysis of global seismic events from **2001 to 2022**. Using a dataset of 782 records, the analysis aims to identify key drivers of earthquake severity, predict missing alert levels using Machine Learning, and assess the probability of tsunami occurrences based on magnitude and location.

The project demonstrates a full data pipeline: from data cleaning and complex feature engineering to advanced geospatial visualization and predictive modeling.

## üöÄ Key Features & Methodologies

### 1. Data Cleaning & Feature Engineering
* **Geospatial Processing:** Extracted and standardized country names from unstructured location strings to handle missing values.
* **Time-Series Handling:** Converted mixed datetime formats to standard objects and extracted temporal features (Year).
* **Feature Creation:** Engineered a `Magnitude Normalization` feature to better compare energy release across different scales.
* **Data Imputation:** Addressed data quality issues in the `Country` and `Alert` columns.

### 2. Exploratory Data Analysis (EDA) & Visualization
Utilized **Plotly** to create interactive visualizations:
* **Geospatial Mapping:** Scatter and Density Mapbox visualizations to pinpoint high-risk zones.
* **Multivariate Analysis:** **Parallel Coordinates** plots to visualize the complex relationship between Magnitude, Depth, Intensity (MMI), and Significance.
* **Flow Analysis:** **Sankey Diagrams** to trace the volume of significant seismic events across top-affected countries like Indonesia and Japan.
* **Correlation Heatmaps:** Identified strong positive correlations between Significance (`sig`) and Alert Levels.

### 3. Machine Learning (Random Forest Classification)
* **Objective:** The `Alert` column contained 366 missing values (~47% of data).
* **Solution:** Implemented a **Random Forest Classifier** to impute these values based on seismic features (`sig`, `mmi`, `cdi`, `mag_norm`).
* **Result:** Achieved an **85% Average Prediction Confidence**, allowing for a complete dataset for subsequent risk analysis.

## üìä Key Insights & Findings
* **Tsunami Threshold:** Earthquakes with a magnitude **‚â• 6.75** showed a distinct increase in Tsunami probability (jumping from ~60% to ~79%).
* **Data Anomalies:** Identified statistical anomalies in reporting from **Vanuatu** and the **Solomon Islands**, where alert levels were uniformly reported as "Green" despite high magnitudes, suggesting localized data quality issues.
* **High-Risk Zones:** Indonesia and Japan accounted for the highest density of significant events, but Chilean earthquakes showed a higher propensity for higher alert levels relative to frequency.

## üõ†Ô∏è Technologies Used
* **Language:** Python
* **Data Manipulation:** Pandas, NumPy
* **Visualization:** Plotly Express, Plotly Graph Objects
* **Machine Learning:** Scikit-Learn (RandomForestClassifier)

## üìÇ Notebook Structure
1.  **Setup & Loading:** Library imports and data ingestion.
2.  **Preprocessing:** Null handling, string manipulation, and type conversion.
3.  **Visual Analysis:** Distribution plots, geospatial mapping, and correlation matrices.
4.  **ML Implementation:** Training the Random Forest model for imputation.
5.  **Tsunami Analysis:** Focused deep-dive into tsunami triggers.
6.  **Conclusion:** Summary of findings.

## üì¨ Contact
**[Shafi Abdur Rahman]**

[shafi77rahman@gmail.com]

## Connece to Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Import libraries

In [None]:
import numpy as np
import pandas as pd

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from termcolor import colored

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

## Load Dataset

In [None]:
## load data from drive
path = '/content/drive/MyDrive/earthquake_data.csv'

In [None]:
df = pd.read_csv(path)
df.head(5)

Unnamed: 0,title,magnitude,date_time,cdi,mmi,alert,tsunami,sig,net,nst,dmin,gap,magType,depth,latitude,longitude,location,continent,country
0,"M 7.0 - 18 km SW of Malango, Solomon Islands",7.0,22-11-2022 02:03,8,7,green,1,768,us,117,0.509,17.0,mww,14.0,-9.7963,159.596,"Malango, Solomon Islands",Oceania,Solomon Islands
1,"M 6.9 - 204 km SW of Bengkulu, Indonesia",6.9,18-11-2022 13:37,4,4,green,0,735,us,99,2.229,34.0,mww,25.0,-4.9559,100.738,"Bengkulu, Indonesia",,
2,M 7.0 -,7.0,12/11/2022 7:09,3,3,green,1,755,us,147,3.125,18.0,mww,579.0,-20.0508,-178.346,,Oceania,Fiji
3,"M 7.3 - 205 km ESE of Neiafu, Tonga",7.3,11/11/2022 10:48,5,5,green,1,833,us,149,1.865,21.0,mww,37.0,-19.2918,-172.129,"Neiafu, Tonga",,
4,M 6.6 -,6.6,9/11/2022 10:14,0,2,green,1,670,us,131,4.998,27.0,mww,624.464,-25.5948,178.278,,,


## Dataset Overview

In [None]:
print(f'Number of rows: {df.shape[0]}')
print(f'Number of columns: {df.shape[1]}')
print(f'Duplicate entries: {df.duplicated().sum()}')

Number of rows: 782
Number of columns: 19
Duplicate entries: 0


In [None]:
df.isnull().sum()

Unnamed: 0,0
title,0
magnitude,0
date_time,0
cdi,0
mmi,0
alert,366
tsunami,0
sig,0
net,0
nst,0


Four colunms has all the missing vlses:


*"alart":*  366

*"location":*  5

*"continent":*  576

*"country:"* 298

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 782 entries, 0 to 781
Data columns (total 19 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   title      782 non-null    object 
 1   magnitude  782 non-null    float64
 2   date_time  782 non-null    object 
 3   cdi        782 non-null    int64  
 4   mmi        782 non-null    int64  
 5   alert      416 non-null    object 
 6   tsunami    782 non-null    int64  
 7   sig        782 non-null    int64  
 8   net        782 non-null    object 
 9   nst        782 non-null    int64  
 10  dmin       782 non-null    float64
 11  gap        782 non-null    float64
 12  magType    782 non-null    object 
 13  depth      782 non-null    float64
 14  latitude   782 non-null    float64
 15  longitude  782 non-null    float64
 16  location   777 non-null    object 
 17  continent  206 non-null    object 
 18  country    484 non-null    object 
dtypes: float64(6), int64(5), object(8)
memory usage: 1

In [None]:
df.describe(include='all')

Unnamed: 0,title,magnitude,date_time,cdi,mmi,alert,tsunami,sig,net,nst,dmin,gap,magType,depth,latitude,longitude,location,continent,country
count,782,782.0,782,782.0,782.0,416,782.0,782.0,782,782.0,782.0,782.0,782,782.0,782.0,782.0,777,206,484
unique,768,,773,,,4,,,11,,,,9,,,,413,6,49
top,M 6.9 -,,11/1/2022 12:39,,,green,,,us,,,,mww,,,,"Kirakira, Solomon Islands",Asia,Indonesia
freq,3,,3,,,325,,,747,,,,468,,,,17,100,110
mean,,6.941125,,4.33376,5.964194,,0.390026,870.108696,,230.250639,1.325757,25.03899,,75.883199,3.5381,52.609199,,,
std,,0.445514,,3.169939,1.462724,,0.488068,322.465367,,250.188177,2.218805,24.225067,,137.277078,27.303429,117.898886,,,
min,,6.5,,0.0,1.0,,0.0,650.0,,0.0,0.0,0.0,,2.7,-61.8484,-179.968,,,
25%,,6.6,,0.0,5.0,,0.0,691.0,,0.0,0.0,14.625,,14.0,-14.5956,-71.66805,,,
50%,,6.8,,5.0,6.0,,0.0,754.0,,140.0,0.0,20.0,,26.295,-2.5725,109.426,,,
75%,,7.1,,7.0,7.0,,1.0,909.75,,445.0,1.863,30.0,,49.75,24.6545,148.941,,,


## Colors

In [None]:
tsunami_colors = {
    'No Tsunami': '#1f77b4',  # blue
    'Tsunami': '#d62728'}     # red

In [None]:
alert_colors = {
    'green': '#43A047',
    'yellow': '#FBC02D',
    'orange': '#FB8C00',
    'red': '#E53935'}

## Creat New Column

### 'year' column

In [None]:
df.columns

Index(['title', 'magnitude', 'date_time', 'cdi', 'mmi', 'alert', 'tsunami',
       'sig', 'net', 'nst', 'dmin', 'gap', 'magType', 'depth', 'latitude',
       'longitude', 'location', 'continent', 'country'],
      dtype='object')

In [None]:
df['date_time'].head(10)

Unnamed: 0,date_time
0,22-11-2022 02:03
1,18-11-2022 13:37
2,12/11/2022 7:09
3,11/11/2022 10:48
4,9/11/2022 10:14
5,9/11/2022 9:51
6,9/11/2022 9:38
7,20-10-2022 11:57
8,22-09-2022 06:16
9,19-09-2022 18:05


In [None]:
# Convert date_time to datetime objects, handling mixed formats
df['date_time'] = pd.to_datetime(df['date_time'], format='mixed', dayfirst=True)

In [None]:
df['year'] = df['date_time'].dt.year

In [None]:
df['year'].tail()

Unnamed: 0,year
777,2001
778,2001
779,2001
780,2001
781,2001


### Magnitude Normalization: 'mag_norm'

In [None]:
df["mag_norm"] = pow(10, df['magnitude']) / pow(3, df['magnitude'])
df['mag_norm'] = df['mag_norm'].round(0)

## Handeling Missing Values

### Country column

In [None]:
mask = df['country'].isna() & df['location'].notna()

df.loc[mask, 'country'] = df.loc[mask, 'location'].str.split(',').str[-1].str.strip()

df.head(5)

Unnamed: 0,title,magnitude,date_time,cdi,mmi,alert,tsunami,sig,net,nst,...,gap,magType,depth,latitude,longitude,location,continent,country,year,mag_norm
0,"M 7.0 - 18 km SW of Malango, Solomon Islands",7.0,2022-11-22 02:03:00,8,7,green,1,768,us,117,...,17.0,mww,14.0,-9.7963,159.596,"Malango, Solomon Islands",Oceania,Solomon Islands,2022,4572.0
1,"M 6.9 - 204 km SW of Bengkulu, Indonesia",6.9,2022-11-18 13:37:00,4,4,green,0,735,us,99,...,34.0,mww,25.0,-4.9559,100.738,"Bengkulu, Indonesia",,Indonesia,2022,4054.0
2,M 7.0 -,7.0,2022-11-12 07:09:00,3,3,green,1,755,us,147,...,18.0,mww,579.0,-20.0508,-178.346,,Oceania,Fiji,2022,4572.0
3,"M 7.3 - 205 km ESE of Neiafu, Tonga",7.3,2022-11-11 10:48:00,5,5,green,1,833,us,149,...,21.0,mww,37.0,-19.2918,-172.129,"Neiafu, Tonga",,Tonga,2022,6562.0
4,M 6.6 -,6.6,2022-11-09 10:14:00,0,2,green,1,670,us,131,...,27.0,mww,624.464,-25.5948,178.278,,,,2022,2825.0


In [None]:
df.isnull().sum()

Unnamed: 0,0
title,0
magnitude,0
date_time,0
cdi,0
mmi,0
alert,366
tsunami,0
sig,0
net,0
nst,0


We extract the country name from the **"location"** column and imputed it in the **"country"** column.

In [None]:
df['country'].unique()

array(['Solomon Islands', 'Indonesia', 'Fiji', 'Tonga', nan,
       'the Fiji Islands', 'Panama', 'Mexico', 'Taiwan', 'Vanuatu',
       'Papua New Guinea', "People's Republic of China",
       'the Kermadec Islands', 'Philippines', 'Brazil', 'Peru',
       'Argentina', 'Nicaragua', 'the Loyalty Islands', 'New Caledonia',
       'Japan', 'New Zealand', 'Kermadec Islands region', 'Alaska',
       'Cyprus', 'United States of America', 'Vanuatu region',
       'Antarctica', 'South Sandwich Islands region', 'Haiti',
       'Wallis and Futuna', 'Mauritius - Reunion region', 'Russia',
       'Mongolia', 'Chile', 'Greece', 'central Mid-Atlantic Ridge',
       'Jamaica', 'Turkey',
       'United Kingdom of Great Britain and Northern Ireland (the)',
       'Australia', 'El Salvador', 'South Sandwich Islands', 'Ecuador',
       'Prince Edward Islands region',
       'South Georgia and the South Sandwich Islands',
       'Svalbard and Jan Mayen', 'Canada', 'Venezuela', 'Bolivia',
       'Honduras'

In [None]:
country_mapping = {
    # Fiji variations
    'the Fiji Islands': 'Fiji',
    'Fiji region': 'Fiji',

    # UK
    'United Kingdom of Great Britain and Northern Ireland (the)': 'UK',

    # USA and states
    'United States of America': 'USA',
    'Alaska': 'USA',
    'California': 'USA',

    # China
    "People's Republic of China": 'China',

    # Japan
    'Japan region': 'Japan',

    # Philippines
    'Philippine Islands region': 'Philippines',

    # Micronesia
    'Micronesia region': 'Micronesia',

    # New Zealand
    'New Zealand region': 'New Zealand',
    'Kermadec Islands region': 'New Zealand',
    'the Kermadec Islands': 'New Zealand',
    'Kermadec Islands': 'New Zealand',

    # Kuril Islands
    'the Kuril Islands': 'Russia',
    'Kuril Islands': 'Russia',

    # Russia
    'Russia region': 'Russia',

    # India
    'India region': 'India',

    # Vanuatu
    'Vanuatu region': 'Vanuatu',

    # South Sandwich Islands
    'South Georgia and the South Sandwich Islands': 'South Georgia',
    'South Sandwich Islands region': 'South Sandwich Islands',

    # Mauritius-Reunion
    'Mauritius - Reunion region': 'Mauritius',

    # Bouvet Island
    'Bouvet Island region': 'Norway',
    'Svalbard and Jan Mayen': 'Norway',
    "Antarctica": 'Antarctica',
    'South Shetland Islands': 'Antarctica',

    # Prince Edward Islands
    'Prince Edward Islands region': 'South Africa',

    # Mid-Atlantic Ridge (no country - international waters)
    'central Mid-Atlantic Ridge': 'Central Mid-Atlantic Ridge',
    'northern Mid-Atlantic Ridge': 'Northern Mid-Atlantic Ridge',

    # Okhotsk
    'Okhotsk': 'Russia',

    # Off coast
    'off the west coast of northern Sumatra': 'Indonesia',

    # Loyalty Islands
    'the Loyalty Islands': 'New Caledonia',

    # Macquarie Island
    'Macquarie Island': 'Australia'}

In [None]:
df['country'] = df['country'].replace(country_mapping)

In [None]:
df['country'].unique()

array(['Solomon Islands', 'Indonesia', 'Fiji', 'Tonga', nan, 'Panama',
       'Mexico', 'Taiwan', 'Vanuatu', 'Papua New Guinea', 'China',
       'New Zealand', 'Philippines', 'Brazil', 'Peru', 'Argentina',
       'Nicaragua', 'New Caledonia', 'Japan', 'USA', 'Cyprus',
       'Antarctica', 'South Sandwich Islands', 'Haiti',
       'Wallis and Futuna', 'Mauritius', 'Russia', 'Mongolia', 'Chile',
       'Greece', 'Central Mid-Atlantic Ridge', 'Jamaica', 'Turkey', 'UK',
       'Australia', 'El Salvador', 'Ecuador', 'South Africa',
       'South Georgia', 'Norway', 'Canada', 'Venezuela', 'Bolivia',
       'Honduras', 'Costa Rica', 'Iran', 'Guatemala', 'Botswana', 'Italy',
       'Myanmar', 'Northern Mariana Islands', 'Afghanistan', 'India',
       'Tajikistan', 'Barbados', 'Nepal', 'Guam', 'Micronesia',
       'Pakistan', 'Colombia', 'Northern Mid-Atlantic Ridge', 'Samoa',
       'Kyrgyzstan', 'Martinique', 'Mozambique', 'Tanzania',
       'Cayman Islands', 'Algeria'], dtype=object)

In [None]:
country_null = df[df['country'].isnull()]

country_null

Unnamed: 0,title,magnitude,date_time,cdi,mmi,alert,tsunami,sig,net,nst,...,gap,magType,depth,latitude,longitude,location,continent,country,year,mag_norm
4,M 6.6 -,6.6,2022-11-09 10:14:00,0,2,green,1,670,us,131,...,27.0,mww,624.464,-25.5948,178.278,,,,2022,2825.0
19,M 6.9 -,6.9,2022-05-19 10:13:00,2,5,green,1,733,us,127,...,45.0,mww,10.0,-54.1325,159.027,,,,2022,4054.0
246,M 6.9 -,6.9,2016-05-28 05:38:00,3,3,green,1,733,us,0,...,19.0,mww,405.69,-21.9724,-178.204,,,,2016,4054.0


These three rows with ***null(NaN)*** value in *location and country* column is located at the middel of the **OCEAN**. That's why no country or location has been assigned to them. This is the standard practice when analyzing the earthquack data.

### Alert

In [None]:
df['alert'].value_counts()

Unnamed: 0_level_0,count
alert,Unnamed: 1_level_1
green,325
yellow,56
orange,22
red,13


In [None]:
df['alert'].isnull().sum()

np.int64(366)

**'alert'** column has 366 missing values.

#### Pie: With Missing Vale

In [None]:
df_2 = df.copy() # To preserve the value for future comparison, I saved the original dataframe

In [None]:
alert_with_na = df_2['alert'].value_counts().reset_index()

In [None]:
fig = px.pie(
    alert_with_na,
    names='alert',
    values='count',
    color='alert',
    color_discrete_map=alert_colors,
    hole=0.55)

fig.add_annotation(
    text="<b>Alert Level<br>With<br>Missing Value",
    x=0.5,
    y=0.5,
    font=dict(size=15, color='teal'),
    showarrow=False)

fig.update_traces(
    textposition='inside',
    textinfo='value+percent')

fig.update_layout(
    title=None,
    height=500,
    width=600)

fig.show()

### 'alt_lev' column

Here, I'll create a new column called **"alt_lev"** or alert level. This new column will contains the ***nummeic information*** about alert where each color will represent a number.

*   "green": 1
*   "yellow": 2
*   "orenge": 3
*   "red": 4

In [None]:
alert_map = {
    'green': 1,
    'yellow': 2,
    'orange': 3,
    'red': 4}

## Create new column called 'alt_lev'
df['alt_lev'] = df['alert'].map(alert_map)
display(df[['alert', 'alt_lev']].head())

Unnamed: 0,alert,alt_lev
0,green,1.0
1,green,1.0
2,green,1.0
3,green,1.0
4,green,1.0


In [None]:
df.columns

Index(['title', 'magnitude', 'date_time', 'cdi', 'mmi', 'alert', 'tsunami',
       'sig', 'net', 'nst', 'dmin', 'gap', 'magType', 'depth', 'latitude',
       'longitude', 'location', 'continent', 'country', 'year', 'mag_norm',
       'alt_lev'],
      dtype='object')

Here, I'll analyze the correlation matrix of the alt_lev column to to explore possible relationships with other variables. If strong or moderate correlations exist between the alert level and other variables, those variables can be used as predictors to help estimate or classify the alert level.

#### Correlation: 'df'

In [None]:
numeric_df = df.select_dtypes(include='number')

In [None]:
numeric_df.columns


Index(['magnitude', 'cdi', 'mmi', 'tsunami', 'sig', 'nst', 'dmin', 'gap',
       'depth', 'latitude', 'longitude', 'year', 'mag_norm', 'alt_lev'],
      dtype='object')

In [None]:
corr_matrix = numeric_df.corr()

In [None]:
fig = px.imshow(
    corr_matrix,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto',
    labels=dict(color='Correlation'))

fig.update_layout(
    title=dict(
        text="Correlation Matrix of Earthquake Variables", x=0.5))
fig.show()

#### Correlation: 'alert'

In [None]:
# Displaying more relevent relation with "alert" varible

alert_corr = numeric_df.drop(['tsunami', 'nst', 'gap', 'latitude', 'longitude', 'dmin', 'depth', 'year'], axis=1).corr()

In [None]:
fig = px.imshow(
    alert_corr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto',
    labels=dict(color='Correlation'))

fig.update_layout(
    title=dict(
        text="Correlation Matrix of alert Variables", x=0.5))

fig.show()

The correlation matrix shows a strong relationship between sig (significance) and the alert level. This is expected, as earthquakes with higher significance scores are more likely to be assigned higher alert levels.

To further investigate unusual patterns and understand how significance is distributed across different alert levels, we create a box plot that includes all data points. This visualization helps illustrate how significance scores vary for each alert level and highlights potential outliers or overlapping distributions.

In [None]:
df.groupby('alert')['sig'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
alert,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
green,325.0,773.224615,152.93992,650.0,676.0,732.0,806.0,1870.0
orange,22.0,1382.136364,313.801772,1005.0,1076.0,1471.0,1590.75,1960.0
red,13.0,2382.615385,484.484182,1274.0,2074.0,2397.0,2820.0,2910.0
yellow,56.0,1047.696429,332.340675,656.0,777.5,981.5,1181.25,2048.0


#### Box: Alert

In [None]:
# Create boxplot
fig = px.box(
    df[df['alert'].notna()],
    x='alert',
    y='sig',
    color='alert',
    points="all",
    color_discrete_map=alert_colors,
    title='Significance Score by Alert Level')

# Layout tweaks
fig.update_layout(
    width=900,
    height=600,
    xaxis_title='Alert Level',
    yaxis_title='Significance Score',
    showlegend=True)

fig.show()

We can observe outliers across all alert levels, particularly in the green alert levels. However, the overall distribution shown in the box plot indicates a strong positive relationship between significance and alert level.

#### ML: Random Forest

Here, I'll use Random Forest to predict and validate alert missing values based on

'sig': significance ,

'mmi': The maximum estimated instrumental intensity for the event,

'cdi': The maximum reported intensity for the event range,

'mag_norm': magnitude of the earthquake

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
# Prepare data

train_data = df[df['alt_lev'].notna()].copy()
predict_data = df[df['alt_lev'].isna()].copy()

In [None]:
# Features
features = ['sig', 'mmi', 'cdi', 'mag_norm']

# Prepare training data
X_train = train_data[features].fillna(train_data[features].median())
y_train = train_data['alt_lev']

In [None]:
# Train model
model = RandomForestClassifier(n_estimators=100, random_state=44, max_depth=5)

# Cross-validation to check accuracy
cv_scores = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')
print(f"Cross-validation accuracy: {cv_scores.mean():.2%} (+/- {cv_scores.std():.2%})")

Cross-validation accuracy: 84.61% (+/- 2.12%)


In [None]:
# Train final model
model.fit(X_train, y_train)

# Check feature importance
importance = pd.DataFrame({
    'feature': features,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=False)
print("Feature Importance:")
print(importance)

Feature Importance:
    feature  importance
0       sig    0.495585
1       mmi    0.270774
2       cdi    0.133105
3  mag_norm    0.100536


In [None]:
# Predict missing values
X_predict = predict_data[features].fillna(predict_data[features].median())
predicted_alt_lev = model.predict(X_predict)

# Get prediction confidence
predicted_proba = model.predict_proba(X_predict)
max_proba = predicted_proba.max(axis=1)

print(f"Average prediction confidence: {max_proba.mean():.2%}")

Average prediction confidence: 85.69%


In [None]:
# # Fill predictions
df.loc[df['alt_lev'].isna(), 'alt_lev'] = predicted_alt_lev

# Map back to alert colors
reverse_map = {1: 'green', 2: 'yellow', 3: 'orange', 4: 'red'}
df.loc[df['alert'].isna(), 'alert'] = df.loc[df['alert'].isna(), 'alt_lev'].map(reverse_map)

print(f"Successfully filled {len(predict_data)} missing alert values")
print(df['alert'].value_counts().sort_index())

Successfully filled 366 missing alert values
alert
green     646
orange     33
red        15
yellow     88
Name: count, dtype: int64


#### Pie: Without Missing Vale

In [None]:
alert_without_na = df['alert'].value_counts().reset_index()

In [None]:
pie_colors = ['#43A047','#FBC02D','#FB8C00', '#E53935']

In [None]:
# Create subplots
fig = make_subplots(
    rows=1, cols=2,
    specs=[[{'type': 'pie'}, {'type': 'pie'}]])

# Add first pie chart
fig.add_trace(
    go.Pie(
        labels=alert_without_na['alert'],
        values=alert_without_na['count'],
        marker=dict(colors=pie_colors),
        hole=0.55,
        textposition='inside',
        textinfo='value+percent',
    ),
    row=1, col=1)

fig.add_annotation(
    text="<b>Alert Level<br>Without<br>Missing Value",
    x=0.145, y=0.5,
    showarrow=False,
    font=dict(size=14))

# Add second pie chart
fig.add_trace(
    go.Pie(
        labels=alert_with_na['alert'],
        values=alert_with_na['count'],
        marker=dict(colors=pie_colors),
        hole=0.55,
        textposition='inside',
        textinfo='value+percent'
    ),
    row=1, col=2)

fig.add_annotation(
    text="<b>Alert Level<br>With<br>Missing Value",
    x=0.855, y=0.5,
    showarrow=False,
    font=dict(size=14))

fig.update_layout(
    showlegend=True,
    height=500,
    width=900)

fig.show()

## Correlation Matrix: Final


In [None]:
numeric_df2 = df.select_dtypes(include='number')

In [None]:
numeric_df2.columns

Index(['magnitude', 'cdi', 'mmi', 'tsunami', 'sig', 'nst', 'dmin', 'gap',
       'depth', 'latitude', 'longitude', 'year', 'mag_norm', 'alt_lev'],
      dtype='object')

In [None]:
corr_matrix2 = numeric_df2.corr()

In [None]:
fig = px.imshow(
    corr_matrix2,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto',
    labels=dict(color='Correlation'))

fig.update_layout(
    title=dict(
        text="Correlation Matrix of Earthquake Variables", x=0.5))

fig.show()

The correlation matrix for the alt_lev variable did not change significantly and remains largely consistent with the previous results (before data imputation).

## Earthquakes: Magnitude

In [None]:
magnitude_counts = df['magnitude'].value_counts().sort_index()

fig = px.bar(
    x=magnitude_counts.index,
    y=magnitude_counts.values,
    title='Earthquakes by Magnitude',
    labels={'x': 'Magnitude', 'y': 'Number of Earthquakes'},
    color=magnitude_counts.index,
    color_continuous_scale='Teal_r')

fig.update_layout(
    showlegend=False,
    height=600,
    width=1000,
    bargap=0)

fig.update_traces(
    textposition='outside',
    text=magnitude_counts)

fig.show()

Earthquakes under **6.5 magnitude** is not available in the dataset.

## Magnitude Measurement Method

In [None]:
mag_type = df['magType'].value_counts().reset_index()

### Pie: Measurement Method

In [None]:
fig = px.pie(
    mag_type,
    names='magType',
    values='count',
    hole=0.5,
    color_discrete_sequence=px.colors.sequential.Teal_r)

fig.add_annotation(
    text="<b>Magnitude<br>Calculate Method",
    x=0.5,
    y=0.5,
    font=dict(size=15, color='teal'),
    showarrow=False)

fig.update_layout(
    title=None,
    showlegend=True,
    height=500,
    width=600)

fig.update_traces(
    textposition='inside',
    textinfo='label+percent')

fig.show()

## Magnitude & Measurement Method

In [None]:
mmm = df.groupby(['magnitude', 'magType']).size().reset_index(name='count')

### Box

In [None]:
fig = px.box(
    df,
    x='magType',
    y='magnitude',
    points='all',
    color='magType',
    color_discrete_sequence=px.colors.sequential.Teal_r, # color
    title='Magnitude Distribution by MagType',
    labels={
        'magType': 'MagType',
        'magnitude': 'Magnitude'})

fig.update_yaxes(range=[6, 9.5])

fig.update_traces(
    hovertemplate=
    "MagType: %{x}<br>" +
    "Magnitude: %{y}<br>" +
    "<extra></extra>")

fig.show()

Earthquake magnitudes calculated using the "mw" algorithm have a higher median value than those calculated by other methods. However, only 16 earthquakes were measured using this algorithm, which is too small a sample size to conclude that "mw" is inappropriate or less reliable.

## Earthquakes by Year

### Bar: Year

In [None]:
# Count earthquakes per year
year_counts = df['year'].value_counts().sort_index()

In [None]:
fig = px.bar(
    x=year_counts.index,
    y=year_counts.values,
    text=year_counts.values,
    color=year_counts,
    color_continuous_scale= 'Teal', # color
    title='Number of Earthquakes by Year')

fig.update_layout(
    xaxis_title='Year',
    yaxis_title='Number of Earthquakes',
    showlegend=False)

fig.show()

## Earthquakes by Country

In [None]:
# Get country counts
country_counts = df['country'].value_counts()
top_20 = country_counts.head(18)
others_country = country_counts[18:].sum()

# Combine data
country_data = top_20.copy()
if others_country > 0:
    country_data['Others'] = others_country

### Sankey

In [None]:
# Prepare Sankey data
labels = ['Total Earthquakes'] + [f"{country} ({count})" for country, count in country_data.items()]
source = [0] * len(country_data)
target = list(range(1, len(country_data) + 1))
values = list(country_data.values)

In [None]:
# Create Sankey
fig = go.Figure(data=[go.Sankey(
    node=dict(
        pad=15,
        thickness=50,
        line=dict(color='white', width=3),
        label=labels
    ),
    link=dict(
        source=source,
        target=target,
        value=values,
        color= 'rgba(135, 206, 250, 0.3)'
    ),
    textfont=dict(size=12))])

fig.update_layout(
    title='Number of Earthquakes by Country/Region',
    height=750)

fig.show()

## Parallel: Earthquake data

In [None]:
cols = ['magnitude', 'mmi', 'cdi', 'sig', 'alt_lev', 'depth', 'gap', 'tsunami']

fig = px.parallel_coordinates(
    df,
    dimensions=cols,
    color='magnitude',
    color_continuous_scale=px.colors.diverging.Tealrose, # color
    labels=cols)

fig.show()

This parallel data coordination how one datapoint of a veriables is connecting with another datapoint of another veriables. This plot shows overall the dataset is very massy but some veriables may connceted and has impact on each others. For example, 'cdi' and 'sig', 'magnitude' and 'mmi', 'sig' and 'alt_lev' has povitive relation.

## Tsunami Data

In [None]:
# Filter
tsunami_events = df[df['tsunami'] == 1]

# Count the occurrences per year
yearly_tsunamis = tsunami_events.groupby('year').size().reset_index(name='count')

###Bar: Tsunami

In [None]:
fig = px.bar(
    yearly_tsunamis,
    x='year',
    y='count',
    text=yearly_tsunamis['count'],
    title='Total Number of Tsunami Events per Year',
    labels={'count': 'Number of Tsunamis', 'year': 'Year'},
    color='count',
    color_continuous_scale= 'Teal')

# Improve layout
fig.update_layout(xaxis_tickangle=-45)
fig.show()

No Tsunami related data was available before 2013. To analyze or make any prediction about tsunami related data we have to analyze data after 2013.

In [None]:
tsunami_df = df[df['year'] >= 2013].copy()

In [None]:
tsunami_df.tail()

Unnamed: 0,title,magnitude,date_time,cdi,mmi,alert,tsunami,sig,net,nst,...,magType,depth,latitude,longitude,location,continent,country,year,mag_norm,alt_lev
413,"M 7.1 - 112 km WSW of Lata, Solomon Islands",7.1,2013-02-06 01:23:00,0,5,green,0,776,us,550,...,mww,10.0,-11.183,164.882,"Lata, Solomon Islands",,Solomon Islands,2013,5157.0,1.0
414,"M 8.0 - 75 km W of Lata, Solomon Islands",8.0,2013-02-06 01:12:00,8,7,green,1,993,us,460,...,mww,24.0,-10.799,165.114,"Lata, Solomon Islands",,Solomon Islands,2013,15242.0,1.0
415,"M 6.9 - 18 km SSW of Obihiro, Japan",6.9,2013-02-02 14:17:00,6,7,yellow,0,814,us,686,...,mww,107.0,42.77,143.092,"Obihiro, Japan",Asia,Japan,2013,4054.0,2.0
416,"M 6.8 - 54 km N of Vallenar, Chile",6.8,2013-01-30 20:15:00,6,7,green,0,771,us,596,...,mww,45.0,-28.094,-70.653,"Vallenar, Chile",South America,Chile,2013,3594.0,1.0
417,"M 7.5 - 110 km SW of Edna Bay, Alaska",7.5,2013-01-05 08:58:00,6,6,yellow,0,1425,ak,0,...,mw,8.7,55.228,-134.859,"Edna Bay, Alaska",,USA,2013,8348.0,2.0


In [None]:
tsunami_df.shape

(418, 22)

New tsunami_df has 418 rows and 22 columns

### Correlation of Tsunami Data Frame

In [None]:
numeric_df = tsunami_df.select_dtypes(include='number')

In [None]:
numeric_df.columns

Index(['magnitude', 'cdi', 'mmi', 'tsunami', 'sig', 'nst', 'dmin', 'gap',
       'depth', 'latitude', 'longitude', 'year', 'mag_norm', 'alt_lev'],
      dtype='object')

In [None]:
tsunami_crr = numeric_df.corr()

In [None]:
fig = px.imshow(
    tsunami_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto',
    labels=dict(color='Correlation'))

fig.update_layout(
    title=dict(
        text="Correlation Matrix of Earthquake Variables", x=0.5))

fig.show()

### Tsunami During Earthquake

In [None]:
# Map 'tsunami' column values (0 and 1) to descriptive labels
tsunami_labels = tsunami_df['tsunami'].map({0: 'No Tsunami', 1: 'Tsunami'})

In [None]:
fig = px.pie(
    names=tsunami_labels,
    color=tsunami_labels,
    color_discrete_map=tsunami_colors,
    hole=0.6)

fig.add_annotation(
    text="<b>Tsunami<br>during<br>Earthquake",
    x=0.5,
    y=0.5,
    font=dict(size=18),
    showarrow=False)

fig.update_layout(
    title=None,
    width=600,
    height=500)

fig.update_traces(textinfo='value+percent')

fig.show()

### Magnitude Level and Tsunami Occurrence

In [None]:
# Group earthquakes by magnitude
grouped = tsunami_df.groupby('magnitude')['tsunami']

# Count total earthquakes and sum tsunamis for each magnitude
summary = grouped.agg(['count', 'sum'])

# Rename columns to be more descriptive
summary.columns = ['Total_Earthquake', 'Tsunami']

# Calculate earthquakes that didn't cause tsunamis
summary['No_Tsunami'] = summary['Total_Earthquake'] - summary['Tsunami']

# Calculate probability of tsunami for each magnitude
summary['Probability'] = summary['Tsunami'] / summary['Total_Earthquake']

summary = summary.sort_index()

display(summary)

Unnamed: 0_level_0,Total_Earthquake,Tsunami,No_Tsunami,Probability
magnitude,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
6.5,80,50,30,0.625
6.6,64,43,21,0.671875
6.7,48,32,16,0.666667
6.8,41,32,9,0.780488
6.9,52,37,15,0.711538
7.0,26,21,5,0.807692
7.1,23,19,4,0.826087
7.2,10,9,1,0.9
7.3,17,13,4,0.764706
7.4,5,5,0,1.0


This data represents number of earthquake of different magnitdes, and the number of **tsunami or no tsunami** of different magnitude.

The magnitude data shows, when the magnitude is higher or equal to 6.75 can increase the chance of tsunami occurence. Here we filter the earthqueak with **6.75 or higher magnitude**

In [None]:
# Convert each magnitude value to a string with 2 decimal places
m_labels = []
for magnitude in summary.index:
    label = f"{magnitude:.2f}"
    m_labels.append(label)

# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add stacked bar for No Tsunami
fig.add_trace(
    go.Bar(
        x=m_labels,
        y=summary['No_Tsunami'],
        name='No Tsunami',
        marker_color= '#1f77b4', #'#86b895',
        marker_line_width=1),
    secondary_y=False)

# Add stacked bar for Tsunami
fig.add_trace(
    go.Bar(
        x=m_labels,
        y=summary['Tsunami'],
        name='Tsunami',
        marker_color= "#d62728", #'#2b7077',
        marker_line_width=1),
    secondary_y=False)

# Add probability line on secondary axis
fig.add_trace(
    go.Scatter(
        x=m_labels,
        y=summary['Probability'],
        name='Tsunami Probability',
        mode='lines+markers',
        line=dict(color='#B22222', width=2),
        marker=dict(size=6, color='#B22222')),
    secondary_y=True)

# Update layout
fig.update_layout(
    title=dict(text='Earthquake & Tsunami Probability by Magnitude'),
    barmode='stack',
    width=1000,
    height=600,
    hovermode='x unified',
    legend=dict(
        orientation='h',
        yanchor='bottom',
        y=1.02,
        xanchor='right',
        x=1))

# Update x-axis
fig.update_xaxes(
    title_text='Earthquake Magnitude',
    title_font=dict(size=12),
    tickangle=45)

# Update primary y-axis (left)
fig.update_yaxes(
    title_text='Number of Earthquakes',
    title_font=dict(size=12))

# Update secondary y-axis (right)
fig.update_yaxes(
    title_text='Probability of Tsunami',
    title_font=dict(size=12, color='#1B5E20'),
    range= (0.05, 1.05),
    secondary_y=True)

fig.show()

Magnitudes of **8.2** and **8.3** may indicate a higher probability of a tsunami; however, only **nine earthquakes** of this magnitude have been recorded. This small number of incidents is insufficient to conclude that earthquakes of magnitude 8 or higher will necessarily cause a tsunami.

### Filter Earthqueak data by magnitude

In [None]:
# Filter data for magnitude >= 6.75
eq_675 = tsunami_df[tsunami_df['magnitude'] >= 6.75]

tsunami_labels_675 = eq_675['tsunami'].map({0: 'No Tsunami', 1: 'Tsunami'})

In [None]:
pie_color_2 = ['#d62728', '#1f77b4']

In [None]:
fig = make_subplots(
    rows=1, cols=2,
    specs=[[{'type': 'pie'}, {'type': 'pie'}]])

# First pie
fig.add_trace(
    go.Pie(
        labels=tsunami_labels,
        marker=dict(colors=pie_color_2),
        hole=0.6,
        textinfo='value+percent'),
    row=1, col=1)

# Second pie
fig.add_trace(
    go.Pie(
        labels=tsunami_labels_675,
        marker=dict(colors=pie_color_2),
        hole=0.6,
        textinfo='value+percent'),
    row=1, col=2)

# Center text annotations
fig.add_annotation(
    text="<b>Tsunami<br>during<br>Earthquake</b>",
    x=0.15, y=0.5,
    showarrow=False,
    font=dict(size=15))

fig.add_annotation(
    text="<b>Tsunami during<br>Earthquakes<br>over 6.75 Magnitude</b>",
    x=0.9, y=0.5,
    showarrow=False,
    font=dict(size=14))

fig.update_layout(
    showlegend=True,
    width=900,
    height=500,
    title=None)

fig.show()

When magnitude is **6.75 or more** the tsunami probability goes slightly higher. **From 72.7% to 79.2%**

### Tsunami Map

In [None]:
fig = px.scatter_mapbox(
    tsunami_df,
    lat="latitude",
    lon="longitude",
    color=tsunami_labels,       # color by tsunami
    size="mag_norm",            # size by normalized magnitude
    size_max=18,
    hover_name="location",
    hover_data={"magnitude": True, "tsunami": True},
    zoom=1,
    height=550,
    color_discrete_map=tsunami_colors)

# map style
fig.update_layout(
    mapbox_style="open-street-map",
    margin={"r":0,"t":40,"l":20,"b":0},
    title="Earthquake Locations: Tsunami vs No Tsunami")

fig.show()

Earthquakes that occur in **coastal or near-coastal areas** are more likely to trigger **tsunamis**.

### Tsunami by Country

In [None]:
country_tsunami = (tsunami_df[tsunami_df['tsunami'] == 1]
                   .groupby('country')
                   .size()
                   .reset_index(name='tsunami_count')
                   .sort_values(by='tsunami_count', ascending=False)
                   .reset_index(drop=True))

In [None]:
country_tsunami=country_tsunami.head(25)

In [None]:
fig = px.treemap(
    country_tsunami,
    path=['country'],
    values='tsunami_count',
    color='tsunami_count',
    color_continuous_scale='Blues', # clor
    title='Countries with Highest Number of Tsunamis')

fig.update_layout(template='plotly_white',
                  margin=dict(t=50, l=10, r=10, b=10))
fig.update_traces(textinfo='label+value')

fig.show()


### Earthquake & Tsunami by Country

In [None]:
group_country = tsunami_df.groupby('country')['tsunami'].agg(['count', 'sum'])

In [None]:
group_country.columns = ['Total_Earthquake', 'Tsunami']

In [None]:
group_country['No Tsunami'] = group_country['Total_Earthquake'] - group_country['Tsunami']

In [None]:
group_country=group_country.sort_values(by='Total_Earthquake', ascending=False)

In [None]:
group_country=group_country.head(25)

In [None]:
fig = px.bar(
    group_country.reset_index(),
    x='country',
    y=['No Tsunami', 'Tsunami'],
    title='Countries with Number of Earthquake and Tsunamis',
    barmode='stack',
    color_discrete_map=tsunami_colors)

fig.update_layout(hovermode='x unified')
fig.update_yaxes(title_text='Number of Earthquakes')
fig.show()

### Parallel: Tsunami Data

In [None]:
cols = ['magnitude', 'mmi', 'cdi', 'sig', 'alt_lev', 'depth', 'gap', 'tsunami']

fig = px.parallel_coordinates(
    tsunami_df,
    dimensions=cols,
    color='magnitude',
    color_continuous_scale=px.colors.diverging.Tealrose, # color
    labels=cols)

fig.show()

## Earthquake Map

### Earthquake: Heat Map

In [None]:
fig = px.density_mapbox(
    df,
    lat = "latitude",
    lon = "longitude",
    z = "magnitude",
    radius = 13,
    hover_name = "location",
    zoom=  1,
    height = 550,
    color_continuous_scale = ["cyan", "yellow", "orange", "red"])

fig.update_layout(
    mapbox_style = "open-street-map",
    mapbox_zoom = 1,
    margin = {"r":0,"t":40,"l":20,"b":0},
    title = "Earthquake Density Heatmap")

fig.show()

### Earthquake Map: Alert Level

In [None]:
fig = px.scatter_mapbox(
    df,
    lat = "latitude",
    lon = "longitude",
    color = "alert",                 # color by alert level
    size = "mag_norm",               # Size by magnitude
    size_max = 18,                   # max dot size
    hover_name = "location",
    hover_data = {
      "magnitude": True, "alert": True},
    zoom = 1,
    height = 550,
    color_discrete_map = alert_colors,
    category_orders= {
        "alert": ["green", "yellow", "orange", "red"]})


fig.update_layout(
    mapbox_style = "open-street-map",
    mapbox_zoom = 1,
    margin = {"r":0,"t":40,"l":20,"b":0},
    title = "Earthquake Locations by Alert Level and Magnitude")

fig.show()

##Country: Indonesia

In [None]:
indonesia = df[df['country']=='Indonesia']

In [None]:
indonesia_tsu = tsunami_df[tsunami_df['country']=='Indonesia']

In [None]:
indonesia_crr = (indonesia
                 .drop(['mag_norm', 'year'], axis=1)
                 .select_dtypes(include='number')
                 .corr())

In [None]:
indonesia_tsu_crr = (indonesia_tsu
                 .drop(['mag_norm', 'year'], axis=1)
                 .select_dtypes(include='number')
                 .corr())

###Crr

In [None]:
# Create subplot layout
fig = make_subplots(
    rows=2, cols=1,
    subplot_titles=[
        "Correlation Matrix of Earthquake in Indonesia",
        "Correlation Matrix of Earthquake in Indonesia (Tsunami)"],
    vertical_spacing=0.12) # gap between plots

# First heatmap
fig1 = px.imshow(
    indonesia_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Second heatmap
fig2 = px.imshow(
    indonesia_tsu_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Add traces to subplots
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=2, col=1)

# Share the same color scale
fig.update_layout(
    coloraxis=dict(colorscale='RdBu_r'),
    margin=dict(l=20,r=20,t=35,b=25))

# Apply shared color axis
fig.data[0].update(coloraxis='coloraxis')
fig.data[1].update(coloraxis='coloraxis')

fig.show()

The data shows a moderate positive correlation between tsunami occurrence and geographic coordinates (0.46 for latitude and 0.54 for longitude), indicating that tsunamis are geographically clustered in certain areas. The weak correlations between earthquake **magnitude** and depth-related variables such as **gap** and **dmin** further support the assumption that tsunamis are more strongly associated with geographic location than with these seismic characteristics.

### Alert level Distribution

In [None]:
indo_alert = indonesia['alert'].value_counts().reset_index()

In [None]:
fig = px.pie(indo_alert,
             names='alert',
             values='count',
              color='alert',
             color_discrete_map=alert_colors,
             hole=0.55)

fig.add_annotation(
    text="<b>Alert Level<br> of Earthquakes<br>in Indonesia",
    x=0.5,
    y=0.5,
    font=dict(size=16, color='teal'),
    showarrow=False)

fig.update_layout(
    title=None,
    showlegend=True,
    height=500,
    width=600)

fig.update_traces(
    textposition='inside',
    textinfo='value+percent')

fig.show()

### Impact Analysis: Indonesia vs Global

In [None]:
# Define the columns and titles
columns = ["sig", "cdi", "mmi", "magnitude"]
titles = ("Significence", "Max reported intensity", "Max instrumental intensity", "Magnitude")
in_color = ['#636EFA', '#EF553B', '#00CC96', '#AB63FA']

# Create a subplot figure with 2 rows and 4 columns
fig = make_subplots(
    rows=2, cols=4,
    subplot_titles=titles + titles,
    vertical_spacing=0.1)

# Add Row 1: Global
for i, col in enumerate(columns):
    fig.add_trace(
        go.Box(
            y=df[col],
            name=col,
            marker_color=in_color[i],
            boxpoints='all'),
        row=1, col=i+1)

# Add Row 2: Indonesia
for i, col in enumerate(columns):
    fig.add_trace(
        go.Box(
            y=indonesia[col],
            name=col,
            marker_color=in_color[i],
            boxpoints='all'),
        row=2, col=i+1)

fig.update_layout(
    title_text="Earthquake Impact Analysis: Global (Top) vs Indonesia (Bottom)",
    showlegend=False,
    height=800,
    width=1000)

# Title Font Size
fig.update_annotations(font_size=12)

fig.show()

Indonesia has less significence and less reported intensity than global average.

### Parallel: Indonesia

In [None]:
cols = ['magnitude', 'mmi', 'cdi', 'sig', 'alt_lev', 'depth', 'gap']

fig = px.parallel_coordinates(
    indonesia,
    dimensions=cols,
    color='magnitude',
    color_continuous_scale=px.colors.diverging.Tealrose, # color
    labels=cols)

fig.show()

##Country: Japan

In [None]:
japan = df[df['country']=='Japan']

In [None]:
japan_tsu = tsunami_df[tsunami_df['country']=='Japan']

In [None]:
japan_crr = (japan
                 .drop(['mag_norm','year'], axis=1)
                 .select_dtypes(include='number')
                 .corr())

In [None]:
japan_tsu_crr = (japan_tsu
                 .drop(['mag_norm','year'], axis=1)
                 .select_dtypes(include='number')
                 .corr())

###Crr

In [None]:
# Create subplot layout
fig = make_subplots(
    rows=2, cols=1,
    subplot_titles=[
        "Correlation Matrix of Earthquake in Japan",
        "Correlation Matrix of Earthquake in Japan (Tsunami)"],
    vertical_spacing=0.12) # gap between plots

# First heatmap
fig1 = px.imshow(
    japan_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Second heatmap
fig2 = px.imshow(
    japan_tsu_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Add traces to subplots
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=2, col=1)

# Share the same color scale
fig.update_layout(
    coloraxis=dict(colorscale='RdBu_r'),
    margin=dict(l=20,r=20,t=35,b=25))

# Apply shared color axis
fig.data[0].update(coloraxis='coloraxis')
fig.data[1].update(coloraxis='coloraxis')

fig.show()

### Alert level Distribution

In [None]:
japan_alert = japan['alert'].value_counts().reset_index()

In [None]:
fig = px.pie(japan_alert,
             names='alert',
             values='count',
              color='alert',
             color_discrete_map=alert_colors,
             hole=0.55)

fig.add_annotation(
    text="<b>Alert Level<br> of Earthquakes<br>in Japan",
    x=0.5,
    y=0.5,
    font=dict(size=16, color='teal'),
    showarrow=False)

fig.update_layout(
    title=None,
    showlegend=True,
    height=500,
    width=600)

fig.update_traces(
    textposition='inside',
    textinfo='value+percent')

fig.show()

### Impact Analysis: Japan vs Global

In [None]:
# Define the columns and titles
columns = ["sig", "cdi", "mmi", "magnitude"]
titles = ("Significence", "Max reported intensity", "Max instrumental intensity", "Magnitude")
in_color = ['#636EFA', '#EF553B', '#00CC96', '#AB63FA']

# Create a subplot figure with 2 rows and 4 columns
fig = make_subplots(
    rows=2, cols=4,
    subplot_titles=titles + titles,
    vertical_spacing=0.1)

# Add Row 1: Global
for i, col in enumerate(columns):
    fig.add_trace(
        go.Box(
            y=df[col],
            name=col,
            marker_color=in_color[i],
            boxpoints='all'
        ),
        row=1, col=i+1)

# Add Row 2: Indonesia
for i, col in enumerate(columns):
    fig.add_trace(
        go.Box(
            y=japan[col],
            name=col,
            marker_color=in_color[i],
            boxpoints='all'
        ),
        row=2, col=i+1)

fig.update_layout(
    title_text="Earthquake Impact Analysis: Global (Top) vs Japan (Bottom)",
    showlegend=False,
    height=800,
    width=1000)

# Title Font Size
fig.update_annotations(font_size=12)

fig.show()

Japan's earthquake has higher significent impact and reported intensity that global average. Their earthquake is much deadlyer than rest of the world.

### Parallel: Japan

In [None]:
cols = ['magnitude', 'mmi', 'cdi', 'sig', 'alt_lev', 'depth', 'gap', 'tsunami']

fig = px.parallel_coordinates(
    japan,
    dimensions=cols,
    color='magnitude',
    color_continuous_scale=px.colors.diverging.Tealrose, # color
    labels=cols)

fig.show()

##Country: Papua New Guinea

In [None]:
png = df[df['country']=='Papua New Guinea']

In [None]:
png_tsu = tsunami_df[tsunami_df['country']=='Papua New Guinea']

In [None]:
png_crr = (png
                 .drop(['year','mag_norm'], axis=1)
                 .select_dtypes(include='number')
                 .corr())

In [None]:
png_tsu_crr = (png_tsu
                 .drop(['year', 'mag_norm'], axis=1)
                 .select_dtypes(include='number')
                 .corr())

###Crr

In [None]:
# Create subplot layout
fig = make_subplots(
    rows=2, cols=1,
    subplot_titles=[
        "Correlation Matrix of Earthquake in Papua New Guinea",
        "Correlation Matrix of Earthquake in Papua New Guinea (Tsunami)"],
    vertical_spacing=0.12) # gap between plots

# First heatmap
fig1 = px.imshow(
    png_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Second heatmap
fig2 = px.imshow(
    png_tsu_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Add traces to subplots
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=2, col=1)

# Share the same color scale
fig.update_layout(
    coloraxis=dict(colorscale='RdBu_r'),
    margin=dict(l=20,r=20,t=35,b=25))

# Apply shared color axis
fig.data[0].update(coloraxis='coloraxis')
fig.data[1].update(coloraxis='coloraxis')

fig.show()

### Alert level Distribution

In [None]:
png_alert = png['alert'].value_counts().reset_index()

In [None]:
fig = px.pie(png_alert,
             names='alert',
             values='count',
              color='alert',
             color_discrete_map=alert_colors,
             hole=0.55)

fig.add_annotation(
    text="<b>Alert Level<br>of Earthquakes in<br>Papua New Guinea",
    x=0.5,
    y=0.5,
    font=dict(size=15, color='teal'),
    showarrow=False)

fig.update_layout(
    title=None,
    showlegend=True,
    height=500,
    width=600)

fig.update_traces(
    textposition='inside',
    textinfo='value+percent')

fig.show()

### Impact Analysis: Papua New Guinea vs Global

In [None]:
# Define the columns and titles
columns = ["sig", "cdi", "mmi", "magnitude"]
titles = ("Significence", "Max reported intensity", "Max instrumental intensity", "Magnitude")
in_color = ['#636EFA', '#EF553B', '#00CC96', '#AB63FA']

# Create a subplot figure with 2 rows and 4 columns
fig = make_subplots(
    rows=2, cols=4,
    subplot_titles=titles + titles,
    vertical_spacing=0.1)

# Add Row 1: Global
for i, col in enumerate(columns):
    fig.add_trace(
        go.Box(
            y=df[col],
            name=col,
            marker_color=in_color[i],
            boxpoints='all'
        ),
        row=1, col=i+1)

# Add Row 2: Indonesia
for i, col in enumerate(columns):
    fig.add_trace(
        go.Box(
            y=png[col],
            name=col,
            marker_color=in_color[i],
            boxpoints='all'
        ),
        row=2, col=i+1)

fig.update_layout(
    title_text="Earthquake Impact Analysis: Global (Top) vs Papua New Guinea (Bottom)",
    showlegend=False,
    height=800,
    width=1000)

# Title Font Size
fig.update_annotations(font_size=12)

fig.show()

### Parallel: Papua New Guinea

In [None]:
cols = ['magnitude', 'mmi', 'cdi', 'sig', 'alt_lev', 'depth', 'gap', 'tsunami']

fig = px.parallel_coordinates(
    png,
    dimensions=cols,
    color='magnitude',
    color_continuous_scale=px.colors.diverging.Tealrose, # color
    labels=cols)

fig.show()

##Country: Chile

In [None]:
chile = df[df['country']=='Chile']

In [None]:
chile_tsu = tsunami_df[tsunami_df['country']=='Chile']

In [None]:
chile_crr = (chile
                 .drop(['mag_norm', 'year'], axis=1)
                 .select_dtypes(include='number')
                 .corr())

In [None]:
chile_tsu_crr = (chile_tsu
                 .drop(['mag_norm', 'year'], axis=1)
                 .select_dtypes(include='number')
                 .corr())

###Crr

In [None]:
# Create subplot layout
fig = make_subplots(
    rows=2, cols=1,
    subplot_titles=[
        "Correlation Matrix of Earthquake in Chile",
        "Correlation Matrix of Earthquake in Chile (Tsunami)"],
    vertical_spacing=0.12) # gap between plots

# First heatmap
fig1 = px.imshow(
    chile_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Second heatmap
fig2 = px.imshow(
    chile_tsu_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Add traces to subplots
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=2, col=1)

# Share the same color scale
fig.update_layout(
    coloraxis=dict(colorscale='RdBu_r'),
    margin=dict(l=20,r=20,t=35,b=25))

# Apply shared color axis
fig.data[0].update(coloraxis='coloraxis')
fig.data[1].update(coloraxis='coloraxis')

fig.show()

### Alert level Distribution

In [None]:
chile_alert = chile['alert'].value_counts().reset_index()

In [None]:
fig = px.pie(chile_alert,
             names='alert',
             values='count',
              color='alert',
             color_discrete_map=alert_colors,
             hole=0.55)

fig.add_annotation(
    text="<b>Alert Level<br>of Earthquakes<br>in Chile",
    x=0.5,
    y=0.5,
    font=dict(size=16, color='teal'),
    showarrow=False)

fig.update_layout(
    title=None,
    showlegend=True,
    height=500,
    width=600)

fig.update_traces(
    textposition='inside',
    textinfo='value+percent')

fig.show()

### Impact Analysis: Chile vs Global

In [None]:
# Define the columns and titles
columns = ["sig", "cdi", "mmi", "magnitude"]
titles = ("Significence", "Max reported intensity", "Max instrumental intensity", "Magnitude")
in_color = ['#636EFA', '#EF553B', '#00CC96', '#AB63FA']

# Create a subplot figure with 2 rows and 4 columns
fig = make_subplots(
    rows=2, cols=4,
    subplot_titles=titles + titles,
    vertical_spacing=0.1)

# Add Row 1: Global
for i, col in enumerate(columns):
    fig.add_trace(
        go.Box(
            y=df[col],
            name=col,
            marker_color=in_color[i],
            boxpoints='all'
        ),
        row=1, col=i+1)

# Add Row 2: Indonesia
for i, col in enumerate(columns):
    fig.add_trace(
        go.Box(
            y=chile[col],
            name=col,
            marker_color=in_color[i],
            boxpoints='all'
        ),
        row=2, col=i+1)

fig.update_layout(
    title_text="Earthquake Impact Analysis: Global (Top) vs Chile (Bottom)",
    showlegend=False,
    height=800,
    width=1000)

# Title Font Size
fig.update_annotations(font_size=12)

fig.show()

### Parallel: Chile

In [None]:
cols = ['magnitude', 'mmi', 'cdi', 'sig', 'alt_lev', 'depth', 'gap', 'tsunami']

fig = px.parallel_coordinates(
    chile,
    dimensions=cols,
    color='magnitude',
    color_continuous_scale=px.colors.diverging.Tealrose, # color
    labels=cols)

fig.show()

##Country: Vanuatu

In [None]:
van = df[df['country']=='Vanuatu']

In [None]:
van_tsu = tsunami_df[tsunami_df['country']=='Vanuatu']

In [None]:
van_crr = (van
                 .drop(['mag_norm', 'year'], axis=1)
                 .select_dtypes(include='number')
                 .corr())

In [None]:
van_tsu_crr = (van_tsu
                 .drop(['mag_norm', 'year'], axis=1)
                 .select_dtypes(include='number')
                 .corr())

### Crr

In [None]:
# Create subplot layout
fig = make_subplots(
    rows=2, cols=1,
    subplot_titles=[
        "Correlation Matrix of Earthquake in Vanuatu",
        "Correlation Matrix of Earthquake in Vanuatu (Tsunami)"],
    vertical_spacing=0.12) # gap between plots

# First heatmap
fig1 = px.imshow(
    van_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Second heatmap
fig2 = px.imshow(
    van_tsu_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Add traces to subplots
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=2, col=1)

# Share the same color scale
fig.update_layout(
    coloraxis=dict(colorscale='RdBu_r'),
    margin=dict(l=20,r=20,t=35,b=25))

# Apply shared color axis
fig.data[0].update(coloraxis='coloraxis')
fig.data[1].update(coloraxis='coloraxis')

fig.show()

The heatmap shows that the **alert level** and **tsunami** data for **Vanuatu** are highly unreliable. All recorded earthquakes are labeled as *green alert*, and from the time tsunami data became available, *every earthquake is marked as causing a tsunami*. This pattern suggests that Vanuatu may have reported only earthquakes that caused tsunamis, or that the data are completely mislabeled. Therefore, in our future analysis, *we will exclude Vanuatu's alert level and tsunami data*.

### Alert level Distribution

In [None]:
van_alert = van['alert'].value_counts().reset_index()

In [None]:
fig = px.pie(van_alert,
             names='alert',
             values='count',
              color='alert',
             color_discrete_map=alert_colors,
             hole=0.55)

fig.add_annotation(
    text="<b>Alert Level<br>of Earthquakes<br>in Vanuatu",
    x=0.5,
    y=0.5,
    font=dict(size=16, color='teal'),
    showarrow=False)

fig.update_layout(
    title=None,
    showlegend=True,
    height=500,
    width=600)

fig.update_traces(
    textposition='inside',
    textinfo='value+percent')

fig.show()

### Impact Analysis: Vanuatu vs Global

In [None]:
# Define the columns and titles
columns = ["sig", "cdi", "mmi", "magnitude"]
titles = ("Significence", "Max reported intensity", "Max instrumental intensity", "Magnitude")
in_color = ['#636EFA', '#EF553B', '#00CC96', '#AB63FA']

# Create a subplot figure with 2 rows and 4 columns
fig = make_subplots(
    rows=2, cols=4,
    subplot_titles=titles + titles,
    vertical_spacing=0.1)

# Add Row 1: Global
for i, col in enumerate(columns):
    fig.add_trace(
        go.Box(
            y=df[col],
            name=col,
            marker_color=in_color[i],
            boxpoints='all'
        ),
        row=1, col=i+1)

# Add Row 2: Indonesia
for i, col in enumerate(columns):
    fig.add_trace(
        go.Box(
            y=van[col],
            name=col,
            marker_color=in_color[i],
            boxpoints='all'
        ),
        row=2, col=i+1)

fig.update_layout(
    title_text="Earthquake Impact Analysis: Global (Top) vs Vanuatu (Bottom)",
    showlegend=False,
    height=800,
    width=1000)

# Title Font Size
fig.update_annotations(font_size=12)

fig.show()

### Parallel: Vanuatu

In [None]:
cols = ['magnitude', 'mmi', 'cdi', 'sig', 'depth', 'gap', 'tsunami', 'alt_lev']

fig = px.parallel_coordinates(
    van,
    dimensions=cols,
    color='magnitude',
    color_continuous_scale=px.colors.diverging.Tealrose, # color
    labels=cols)

fig.show()

##Country: Solomon Islands

In [None]:
solo = df[df['country']=='Solomon Islands']

In [None]:
solo_tsu = tsunami_df[tsunami_df['country']=='Solomon Islands']

In [None]:
solo_crr = (solo
                 .drop(['mag_norm', 'year'], axis=1)
                 .select_dtypes(include='number')
                 .corr())

In [None]:
solo_tsu_crr = (solo_tsu
                 .drop(['mag_norm', 'year'], axis=1)
                 .select_dtypes(include='number')
                 .corr())

### Crr

In [None]:
# Create subplot layout
fig = make_subplots(
    rows=2, cols=1,
    subplot_titles=[
        "Correlation Matrix of Earthquake in Solomon Islands",
        "Correlation Matrix of Earthquake in Solomon Islands (Tsunami)"],
    vertical_spacing=0.12) # gap between plots

# First heatmap
fig1 = px.imshow(
    solo_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Second heatmap
fig2 = px.imshow(
    solo_tsu_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Add traces to subplots
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=2, col=1)

# Share the same color scale
fig.update_layout(
    coloraxis=dict(colorscale='RdBu_r'),
    margin=dict(l=20,r=20,t=35,b=25))

# Apply shared color axis
fig.data[0].update(coloraxis='coloraxis')
fig.data[1].update(coloraxis='coloraxis')

fig.show()

The heatmap shows that the **alert level** data for the **Solomon Islands** are **suspicious**. Almost all recorded earthquakes are labeled as **green alert**, with only one reported as yellow alert. This unusual pattern raises concerns about the ***reliability of the data.*** Therefore, in our future analysis, we will **closely monitor** the Solomon Islands' alert level data to avoid biased predictions.

### Alert level Distribution

In [None]:
solo_alert = solo['alert'].value_counts().reset_index()

In [None]:
fig = px.pie(solo_alert,
             names='alert',
             values='count',
              color='alert',
             color_discrete_map=alert_colors,
             hole=0.55)

fig.add_annotation(
    text="<b>Alert Level<br>of Earthquakes<br>in Solomon Islands",
    x=0.5,
    y=0.5,
    font=dict(size=16, color='teal'),
    showarrow=False)

fig.update_layout(
    title=None,
    showlegend=True,
    height=500,
    width=600)

fig.update_traces(
    textposition='inside',
    textinfo='value+percent')

fig.show()

### Impact Analysis: Solomon Islands vs Global

In [None]:
# Define the columns and titles
columns = ["sig", "cdi", "mmi", "magnitude"]
titles = ("Significence", "Max reported intensity", "Max instrumental intensity", "Magnitude")
in_color = ['#636EFA', '#EF553B', '#00CC96', '#AB63FA']

# Create a subplot figure with 2 rows and 4 columns
fig = make_subplots(
    rows=2, cols=4,
    subplot_titles=titles + titles,
    vertical_spacing=0.1)

# Add Row 1: Global
for i, col in enumerate(columns):
    fig.add_trace(
        go.Box(
            y=df[col],
            name=col,
            marker_color=in_color[i],
            boxpoints='all'),
        row=1, col=i+1)

# Add Row 2: Indonesia
for i, col in enumerate(columns):
    fig.add_trace(
        go.Box(
            y=solo[col],
            name=col,
            marker_color=in_color[i],
            boxpoints='all'),
        row=2, col=i+1)

fig.update_layout(
    title_text="Earthquake Impact Analysis: Global (Top) vs Solomon Islands (Bottom)",
    showlegend=False,
    height=800,
    width=1000)

# Title Font Size
fig.update_annotations(font_size=12)

fig.show()

### Parallel: Solomon Islands

In [None]:
cols = ['magnitude', 'mmi', 'cdi', 'sig', 'alt_lev', 'depth', 'gap', 'tsunami']

fig = px.parallel_coordinates(
    solo,
    dimensions=cols,
    color='magnitude',
    color_continuous_scale=px.colors.diverging.Tealrose, # color
    labels=cols)

fig.show()

##Country: USA

In [None]:
usa = df[df['country']=='USA']

In [None]:
usa_tsu = tsunami_df[tsunami_df['country']=='USA']

In [None]:
usa_crr = (usa
                 .drop(['mag_norm', 'year'], axis=1)
                 .select_dtypes(include='number')
                 .corr())

In [None]:
usa_tsu_crr = (usa_tsu
                 .drop(['mag_norm', 'year'], axis=1)
                 .select_dtypes(include='number')
                 .corr())

### Crr

In [None]:
# Create subplot layout
fig = make_subplots(
    rows=2, cols=1,
    subplot_titles=[
        "Correlation Matrix of Earthquake in USA",
        "Correlation Matrix of Earthquake in USA (Tsunami)"],
    vertical_spacing=0.12) # gap between plots

# First heatmap
fig1 = px.imshow(
    usa_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Second heatmap
fig2 = px.imshow(
    usa_tsu_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Add traces to subplots
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=2, col=1)

# Share the same color scale
fig.update_layout(
    coloraxis=dict(colorscale='RdBu_r'),
    margin=dict(l=20,r=20,t=35,b=25))

# Apply shared color axis
fig.data[0].update(coloraxis='coloraxis')
fig.data[1].update(coloraxis='coloraxis')

fig.show()

### Alert level Distribution

In [None]:
usa_alert = usa['alert'].value_counts().reset_index()

In [None]:
fig = px.pie(usa_alert,
             names='alert',
             values='count',
              color='alert',
             color_discrete_map=alert_colors,
             hole=0.55)

fig.add_annotation(
    text="<b>Alert Level<br>of Earthquakes<br>in USA",
    x=0.5,
    y=0.5,
    font=dict(size=16, color='teal'),
    showarrow=False)

fig.update_layout(
    title=None,
    showlegend=True,
    height=500,
    width=600)

fig.update_traces(
    textposition='inside',
    textinfo='value+percent')

fig.show()

### Impact Analysis: USA vs Global

In [None]:
# Define the columns and titles
columns = ["sig", "cdi", "mmi", "magnitude"]
titles = ("Significence", "Max reported intensity", "Max instrumental intensity", "Magnitude")
in_color = ['#636EFA', '#EF553B', '#00CC96', '#AB63FA']

# Create a subplot figure with 2 rows and 4 columns
fig = make_subplots(
    rows=2, cols=4,
    subplot_titles=titles + titles,
    vertical_spacing=0.1)

# Add Row 1: Global
for i, col in enumerate(columns):
    fig.add_trace(
        go.Box(
            y=df[col],
            name=col,
            marker_color=in_color[i],
            boxpoints='all'
        ),
        row=1, col=i+1)

# Add Row 2: Indonesia
for i, col in enumerate(columns):
    fig.add_trace(
        go.Box(
            y=usa[col],
            name=col,
            marker_color=in_color[i],
            boxpoints='all'
        ),
        row=2, col=i+1)

fig.update_layout(
    title_text="Earthquake Impact Analysis: Global (Top) vs USA (Bottom)",
    showlegend=False,
    height=800,
    width=1000)

# Title Font Size
fig.update_annotations(font_size=12)

fig.show()

###Parallel: USA

In [None]:
cols = ['magnitude', 'mmi', 'cdi', 'sig', 'alt_lev', 'depth', 'gap', 'tsunami']

fig = px.parallel_coordinates(
    usa,
    dimensions=cols,
    color='magnitude',
    color_continuous_scale=px.colors.diverging.Tealrose, # color
    labels=cols)

fig.show()

## Sesmic station data analysis

Immediately after an earthquake is recorded, only a limited set of information is available, such as magnitude, depth, distance, gap, etc.. In this section, we analyze these early-stage parameters to identify patterns and key insights that can support impact estimation, tsunami warning decisions, and emergency crisis management. Understanding how these variables relate to earthquake severity and potential hazards is critical for making timely and informed responses when complete data are not yet available.

In [None]:
imm_df = df.drop(['cdi', 'mmi', 'sig', 'alert', 'magType' , 'tsunami', 'alt_lev','continent'], axis=1)
imm_df.head(5)

Unnamed: 0,title,magnitude,date_time,net,nst,dmin,gap,depth,latitude,longitude,location,country,year,mag_norm
0,"M 7.0 - 18 km SW of Malango, Solomon Islands",7.0,2022-11-22 02:03:00,us,117,0.509,17.0,14.0,-9.7963,159.596,"Malango, Solomon Islands",Solomon Islands,2022,4572.0
1,"M 6.9 - 204 km SW of Bengkulu, Indonesia",6.9,2022-11-18 13:37:00,us,99,2.229,34.0,25.0,-4.9559,100.738,"Bengkulu, Indonesia",Indonesia,2022,4054.0
2,M 7.0 -,7.0,2022-11-12 07:09:00,us,147,3.125,18.0,579.0,-20.0508,-178.346,,Fiji,2022,4572.0
3,"M 7.3 - 205 km ESE of Neiafu, Tonga",7.3,2022-11-11 10:48:00,us,149,1.865,21.0,37.0,-19.2918,-172.129,"Neiafu, Tonga",Tonga,2022,6562.0
4,M 6.6 -,6.6,2022-11-09 10:14:00,us,131,4.998,27.0,624.464,-25.5948,178.278,,,2022,2825.0


In [None]:
imm_df['net'].value_counts()

Unnamed: 0_level_0,count
net,Unnamed: 1_level_1
us,747
ak,11
official,8
nc,3
duputel,3
pt,2
at,2
ci,2
hv,2
nn,1


**'net'**: The ID of a data contributor. Identifies the network considered to be the preferred source of information for this event.

Most of the 'net' is **US(747)**. So, we dropping this column too.

The 'date_time' column is also unnecessary for our analysis, because time and dates has **0** impacts on earthquake.

In [None]:
imm_df = imm_df.drop(['net', 'date_time'], axis=1)
imm_df.head(5)

Unnamed: 0,title,magnitude,nst,dmin,gap,depth,latitude,longitude,location,country,year,mag_norm
0,"M 7.0 - 18 km SW of Malango, Solomon Islands",7.0,117,0.509,17.0,14.0,-9.7963,159.596,"Malango, Solomon Islands",Solomon Islands,2022,4572.0
1,"M 6.9 - 204 km SW of Bengkulu, Indonesia",6.9,99,2.229,34.0,25.0,-4.9559,100.738,"Bengkulu, Indonesia",Indonesia,2022,4054.0
2,M 7.0 -,7.0,147,3.125,18.0,579.0,-20.0508,-178.346,,Fiji,2022,4572.0
3,"M 7.3 - 205 km ESE of Neiafu, Tonga",7.3,149,1.865,21.0,37.0,-19.2918,-172.129,"Neiafu, Tonga",Tonga,2022,6562.0
4,M 6.6 -,6.6,131,4.998,27.0,624.464,-25.5948,178.278,,,2022,2825.0


In [None]:
imm_df.describe()

Unnamed: 0,magnitude,nst,dmin,gap,depth,latitude,longitude,year,mag_norm
count,782.0,782.0,782.0,782.0,782.0,782.0,782.0,782.0,782.0
mean,6.941125,230.250639,1.325757,25.03899,75.883199,3.5381,52.609199,2012.280051,5144.595908
std,0.445514,250.188177,2.218805,24.225067,137.277078,27.303429,117.898886,6.099439,4695.111022
min,6.5,0.0,0.0,0.0,2.7,-61.8484,-179.968,2001.0,2504.0
25%,6.6,0.0,0.0,14.625,14.0,-14.5956,-71.66805,2007.0,2825.0
50%,6.8,140.0,0.0,20.0,26.295,-2.5725,109.426,2013.0,3594.0
75%,7.1,445.0,1.863,30.0,49.75,24.6545,148.941,2017.0,5157.0
max,9.1,934.0,17.654,239.0,670.81,71.6312,179.662,2022.0,57306.0


In [None]:
imm_crr = imm_df.drop('mag_norm', axis=1).select_dtypes(include='number').corr()

### Crr: imm_df

In [None]:
fig = px.imshow(
    imm_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto',
    labels=dict(color='Correlation'))

fig.update_layout(
    title=dict(
        text="Correlation Matrix", x=0.5))

fig.show()

### Parallel: imm_df

In [None]:
cols = ['magnitude', 'nst', 'dmin', 'depth', 'gap']

fig = px.parallel_coordinates(
    imm_df,
    dimensions=cols,
    color='magnitude',
    color_continuous_scale=px.colors.diverging.Tealrose, # color
    labels=cols)

fig.show()

### For Indonesia

In [None]:
amm_crr = (imm_df[imm_df['country']=='Indonesia']
           .drop('mag_norm', axis=1)
           .select_dtypes(include='number')
           .corr())

In [None]:
# Create subplot layout
fig = make_subplots(
    rows=2, cols=1,
    subplot_titles=[
        "Correlation Matrix of Earthquake in Indonesia",
        "Correlation Matrix of Earthquake in Global"],
    vertical_spacing=0.12) # gap between plots

# First heatmap
fig1 = px.imshow(
    amm_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Second heatmap
fig2 = px.imshow(
    imm_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Add traces to subplots
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=2, col=1)

# Color
fig.update_layout(
    coloraxis=dict(colorscale='RdBu_r'),
    margin=dict(l=20,r=20,t=35,b=25))

# Apply shared color axis
fig.data[0].update(coloraxis='coloraxis')
fig.data[1].update(coloraxis='coloraxis')

fig.show()

###For Japan

In [None]:
bmm_crr = (imm_df[imm_df['country']=='Japan']
           .drop('mag_norm', axis=1)
           .select_dtypes(include='number')
           .corr())

In [None]:
# Create subplot layout
fig = make_subplots(
    rows=2, cols=1,
    subplot_titles=[
        "Correlation Matrix of Earthquake in Japan",
        "Correlation Matrix of Earthquake in Global"],
    vertical_spacing=0.12) # gap between plots

# First heatmap
fig1 = px.imshow(
    bmm_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Second heatmap
fig2 = px.imshow(
    imm_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Add traces to subplots
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=2, col=1)

# Color
fig.update_layout(
    coloraxis=dict(colorscale='RdBu_r'),
    margin=dict(l=20,r=20,t=35,b=25))

# Apply shared color axis
fig.data[0].update(coloraxis='coloraxis')
fig.data[1].update(coloraxis='coloraxis')

fig.show()

### For Papua New Guinea

In [None]:
cmm_crr = (imm_df[imm_df['country']=='Papua New Guinea']
           .drop('mag_norm', axis=1)
           .select_dtypes(include='number')
           .corr())

In [None]:
# Create subplot layout
fig = make_subplots(
    rows=2, cols=1,
    subplot_titles=[
        "Correlation Matrix of Earthquake in Papua New Guinea",
        "Correlation Matrix of Earthquake in Global"],
    vertical_spacing=0.12) # gap between plots

# First heatmap
fig1 = px.imshow(
    cmm_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Second heatmap
fig2 = px.imshow(
    imm_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Add traces to subplots
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=2, col=1)

# Color
fig.update_layout(
    coloraxis=dict(colorscale='RdBu_r'),
    margin=dict(l=20,r=20,t=35,b=25))

# Apply shared color axis
fig.data[0].update(coloraxis='coloraxis')
fig.data[1].update(coloraxis='coloraxis')

fig.show()

### For USA

In [None]:
dmm_crr = (imm_df[imm_df['country']=='USA']
           .drop('mag_norm', axis=1)
           .select_dtypes(include='number')
           .corr())

In [None]:
# Create subplot layout
fig = make_subplots(
    rows=2, cols=1,
    subplot_titles=[
        "Correlation Matrix of Earthquake in USA",
        "Correlation Matrix of Earthquake in Global"],
    vertical_spacing=0.12) # gap between plots

# First heatmap
fig1 = px.imshow(
    dmm_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Second heatmap
fig2 = px.imshow(
    imm_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Add traces to subplots
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=2, col=1)

# Color
fig.update_layout(
    coloraxis=dict(colorscale='RdBu_r'),
    margin=dict(l=20,r=20,t=35,b=25))

# Apply shared color axis
fig.data[0].update(coloraxis='coloraxis')
fig.data[1].update(coloraxis='coloraxis')

fig.show()

### For Chile

In [None]:
emm_crr = (imm_df[imm_df['country']=='Chile']
           .drop('mag_norm', axis=1)
           .select_dtypes(include='number')
           .corr())

In [None]:
# Create subplot layout
fig = make_subplots(
    rows=2, cols=1,
    subplot_titles=[
        "Correlation Matrix of Earthquake in Chile",
        "Correlation Matrix of Earthquake in Global"],
    vertical_spacing=0.12) # gap between plots

# First heatmap
fig1 = px.imshow(
    emm_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Second heatmap
fig2 = px.imshow(
    imm_crr,
    color_continuous_scale='RdBu_r',
    text_auto='.2f',
    aspect='auto')

# Add traces to subplots
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=2, col=1)

# Color
fig.update_layout(
    coloraxis=dict(colorscale='RdBu_r'),
    margin=dict(l=20,r=20,t=35,b=25))

# Apply shared color axis
fig.data[0].update(coloraxis='coloraxis')
fig.data[1].update(coloraxis='coloraxis')

fig.show()

## Conclution

Among all the variables analyzed, only **'dmin'**: the horizontal distance from the epicenter to the nearest seismic station; and **'nst'**: the total number of seismic stations used to determine the earthquake location; show a moderate to strong negative relationship. However, this relationship does not provide meaningful insight for predicting earthquake severity, potential hazards, or tsunami.

Overall, while this dataset is *incomplete* and, in some cases, *unreliable*, it provides clear evidence that earthquakes are among the most uncertain natural disasters in terms of predicting their severity and the risk of tsunami.

Our analysis indicates that **earthquake location** and **magnitude** are the most importent parameters for developing an effective tsunami warning system, predicting the severity, and for taking appropriate **precautions** against future earthquake impacts.