*Creating A Feedforward Neural Network In TensorFlow To Predict Earthquake Magnitudes*


Import libraries

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

The initial dataset was extremely messy, including misaligned columns. Thus, missing data was't filled but rather aligned. Then the new csv "clean_earthquake" was read in using pandas. In doing so, the necessary null values were filled. There was no need for encoding any of the columns since they would not be relevant features for the model.

In [13]:
clean_eq = pd.read_csv('/Users/gerryjr/Desktop/EarthquakeProject/Data/clean_earthquake.csv')

clean_eq['gap'] = clean_eq['gap'].fillna(clean_eq['gap'].median())
clean_eq['dmin'] = clean_eq['dmin'].fillna(clean_eq['dmin'].mean())
clean_eq['rms'] = clean_eq['rms'].fillna(clean_eq['rms'].mean())
clean_eq['depthError'] = clean_eq['depthError'].fillna(clean_eq['depthError'].median())





This is a pre-model visualization to give an idea of how the magnitudes are distributed across their frequency in occurence. It is always good to have a visual before creating the model.

In [None]:
plt.figure(figsize=(10, 6))
sns.histplot(clean_eq['mag'], kde=True, bins=30, color='blue')
plt.title('Distribution of Earthquake Magnitudes')
plt.xlabel('Magnitude')
plt.ylabel('Frequency')
plt.show()

Below is a geo-spatial visual, created in Tableau, aligning the latitude and longitude coordinates. This provides an interactive visual for the geographical location and color gradient based on magnitude. Each data point is labeled with the corresponding latitude, longitude, time, depth and magnitude.

[View Interactive Tableau Visualization](https://public.tableau.com/views/GeoSpacial_17382574271160/Geo-SPacialVisualization?:language=en-US&publish=yes&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link)


Now it is time for selecting features and engineering any necessary ones. As mentioned before, portions of data had such severe amounts of null space that they were eliminated completely. Thus, 3 additional features are engineered to aid the model in drawing more information for the magnitude prediction. In particular, the 'depth_error_ratio' is a relationship feature that can provide meaningful information. The final engineered feature is the Gutenberg-Richter relation. After doing some research online, it became clear that this relation in geology, takes the magnitude of earthquakes and the number of earthquakes of that magnitude or greater. After trial and error with feature engineering, this one in particular made a significant difference in the model accuracy.

In [None]:
#Feature Engineering
clean_eq['depth_squared'] = clean_eq['depth'] ** 2
clean_eq['depth_error_ratio'] = clean_eq['depthError'] / clean_eq['depth']
clean_eq['depth_error_ratio'].replace([np.inf, -np.inf], np.nan)
clean_eq['depth_error_ratio'].fillna(clean_eq['depth_error_ratio'].median())

#Gutenberg-Richter Relationship
clean_eq['energy_release'] = 10 ** (1.5 *clean_eq['mag'])
clean_eq['energy_release'] = np.clip(clean_eq['energy_release'], None, 1e10)  # Adjust threshold as needed

Select the features for the model to learn from. Standard features such as, depth, rms and gap or entered along with the engineered features mentioned in the previous markdown. Obviously, the target for this scenario is magnitude('mag'). 

In [16]:
# Features and Targets
X = clean_eq[['latitude',
            'longitude', 
            'depth', 
            'gap', 
            'dmin', 
            'rms', 
            'depthError', 
            'depth_squared', 
            'depth_error_ratio', 
            'energy_release'
            ]]
y = clean_eq['mag']  # Target variable (magnitude)



Before splitting the training and testing, it is important to scale the features.

The train/test split is initiated with a testing size of 20%.

In [17]:
# Scale the features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Next, it is time to assemble the Feedforward Neural Network. Keep in mind that Sequential, Dense and Dropout were imported from tensorflow.keras.layer in the first cell and there are 4 layers to this model. When this was initially created, there was overfitting ("the model accuracy being 100%") which is not a good thing. That means the model is relying too heavily on particular neurons or simply memorizing portions of the trained data. The goal is to have the model recognize patterns so it can make accurate predictions with future data, not memorize the current data. Therefore, a Dropout function is placed between each layer where it drops 20% of the neurons in each training step. This forces the model to learn, not memorize. The final layer has one neuron since it is the goal to predict one target.

In [None]:
# Build Feedforward Neural Network (FNN)
fnn = Sequential([
    Dense(128, input_dim=X_train.shape[1], activation='relu'),
    Dropout(0.2),
    Dense(64,activation='relu'),
    Dropout(0.2),
    Dense(32, activation='relu'),
    Dense(1)  # Single output neuron for magnitude
])

Now that the model is built, it is time to compile, train, evaluate and predict. The compiler tells the neural network how to learn the data. In this case the "adam" optimizer argument is common and dynamically updates the weights through training. The loss argument ('mse') measures how far off the predictions are from the real values. The metrics argument ('mae') calculates the absolute distance between predictions and true values. 

After compiling, the model gets trained with 50 epochs (iterations).

Finally, it is time to predict the tested X values. In doing so, we print the first 10 predicted values along with the first 10 actual values.

In [None]:
# Compile the FNN
fnn.compile(optimizer='adam', loss='mse', metrics=['mae'])

# Train the FNN
history = fnn.fit(X_train, y_train, epochs=50, batch_size=10, validation_split=0.2)

# Evaluate the model
loss, mae = fnn.evaluate(X_test, y_test)
print(f"Loss: {loss}, Mean Absolute Error: {mae}")

# Make predictions for the entire test set
pred = fnn.predict(X_test)

# Print the first 10 predictions and true values for verification
print("First 10 Predictions:", pred[:10].flatten())
print("First 10 True Values:", y_test[:10].values)


^^You will notice, as the above cell ran, the "loss" dropped each iteration which is good. This indicates the model became more accurate each iteration. The output also indicates a fairly good prediction.

Now let's display the results of the MSE and MAE:

In [None]:
# Calculate metrics for the entire test set
mae = mean_absolute_error(y_test, pred)
mse = mean_squared_error(y_test, pred)
print("Mean Absolute Error (MAE):", mae)
print("Mean Squared Error (MSE):", mse)




Now prepare the data for a visualization in Tableau, specifically the relationship between depth and error.

In [21]:
# Prepare the data for Tableau visualization
X_test = pd.DataFrame(X_test, columns=[['latitude',
            'longitude', 
            'depth', 
            'gap', 
            'dmin', 
            'rms', 
            'depthError', 
            'depth_squared', 
            'depth_error_ratio', 
            'energy_release'
            ]])
y_test = y_test.reset_index(drop=True)
pred_series = pd.Series(pred.flatten(), name="Predicted_Magnitude", index=y_test.index)

# Add True Magnitude and Predicted Magnitude to the DataFrame
X_test['True_Magnitude'] = y_test
X_test['Predicted_Magnitude'] = pred_series

After preparing the data, it is brought into Tableau and visualized in a density plot. This plot indicates that error is lower for shallower depths and more innacurate as they become deeper. This is somewhat expected. This goes back to my previous point on having more reliable data for more accuracy.

In [None]:

# Create a DataFrame with true and predicted values
results_df = pd.DataFrame({
    'True Magnitude': y_test.values.flatten(),
    'Predicted Magnitude': pred.flatten()})
# Save to CSV
results_df.to_csv('true_vs_predicted.csv', index=False)

print("CSV file 'true_vs_predicted.csv' created successfully!")


Having saved the csv, the below visualization is a scatter plot of the true and predicted magnitudes

[View Tableau Visualization](https://public.tableau.com/views/TrueVSPredictedMagnitudes/TrueVSPredictedMagnitudes)


The scatter plot above compares the true magnitudes of recorded earthquakes with the magnitudes predicted by the trained neural network. The diagonal polynomial reference line indicates perfect predictions—points. Points that deviate from this line show scenarios where the model miscalculated the true magnitude. Overall, the model performs well for most cases, particularly for moderate magnitudes. There are higher errors for stronger earthquakes, which suggests potential areas for improvement, such as additional feature engineering or refining the model’s architecture.
