# STAT495 - Earthquake Analysis Project

## Seismic Activity Analysis in Turkey (1990-2025)

---

### Project Overview

This project analyzes earthquake data from Turkey's AFAD (Disaster and Emergency Management Authority) database spanning from 1990 to 2025. The analysis covers multiple aspects of seismic activity including spatial distribution, temporal patterns, and correlations with various geological and environmental factors.

### Classification Criteria

- **Earthquake**: Magnitude >= 4.0
- **Tremor**: Magnitude < 4.0

## Notebook Structure

### Foundation
| Notebook | Description |
|----------|-------------|
| **00_project_overview** | Project introduction, data dictionary, and methodology |
| **01_data_loading_preprocessing** | Data loading, cleaning, preprocessing, and export |

### Exploratory Analysis
| Notebook | Description |
|----------|-------------|
| **02_exploratory_data_analysis** | Descriptive statistics and basic visualizations |
| **03_kde_density_analysis** | Kernel Density Estimation for earthquake hotspots |
| **04_spatial_clustering_analysis** | K-Means clustering of seismic activity |

### Seismological Analysis
| Notebook | Description |
|----------|-------------|
| **05_gutenberg_richter_analysis** | Frequency-magnitude distribution and b-value |
| **06_seismic_gap_analysis** | Identification of seismic gaps along fault lines |
| **07_time_series_analysis** | Temporal patterns, trends, and inter-event times |

### Environmental Factors
| Notebook | Description |
|----------|-------------|
| **08_soil_classification_analysis** | Earthquake patterns by soil type |
| **09_fault_proximity_analysis** | Relationship between earthquakes and active faults |
| **10_moon_phase_analysis** | Lunar cycle correlation analysis |
| **11_atmospheric_pressure_analysis** | Atmospheric pressure correlations |
| **12_eclipse_analysis** | Solar and lunar eclipse correlation analysis |

### Risk & Statistical Modeling
| Notebook | Description |
|----------|-------------|
| **13_population_density_risk** | Population exposure and risk assessment |
| **14_statistical_modeling_rq1_6** | Statistical models for Research Questions 1-6 |
| **15_advanced_statistical_modeling** | Advanced models for Research Questions A-I |

### Synthesis
| Notebook | Description |
|----------|-------------|
| **16_summary_conclusions** | Key findings and conclusions |

## Data Sources

### Primary Dataset
- **Source**: AFAD (Afet ve Acil Durum Yönetimi Başkanlığı)
- **File**: `afad_full_historical_1990_2025.csv`
- **Period**: 1990-2025
- **Records**: ~537,000+ seismic events

### Supporting Datasets

| Dataset | Description | Source |
|---------|-------------|--------|
| Soil Classification | Soil types and seismic hazard levels by province/district | TBDY 2018 |
| Active Faults | Major fault lines with coordinates and slip rates | MTA |
| GPS Velocities | Crustal movement data | Academic sources |
| Moon Phases | Daily lunar data and phase information | Astronomical calculations |
| Atmospheric Pressure | Regional pressure measurements | Meteorological data |
| **Solar Eclipses** | Solar eclipse events (1990-2025) | NASA Eclipse Data |
| **Lunar Eclipses** | Lunar eclipse events (1990-2025) | NASA Eclipse Data |
| Population Density | Province-level population density | TÜİK |

## Data Dictionary

### Earthquake Data (`afad_full_historical_1990_2025.csv`)

| Column | Type | Description |
|--------|------|-------------|
| `eventID` | int | Unique earthquake identifier |
| `date` | datetime | Event timestamp (UTC) |
| `latitude` | float | Latitude (degrees) |
| `longitude` | float | Longitude (degrees) |
| `depth` | float | Focal depth (km) |
| `magnitude` | float | Earthquake magnitude |
| `province` | str | Province name (TR) |
| `district` | str | District name (TR) |

### Derived Variables

| Variable | Description |
|----------|-------------|
| `category` | 'Earthquake' (M >= 4.0) or 'Tremor' (M < 4.0) |
| `year` | Extraction from date |
| `month` | Extraction from date |
| `hour` | Extraction from date |
| `day_of_week` | Day of week (0=Monday, 6=Sunday) |
| `energy_joules` | Seismic energy release (E = 10^(1.5*M + 4.8)) |
| `fault_distance_km` | Distance to nearest active fault |

## Methodology Overview

### 1. Spatial Analysis
- **Kernel Density Estimation (KDE)**: Identifying earthquake hotspots using Gaussian kernel smoothing
- **Spatial Clustering**: K-Means algorithm for seismic zone identification

### 2. Seismological Analysis
- **Gutenberg-Richter Law**: log₁₀(N) = a - bM
  - b-value estimation using Maximum Likelihood Estimation (MLE)
  - Magnitude of Completeness (Mc) using Maximum Curvature method
- **Seismic Gap Analysis**: Identification of fault segments with low seismic activity

### 3. Temporal Analysis
- **Trend Analysis**: Mann-Kendall test for monotonic trends
- **Seasonality**: Seasonal decomposition of earthquake frequency
- **Inter-Event Time**: Distribution fitting (Exponential, Weibull, Gamma)
- **Cumulative Analysis**: Energy release and seismic moment accumulation

### 4. Correlation Analysis
- **Soil Effects**: ANOVA and post-hoc tests for magnitude/depth differences by soil class
- **Fault Proximity**: Relationship between distance to faults and earthquake characteristics
- **Lunar Phases**: Chi-square test for earthquake-moon phase independence
- **Atmospheric Pressure**: Pearson correlation analysis

## Key Formulas

### Gutenberg-Richter Law
$$\log_{10}(N) = a - bM$$

Where:
- N = Cumulative number of earthquakes ≥ magnitude M
- a = Activity level constant
- b = b-value (typically ~1.0)

### b-value (MLE)
$$b = \frac{\log_{10}(e)}{\bar{M} - M_c}$$

### Seismic Energy
$$E = 10^{1.5M + 4.8} \text{ (Joules)}$$

### Seismic Moment
$$M_0 = 10^{1.5M + 16.1} \text{ (dyne-cm)}$$

### Haversine Distance
$$d = 2r \arcsin\left(\sqrt{\sin^2\left(\frac{\Delta\phi}{2}\right) + \cos(\phi_1)\cos(\phi_2)\sin^2\left(\frac{\Delta\lambda}{2}\right)}\right)$$

## Statistical Tests Used

| Test | Application | Hypothesis |
|------|-------------|------------|
| **Kolmogorov-Smirnov** | Distribution fitting | Data follows specified distribution |
| **Chi-Square** | Independence testing | Variables are independent |
| **Mann-Kendall** | Trend detection | No monotonic trend exists |
| **ANOVA** | Group comparison | All group means are equal |
| **Kruskal-Wallis** | Non-parametric group comparison | All group medians are equal |
| **Tukey HSD** | Post-hoc pairwise comparison | Following significant ANOVA |
| **Pearson Correlation** | Linear relationship | No linear correlation |
| **Rayleigh Test** | Circular uniformity | Uniform distribution around circle |

## Project Structure

```
stat495project/
├── data/
│   ├── raw/                    # Original datasets
│   │   ├── afad_full_historical_1990_2025.csv
│   │   ├── soil/               # Soil classification data
│   │   ├── tectonic/           # Fault, GPS, and pressure data
│   │   └── lunar/              # Moon phases, solar & lunar eclipses
│   └── processed/              # Cleaned and merged datasets
├── notebooks/                  # Analysis notebooks (00-16)
├── src/                        # Utility modules
│   ├── __init__.py
│   ├── config.py              # Paths, constants, colors
│   ├── geo_utils.py           # Geographic calculations
│   ├── seismology.py          # Seismological functions
│   └── visualization.py       # Plotting utilities
└── reports/
    ├── figures/               # Generated figures (by notebook)
    └── tables/                # Generated tables (CSV format)
```

## Quick Start

To run the analysis:

1. Start with `01_data_loading_preprocessing.ipynb` to load and clean the data
2. Proceed through notebooks in numerical order
3. Each notebook is self-contained but builds on processed data from notebook 01

### Requirements

```python
pandas >= 1.5.0
numpy >= 1.24.0
matplotlib >= 3.7.0
scipy >= 1.10.0
scikit-learn >= 1.2.0
statsmodels >= 0.14.0
seaborn >= 0.12.0
```

In [1]:
# Verify installation
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from sklearn.cluster import KMeans
import statsmodels.api as sm
import seaborn as sns

print("All required packages are installed!")
print(f"pandas: {pd.__version__}")
print(f"numpy: {np.__version__}")

All required packages are installed!
pandas: 2.3.3
numpy: 2.0.2


---

**Author**: STAT495 Project  
**Date**: 2025  
**Data Period**: 1990-2025