# ▸ K-Means Clustering for Turbulence Risk Grouping

This notebook applies K-Means clustering to the PCA-reduced weather data to discover natural groupings of flight conditions. It enables the identification of distinct turbulence risk zones in an unsupervised way, independent of labeled training data.

---

## 1. Optimal k using Elbow Method

To determine the best number of clusters (k), the Elbow Method was used to plot inertia values across a range of k values from 2 to 12.

<!-- elbow_plot.png -->
![elbow_plot](images/elbow_plot.png)

As observed, the optimal "elbow point" occurs around k = 3, which balances model simplicity and cluster separation.

---

# 2. Applying K-Means and Labeling Risk Clusters

Once k=3 was chosen, K-Means clustering was applied on the PCA-reduced data.

To identify which clusters represent turbulence-prone conditions, the percentage of SEV–EXTRM cases was calculated in each cluster. A cluster was marked as "high-risk" if its percentage of turbulence reports exceeded a dynamic threshold:

```python
Dynamic Threshold = Mean + 0.7 × Std
```

This thresholding strategy ensures only **truly high-risk clusters** are flagged, even if they are small or rare.
```python
Cluster 2 → 82.35% SEV–EXTRM → High-Risk
Cluster 1 → 32.65% SEV–EXTRM
Cluster 0 → 19.25% SEV–EXTRM

```

→  **Cluster 2**  was dynamically labeled as the high-risk group.

---

## 3. 3D PCA Visualization of Clusters

A 3D PCA space with color-coded clusters is visualized to show how turbulence risk groups are spatially organized.

![risk_clusters](images/risk_clusters.png)

This visual reinforces how certain turbulence regimes (in red) are separated from other clusters in unsupervised space which gives useful cues to both **machine learning models** and **aviation analysts**.

---

## Why This Matters

**Identifying turbulence-prone clusters without supervision is extremely valuable in aviation**:

✧ Enables early detection of unseen risk patterns, even in unlabeled regions.

✧ Helps group atmospheric behaviors in ways meaningful for forecasting.

✧ Informs flight planning tools or alerts using unsupervised risk indicators.

✧ Gives operational teams a view of emerging high-risk weather zones not caught by traditional models.

Combined with supervised models, this approach adds resilience and interpretability to the overall turbulence-risk prediction system.