# Language Extinction Risk Prediction Analysis

**INFT6201 Big Data Assessment 2**  
**Predicting Global Language Extinction Risk: A Machine Learning Framework to Guide UNESCO's International Decade of Indigenous Languages 2022-2032**

---

## Executive Summary

This notebook presents a comprehensive analysis of global language extinction risk using machine learning. The project develops a framework to predict which of the world's 3,116 endangered languages face imminent extinction, enabling data-driven prioritization of preservation resources during UNESCO's International Decade of Indigenous Languages (2022-2032).

### Key Objectives:
1. **Classification Task**: Predict language endangerment transitions with >85% accuracy
2. **Feature Analysis**: Identify the most important factors driving language endangerment
3. **Geographic Patterns**: Map global distribution of endangered languages
4. **Policy Impact**: Provide actionable insights for preservation efforts

### Expected Impact:
- Guide UNESCO's $2+ billion International Decade of Indigenous Languages
- Enable early intervention to save 200-300 languages by 2100
- Preserve irreplaceable cultural heritage for millions of Indigenous peoples


## 1. Data Loading and Integration

We integrate multiple global datasets to create a comprehensive language endangerment database:


In [6]:
# Import required libraries
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Add src to path
sys.path.append('src')

# Import our custom modules
from data.data_loader import LanguageDataLoader
from data.data_preprocessor import LanguageDataPreprocessor
from models.ml_models import LanguageExtinctionPredictor
from visualization.visualizer import LanguageVisualizer

# Set up plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("Libraries imported successfully!")
print("Starting Language Extinction Risk Analysis...")


ModuleNotFoundError: No module named 'xgboost'