# Attribute Analysis and Feature Engineering

## Overview
This notebook provides detailed analysis of individual attributes in the student dropout prediction dataset. It examines data quality, characteristics, and feature engineering opportunities for machine learning models.

## Objectives
- **Comprehensive Attribute Analysis**: Detailed examination of each feature's characteristics
- **Data Quality Assessment**: Identification of missing values, outliers, and data anomalies
- **Feature Engineering**: Creation of meaningful features for predictive modeling
- **Statistical Analysis**: Understanding distributions and relationships between variables

## Methodology
1. **Data Type Classification**: Categorize attributes by data type and measurement scale
2. **Missing Value Analysis**: Quantify and visualize missing data patterns
3. **Distribution Analysis**: Examine statistical distributions and identify outliers
4. **Feature Engineering**: Create derived features and transformations
5. **Correlation Analysis**: Identify relationships between features and target variable

---


In [None]:
# Import essential libraries for data analysis and visualization
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Import machine learning libraries for feature analysis
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc, precision_recall_curve
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB

# Try to import XGBoost for advanced modeling (optional dependency)
try:
    import xgboost as xgb
    xgboost_available = True
    print("✓ XGBoost successfully imported")
except ImportError:
    print("⚠ XGBoost not available - will skip XGBoost models")
    xgboost_available = False

# Set plotting parameters for better visualization
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)


### 1.2 Data Loading and Preprocessing
This section loads the dataset and performs initial data cleaning, including filtering nationalities with sufficient sample sizes.


## 1. Data Loading and Initial Setup

### 1.1 Import Libraries and Load Data
This section imports necessary libraries and loads the dataset for attribute analysis.
