# ðŸ“Š Notebook: Modelos de Machine Learning - Forecasting TJGO

Este notebook demonstra o uso de modelos de machine learning para previsÃ£o de casos no TJGO:
- Random Forest
- XGBoost  
- LightGBM
- Modelos Baseline

## ðŸŽ¯ Objetivos
- Treinar modelos de ML com features temporais
- Analisar importÃ¢ncia das features
- Comparar performance entre modelos
- Avaliar resÃ­duos e diagnÃ³sticos

## ðŸ“ˆ CaracterÃ­sticas dos Modelos ML
- **Random Forest**: Ensemble de Ã¡rvores, robusto a overfitting
- **XGBoost**: Gradient boosting otimizado, alta performance
- **LightGBM**: Gradient boosting eficiente, rÃ¡pido treinamento
- **Features**: Lags, rolling statistics, variÃ¡veis exÃ³genas


In [None]:
# ImportaÃ§Ãµes necessÃ¡rias
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Importar modelos de ML
import sys
sys.path.append('../src')
from models.ml_models import (
    RandomForestModel, XGBoostModel, LightGBMModel, BaselineModels,
    train_random_forest_model, train_xgboost_model, train_lightgbm_model,
    train_baseline_models
)

# Configurar matplotlib
plt.style.use('seaborn-v0_8')
plt.rcParams['figure.figsize'] = (12, 6)

print("âœ… Bibliotecas importadas com sucesso!")
print("ðŸ“Š Pronto para usar os modelos de ML")


In [None]:
# Carregar dados processados
train_data = pd.read_csv('../data/processed_test/train_test.csv', index_col='DATA', parse_dates=True)
test_data = pd.read_csv('../data/processed_test/test_test.csv', index_col='DATA', parse_dates=True)

print("ðŸ“Š Dados carregados:")
print(f"  Treino: {len(train_data)} observaÃ§Ãµes")
print(f"  Teste:  {len(test_data)} observaÃ§Ãµes")
print(f"  VariÃ¡veis: {len(train_data.columns)}")

# Visualizar features disponÃ­veis
print("\nðŸ“‹ Features disponÃ­veis:")
feature_cols = train_data.select_dtypes(include=[np.number]).columns.tolist()
for i, col in enumerate(feature_cols):
    print(f"  {i+1:2d}. {col}")


## ðŸ”§ 2. Treinamento dos Modelos


In [None]:
# Treinar modelos baseline
print("ðŸ”„ Treinando modelos baseline...")
baseline_models = train_baseline_models(train_data, test_data)


In [None]:
# Treinar Random Forest
print("\nðŸ”„ Treinando Random Forest...")
rf_model = train_random_forest_model(train_data, test_data)
rf_model.print_summary("Random Forest")


In [None]:
# Treinar XGBoost
print("\nðŸ”„ Treinando XGBoost...")
xgb_model = train_xgboost_model(train_data, test_data)
xgb_model.print_summary("XGBoost")


In [None]:
# Treinar LightGBM
print("\nðŸ”„ Treinando LightGBM...")
lgb_model = train_lightgbm_model(train_data, test_data)
lgb_model.print_summary("LightGBM")


## ðŸ“ˆ 3. AnÃ¡lises e VisualizaÃ§Ãµes


In [None]:
# Plotar previsÃµes de todos os modelos
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
axes = axes.ravel()

models = [
    (baseline_models.baseline_results['persistence'], 'Baseline PersistÃªncia'),
    (baseline_models.baseline_results['moving_average'], 'Baseline MÃ©dia MÃ³vel'),
    (rf_model, 'Random Forest'),
    (xgb_model, 'XGBoost')
]

for i, (model_data, name) in enumerate(models):
    ax = axes[i]
    
    if isinstance(model_data, dict):
        predictions = model_data['predictions']
        metrics = model_data['metrics']
        mae = metrics['mae']
    else:
        predictions = model_data.predictions
        mae = model_data.metrics['mae']
    
    # Plotar dados reais e previsÃµes
    ax.plot(test_data.index, test_data['TOTAL_CASOS'], 
           label='Real', linewidth=2, color='blue')
    ax.plot(test_data.index, predictions, 
           label='PrevisÃ£o', linewidth=2, color='red', linestyle='--')
    
    ax.set_title(f"{name} (MAE: {mae:.0f})", fontweight='bold')
    ax.set_xlabel('Data')
    ax.set_ylabel('TOTAL_CASOS')
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../reports_test/ml_models_comparison.png', dpi=300, bbox_inches='tight')
plt.show()


In [None]:
# Analisar importÃ¢ncia das features (Random Forest)
rf_model.plot_feature_importance("Random Forest", top_n=10, 
                                save_path="../reports_test/rf_feature_importance.png")


In [None]:
# Analisar importÃ¢ncia das features (XGBoost)
xgb_model.plot_feature_importance("XGBoost", top_n=10, 
                                  save_path="../reports_test/xgb_feature_importance.png")


In [None]:
# Plotar resÃ­duos dos modelos
rf_model.plot_residuals("Random Forest", save_path="../reports_test/rf_residuals.png")
xgb_model.plot_residuals("XGBoost", save_path="../reports_test/xgb_residuals.png")
