# üìä An√°lise Comparativa Final

## Objetivo
Consolidar e comparar os resultados de todos os experimentos:
1. **ML Cl√°ssico**: Wavelet db2 + SVM/RF/XGBoost/LightGBM
2. **DL Raw**: CNN/LSTM/CNN-LSTM/Transformer (sinal bruto)
3. **DL Fixed Wavelet**: Wavelet db2 + CNN/LSTM/CNN-LSTM/Transformer
4. **DL Learned Wavelet**: LearnedWaveletDWT1D_QMF + CNN/LSTM/Transformer

## M√©tricas de Avalia√ß√£o
- RMSE, MAE, MAPE
- R¬≤, Explained Variance
- Tempo de treinamento

In [None]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import json
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Imports locais
import sys
sys.path.append('.')
from config.experiment_config import RESULTS_DIR, DATA_DIR
from src.visualization import ExperimentVisualizer

# Configura√ß√£o de plots
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')

print("‚úÖ Imports realizados com sucesso!")
print(f"üìÅ RESULTS_DIR: {RESULTS_DIR}")

## 1. Carregar Todos os Resultados

In [None]:
def load_all_results():
    """Carrega todos os resultados dos experimentos."""
    all_results = []
    
    # Diret√≥rios de experimentos
    experiment_dirs = [
        ('ML_FixedWavelet', RESULTS_DIR / 'ml_experiments'),
        ('DL_Raw', RESULTS_DIR / 'dl_raw_experiments'),
        ('DL_FixedWavelet', RESULTS_DIR / 'dl_wavelet_experiments'),
        ('DL_LearnedWavelet', RESULTS_DIR / 'learned_wavelet_experiments')
    ]
    
    for category, exp_dir in experiment_dirs:
        if not exp_dir.exists():
            print(f"‚ö†Ô∏è Diret√≥rio n√£o encontrado: {exp_dir}")
            continue
            
        # Tentar carregar CSV
        csv_files = list(exp_dir.glob('*.csv'))
        for csv_file in csv_files:
            if 'comparison' in csv_file.name:
                df = pd.read_csv(csv_file)
                df['Category'] = category
                all_results.append(df)
                print(f"‚úÖ Carregado: {csv_file.name} ({len(df)} registros)")
        
        # Tamb√©m carregar logs JSON se existirem
        json_files = list(exp_dir.glob('experiments_log*.json'))
        for json_file in json_files:
            with open(json_file, 'r') as f:
                logs = json.load(f)
            print(f"‚úÖ Logs JSON: {json_file.name} ({len(logs)} experimentos)")
    
    return all_results

# Carregar resultados
results_list = load_all_results()

In [None]:
# Consolidar em um √∫nico DataFrame
if results_list:
    results_df = pd.concat(results_list, ignore_index=True)
    
    # Padronizar colunas se necess√°rio
    if 'Model' not in results_df.columns and 'model_name' in results_df.columns:
        results_df['Model'] = results_df['model_name']
    
    print(f"\nüìä Total de experimentos: {len(results_df)}")
    print(f"\nCategorias:")
    print(results_df['Category'].value_counts())
else:
    print("‚ö†Ô∏è Nenhum resultado encontrado. Execute os notebooks anteriores primeiro.")
    results_df = pd.DataFrame()

## 2. Resumo Geral

In [None]:
if len(results_df) > 0:
    print("\n" + "="*80)
    print("üìä RESUMO GERAL - TODOS OS EXPERIMENTOS")
    print("="*80)
    
    # Garantir nomes de colunas corretos
    rmse_col = 'RMSE' if 'RMSE' in results_df.columns else 'rmse'
    mae_col = 'MAE' if 'MAE' in results_df.columns else 'mae'
    r2_col = 'R¬≤' if 'R¬≤' in results_df.columns else 'r2'
    params_col = 'Params' if 'Params' in results_df.columns else None
    
    cols_to_display = ['Category', 'Model', rmse_col, mae_col, r2_col]
    if params_col:
        cols_to_display.append(params_col)
    
    display_df = results_df[cols_to_display].copy()
    display_df = display_df.sort_values(rmse_col)
    
    print(display_df.to_string(index=False))
    
    # Salvar resumo
    display_df.to_csv(RESULTS_DIR / 'all_experiments_summary.csv', index=False)
    print(f"\n‚úÖ Resumo salvo em: {RESULTS_DIR / 'all_experiments_summary.csv'}")
else:
    print("‚ö†Ô∏è Nenhum resultado dispon√≠vel")

## 3. Melhor Modelo por Categoria

In [None]:
if len(results_df) > 0:
    rmse_col = 'RMSE' if 'RMSE' in results_df.columns else 'rmse'
    
    print("\n" + "="*80)
    print("üèÜ MELHOR MODELO POR CATEGORIA")
    print("="*80)
    
    best_per_category = results_df.loc[results_df.groupby('Category')[rmse_col].idxmin()]
    
    for _, row in best_per_category.iterrows():
        print(f"\nüìå {row['Category']}:")
        print(f"   Modelo: {row['Model']}")
        print(f"   RMSE:   {row[rmse_col]:.6f}")
        if 'R¬≤' in row.index:
            print(f"   R¬≤:     {row['R¬≤']:.6f}")
        elif 'r2' in row.index:
            print(f"   R¬≤:     {row['r2']:.6f}")

## 4. Compara√ß√£o Visual por Categoria

In [None]:
if len(results_df) > 0:
    rmse_col = 'RMSE' if 'RMSE' in results_df.columns else 'rmse'
    r2_col = 'R¬≤' if 'R¬≤' in results_df.columns else 'r2'
    
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    categories = results_df['Category'].unique()
    colors = {
        'ML_FixedWavelet': '#2ecc71',
        'DL_Raw': '#3498db',
        'DL_FixedWavelet': '#e74c3c',
        'DL_LearnedWavelet': '#9b59b6'
    }
    
    # Plot 1: RMSE por Categoria
    ax = axes[0, 0]
    for i, cat in enumerate(categories):
        cat_data = results_df[results_df['Category'] == cat].sort_values(rmse_col)
        ax.barh(cat_data['Model'], cat_data[rmse_col], 
               color=colors.get(cat, f'C{i}'), alpha=0.8, label=cat)
    ax.set_xlabel('RMSE (menor √© melhor)')
    ax.set_title('RMSE por Modelo e Categoria')
    ax.legend(loc='lower right')
    ax.grid(True, alpha=0.3, axis='x')
    
    # Plot 2: R¬≤ por Categoria
    ax = axes[0, 1]
    for i, cat in enumerate(categories):
        cat_data = results_df[results_df['Category'] == cat].sort_values(r2_col, ascending=False)
        ax.barh(cat_data['Model'], cat_data[r2_col],
               color=colors.get(cat, f'C{i}'), alpha=0.8, label=cat)
    ax.set_xlabel('R¬≤ (maior √© melhor)')
    ax.set_title('R¬≤ por Modelo e Categoria')
    ax.legend(loc='lower right')
    ax.grid(True, alpha=0.3, axis='x')
    
    # Plot 3: Boxplot RMSE por categoria
    ax = axes[1, 0]
    category_order = ['ML_FixedWavelet', 'DL_Raw', 'DL_FixedWavelet', 'DL_LearnedWavelet']
    category_order = [c for c in category_order if c in categories]
    palette = [colors.get(c, 'gray') for c in category_order]
    sns.boxplot(data=results_df, x='Category', y=rmse_col, ax=ax, 
                order=category_order, palette=palette)
    ax.set_xlabel('Categoria')
    ax.set_ylabel('RMSE')
    ax.set_title('Distribui√ß√£o de RMSE por Categoria')
    ax.tick_params(axis='x', rotation=15)
    
    # Plot 4: Boxplot R¬≤ por categoria
    ax = axes[1, 1]
    sns.boxplot(data=results_df, x='Category', y=r2_col, ax=ax,
                order=category_order, palette=palette)
    ax.set_xlabel('Categoria')
    ax.set_ylabel('R¬≤')
    ax.set_title('Distribui√ß√£o de R¬≤ por Categoria')
    ax.tick_params(axis='x', rotation=15)
    
    plt.suptitle('An√°lise Comparativa Completa', fontsize=16, fontweight='bold')
    plt.tight_layout()
    plt.savefig(RESULTS_DIR / 'comparison_all_categories.png', dpi=150, bbox_inches='tight')
    plt.show()

## 5. Ranking Global

In [None]:
if len(results_df) > 0:
    rmse_col = 'RMSE' if 'RMSE' in results_df.columns else 'rmse'
    r2_col = 'R¬≤' if 'R¬≤' in results_df.columns else 'r2'
    params_col = 'Params' if 'Params' in results_df.columns else None
    
    print("\n" + "="*80)
    print("üèÖ RANKING GLOBAL (por RMSE)")
    print("="*80)
    
    ranking = results_df.sort_values(rmse_col).reset_index(drop=True)
    ranking['Rank'] = range(1, len(ranking) + 1)
    
    # Display top 10
    print("\nTop 10 Modelos:")
    print("-"*90)
    for i, row in ranking.head(10).iterrows():
        params_str = f" | Params: {int(row[params_col]):,}" if params_col and pd.notna(row.get(params_col)) else ""
        print(f"{row['Rank']:2d}. [{row['Category']:18s}] {row['Model']:25s} | RMSE: {row[rmse_col]:.6f} | R¬≤: {row[r2_col]:.4f}{params_str}")
    
    # Salvar ranking completo
    ranking.to_csv(RESULTS_DIR / 'ranking_global.csv', index=False)
    print(f"\n‚úÖ Ranking salvo em: {RESULTS_DIR / 'ranking_global.csv'}")

## 6. An√°lise Estat√≠stica das Categorias

In [None]:
if len(results_df) > 0:
    rmse_col = 'RMSE' if 'RMSE' in results_df.columns else 'rmse'
    r2_col = 'R¬≤' if 'R¬≤' in results_df.columns else 'r2'
    
    print("\n" + "="*80)
    print("üìà ESTAT√çSTICAS POR CATEGORIA")
    print("="*80)
    
    stats = results_df.groupby('Category').agg({
        rmse_col: ['mean', 'std', 'min', 'max'],
        r2_col: ['mean', 'std', 'min', 'max']
    }).round(6)
    
    print(stats)
    
    stats.to_csv(RESULTS_DIR / 'statistics_by_category.csv')

## 7. Compara√ß√£o: Fixed Wavelet vs Learned Wavelet

In [None]:
if len(results_df) > 0:
    rmse_col = 'RMSE' if 'RMSE' in results_df.columns else 'rmse'
    
    fixed_wavelet = results_df[results_df['Category'] == 'DL_FixedWavelet']
    learned_wavelet = results_df[results_df['Category'] == 'DL_LearnedWavelet']
    
    if len(fixed_wavelet) > 0 and len(learned_wavelet) > 0:
        print("\n" + "="*80)
        print("üî¨ COMPARA√á√ÉO: Fixed Wavelet (db2) vs Learned Wavelet (QMF)")
        print("="*80)
        
        # Comparar por arquitetura (CNN, LSTM, Transformer)
        architectures = ['CNN', 'LSTM', 'Transformer']
        
        comparison_data = []
        for arch in architectures:
            fixed = fixed_wavelet[fixed_wavelet['Model'].str.contains(arch, case=False)]
            learned = learned_wavelet[learned_wavelet['Model'].str.contains(arch, case=False)]
            
            if len(fixed) > 0 and len(learned) > 0:
                fixed_rmse = fixed[rmse_col].min()
                learned_rmse = learned[rmse_col].min()
                improvement = ((fixed_rmse - learned_rmse) / fixed_rmse) * 100
                
                comparison_data.append({
                    'Architecture': arch,
                    'Fixed_RMSE': fixed_rmse,
                    'Learned_RMSE': learned_rmse,
                    'Improvement_%': improvement,
                    'Winner': 'Learned' if improvement > 0 else 'Fixed'
                })
        
        if comparison_data:
            comp_df = pd.DataFrame(comparison_data)
            print("\n")
            print(comp_df.to_string(index=False))
            
            # Visualiza√ß√£o
            fig, ax = plt.subplots(figsize=(10, 6))
            x = np.arange(len(comp_df))
            width = 0.35
            
            bars1 = ax.bar(x - width/2, comp_df['Fixed_RMSE'], width, 
                          label='Fixed Wavelet (db2)', color='#e74c3c', alpha=0.8)
            bars2 = ax.bar(x + width/2, comp_df['Learned_RMSE'], width,
                          label='Learned Wavelet (QMF)', color='#9b59b6', alpha=0.8)
            
            ax.set_xlabel('Arquitetura')
            ax.set_ylabel('RMSE (menor √© melhor)')
            ax.set_title('Fixed Wavelet vs Learned Wavelet por Arquitetura')
            ax.set_xticks(x)
            ax.set_xticklabels(comp_df['Architecture'])
            ax.legend()
            
            # Adicionar valores nas barras
            for bar in bars1:
                height = bar.get_height()
                ax.annotate(f'{height:.4f}',
                           xy=(bar.get_x() + bar.get_width() / 2, height),
                           xytext=(0, 3),
                           textcoords="offset points",
                           ha='center', va='bottom', fontsize=9)
            for bar in bars2:
                height = bar.get_height()
                ax.annotate(f'{height:.4f}',
                           xy=(bar.get_x() + bar.get_width() / 2, height),
                           xytext=(0, 3),
                           textcoords="offset points",
                           ha='center', va='bottom', fontsize=9)
            
            plt.tight_layout()
            plt.savefig(RESULTS_DIR / 'fixed_vs_learned_wavelet.png', dpi=150, bbox_inches='tight')
            plt.show()
    else:
        print("‚ö†Ô∏è Dados insuficientes para compara√ß√£o Fixed vs Learned")

## 8. An√°lise: Raw vs Wavelet Preprocessing

In [None]:
if len(results_df) > 0:
    rmse_col = 'RMSE' if 'RMSE' in results_df.columns else 'rmse'
    
    raw_results = results_df[results_df['Category'] == 'DL_Raw']
    wavelet_results = results_df[results_df['Category'].isin(['DL_FixedWavelet', 'DL_LearnedWavelet'])]
    
    if len(raw_results) > 0 and len(wavelet_results) > 0:
        print("\n" + "="*80)
        print("üî¨ COMPARA√á√ÉO: Raw Signal vs Wavelet Preprocessing")
        print("="*80)
        
        raw_best = raw_results[rmse_col].min()
        wavelet_best = wavelet_results[rmse_col].min()
        
        improvement = ((raw_best - wavelet_best) / raw_best) * 100
        
        print(f"\nüìä Melhor RMSE (Raw):     {raw_best:.6f}")
        print(f"üìä Melhor RMSE (Wavelet): {wavelet_best:.6f}")
        print(f"üìà Melhoria com Wavelet:  {improvement:.2f}%")
        
        if improvement > 0:
            print("\n‚úÖ Wavelets melhoram o desempenho neste dataset ruidoso!")
        else:
            print("\n‚ö†Ô∏è Sinal raw teve melhor desempenho")

## 9. Relat√≥rio Final

In [None]:
if len(results_df) > 0:
    rmse_col = 'RMSE' if 'RMSE' in results_df.columns else 'rmse'
    r2_col = 'R¬≤' if 'R¬≤' in results_df.columns else 'r2'
    
    # Melhor modelo geral
    best_overall = results_df.loc[results_df[rmse_col].idxmin()]
    
    report = f"""
{'='*80}
üìã RELAT√ìRIO FINAL - EXPERIMENTOS DE VALIDA√á√ÉO DE WAVELETS
{'='*80}

üìÖ Data: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

üìä RESUMO GERAL
{'‚îÄ'*40}
‚Ä¢ Total de experimentos: {len(results_df)}
‚Ä¢ Categorias avaliadas: {', '.join(results_df['Category'].unique())}

üèÜ MELHOR MODELO GERAL
{'‚îÄ'*40}
‚Ä¢ Modelo:    {best_overall['Model']}
‚Ä¢ Categoria: {best_overall['Category']}
‚Ä¢ RMSE:      {best_overall[rmse_col]:.6f}
‚Ä¢ R¬≤:        {best_overall[r2_col]:.6f}

üìà AN√ÅLISE POR CATEGORIA
{'‚îÄ'*40}
"""
    
    for cat in results_df['Category'].unique():
        cat_data = results_df[results_df['Category'] == cat]
        best_cat = cat_data.loc[cat_data[rmse_col].idxmin()]
        report += f"\n‚Ä¢ {cat}:"
        report += f"\n  - Melhor: {best_cat['Model']}"
        report += f"\n  - RMSE:   {best_cat[rmse_col]:.6f}\n"
    
    report += f"""
üí° PRINCIPAIS CONCLUS√ïES
{'‚îÄ'*40}
1. Wavelets (fixed e learned) geralmente melhoram o desempenho em sinais ruidosos
2. LearnedWaveletDWT1D_QMF adapta-se √†s caracter√≠sticas espec√≠ficas do sinal
3. Arquiteturas h√≠bridas (Wavelet + DL) capturam tanto padr√µes locais quanto globais

üìÅ Arquivos Gerados:
‚Ä¢ {RESULTS_DIR / 'all_experiments_summary.csv'}
‚Ä¢ {RESULTS_DIR / 'ranking_global.csv'}
‚Ä¢ {RESULTS_DIR / 'comparison_all_categories.png'}

{'='*80}
"""
    
    print(report)
    
    # Salvar relat√≥rio
    with open(RESULTS_DIR / 'final_report.txt', 'w') as f:
        f.write(report)
    print(f"\n‚úÖ Relat√≥rio salvo em: {RESULTS_DIR / 'final_report.txt'}")

## 10. Conclus√£o

In [None]:
print("\n" + "="*80)
print("üéâ AN√ÅLISE COMPARATIVA CONCLU√çDA!")
print("="*80)
print(f"""
Este notebook consolidou os resultados de todos os experimentos:

‚úÖ ML Cl√°ssico (Wavelet db2 + SVM/RF/XGBoost/LightGBM)
‚úÖ DL Raw (CNN/LSTM/CNN-LSTM/Transformer)
‚úÖ DL Fixed Wavelet (Wavelet db2 + DL models)
‚úÖ DL Learned Wavelet (LearnedWaveletDWT1D_QMF + DL models)

üìÅ Resultados salvos em: {RESULTS_DIR}

üî¨ Pr√≥ximos Passos Sugeridos:
1. Analisar os filtros aprendidos pelo LearnedWaveletDWT1D_QMF
2. Testar com diferentes configura√ß√µes de ru√≠do
3. Aplicar em dados reais (financeiros, biom√©dicos, etc.)
4. Otimizar hiperpar√¢metros com Optuna
""")