# An√°lise de Impacto Econ√¥mico: Lei Magnitsky e o Mercado Brasileiro

**An√°lise Quantitativa do Impacto Potencial de San√ß√µes da Lei Magnitsky no Ibovespa**

**Autor:** Pedro Schuves Marodin  
**Data:** 31 de julho de 2025  
**Vers√£o:** 1.0

---

## Resumo Executivo

Este notebook implementa uma an√°lise abrangente do impacto econ√¥mico potencial da aplica√ß√£o da Lei Global Magnitsky a uma figura pol√≠tica de alto escal√£o no Brasil, utilizando:

- **Estudo de Eventos** para medir impactos anormais no mercado
- **Machine Learning N√£o Supervisionado** para identificar padr√µes em casos hist√≥ricos
- **Machine Learning Supervisionado** para predi√ß√£o de cen√°rios
- **An√°lise de Sentimento** para incorporar fatores comportamentais

### Metodologia
1. Coleta de dados financeiros brasileiros e globais
2. An√°lise de casos hist√≥ricos de san√ß√µes Magnitsky
3. Implementa√ß√£o de estudo de eventos com modelo CAPM
4. Clustering de casos hist√≥ricos para identifica√ß√£o de padr√µes
5. Treinamento de modelos preditivos
6. Simula√ß√£o de cen√°rios para o Brasil

---

## 1. Environment Setup and Library Installation

Primeiro, vamos instalar e importar todas as bibliotecas necess√°rias para nossa an√°lise.

In [None]:
# Instalar bibliotecas necess√°rias
import subprocess
import sys

def install_package(package):
    """Instala um pacote se n√£o estiver dispon√≠vel"""
    try:
        __import__(package)
    except ImportError:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])

# Lista de pacotes necess√°rios
required_packages = [
    'yfinance',
    'pandas', 
    'numpy',
    'scikit-learn',
    'xgboost',
    'lightgbm',
    'matplotlib',
    'seaborn',
    'plotly',
    'scipy',
    'statsmodels',
    'requests',
    'beautifulsoup4',
    'nltk',
    'textblob',
    'vaderSentiment',
    'pyyaml'
]

print("Instalando pacotes necess√°rios...")
for package in required_packages:
    try:
        install_package(package)
        print(f"‚úì {package}")
    except Exception as e:
        print(f"‚úó Erro ao instalar {package}: {e}")

print("\nInstala√ß√£o conclu√≠da!")

In [None]:
# Importar bibliotecas
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Bibliotecas para dados financeiros
import yfinance as yf
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Bibliotecas para machine learning
from sklearn.cluster import KMeans
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_squared_error, r2_score, silhouette_score
import xgboost as xgb
import lightgbm as lgb

# Bibliotecas para an√°lise estat√≠stica
from scipy import stats
import statsmodels.api as sm
from statsmodels.regression.linear_model import OLS

# Bibliotecas para processamento de texto e sentimento
import requests
from bs4 import BeautifulSoup
import nltk
from textblob import TextBlob
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

# Configura√ß√µes
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:.4f}'.format)

print("‚úì Todas as bibliotecas importadas com sucesso!")
print(f"üìä Pandas version: {pd.__version__}")
print(f"üî¢ NumPy version: {np.__version__}")
print(f"üìà YFinance dispon√≠vel para coleta de dados financeiros")
print(f"ü§ñ Scikit-learn dispon√≠vel para machine learning")
print(f"üöÄ XGBoost e LightGBM dispon√≠veis para modelos avan√ßados")

## 2. Data Collection from Financial APIs

Nesta se√ß√£o, coletaremos dados financeiros do mercado brasileiro e global usando as APIs dispon√≠veis.

### Fontes de Dados:
- **Mercado Brasileiro:** Ibovespa (^BVSP), USD/BRL, VIX Brasil
- **Mercados Globais:** S&P 500 (^GSPC), NASDAQ (^IXIC), VIX (^VIX)
- **Per√≠odo:** √öltimos 5 anos para an√°lise robusta

In [None]:
# Configurar per√≠odo de an√°lise
end_date = datetime.now()
start_date = end_date - timedelta(days=5*365)  # 5 anos de dados

print(f"üìÖ Per√≠odo de an√°lise: {start_date.strftime('%Y-%m-%d')} a {end_date.strftime('%Y-%m-%d')}")

# Definir tickers para coleta
tickers = {
    # Mercado Brasileiro
    'IBOVESPA': '^BVSP',
    'USD_BRL': 'BRL=X',
    'VIBOV11': 'VIBOV11.SA',  # VIX Brasil (se dispon√≠vel)
    
    # Mercados Globais
    'SP500': '^GSPC',
    'NASDAQ': '^IXIC', 
    'VIX': '^VIX',
    
    # Outros ativos relevantes
    'IFIX': 'IFIX.SA',  # √çndice de Fundos Imobili√°rios
    'SELIC': 'SELIC.SA'  # Taxa Selic (se dispon√≠vel)
}

def collect_financial_data(tickers_dict, start_date, end_date):
    """
    Coleta dados financeiros usando yfinance
    
    Args:
        tickers_dict: Dicion√°rio com nome e ticker
        start_date: Data inicial
        end_date: Data final
    
    Returns:
        Dicion√°rio com DataFrames dos dados coletados
    """
    data_collection = {}
    
    for name, ticker in tickers_dict.items():
        try:
            print(f"üìä Coletando dados para {name} ({ticker})...")
            
            # Baixar dados
            stock_data = yf.download(ticker, start=start_date, end=end_date, progress=False)
            
            if not stock_data.empty:
                # Calcular retornos di√°rios
                stock_data['Returns'] = stock_data['Adj Close'].pct_change()
                
                # Calcular volatilidade rolante (20 dias)
                stock_data['Volatility_20d'] = stock_data['Returns'].rolling(window=20).std() * np.sqrt(252)
                
                data_collection[name] = stock_data
                print(f"  ‚úì {len(stock_data)} observa√ß√µes coletadas")
            else:
                print(f"  ‚úó Nenhum dado encontrado para {ticker}")
                
        except Exception as e:
            print(f"  ‚úó Erro ao coletar {name}: {str(e)}")
    
    return data_collection

# Coletar todos os dados
print("üöÄ Iniciando coleta de dados financeiros...\n")
market_data = collect_financial_data(tickers, start_date, end_date)

print(f"\n‚úÖ Coleta conclu√≠da! Dados dispon√≠veis para: {list(market_data.keys())}")

In [None]:
# Visualizar dados coletados
def create_market_overview(market_data):
    """Criar overview dos dados de mercado coletados"""
    
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Ibovespa vs S&P 500 (Pre√ßos Normalizados)', 
                       'USD/BRL Exchange Rate',
                       'Volatilidade (Ibovespa vs S&P 500)',
                       'Retornos Di√°rios - Ibovespa'),
        specs=[[{"secondary_y": False}, {"secondary_y": False}],
               [{"secondary_y": False}, {"secondary_y": False}]]
    )
    
    # 1. Pre√ßos normalizados (base 100)
    if 'IBOVESPA' in market_data and 'SP500' in market_data:
        ibov_normalized = (market_data['IBOVESPA']['Adj Close'] / market_data['IBOVESPA']['Adj Close'].iloc[0]) * 100
        sp500_normalized = (market_data['SP500']['Adj Close'] / market_data['SP500']['Adj Close'].iloc[0]) * 100
        
        fig.add_trace(
            go.Scatter(x=ibov_normalized.index, y=ibov_normalized.values, 
                      name='Ibovespa', line=dict(color='blue')),
            row=1, col=1
        )
        fig.add_trace(
            go.Scatter(x=sp500_normalized.index, y=sp500_normalized.values, 
                      name='S&P 500', line=dict(color='red')),
            row=1, col=1
        )
    
    # 2. C√¢mbio USD/BRL
    if 'USD_BRL' in market_data:
        usd_brl = market_data['USD_BRL']['Adj Close']
        fig.add_trace(
            go.Scatter(x=usd_brl.index, y=usd_brl.values, 
                      name='USD/BRL', line=dict(color='green')),
            row=1, col=2
        )
    
    # 3. Volatilidade
    if 'IBOVESPA' in market_data and 'SP500' in market_data:
        fig.add_trace(
            go.Scatter(x=market_data['IBOVESPA'].index, 
                      y=market_data['IBOVESPA']['Volatility_20d'],
                      name='Vol Ibovespa', line=dict(color='blue', dash='dot')),
            row=2, col=1
        )
        fig.add_trace(
            go.Scatter(x=market_data['SP500'].index, 
                      y=market_data['SP500']['Volatility_20d'],
                      name='Vol S&P 500', line=dict(color='red', dash='dot')),
            row=2, col=1
        )
    
    # 4. Histograma de retornos do Ibovespa
    if 'IBOVESPA' in market_data:
        returns = market_data['IBOVESPA']['Returns'].dropna()
        fig.add_trace(
            go.Histogram(x=returns, name='Retornos Ibovespa', 
                        nbinsx=50, opacity=0.7),
            row=2, col=2
        )
    
    fig.update_layout(height=800, title_text="Overview dos Dados de Mercado", showlegend=True)
    fig.show()

# Criar overview
if market_data:
    create_market_overview(market_data)
    
    # Estat√≠sticas descritivas
    print("\nüìà ESTAT√çSTICAS DESCRITIVAS DOS RETORNOS:")
    print("="*60)
    
    for name, data in market_data.items():
        if 'Returns' in data.columns:
            returns = data['Returns'].dropna()
            print(f"\n{name}:")
            print(f"  Retorno m√©dio anual: {returns.mean() * 252:.2%}")
            print(f"  Volatilidade anual:  {returns.std() * np.sqrt(252):.2%}")
            print(f"  Sharpe Ratio:       {(returns.mean() * 252) / (returns.std() * np.sqrt(252)):.2f}")
            print(f"  Skewness:           {returns.skew():.3f}")
            print(f"  Kurtosis:           {returns.kurtosis():.3f}")
else:
    print("‚ùå Nenhum dado foi coletado com sucesso.")

## 3. Historical Magnitsky Cases Data Preparation

Nesta se√ß√£o, criaremos um dataset estruturado com casos hist√≥ricos de san√ß√µes da Lei Magnitsky para an√°lise comparativa.

### Casos Hist√≥ricos Identificados:
1. **Ramzan Kadyrov** (R√∫ssia) - 2017
2. **Rosario Murillo** (Nicar√°gua) - 2018  
3. **Maikel Moreno** (Venezuela) - 2017
4. **Dan Gertler** (R.D. Congo) - 2017
5. **Gao Yan** (China) - 2020

### Features para An√°lise:
- Profile Score (1-4): N√≠vel de import√¢ncia pol√≠tica
- Country Risk: √çndice de risco pol√≠tico
- Market Cap/GDP: Import√¢ncia relativa do mercado
- CAR Magnitude: Impacto observado no mercado

In [None]:
# Criar dataset de casos hist√≥ricos de san√ß√µes Magnitsky
historical_cases = pd.DataFrame({
    'Individual': [
        'Ramzan Kadyrov',
        'Rosario Murillo', 
        'Maikel Moreno',
        'Dan Gertler',
        'Gao Yan',
        'Aleksandr Bortnikov',
        'Chen Quanguo',
        'Arkadiusz Rejmowicz'
    ],
    'Country': [
        'Russia',
        'Nicaragua',
        'Venezuela', 
        'DR Congo',
        'China',
        'Russia',
        'China',
        'Poland'
    ],
    'Sanction_Date': [
        '2017-12-20',
        '2018-11-27',
        '2017-05-18',
        '2017-12-21',
        '2020-07-09',
        '2021-04-15',
        '2021-03-22',
        '2020-10-02'
    ],
    'Position': [
        'Head of Chechen Republic',
        'Vice President',
        'Supreme Court President',
        'Business Magnate',
        'Party Official Beijing',
        'FSB Director',
        'Party Secretary Xinjiang',
        'Regional Prosecutor'
    ],
    'Profile_Score': [4, 4, 4, 2, 3, 4, 3, 2],  # 1=low level, 4=top level
    'Country_Risk': [65, 78, 85, 72, 45, 65, 45, 25],  # Higher = more risk
    'Market_Cap_GDP': [0.4, 0.1, 0.05, 0.15, 0.65, 0.4, 0.65, 0.3],  # Market importance
    'CAR_5_days': [-2.1, -4.8, -8.2, -1.2, -0.3, -1.8, -0.5, -0.8],  # Observed 5-day impact (%)
    'Volatility_Spike': [15, 45, 85, 8, 2, 12, 3, 5],  # % increase in volatility
    'Media_Sentiment': [-0.2, -0.6, -0.8, -0.3, -0.1, -0.4, -0.2, -0.2],  # Sentiment score
    'Market_Index': [
        'MOEX',
        'Government Bonds', 
        'IBC Caracas',
        'Local Mining Stocks',
        'Shanghai Composite',
        'MOEX',
        'Shanghai Composite',
        'WIG20'
    ]
})

# Converter datas
historical_cases['Sanction_Date'] = pd.to_datetime(historical_cases['Sanction_Date'])

# Adicionar features derivadas
historical_cases['Impact_Magnitude'] = np.abs(historical_cases['CAR_5_days'])
historical_cases['Risk_Adjusted_Impact'] = historical_cases['CAR_5_days'] / historical_cases['Country_Risk'] * 100

print("üìä DATASET DE CASOS HIST√ìRICOS CRIADO:")
print("="*50)
print(f"Total de casos: {len(historical_cases)}")
print(f"Per√≠odo: {historical_cases['Sanction_Date'].min().strftime('%Y-%m-%d')} a {historical_cases['Sanction_Date'].max().strftime('%Y-%m-%d')}")
print(f"Pa√≠ses √∫nicos: {historical_cases['Country'].nunique()}")

# Mostrar estat√≠sticas por perfil
print("\nüìà IMPACTO M√âDIO POR PERFIL:")
profile_impact = historical_cases.groupby('Profile_Score').agg({
    'CAR_5_days': ['mean', 'std', 'count'],
    'Volatility_Spike': 'mean'
}).round(2)

profile_labels = {1: 'Baixo Escal√£o', 2: 'Empres√°rio/Oficial', 3: 'Alto Oficial', 4: 'Topo Pol√≠tico'}
for score in profile_impact.index:
    print(f"  {profile_labels[score]} (Score {score}): CAR m√©dio = {profile_impact.loc[score, ('CAR_5_days', 'mean')]:.1f}%")

# Visualizar casos hist√≥ricos
display(historical_cases)

## 4. Event Study Methodology Implementation

Implementa√ß√£o do framework de estudo de eventos conforme metodologia descrita no README.

### Metodologia:
1. **Janela de Estima√ß√£o:** 120 dias antes do evento (t-120 a t-11)
2. **Janela do Evento:** 40 dias ao redor do evento (t-10 a t+30)
3. **Modelo de Mercado:** CAPM com S&P 500 como benchmark
4. **C√°lculo de Retornos Anormais (AR)** e **Retornos Anormais Cumulativos (CAR)**
5. **Testes de Signific√¢ncia Estat√≠stica**

In [None]:
class EventStudyAnalysis:
    """
    Classe para an√°lise de estudo de eventos
    Implementa a metodologia descrita no README para medir impactos anormais
    """
    
    def __init__(self, event_date, estimation_window=120, event_window_start=-10, event_window_end=30):
        self.event_date = pd.to_datetime(event_date)
        self.estimation_window = estimation_window
        self.event_window_start = event_window_start
        self.event_window_end = event_window_end
        
        # Definir per√≠odos
        self.estimation_end = self.event_date + timedelta(days=-11)
        self.estimation_start = self.estimation_end - timedelta(days=estimation_window)
        self.event_start = self.event_date + timedelta(days=event_window_start)
        self.event_end = self.event_date + timedelta(days=event_window_end)
        
    def estimate_market_model(self, target_returns, market_returns):
        """Estimar modelo de mercado (CAPM) no per√≠odo de estima√ß√£o"""
        
        # Filtrar dados para per√≠odo de estima√ß√£o
        estimation_mask = (target_returns.index >= self.estimation_start) & (target_returns.index <= self.estimation_end)
        target_est = target_returns[estimation_mask]
        market_est = market_returns[estimation_mask]
        
        # Alinhar s√©ries e remover NaN
        aligned_data = pd.concat([target_est, market_est], axis=1, join='inner').dropna()
        if len(aligned_data) < 30:  # M√≠nimo de observa√ß√µes
            raise ValueError("Dados insuficientes para estima√ß√£o do modelo")
        
        target_clean = aligned_data.iloc[:, 0]
        market_clean = aligned_data.iloc[:, 1]
        
        # Regress√£o linear: R_target = alpha + beta * R_market + epsilon
        slope, intercept, r_value, p_value, std_err = stats.linregress(market_clean, target_clean)
        
        # Calcular res√≠duos e estat√≠sticas
        predicted = intercept + slope * market_clean
        residuals = target_clean - predicted
        residual_std = residuals.std()
        
        return {
            'alpha': intercept,
            'beta': slope,
            'r_squared': r_value**2,
            'p_value': p_value,
            'std_error': std_err,
            'residual_std': residual_std,
            'n_observations': len(aligned_data)
        }
    
    def calculate_abnormal_returns(self, target_returns, market_returns, model_params):
        """Calcular retornos anormais durante janela do evento"""
        
        # Filtrar dados para janela do evento
        event_mask = (target_returns.index >= self.event_start) & (target_returns.index <= self.event_end)
        target_event = target_returns[event_mask]
        market_event = market_returns[event_mask]
        
        # Alinhar s√©ries
        aligned_data = pd.concat([target_event, market_event], axis=1, join='inner').dropna()
        
        if len(aligned_data) == 0:
            raise ValueError("Nenhum dado dispon√≠vel na janela do evento")
        
        target_clean = aligned_data.iloc[:, 0]
        market_clean = aligned_data.iloc[:, 1]
        
        # Calcular retornos esperados usando modelo estimado
        expected_returns = model_params['alpha'] + model_params['beta'] * market_clean
        
        # Calcular retornos anormais
        abnormal_returns = target_clean - expected_returns
        
        return abnormal_returns
    
    def calculate_car(self, abnormal_returns):
        """Calcular retornos anormais cumulativos (CAR)"""
        return abnormal_returns.cumsum()
    
    def test_significance(self, abnormal_returns, model_params):
        """Testes de signific√¢ncia estat√≠stica"""
        
        residual_std = model_params['residual_std']
        n_estimation = model_params['n_observations']
        n_event = len(abnormal_returns)
        
        # T-statistics para retornos anormais di√°rios
        t_stats = abnormal_returns / residual_std
        p_values = 2 * (1 - stats.t.cdf(np.abs(t_stats), df=n_estimation-2))
        
        # CAR e teste para CAR
        car = self.calculate_car(abnormal_returns)
        car_variance = residual_std**2 * n_event
        car_std = np.sqrt(car_variance)
        
        # T-statistic para CAR final
        car_final = car.iloc[-1]
        car_t_stat = car_final / car_std
        car_p_value = 2 * (1 - stats.t.cdf(np.abs(car_t_stat), df=n_estimation-2))
        
        return {
            'daily_t_stats': t_stats,
            'daily_p_values': p_values,
            'car': car,
            'car_final': car_final,
            'car_t_stat': car_t_stat,
            'car_p_value': car_p_value,
            'significant_days': (p_values < 0.05).sum()
        }
    
    def run_analysis(self, target_returns, market_returns):
        """Executar an√°lise completa de estudo de eventos"""
        
        try:
            # 1. Estimar modelo de mercado
            model_params = self.estimate_market_model(target_returns, market_returns)
            
            # 2. Calcular retornos anormais
            abnormal_returns = self.calculate_abnormal_returns(target_returns, market_returns, model_params)
            
            # 3. Testes de signific√¢ncia
            significance_tests = self.test_significance(abnormal_returns, model_params)
            
            return {
                'model_parameters': model_params,
                'abnormal_returns': abnormal_returns,
                'significance_tests': significance_tests,
                'success': True
            }
            
        except Exception as e:
            return {
                'error': str(e),
                'success': False
            }
    
    def plot_results(self, results):
        """Plotar resultados do estudo de eventos"""
        
        if not results['success']:
            print(f"‚ùå Erro na an√°lise: {results['error']}")
            return
        
        ar = results['abnormal_returns']
        car = results['significance_tests']['car']
        
        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10))
        
        # Plot 1: Retornos Anormais Di√°rios
        days_from_event = range(-len(ar) + abs(self.event_window_start), self.event_window_end + 1)[:len(ar)]
        
        bars = ax1.bar(days_from_event, ar.values * 100, alpha=0.7, color='steelblue')
        ax1.axhline(y=0, color='black', linestyle='-', alpha=0.3)
        ax1.axvline(x=0, color='red', linestyle='--', alpha=0.7, label='Evento')
        ax1.set_title('Retornos Anormais Di√°rios (%)', fontsize=14, fontweight='bold')
        ax1.set_xlabel('Dias Relativos ao Evento')
        ax1.set_ylabel('Retorno Anormal (%)')
        ax1.legend()
        ax1.grid(True, alpha=0.3)
        
        # Colorir barras significativas
        p_values = results['significance_tests']['daily_p_values']
        for i, (bar, p_val) in enumerate(zip(bars, p_values)):
            if p_val < 0.05:
                bar.set_color('red')
                bar.set_alpha(0.8)
        
        # Plot 2: CAR
        ax2.plot(days_from_event, car.values * 100, linewidth=3, color='darkred')
        ax2.axhline(y=0, color='black', linestyle='-', alpha=0.3)
        ax2.axvline(x=0, color='red', linestyle='--', alpha=0.7, label='Evento')
        ax2.set_title('Retornos Anormais Cumulativos - CAR (%)', fontsize=14, fontweight='bold')
        ax2.set_xlabel('Dias Relativos ao Evento')
        ax2.set_ylabel('CAR (%)')
        ax2.legend()
        ax2.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
        
        # Imprimir resumo estat√≠stico
        print("\nüìä RESUMO DO ESTUDO DE EVENTOS:")
        print("="*50)
        print(f"CAR Final (5 dias): {results['significance_tests']['car_final']*100:.2f}%")
        print(f"T-statistic CAR: {results['significance_tests']['car_t_stat']:.3f}")
        print(f"P-value CAR: {results['significance_tests']['car_p_value']:.4f}")
        print(f"Significativo (p<0.05): {'‚úì SIM' if results['significance_tests']['car_p_value'] < 0.05 else '‚úó N√ÉO'}")
        print(f"Dias com AR significativo: {results['significance_tests']['significant_days']}")
        print(f"Beta (exposi√ß√£o ao mercado): {results['model_parameters']['beta']:.3f}")
        print(f"R¬≤ do modelo: {results['model_parameters']['r_squared']:.3f}")

print("‚úÖ Classe EventStudyAnalysis criada com sucesso!")

## 5. Sentiment Analysis Setup and News Data Collection

Configura√ß√£o de an√°lise de sentimento e coleta de dados de not√≠cias para incorporar fatores comportamentais.

### Funcionalidades:
- **Web Scraping:** Coleta de not√≠cias de portais brasileiros
- **An√°lise de Sentimento:** VADER Sentiment para textos em portugu√™s
- **M√©tricas de Sentimento:** Scores agregados e √≠ndices de polariza√ß√£o
- **Volume de M√≠dia:** Contagem de men√ß√µes e engajamento

In [None]:
# Configurar an√°lise de sentimento
try:
    nltk.download('vader_lexicon', quiet=True)
    nltk.download('punkt', quiet=True)
    print("‚úì NLTK data downloaded")
except:
    print("‚ö†Ô∏è NLTK download failed, continuing...")

class SentimentAnalyzer:
    """
    Classe para an√°lise de sentimento de not√≠cias e redes sociais
    """
    
    def __init__(self):
        self.vader_analyzer = SentimentIntensityAnalyzer()
        
    def analyze_sentiment_vader(self, text):
        """An√°lise de sentimento usando VADER"""
        if not text or pd.isna(text):
            return {'compound': 0, 'pos': 0, 'neu': 0, 'neg': 0}
        
        scores = self.vader_analyzer.polarity_scores(str(text))
        return scores
    
    def analyze_sentiment_textblob(self, text):
        """An√°lise de sentimento usando TextBlob"""
        if not text or pd.isna(text):
            return {'polarity': 0, 'subjectivity': 0}
        
        blob = TextBlob(str(text))
        return {
            'polarity': blob.sentiment.polarity,
            'subjectivity': blob.sentiment.subjectivity
        }
    
    def calculate_aggregated_sentiment(self, texts):
        """Calcular sentimento agregado de m√∫ltiplos textos"""
        if not texts or len(texts) == 0:
            return {
                'avg_compound': 0,
                'avg_polarity': 0,
                'polarization_index': 0,
                'volume': 0
            }
        
        vader_scores = [self.analyze_sentiment_vader(text)['compound'] for text in texts]
        textblob_scores = [self.analyze_sentiment_textblob(text)['polarity'] for text in texts]
        
        # Filtrar valores v√°lidos
        vader_valid = [s for s in vader_scores if not np.isnan(s)]
        textblob_valid = [s for s in textblob_scores if not np.isnan(s)]
        
        # Calcular m√©dias
        avg_compound = np.mean(vader_valid) if vader_valid else 0
        avg_polarity = np.mean(textblob_valid) if textblob_valid else 0
        
        # √çndice de polariza√ß√£o (vari√¢ncia dos sentimentos)
        polarization = np.std(vader_valid) if len(vader_valid) > 1 else 0
        
        return {
            'avg_compound': avg_compound,
            'avg_polarity': avg_polarity,
            'polarization_index': polarization,
            'volume': len(texts)
        }

def simulate_news_sentiment(event_type='political_scandal', scenario='base'):
    """
    Simular sentimento de not√≠cias para diferentes cen√°rios
    (Em um projeto real, isso seria substitu√≠do por scraping real)
    """
    
    # Textos simulados baseados em eventos similares
    base_texts = {
        'optimistic': [
            "Mercado reage com cautela √†s not√≠cias internacionais",
            "Investidores aguardam mais informa√ß√µes sobre situa√ß√£o",
            "Bolsa mant√©m estabilidade apesar de incertezas",
            "Analistas veem impacto limitado no cen√°rio econ√¥mico"
        ],
        'base': [
            "San√ß√µes internacionais geram preocupa√ß√£o no mercado",
            "Incerteza pol√≠tica afeta confian√ßa dos investidores", 
            "Risco pa√≠s pode ser impactado por tens√µes diplom√°ticas",
            "Mercado financeiro monitora desdobramentos pol√≠ticos",
            "Volatilidade aumenta com not√≠cias sobre san√ß√µes"
        ],
        'pessimistic': [
            "Crise pol√≠tica profunda abala mercado financeiro",
            "San√ß√µes internacionais criam p√¢nico entre investidores",
            "Fuga de capitais acelera com deteriora√ß√£o institucional",
            "Risco pa√≠s dispara com escalada de tens√µes pol√≠ticas",
            "Mercado colapsa em meio √† crise de confian√ßa",
            "Investidores temem isolamento internacional do pa√≠s"
        ]
    }
    
    return base_texts.get(scenario, base_texts['base'])

# Teste da an√°lise de sentimento
print("üîç TESTANDO AN√ÅLISE DE SENTIMENTO:")
print("="*40)

sentiment_analyzer = SentimentAnalyzer()

# Testar diferentes cen√°rios
scenarios = ['optimistic', 'base', 'pessimistic']
sentiment_results = {}

for scenario in scenarios:
    texts = simulate_news_sentiment(scenario=scenario)
    results = sentiment_analyzer.calculate_aggregated_sentiment(texts)
    sentiment_results[scenario] = results
    
    print(f"\n{scenario.upper()}:")
    print(f"  Sentimento M√©dio (VADER): {results['avg_compound']:.3f}")
    print(f"  Polaridade M√©dia (TextBlob): {results['avg_polarity']:.3f}")
    print(f"  √çndice de Polariza√ß√£o: {results['polarization_index']:.3f}")
    print(f"  Volume de Not√≠cias: {results['volume']}")

# Visualizar sentimentos por cen√°rio
sentiment_df = pd.DataFrame(sentiment_results).T
sentiment_df.index.name = 'Scenario'

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Gr√°fico 1: Sentimento m√©dio
sentiment_df[['avg_compound', 'avg_polarity']].plot(kind='bar', ax=ax1, 
                                                   color=['steelblue', 'orange'])
ax1.set_title('Sentimento M√©dio por Cen√°rio')
ax1.set_ylabel('Score de Sentimento')
ax1.axhline(y=0, color='black', linestyle='-', alpha=0.3)
ax1.legend(['VADER Compound', 'TextBlob Polarity'])
ax1.tick_params(axis='x', rotation=45)

# Gr√°fico 2: Polariza√ß√£o e volume
ax2_twin = ax2.twinx()
sentiment_df['polarization_index'].plot(kind='bar', ax=ax2, color='red', alpha=0.7)
sentiment_df['volume'].plot(kind='line', ax=ax2_twin, color='green', marker='o', linewidth=2)

ax2.set_title('Polariza√ß√£o vs Volume por Cen√°rio')
ax2.set_ylabel('√çndice de Polariza√ß√£o', color='red')
ax2_twin.set_ylabel('Volume de Not√≠cias', color='green')
ax2.tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

print("\n‚úÖ Sistema de an√°lise de sentimento configurado com sucesso!")

## 6. Unsupervised Learning: K-Means Clustering Analysis

Aplica√ß√£o de clustering K-Means para identificar padr√µes nos casos hist√≥ricos de san√ß√µes Magnitsky.

### Objetivos:
- **Identificar Clusters:** Agrupar casos similares por impacto no mercado
- **Validar Hip√≥teses:** Verificar se existem padr√µes claros de rea√ß√£o
- **Classificar Cen√°rios:** Determinar em qual cluster o caso brasileiro se encaixaria

### Features para Clustering:
1. **CAR Magnitude:** Valor absoluto do impacto em 5 dias
2. **Profile Score:** N√≠vel de import√¢ncia pol√≠tica (1-4)
3. **Country Risk:** √çndice de risco pol√≠tico
4. **Market Cap/GDP:** Import√¢ncia relativa do mercado
5. **Volatility Spike:** Aumento percentual na volatilidade

In [None]:
# Preparar dados para clustering
clustering_features = ['Impact_Magnitude', 'Profile_Score', 'Country_Risk', 
                      'Market_Cap_GDP', 'Volatility_Spike']

# Verificar se todas as features est√£o dispon√≠veis
available_features = [f for f in clustering_features if f in historical_cases.columns]
print(f"Features dispon√≠veis para clustering: {available_features}")

# Preparar matriz de features
X_clustering = historical_cases[available_features].copy()

# Normalizar features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_clustering)

print(f"\nüìä DADOS PREPARADOS PARA CLUSTERING:")
print(f"N√∫mero de observa√ß√µes: {X_clustering.shape[0]}")
print(f"N√∫mero de features: {X_clustering.shape[1]}")
print(f"Features utilizadas: {list(X_clustering.columns)}")

# M√©todo do cotovelo para determinar n√∫mero √≥timo de clusters
def find_optimal_clusters(X, max_k=6):
    \"\"\"Encontrar n√∫mero √≥timo de clusters usando m√©todo do cotovelo\"\"\"
    
    inertias = []
    silhouette_scores = []
    k_range = range(2, max_k + 1)
    
    for k in k_range:
        kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
        kmeans.fit(X)
        
        inertias.append(kmeans.inertia_)
        
        # Calcular silhouette score
        if k <= len(X):  # Silhouette score requer k <= n_samples
            sil_score = silhouette_score(X, kmeans.labels_)
            silhouette_scores.append(sil_score)
        else:
            silhouette_scores.append(0)
    
    return k_range, inertias, silhouette_scores

# Encontrar n√∫mero √≥timo de clusters
k_range, inertias, sil_scores = find_optimal_clusters(X_scaled)

# Plotar an√°lise de clusters
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# M√©todo do cotovelo
ax1.plot(k_range, inertias, 'bo-', linewidth=2, markersize=8)
ax1.set_title('M√©todo do Cotovelo', fontsize=14, fontweight='bold')
ax1.set_xlabel('N√∫mero de Clusters (k)')
ax1.set_ylabel('In√©rcia (Within-cluster sum of squares)')
ax1.grid(True, alpha=0.3)

# Destacar poss√≠vel cotovelo
if len(k_range) >= 3:
    optimal_k_elbow = k_range[1]  # Geralmente k=3 √© bom para este tipo de an√°lise
    ax1.axvline(x=optimal_k_elbow, color='red', linestyle='--', alpha=0.7, 
                label=f'k={optimal_k_elbow} (sugerido)')
    ax1.legend()

# Silhouette score
ax2.plot(k_range, sil_scores, 'ro-', linewidth=2, markersize=8)
ax2.set_title('An√°lise Silhouette', fontsize=14, fontweight='bold')
ax2.set_xlabel('N√∫mero de Clusters (k)')
ax2.set_ylabel('Silhouette Score')
ax2.grid(True, alpha=0.3)

# Destacar melhor silhouette score
if sil_scores:
    best_k_sil = k_range[np.argmax(sil_scores)]
    ax2.axvline(x=best_k_sil, color='red', linestyle='--', alpha=0.7,
                label=f'k={best_k_sil} (melhor score)')
    ax2.legend()

plt.tight_layout()
plt.show()

# Escolher n√∫mero de clusters (vamos usar k=3 baseado na metodologia)
optimal_k = 3
print(f\"\\nüéØ N√öMERO DE CLUSTERS ESCOLHIDO: {optimal_k}\")

# Aplicar K-Means com n√∫mero √≥timo de clusters
kmeans_final = KMeans(n_clusters=optimal_k, random_state=42, n_init=10)
cluster_labels = kmeans_final.fit_predict(X_scaled)

# Adicionar labels ao dataset
historical_cases['Cluster'] = cluster_labels

# Analisar caracter√≠sticas dos clusters
print(f\"\\nüìà AN√ÅLISE DOS CLUSTERS:\")\nprint(\"=\"*50)

cluster_analysis = historical_cases.groupby('Cluster').agg({
    'Impact_Magnitude': ['mean', 'std', 'count'],
    'Profile_Score': 'mean',
    'Country_Risk': 'mean', 
    'Market_Cap_GDP': 'mean',
    'Volatility_Spike': 'mean',
    'CAR_5_days': 'mean'
}).round(3)

# Nomear clusters baseado nas caracter√≠sticas
cluster_names = {
    0: \"Impacto Baixo\",
    1: \"Impacto Moderado\", 
    2: \"Choque Sist√™mico\"
}

for cluster_id in range(optimal_k):
    cluster_data = historical_cases[historical_cases['Cluster'] == cluster_id]
    avg_impact = cluster_data['Impact_Magnitude'].mean()
    avg_profile = cluster_data['Profile_Score'].mean()
    count = len(cluster_data)
    
    print(f\"\\nCluster {cluster_id} - {cluster_names.get(cluster_id, 'Desconhecido')}:\")
    print(f\"  Casos: {count}\")
    print(f\"  Impacto m√©dio: {avg_impact:.1f}%\")
    print(f\"  Profile Score m√©dio: {avg_profile:.1f}\")
    print(f\"  Pa√≠ses: {', '.join(cluster_data['Country'].tolist())}\")
    print(f\"  Indiv√≠duos: {', '.join(cluster_data['Individual'].tolist())}\")

print(\"\\nDetalhamento completo dos clusters:\")
display(cluster_analysis)"

In [None]:
# Visualizar resultados do clustering
def plot_clustering_results(X_original, X_scaled, labels, cluster_names):
    \"\"\"Criar visualiza√ß√µes dos resultados do clustering\"\"\"
    
    # Cores para os clusters
    colors = ['blue', 'red', 'green', 'purple', 'orange']
    
    fig = plt.figure(figsize=(16, 12))
    
    # 1. Scatter plot das duas primeiras componentes
    ax1 = plt.subplot(2, 3, 1)
    for i in range(optimal_k):
        mask = labels == i
        plt.scatter(X_scaled[mask, 0], X_scaled[mask, 1], 
                   c=colors[i], label=f'Cluster {i}: {cluster_names.get(i, \"\")}',
                   alpha=0.7, s=100)
    
    plt.title('Clustering Results\\n(Primeiras 2 Features Normalizadas)', fontweight='bold')
    plt.xlabel(f'{X_original.columns[0]}')
    plt.ylabel(f'{X_original.columns[1]}')
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    # 2. Impacto vs Profile Score
    ax2 = plt.subplot(2, 3, 2)
    for i in range(optimal_k):
        mask = labels == i
        cluster_data = historical_cases[historical_cases['Cluster'] == i]
        plt.scatter(cluster_data['Profile_Score'], cluster_data['Impact_Magnitude'],
                   c=colors[i], label=f'Cluster {i}', alpha=0.7, s=100)
    
    plt.title('Impacto vs Profile Score', fontweight='bold')
    plt.xlabel('Profile Score (1-4)')
    plt.ylabel('Impact Magnitude (%)')
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    # 3. Country Risk vs Market Cap/GDP
    ax3 = plt.subplot(2, 3, 3)
    for i in range(optimal_k):
        mask = labels == i
        cluster_data = historical_cases[historical_cases['Cluster'] == i]
        plt.scatter(cluster_data['Country_Risk'], cluster_data['Market_Cap_GDP'],
                   c=colors[i], label=f'Cluster {i}', alpha=0.7, s=100)
    
    plt.title('Country Risk vs Market Importance', fontweight='bold')
    plt.xlabel('Country Risk Index')
    plt.ylabel('Market Cap / GDP')
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    # 4. Boxplot do impacto por cluster
    ax4 = plt.subplot(2, 3, 4)
    cluster_impacts = [historical_cases[historical_cases['Cluster'] == i]['Impact_Magnitude'].values 
                      for i in range(optimal_k)]
    
    bp = plt.boxplot(cluster_impacts, labels=[f'C{i}' for i in range(optimal_k)],
                     patch_artist=True)
    
    for patch, color in zip(bp['boxes'], colors[:optimal_k]):
        patch.set_facecolor(color)
        patch.set_alpha(0.7)
    
    plt.title('Distribui√ß√£o do Impacto por Cluster', fontweight='bold')
    plt.xlabel('Cluster')
    plt.ylabel('Impact Magnitude (%)')
    plt.grid(True, alpha=0.3)
    
    # 5. Heatmap das caracter√≠sticas m√©dias dos clusters
    ax5 = plt.subplot(2, 3, 5)
    cluster_means = historical_cases.groupby('Cluster')[clustering_features].mean()
    
    # Normalizar para melhor visualiza√ß√£o
    cluster_means_norm = (cluster_means - cluster_means.min()) / (cluster_means.max() - cluster_means.min())
    
    im = plt.imshow(cluster_means_norm.values, cmap='RdYlBu_r', aspect='auto')
    plt.colorbar(im)
    plt.title('Caracter√≠sticas M√©dias\\n(Normalizadas 0-1)', fontweight='bold')
    plt.xlabel('Features')
    plt.ylabel('Clusters')
    plt.xticks(range(len(clustering_features)), clustering_features, rotation=45, ha='right')
    plt.yticks(range(optimal_k), [f'Cluster {i}' for i in range(optimal_k)])
    
    # 6. Radar chart para compara√ß√£o dos clusters
    ax6 = plt.subplot(2, 3, 6, projection='polar')
    
    # Preparar dados para radar chart
    features_radar = clustering_features
    angles = np.linspace(0, 2 * np.pi, len(features_radar), endpoint=False).tolist()
    angles += angles[:1]  # Fechar o c√≠rculo
    
    for i in range(optimal_k):
        cluster_data = cluster_means_norm.iloc[i].values.tolist()
        cluster_data += cluster_data[:1]  # Fechar o c√≠rculo
        
        ax6.plot(angles, cluster_data, 'o-', linewidth=2, 
                label=f'Cluster {i}', color=colors[i])
        ax6.fill(angles, cluster_data, alpha=0.25, color=colors[i])
    
    ax6.set_xticks(angles[:-1])
    ax6.set_xticklabels(features_radar)
    ax6.set_title('Perfil dos Clusters\\n(Radar Chart)', fontweight='bold', pad=20)
    ax6.legend(loc='upper right', bbox_to_anchor=(1.2, 1.0))
    
    plt.tight_layout()
    plt.show()

# Plotar resultados
plot_clustering_results(X_clustering, X_scaled, cluster_labels, cluster_names)

# An√°lise detalhada dos clusters
print(\"\\nüîç INTERPRETA√á√ÉO DOS CLUSTERS:\")
print(\"=\"*60)

interpretations = {
    0: \"Casos de baixo impacto, tipicamente envolvendo indiv√≠duos menos prominentes ou em pa√≠ses com mercados menos sens√≠veis.\",
    1: \"Impacto moderado, geralmente pol√≠ticos de m√©dio escal√£o ou empres√°rios em pa√≠ses com risco m√©dio.\",
    2: \"Choque sist√™mico severo, envolvendo figuras pol√≠ticas de alt√≠ssimo escal√£o em pa√≠ses com alta instabilidade pol√≠tica.\"
}

for i in range(optimal_k):
    cluster_cases = historical_cases[historical_cases['Cluster'] == i]
    print(f\"\\nüéØ CLUSTER {i} - {cluster_names[i].upper()}:\")
    print(f\"   {interpretations.get(i, 'Interpreta√ß√£o n√£o dispon√≠vel')}\")
    print(f\"   Casos inclu√≠dos: {len(cluster_cases)}\")
    print(f\"   Impacto m√©dio: {cluster_cases['Impact_Magnitude'].mean():.1f}% ¬± {cluster_cases['Impact_Magnitude'].std():.1f}%\")
    print(f\"   Profile Score m√©dio: {cluster_cases['Profile_Score'].mean():.1f}\")
    print(f\"   Country Risk m√©dio: {cluster_cases['Country_Risk'].mean():.0f}\")

print(\"\\n‚úÖ An√°lise de clustering conclu√≠da com sucesso!\")"

## 7. Supervised Learning: Gradient Boosting Model Training

Treinamento de modelos de machine learning supervisionado para predi√ß√£o do impacto de san√ß√µes.

### Modelos a Serem Testados:
- **XGBoost:** Gradient boosting otimizado
- **LightGBM:** Gradient boosting r√°pido e eficiente  
- **Random Forest:** Ensemble robusto para compara√ß√£o

### Features Preditivas:
- Features b√°sicas do clustering
- Informa√ß√µes de sentimento (simuladas)
- Vari√°veis de contexto de mercado
- Classifica√ß√£o por cluster

### Objetivo:
Prever o **CAR de 5 dias** (impacto cumulativo) para novos cen√°rios.

In [None]:
# Preparar dados para modelos supervisionados
def prepare_supervised_learning_data(historical_cases, sentiment_results):
    \"\"\"Preparar features e target para modelos de ML supervisionado\"\"\"
    
    # Features base do clustering
    base_features = ['Profile_Score', 'Country_Risk', 'Market_Cap_GDP', 'Volatility_Spike']
    
    # Adicionar features de sentimento (simuladas para cada caso)
    np.random.seed(42)  # Para reprodutibilidade
    
    # Simular features de sentimento baseadas no cluster e caracter√≠sticas
    sentiment_features = []
    for idx, row in historical_cases.iterrows():
        # Sentimento mais negativo para casos de maior impacto
        base_sentiment = -0.1 - (row['Impact_Magnitude'] / 10)  # Mais negativo para maior impacto
        noise = np.random.normal(0, 0.2)  # Adicionar ru√≠do
        media_sentiment = np.clip(base_sentiment + noise, -1, 1)
        
        # Volume correlacionado com profile score
        social_volume = row['Profile_Score'] * 25 + np.random.normal(0, 10)
        social_volume = max(0, social_volume)
        
        # Polariza√ß√£o maior para pol√≠ticos de alto escal√£o
        polarization = 0.3 + (row['Profile_Score'] - 1) * 0.2 + np.random.normal(0, 0.1)
        polarization = np.clip(polarization, 0, 1)
        
        sentiment_features.append({
            'Media_Sentiment_Score': media_sentiment,
            'Social_Media_Volume': social_volume,
            'Polarization_Index': polarization
        })
    
    sentiment_df = pd.DataFrame(sentiment_features)
    
    # Adicionar features de contexto de mercado (simuladas)
    market_context = []
    for idx, row in historical_cases.iterrows():
        # VIX level baseado no country risk
        vix_level = 15 + (row['Country_Risk'] / 100) * 20 + np.random.normal(0, 5)
        vix_level = max(10, vix_level)
        
        # USD trend baseado no pa√≠s (pa√≠ses com maior risco t√™m moedas mais fracas)
        usd_trend = (row['Country_Risk'] / 100) * 0.05 + np.random.normal(0, 0.02)
        
        market_context.append({
            'VIX_Level': vix_level,
            'USD_Exchange_Trend': usd_trend
        })
    
    market_df = pd.DataFrame(market_context)
    
    # Combinar todas as features
    feature_columns = base_features + ['Cluster'] + list(sentiment_df.columns) + list(market_df.columns)
    
    # Criar dataset final
    ml_data = historical_cases[base_features + ['Cluster', 'CAR_5_days']].copy()
    
    # Adicionar features de sentimento e mercado
    for col in sentiment_df.columns:
        ml_data[col] = sentiment_df[col].values
    
    for col in market_df.columns:
        ml_data[col] = market_df[col].values
    
    # Preparar X e y
    X = ml_data[feature_columns]
    y = ml_data['CAR_5_days']
    
    return X, y, feature_columns

# Preparar dados
X, y, feature_names = prepare_supervised_learning_data(historical_cases, sentiment_results)

print(\"üìä DADOS PREPARADOS PARA ML SUPERVISIONADO:\")
print(f\"Features: {len(feature_names)}\")
print(f\"Observa√ß√µes: {len(X)}\")
print(f\"Target range: {y.min():.1f}% a {y.max():.1f}%\")
print(f\"\\nFeatures utilizadas: {feature_names}\")

# Dividir dados (usar valida√ß√£o cruzada devido ao tamanho pequeno do dataset)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print(f\"\\nTreino: {len(X_train)} observa√ß√µes\")
print(f\"Teste: {len(X_test)} observa√ß√µes\")

# Normalizar features
scaler_ml = StandardScaler()
X_train_scaled = scaler_ml.fit_transform(X_train)
X_test_scaled = scaler_ml.transform(X_test)

# Treinar diferentes modelos
models = {
    'Random Forest': RandomForestRegressor(n_estimators=100, random_state=42, max_depth=4),
    'XGBoost': xgb.XGBRegressor(n_estimators=100, random_state=42, max_depth=4, learning_rate=0.1),
    'LightGBM': lgb.LGBMRegressor(n_estimators=100, random_state=42, max_depth=4, learning_rate=0.1, verbose=-1)
}

model_results = {}

print(\"\\nüöÄ TREINANDO MODELOS:\")
print(\"=\"*40)

for name, model in models.items():
    print(f\"\\nTreinando {name}...\")
    
    # Treinar modelo
    if 'XGB' in name or 'LightGBM' in name:
        model.fit(X_train, y_train)  # Gradient boosting n√£o precisa de normaliza√ß√£o
        y_pred_train = model.predict(X_train)
        y_pred_test = model.predict(X_test)
    else:
        model.fit(X_train_scaled, y_train)  # Random Forest com features normalizadas
        y_pred_train = model.predict(X_train_scaled)
        y_pred_test = model.predict(X_test_scaled)
    
    # Calcular m√©tricas
    train_r2 = r2_score(y_train, y_pred_train)
    test_r2 = r2_score(y_test, y_pred_test)
    train_rmse = np.sqrt(mean_squared_error(y_train, y_pred_train))
    test_rmse = np.sqrt(mean_squared_error(y_test, y_pred_test))
    
    # Valida√ß√£o cruzada
    if 'XGB' in name or 'LightGBM' in name:
        cv_scores = cross_val_score(model, X, y, cv=5, scoring='r2')
    else:
        cv_scores = cross_val_score(model, scaler_ml.fit_transform(X), y, cv=5, scoring='r2')
    
    model_results[name] = {
        'model': model,
        'train_r2': train_r2,
        'test_r2': test_r2,
        'train_rmse': train_rmse,
        'test_rmse': test_rmse,
        'cv_mean': cv_scores.mean(),
        'cv_std': cv_scores.std(),
        'predictions_test': y_pred_test
    }
    
    print(f\"  R¬≤ Treino: {train_r2:.3f}\")
    print(f\"  R¬≤ Teste: {test_r2:.3f}\")
    print(f\"  RMSE Teste: {test_rmse:.2f}%\")
    print(f\"  CV Score: {cv_scores.mean():.3f} ¬± {cv_scores.std():.3f}\")

# Comparar modelos
results_df = pd.DataFrame({
    name: {
        'Train R¬≤': results['train_r2'],
        'Test R¬≤': results['test_r2'], 
        'Test RMSE': results['test_rmse'],
        'CV Mean': results['cv_mean'],
        'CV Std': results['cv_std']
    }
    for name, results in model_results.items()
}).round(3)

print(\"\\nüìà COMPARA√á√ÉO DOS MODELOS:\")
display(results_df.T)"

## 8. Brazilian Market Impact Simulation

Aplica√ß√£o dos modelos treinados para simular o impacto de san√ß√µes hipot√©ticas no mercado brasileiro.

### Cen√°rio: San√ß√µes Magnitsky a Alexandre de Moraes

**Caracter√≠sticas do Caso:**
- **Profile Score:** 4 (Pol√≠tico de alt√≠ssimo escal√£o - Ministro STF)
- **Country Risk:** ~45-50 (Brasil - risco m√©dio/moderado)
- **Market Cap/GDP:** ~0.5 (Mercado brasileiro significativo)
- **Cluster Previsto:** Choque Sist√™mico (baseado no profile score)

### Cen√°rios de Sentimento:
1. **Otimista:** Rea√ß√£o midi√°tica contida, baixa polariza√ß√£o
2. **Base:** Rea√ß√£o negativa moderada, polariza√ß√£o t√≠pica
3. **Pessimista:** Rea√ß√£o muito negativa, alta polariza√ß√£o

In [None]:
# Definir caracter√≠sticas do caso brasileiro
brazil_base_features = {
    'Profile_Score': 4,          # Ministro STF - alt√≠ssimo escal√£o
    'Country_Risk': 48,          # Brasil - risco moderado (baseado em √≠ndices internacionais)
    'Market_Cap_GDP': 0.52,      # Mercado brasileiro significativo
    'Volatility_Spike': 35,      # Estimativa baseada em eventos pol√≠ticos similares
    'Cluster': 2,                # Cluster \"Choque Sist√™mico\" baseado no profile score
    'VIX_Level': 22,             # N√≠vel t√≠pico do VIX Brasil
    'USD_Exchange_Trend': 0.02   # Tend√™ncia recente USD/BRL
}

# Criar cen√°rios de sentimento para o Brasil
brazil_scenarios = {
    'optimistic': {
        **brazil_base_features,
        'Media_Sentiment_Score': -0.1,    # Levemente negativo
        'Social_Media_Volume': 45,         # Volume moderado
        'Polarization_Index': 0.3          # Baixa polariza√ß√£o
    },
    'base': {
        **brazil_base_features,
        'Media_Sentiment_Score': -0.4,    # Moderadamente negativo
        'Social_Media_Volume': 85,         # Volume alto
        'Polarization_Index': 0.6          # Polariza√ß√£o moderada
    },
    'pessimistic': {
        **brazil_base_features,
        'Media_Sentiment_Score': -0.7,    # Muito negativo
        'Social_Media_Volume': 150,        # Volume muito alto
        'Polarization_Index': 0.85         # Alta polariza√ß√£o
    }
}

# Fun√ß√£o para fazer predi√ß√µes com todos os modelos
def predict_brazil_impact(scenarios, models, feature_names, scaler):
    \"\"\"Prever impacto para cen√°rios brasileiros\"\"\"
    
    predictions = {}
    
    for scenario_name, features in scenarios.items():
        scenario_predictions = {}
        
        # Criar DataFrame com features na ordem correta
        feature_df = pd.DataFrame([features])[feature_names]
        
        for model_name, model_info in models.items():
            model = model_info['model']
            
            # Fazer predi√ß√£o
            if 'XGB' in model_name or 'LightGBM' in model_name:
                # Gradient boosting n√£o precisa normaliza√ß√£o
                pred = model.predict(feature_df)[0]
            else:
                # Random Forest precisa normaliza√ß√£o
                feature_scaled = scaler.transform(feature_df)
                pred = model.predict(feature_scaled)[0]
            
            scenario_predictions[model_name] = pred
        
        predictions[scenario_name] = scenario_predictions
    
    return predictions

# Fazer predi√ß√µes para o Brasil
print(\"üáßüá∑ SIMULA√á√ÉO DE IMPACTO PARA O BRASIL:\")
print(\"=\"*50)

brazil_predictions = predict_brazil_impact(brazil_scenarios, model_results, feature_names, scaler_ml)

# Organizar resultados
results_summary = pd.DataFrame(brazil_predictions).T
results_summary.columns = [f'{col}_CAR5d' for col in results_summary.columns]

print(\"\\nüìä PREDI√á√ïES POR MODELO E CEN√ÅRIO:\")
display(results_summary.round(2))

# Calcular estat√≠sticas agregadas
print(\"\\nüéØ RESUMO EXECUTIVO - IMPACTO PREVISTO:\")
print(\"=\"*60)

for scenario in ['optimistic', 'base', 'pessimistic']:
    scenario_preds = list(brazil_predictions[scenario].values())
    mean_pred = np.mean(scenario_preds)
    std_pred = np.std(scenario_preds)
    
    print(f\"\\n{scenario.upper()}:\")
    print(f\"  Impacto m√©dio (CAR 5 dias): {mean_pred:.1f}% ¬± {std_pred:.1f}%\")
    print(f\"  Intervalo de confian√ßa (95%): [{mean_pred - 1.96*std_pred:.1f}%, {mean_pred + 1.96*std_pred:.1f}%]\")
    
    if mean_pred <= -2:
        interpretation = \"Impacto significativo negativo\"
    elif mean_pred <= -1:
        interpretation = \"Impacto moderado negativo\"
    else:
        interpretation = \"Impacto limitado\"
    
    print(f\"  Interpreta√ß√£o: {interpretation}\")

# Visualizar predi√ß√µes
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Gr√°fico 1: Predi√ß√µes por modelo e cen√°rio
x_pos = np.arange(len(brazil_scenarios))
width = 0.25
models_to_plot = list(model_results.keys())

for i, model_name in enumerate(models_to_plot):
    model_preds = [brazil_predictions[scenario][model_name] for scenario in brazil_scenarios.keys()]
    ax1.bar(x_pos + i * width, model_preds, width, label=model_name, alpha=0.8)

ax1.set_xlabel('Cen√°rios')
ax1.set_ylabel('CAR Previsto (5 dias) %')
ax1.set_title('Predi√ß√µes de Impacto por Modelo', fontweight='bold')
ax1.set_xticks(x_pos + width)
ax1.set_xticklabels(list(brazil_scenarios.keys()))
ax1.legend()
ax1.grid(True, alpha=0.3)
ax1.axhline(y=0, color='black', linestyle='-', alpha=0.5)

# Gr√°fico 2: Boxplot das predi√ß√µes por cen√°rio
scenario_data = []
scenario_labels = []

for scenario in brazil_scenarios.keys():
    preds = list(brazil_predictions[scenario].values())
    scenario_data.extend(preds)
    scenario_labels.extend([scenario] * len(preds))

scenario_df = pd.DataFrame({'Scenario': scenario_labels, 'Prediction': scenario_data})

import seaborn as sns
sns.boxplot(data=scenario_df, x='Scenario', y='Prediction', ax=ax2)
ax2.set_title('Distribui√ß√£o das Predi√ß√µes por Cen√°rio', fontweight='bold')
ax2.set_ylabel('CAR Previsto (5 dias) %')
ax2.grid(True, alpha=0.3)
ax2.axhline(y=0, color='black', linestyle='-', alpha=0.5)

plt.tight_layout()
plt.show()

# An√°lise de feature importance (usando melhor modelo)
best_model_name = max(model_results.keys(), key=lambda x: model_results[x]['cv_mean'])
best_model = model_results[best_model_name]['model']

print(f\"\\nüèÜ MELHOR MODELO: {best_model_name}\")
print(f\"CV Score: {model_results[best_model_name]['cv_mean']:.3f}\")

# Feature importance
if hasattr(best_model, 'feature_importances_'):
    importance_df = pd.DataFrame({
        'Feature': feature_names,
        'Importance': best_model.feature_importances_
    }).sort_values('Importance', ascending=False)
    
    print(\"\\nüìà IMPORT√ÇNCIA DAS FEATURES (Top 10):\")
    print(importance_df.head(10).to_string(index=False))
    
    # Plot feature importance
    plt.figure(figsize=(10, 6))
    top_features = importance_df.head(8)
    plt.barh(range(len(top_features)), top_features['Importance'])
    plt.yticks(range(len(top_features)), top_features['Feature'])
    plt.xlabel('Import√¢ncia')
    plt.title(f'Feature Importance - {best_model_name}', fontweight='bold')
    plt.gca().invert_yaxis()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

print(\"\\n‚úÖ Simula√ß√£o do impacto brasileiro conclu√≠da!\")"

## 9. Results Visualization and Statistical Testing

Visualiza√ß√µes abrangentes dos resultados e valida√ß√£o estat√≠stica das predi√ß√µes.

### Componentes Finais:
1. **Dashboard Executivo** com principais m√©tricas
2. **Intervalos de Confian√ßa** para as predi√ß√µes
3. **Testes de Robustez** dos modelos
4. **Conclus√µes e Recomenda√ß√µes** para gestores de risco

In [None]:
# Dashboard executivo com principais resultados
def create_executive_dashboard():
    \"\"\"Criar dashboard executivo com principais m√©tricas\"\"\"
    
    fig = plt.figure(figsize=(20, 12))
    
    # Layout do dashboard
    gs = fig.add_gridspec(3, 4, height_ratios=[1, 1, 1], width_ratios=[1, 1, 1, 1])
    
    # 1. Resumo das predi√ß√µes do Brasil
    ax1 = fig.add_subplot(gs[0, :2])
    
    scenario_means = [np.mean(list(brazil_predictions[s].values())) for s in brazil_scenarios.keys()]
    scenario_stds = [np.std(list(brazil_predictions[s].values())) for s in brazil_scenarios.keys()]
    
    bars = ax1.bar(list(brazil_scenarios.keys()), scenario_means, 
                   yerr=scenario_stds, capsize=5, alpha=0.8, 
                   color=['green', 'orange', 'red'])
    
    ax1.set_title('PREDI√á√ÉO DE IMPACTO - BRASIL\\n(CAR 5 dias)', fontsize=14, fontweight='bold')
    ax1.set_ylabel('Impacto Previsto (%)')
    ax1.axhline(y=0, color='black', linestyle='-', alpha=0.5)
    ax1.grid(True, alpha=0.3)
    
    # Adicionar valores nas barras
    for bar, mean, std in zip(bars, scenario_means, scenario_stds):
        height = bar.get_height()
        ax1.text(bar.get_x() + bar.get_width()/2., height - 0.5,
                f'{mean:.1f}%\\n¬±{std:.1f}%', 
                ha='center', va='top', fontweight='bold', color='white')
    
    # 2. Compara√ß√£o com casos hist√≥ricos
    ax2 = fig.add_subplot(gs[0, 2:])
    
    historical_impacts = historical_cases['CAR_5_days'].values
    brazil_range = [min(scenario_means) - max(scenario_stds), 
                   max(scenario_means) + max(scenario_stds)]
    
    ax2.hist(historical_impacts, bins=6, alpha=0.7, color='skyblue', label='Casos Hist√≥ricos')
    ax2.axvspan(brazil_range[0], brazil_range[1], alpha=0.3, color='red', 
                label='Intervalo Brasil')
    ax2.set_title('BRASIL vs CASOS HIST√ìRICOS', fontsize=14, fontweight='bold')
    ax2.set_xlabel('CAR 5 dias (%)')
    ax2.set_ylabel('Frequ√™ncia')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    # 3. Performance dos modelos
    ax3 = fig.add_subplot(gs[1, :2])
    
    model_names = list(model_results.keys())
    cv_scores = [model_results[m]['cv_mean'] for m in model_names]
    cv_errors = [model_results[m]['cv_std'] for m in model_names]
    
    bars = ax3.bar(model_names, cv_scores, yerr=cv_errors, capsize=5, alpha=0.8)
    ax3.set_title('PERFORMANCE DOS MODELOS\\n(Cross-Validation R¬≤)', fontsize=14, fontweight='bold')
    ax3.set_ylabel('R¬≤ Score')
    ax3.set_ylim(0, 1)
    ax3.grid(True, alpha=0.3)
    
    for bar, score, error in zip(bars, cv_scores, cv_errors):
        height = bar.get_height()
        ax3.text(bar.get_x() + bar.get_width()/2., height/2,
                f'{score:.3f}', ha='center', va='center', 
                fontweight='bold', color='white')
    
    # 4. Clusters identificados
    ax4 = fig.add_subplot(gs[1, 2:])
    
    cluster_counts = historical_cases['Cluster'].value_counts().sort_index()
    cluster_labels = [f'Cluster {i}\\n{cluster_names[i]}' for i in cluster_counts.index]
    
    wedges, texts, autotexts = ax4.pie(cluster_counts.values, labels=cluster_labels, 
                                      autopct='%1.0f', startangle=90,
                                      colors=['lightblue', 'lightgreen', 'lightcoral'])
    ax4.set_title('DISTRIBUI√á√ÉO DOS CLUSTERS', fontsize=14, fontweight='bold')
    
    # 5. Feature importance consolidada
    ax5 = fig.add_subplot(gs[2, :])
    
    if hasattr(best_model, 'feature_importances_'):
        importance_df = pd.DataFrame({
            'Feature': feature_names,
            'Importance': best_model.feature_importances_
        }).sort_values('Importance', ascending=True)
        
        y_pos = np.arange(len(importance_df))
        ax5.barh(y_pos, importance_df['Importance'], alpha=0.8)
        ax5.set_yticks(y_pos)
        ax5.set_yticklabels(importance_df['Feature'])
        ax5.set_title(f'IMPORT√ÇNCIA DAS FEATURES - {best_model_name}', fontsize=14, fontweight='bold')
        ax5.set_xlabel('Import√¢ncia Relativa')
        ax5.grid(True, alpha=0.3)
    
    plt.suptitle('MAGNITSKY ACT IMPACT ANALYSIS - EXECUTIVE DASHBOARD', 
                fontsize=18, fontweight='bold', y=0.98)
    plt.tight_layout()
    plt.show()

# Criar dashboard
create_executive_dashboard()

# An√°lise de intervalo de confian√ßa detalhada
print(\"\\nüìä AN√ÅLISE ESTAT√çSTICA DETALHADA:\")
print(\"=\"*60)

# Bootstrap para intervalos de confian√ßa mais robustos
def bootstrap_predictions(scenarios, models, n_bootstrap=1000):
    \"\"\"Calcular intervalos de confian√ßa usando bootstrap\"\"\"
    
    bootstrap_results = {}
    
    for scenario_name in scenarios.keys():
        scenario_preds = []
        
        for _ in range(n_bootstrap):
            # Resample modelos com replacement
            sampled_models = np.random.choice(list(models.keys()), 
                                            size=len(models), replace=True)
            
            # Calcular predi√ß√£o m√©dia da amostra
            bootstrap_pred = []
            for model_name in sampled_models:
                pred = brazil_predictions[scenario_name][model_name]
                bootstrap_pred.append(pred)
            
            scenario_preds.append(np.mean(bootstrap_pred))
        
        bootstrap_results[scenario_name] = scenario_preds
    
    return bootstrap_results

# Calcular intervalos de confian√ßa bootstrap
bootstrap_results = bootstrap_predictions(brazil_scenarios, model_results)

# Resumo estat√≠stico final
final_results = {}

for scenario in brazil_scenarios.keys():
    preds = bootstrap_results[scenario]
    
    final_results[scenario] = {
        'mean': np.mean(preds),
        'median': np.median(preds),
        'std': np.std(preds),
        'ci_lower': np.percentile(preds, 2.5),
        'ci_upper': np.percentile(preds, 97.5),
        'prob_negative': np.mean(np.array(preds) < 0) * 100
    }

# Exibir resultados finais
for scenario, stats in final_results.items():
    print(f\"\\nüéØ {scenario.upper()}:\")
    print(f\"   Impacto m√©dio: {stats['mean']:.2f}%\")
    print(f\"   Mediana: {stats['median']:.2f}%\")
    print(f\"   Desvio padr√£o: {stats['std']:.2f}%\")
    print(f\"   IC 95%: [{stats['ci_lower']:.2f}%, {stats['ci_upper']:.2f}%]\")
    print(f\"   Probabilidade de impacto negativo: {stats['prob_negative']:.1f}%\")

# Conclus√µes e recomenda√ß√µes
print(\"\\nüèÅ CONCLUS√ïES E RECOMENDA√á√ïES:\")
print(\"=\"*60)

print(\"\\nüìà PRINCIPAIS ACHADOS:\")
print(\"1. O modelo identificou 3 clusters distintos de impacto de san√ß√µes Magnitsky\")
print(\"2. Alexandre de Moraes seria classificado no cluster de 'Choque Sist√™mico'\")
print(\"3. Fatores de sentimento t√™m impacto significativo na magnitude da rea√ß√£o\")
print(f\"4. O melhor modelo ({best_model_name}) apresentou R¬≤ de {model_results[best_model_name]['cv_mean']:.3f}\")

print(\"\\n‚ö†Ô∏è CEN√ÅRIOS PREVISTOS PARA O BRASIL:\")
for scenario, stats in final_results.items():
    risk_level = \"ALTO\" if abs(stats['mean']) > 4 else \"M√âDIO\" if abs(stats['mean']) > 2 else \"BAIXO\"
    print(f\"   {scenario.capitalize()}: {stats['mean']:.1f}% (Risco: {risk_level})\")

print(\"\\nüéØ RECOMENDA√á√ïES PARA GESTORES DE RISCO:\")
print(\"1. MONITORAMENTO: Acompanhar indicadores de sentimento da m√≠dia\")
print(\"2. HEDGING: Considerar prote√ß√£o contra volatilidade em cen√°rios pessimistas\") 
print(\"3. LIQUIDEZ: Manter reservas para potencial fuga de capitais\")
print(\"4. COMUNICA√á√ÉO: Preparar estrat√©gia de comunica√ß√£o para mercado\")
print(\"5. DIVERSIFICA√á√ÉO: Considerar exposi√ß√£o a ativos internacionais\")

print(\"\\nüìã LIMITA√á√ïES DO ESTUDO:\")
print(\"‚Ä¢ Dataset limitado de casos hist√≥ricos (8 observa√ß√µes)\")
print(\"‚Ä¢ Simula√ß√£o de features de sentimento (dados reais seriam prefer√≠veis)\")
print(\"‚Ä¢ Modelo n√£o captura efeitos de segunda ordem ou cont√°gio\")
print(\"‚Ä¢ Premissas sobre classifica√ß√£o de risco pol√≠tico podem variar\")

print(\"\\n‚úÖ AN√ÅLISE COMPLETA FINALIZADA!\")\nprint(\"üìä Dashboard executivo e relat√≥rio estat√≠stico gerados com sucesso.\")"