# Exploratory Data Analysis (EDA) - KLBN3.SA (Klabin)

## Tech Challenge 04 - Deep Learning e IA

Este notebook apresenta uma analise exploratoria completa dos dados historicos da acao KLBN3.SA (Klabin S.A.) no periodo de 2020 a 2025.

### Objetivos
- Entender a distribuicao e comportamento dos precos
- Analisar padroes temporais e tendencias
- Visualizar indicadores tecnicos (medias moveis)
- Identificar volatilidade e retornos
- Preparar insights para o modelo LSTM

### Dados
- **Ativo**: KLBN3.SA (Klabin S.A. - B3)
- **Periodo**: Janeiro/2020 - Outubro/2025
- **Fonte**: Yahoo Finance via yfinance

---

## Entendendo os Dados

### Sobre a Empresa

| Campo | Valor |
|-------|-------|
| **Ticker** | KLBN3.SA |
| **Empresa** | Klabin S.A. |
| **Setor** | Papel e Celulose |
| **Tipo de Acao** | Ordinaria (ON) - com direito a voto |
| **Bolsa** | B3 (Brasil, Bolsa, Balcao) |
| **Fundacao** | 1899 |

A Klabin e a **maior produtora e exportadora de papeis do Brasil**, atuando em embalagens, papeis e celulose.

---

### Tipo de Dados: Micro vs Macro

| Classificacao | Descricao |
|---------------|-----------|
| **MICROECONOMICOS** | Dados especificos de UMA empresa |
| Frequencia | Diaria (cada pregao da bolsa) |
| Fonte | Yahoo Finance (yfinance) |


---

### Explicacao das Colunas do CSV

| Coluna | Significado | Exemplo |
|--------|-------------|---------|
| **Date** | Data do pregao na bolsa | 2020-01-02 |
| **Open** | Preco de abertura do dia (R$) | 2.85 |
| **High** | Preco maximo atingido no dia (R$) | 2.91 |
| **Low** | Preco minimo atingido no dia (R$) | 2.84 |
| **Close** | Preco de fechamento do dia (R$) | 2.90 |
| **Volume** | Quantidade de acoes negociadas | 388.183 |

> **Nota sobre precos**: Os valores estao **ajustados** (adjusted) para splits e dividendos. Por isso parecem baixos (R$ 2-5). O preco nominal atual e diferente.

---

### Features Engenheiradas (Calculadas)

Alem dos dados brutos, o modelo usa features adicionais calculadas:

| Feature | Como e Calculada | Para que Serve |
|---------|------------------|----------------|
| **SMA_7** | Media movel de 7 dias | Captura tendencia de curto prazo |
| **SMA_21** | Media movel de 21 dias | Captura tendencia de medio prazo |
| **Returns** | Variacao percentual diaria | Mede retorno do investimento |
| **Volatility** | Desvio padrao dos retornos (21 dias) | Mede risco/incerteza |

---

### Estrutura do Arquivo CSV

```
Linha 1: Price, Close, High, Low, Open, Volume    <- Tipo de dado
Linha 2: Ticker, KLBN3.SA, KLBN3.SA, ...          <- Identificador da empresa  
Linha 3: Date, , , , ,                            <- Cabecalho
Linha 4+: Dados diarios                           <- Valores
```

**Importante**: O CSV contem dados de **apenas UMA empresa** (KLBN3.SA)

## 1. Carregamento dos Dados

Utilizando a classe `DataPreprocessor` do modulo de preprocessing existente para carregar e processar os dados.

In [1]:
# Configuração do diretório de trabalho
import os
import sys

# Mudar para a raiz do projeto (se estiver na pasta notebooks)
if os.path.basename(os.getcwd()) == 'notebooks':
    os.chdir('..')
sys.path.insert(0, os.getcwd())

# Imports
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Import do preprocessador
from src.financial.preprocessing import DataPreprocessor

#print(f"Diretório de trabalho: {os.getcwd()}")
print("Imports carregados com sucesso!")

Imports carregados com sucesso!


In [2]:
# Initialize the preprocessor
preprocessor = DataPreprocessor()

# Load raw data (without features)
df_raw = preprocessor.load_data()

# Load data with engineered features
df = preprocessor.load_data()
df = preprocessor.add_features(df)

print(f"Dataset Shape (raw): {df_raw.shape}")
print(f"Dataset Shape (with features): {df.shape}")
print(f"\nDate Range: {df_raw['Date'].min().strftime('%Y-%m-%d')} to {df_raw['Date'].max().strftime('%Y-%m-%d')}")
print(f"Total Trading Days: {len(df_raw)}")

Dataset Shape (raw): (1454, 6)
Dataset Shape (with features): (1433, 10)

Date Range: 2020-01-02 to 2025-10-30
Total Trading Days: 1454


In [3]:
# Display first rows
print("Primeiras 5 linhas do dataset:")
display(df.head())

print("\nUltimas 5 linhas do dataset:")
display(df.tail())

Primeiras 5 linhas do dataset:


Unnamed: 0,Date,Close,High,Low,Open,Volume,SMA_7,SMA_21,Returns,Volatility
0,2020-01-31,3.015236,3.064899,2.958479,2.958479,209220,3.061859,3.074696,0.024096,0.015184
1,2020-02-03,3.05071,3.057804,2.986858,3.057804,257840,3.044629,3.073007,0.011765,0.015422
2,2020-02-04,3.03652,3.064899,2.993952,3.050709,144540,3.028412,3.070642,-0.004651,0.015448
3,2020-02-05,3.022331,3.107467,3.022331,3.064899,181390,3.024358,3.067602,-0.004673,0.015472
4,2020-02-06,3.03652,3.121657,3.022331,3.03652,407990,3.022331,3.067602,0.004695,0.015111



Ultimas 5 linhas do dataset:


Unnamed: 0,Date,Close,High,Low,Open,Volume,SMA_7,SMA_21,Returns,Volatility
1428,2025-10-24,3.558834,3.568692,3.519401,3.539118,316700,3.488418,3.51987,0.011205,0.01133
1429,2025-10-27,3.548976,3.588409,3.529259,3.588409,592800,3.495459,3.515176,-0.00277,0.011304
1430,2025-10-28,3.578551,3.598267,3.529259,3.548976,297900,3.515176,3.515645,0.008333,0.010463
1431,2025-10-29,3.558834,3.608126,3.539118,3.578551,456600,3.529259,3.514707,-0.00551,0.010516
1432,2025-10-30,3.548976,3.578551,3.519401,3.539117,435000,3.543342,3.510482,-0.00277,0.009799


In [4]:
# Data types and info
print("Informacoes do Dataset:")
print("=" * 50)
print(f"\nColunas: {list(df.columns)}")
print(f"\nTipos de dados:")
for col in df.columns:
    print(f"  {col}: {df[col].dtype}")

print(f"\nMemoria utilizada: {df.memory_usage(deep=True).sum() / 1024:.2f} KB")

Informacoes do Dataset:

Colunas: ['Date', 'Close', 'High', 'Low', 'Open', 'Volume', 'SMA_7', 'SMA_21', 'Returns', 'Volatility']

Tipos de dados:
  Date: datetime64[ns]
  Close: float64
  High: float64
  Low: float64
  Open: float64
  Volume: int64
  SMA_7: float64
  SMA_21: float64
  Returns: float64
  Volatility: float64

Memoria utilizada: 112.08 KB


## 2. Analise Estatistica Descritiva

Resumo estatistico das principais variaveis do dataset.

In [5]:
# Comprehensive statistical summary
stats_df = df[['Close', 'High', 'Low', 'Open', 'Volume']].describe()
stats_df.loc['range'] = stats_df.loc['max'] - stats_df.loc['min']
stats_df.loc['cv'] = stats_df.loc['std'] / stats_df.loc['mean'] * 100  # Coefficient of variation

print("Estatisticas Descritivas - Precos e Volume:")
display(stats_df.round(4))

Estatisticas Descritivas - Precos e Volume:


Unnamed: 0,Close,High,Low,Open,Volume
count,1433.0,1433.0,1433.0,1433.0,1433.0
mean,3.6717,3.7306,3.6175,3.678,422396.7
std,0.5219,0.5255,0.518,0.5185,313605.0
min,1.8324,1.9886,1.6832,1.8395,33110.0
25%,3.3665,3.4238,3.3097,3.3701,228250.0
50%,3.7067,3.75,3.6584,3.7067,348600.0
75%,4.0517,4.125,3.996,4.0589,520100.0
max,4.7699,4.8935,4.6739,4.7603,2738890.0
range,2.9375,2.9048,2.9906,2.9208,2705780.0
cv,14.215,14.087,14.3189,14.0975,74.2442


In [6]:
# Statistics for engineered features
feature_stats = df[['Returns', 'Volatility', 'SMA_7', 'SMA_21']].describe()
feature_stats.loc['skewness'] = df[['Returns', 'Volatility', 'SMA_7', 'SMA_21']].skew()
feature_stats.loc['kurtosis'] = df[['Returns', 'Volatility', 'SMA_7', 'SMA_21']].kurtosis()

print("Estatisticas das Features Engenheiradas:")
display(feature_stats.round(6))

Estatisticas das Features Engenheiradas:


Unnamed: 0,Returns,Volatility,SMA_7,SMA_21
count,1433.0,1433.0,1433.0,1433.0
mean,0.000329,0.018428,3.670583,3.668624
std,0.019956,0.007748,0.517328,0.507282
min,-0.124309,0.008416,1.989661,2.199348
25%,-0.010753,0.013268,3.382727,3.350374
50%,0.0,0.017398,3.701075,3.688966
75%,0.010482,0.021218,4.06357,4.062672
max,0.135659,0.065418,4.677996,4.613282
skewness,0.125207,2.956403,-0.449122,-0.44521
kurtosis,4.685833,13.554412,-0.20116,-0.411851


In [7]:
# Check for missing values
missing = df.isnull().sum()
missing_pct = (df.isnull().sum() / len(df) * 100).round(2)

missing_df = pd.DataFrame({
    'Missing Values': missing,
    'Percentage (%)': missing_pct
})

print("Analise de Valores Ausentes:")
display(missing_df)

Analise de Valores Ausentes:


Unnamed: 0,Missing Values,Percentage (%)
Date,0,0.0
Close,0,0.0
High,0,0.0
Low,0,0.0
Open,0,0.0
Volume,0,0.0
SMA_7,0,0.0
SMA_21,0,0.0
Returns,0,0.0
Volatility,0,0.0


## 3. Visualizacao de Series Temporais

### 3.1 Preco de Fechamento com Medias Moveis

Grafico interativo mostrando a evolucao do preco de fechamento ao longo do tempo, com as medias moveis de 7 e 21 dias.

In [8]:
# Create interactive time series chart with moving averages
fig = go.Figure()

# Close price
fig.add_trace(go.Scatter(
    x=df['Date'],
    y=df['Close'],
    mode='lines',
    name='Preco de Fechamento',
    line=dict(color='#1f77b4', width=1.5),
    hovertemplate='Data: %{x}<br>Fechamento: R$ %{y:.2f}<extra></extra>'
))

# SMA 7
fig.add_trace(go.Scatter(
    x=df['Date'],
    y=df['SMA_7'],
    mode='lines',
    name='SMA 7 dias',
    line=dict(color='#ff7f0e', width=1.2, dash='dot'),
    hovertemplate='Data: %{x}<br>SMA 7: R$ %{y:.2f}<extra></extra>'
))

# SMA 21
fig.add_trace(go.Scatter(
    x=df['Date'],
    y=df['SMA_21'],
    mode='lines',
    name='SMA 21 dias',
    line=dict(color='#2ca02c', width=1.2, dash='dash'),
    hovertemplate='Data: %{x}<br>SMA 21: R$ %{y:.2f}<extra></extra>'
))

fig.update_layout(
    title=dict(
        text='KLBN3.SA - Preco de Fechamento e Medias Moveis',
        font=dict(size=18)
    ),
    xaxis_title='Data',
    yaxis_title='Preco (R$)',
    hovermode='x unified',
    legend=dict(
        yanchor='top',
        y=0.99,
        xanchor='left',
        x=0.01,
        bgcolor='rgba(255,255,255,0.8)'
    ),
    template='plotly_white',
    height=500,
    xaxis=dict(
        rangeslider=dict(visible=True),
        type='date'
    )
)

fig.show()

### 3.2 Grafico de Candlestick (OHLC)

Grafico de candlestick mostrando os precos de abertura, maxima, minima e fechamento diarios.

In [9]:
# Create candlestick chart
fig_candle = go.Figure(data=[go.Candlestick(
    x=df['Date'],
    open=df['Open'],
    high=df['High'],
    low=df['Low'],
    close=df['Close'],
    increasing_line_color='#26a69a',  # Green for up
    decreasing_line_color='#ef5350',  # Red for down
    name='KLBN3.SA'
)])

fig_candle.update_layout(
    title=dict(
        text='KLBN3.SA - Candlestick Chart (OHLC)',
        font=dict(size=18)
    ),
    xaxis_title='Data',
    yaxis_title='Preco (R$)',
    template='plotly_white',
    height=600,
    xaxis=dict(
        rangeslider=dict(visible=True),
        rangeselector=dict(
            buttons=list([
                dict(count=1, label='1M', step='month', stepmode='backward'),
                dict(count=3, label='3M', step='month', stepmode='backward'),
                dict(count=6, label='6M', step='month', stepmode='backward'),
                dict(count=1, label='YTD', step='year', stepmode='todate'),
                dict(count=1, label='1Y', step='year', stepmode='backward'),
                dict(step='all', label='Tudo')
            ])
        )
    )
)

fig_candle.show()

### 3.3 Volume Negociado

Analise do volume de negociacao ao longo do periodo.

In [10]:
# Volume chart with color based on price direction
colors = ['#26a69a' if df['Close'].iloc[i] >= df['Open'].iloc[i] 
          else '#ef5350' for i in range(len(df))]

fig_vol = go.Figure()

fig_vol.add_trace(go.Bar(
    x=df['Date'],
    y=df['Volume'],
    marker_color=colors,
    name='Volume',
    hovertemplate='Data: %{x}<br>Volume: %{y:,.0f}<extra></extra>'
))

# Add moving average of volume
vol_ma = df['Volume'].rolling(window=21).mean()
fig_vol.add_trace(go.Scatter(
    x=df['Date'],
    y=vol_ma,
    mode='lines',
    name='MA Volume 21 dias',
    line=dict(color='#ffa726', width=2)
))

fig_vol.update_layout(
    title=dict(
        text='KLBN3.SA - Volume de Negociacao',
        font=dict(size=18)
    ),
    xaxis_title='Data',
    yaxis_title='Volume',
    template='plotly_white',
    height=400,
    showlegend=True,
    legend=dict(yanchor='top', y=0.99, xanchor='right', x=0.99)
)

fig_vol.show()

### 3.4 Preco e Volume Combinados

Visualizacao combinada do preco com o volume para identificar padroes.

In [11]:
# Create subplots with shared x-axis
fig_combined = make_subplots(
    rows=2, cols=1,
    shared_xaxes=True,
    vertical_spacing=0.05,
    row_heights=[0.7, 0.3],
    subplot_titles=('Preco de Fechamento', 'Volume')
)

# Price chart
fig_combined.add_trace(
    go.Scatter(x=df['Date'], y=df['Close'], 
               name='Close', line=dict(color='#1f77b4', width=1.5)),
    row=1, col=1
)

fig_combined.add_trace(
    go.Scatter(x=df['Date'], y=df['SMA_21'], 
               name='SMA 21', line=dict(color='#ff7f0e', width=1, dash='dash')),
    row=1, col=1
)

# Volume chart
fig_combined.add_trace(
    go.Bar(x=df['Date'], y=df['Volume'], 
           name='Volume', marker_color='rgba(100,100,100,0.5)'),
    row=2, col=1
)

fig_combined.update_layout(
    title=dict(text='KLBN3.SA - Preco e Volume', font=dict(size=18)),
    template='plotly_white',
    height=700,
    showlegend=True,
    xaxis2_rangeslider_visible=True
)

fig_combined.update_yaxes(title_text='Preco (R$)', row=1, col=1)
fig_combined.update_yaxes(title_text='Volume', row=2, col=1)

fig_combined.show()

## 4. Analise Estatistica

### 4.1 Distribuicao dos Retornos Diarios

Analise da distribuicao dos retornos para verificar normalidade e identificar outliers.

In [12]:
# Returns distribution histogram with KDE
fig_returns = go.Figure()

# Histogram
fig_returns.add_trace(go.Histogram(
    x=df['Returns'] * 100,  # Convert to percentage
    nbinsx=50,
    name='Retornos',
    marker_color='#1f77b4',
    opacity=0.7,
    histnorm='probability density'
))

# Add normal distribution curve for comparison
returns_pct = df['Returns'].dropna() * 100
x_range = np.linspace(returns_pct.min(), returns_pct.max(), 100)
normal_dist = stats.norm.pdf(x_range, returns_pct.mean(), returns_pct.std())

fig_returns.add_trace(go.Scatter(
    x=x_range,
    y=normal_dist,
    mode='lines',
    name='Distribuicao Normal',
    line=dict(color='#d62728', width=2)
))

fig_returns.update_layout(
    title=dict(
        text='Distribuicao dos Retornos Diarios (%)',
        font=dict(size=18)
    ),
    xaxis_title='Retorno Diario (%)',
    yaxis_title='Densidade',
    template='plotly_white',
    height=450,
    showlegend=True,
    bargap=0.05
)

fig_returns.show()

# Print normality test
sample_size = min(5000, len(df['Returns'].dropna()))
stat, p_value = stats.shapiro(df['Returns'].dropna()[:sample_size])
print(f"\nTeste de Normalidade (Shapiro-Wilk):")
print(f"  Estatistica: {stat:.6f}")
print(f"  P-value: {p_value:.6f}")
print(f"  Normal?: {'Sim' if p_value > 0.05 else 'Nao'} (alpha=0.05)")


Teste de Normalidade (Shapiro-Wilk):
  Estatistica: 0.955921
  P-value: 0.000000
  Normal?: Nao (alpha=0.05)


### 4.2 Boxplots - Distribuicao de Precos

Analise da distribuicao e identificacao de outliers.

In [13]:
# Boxplots for price columns
price_cols = ['Open', 'High', 'Low', 'Close']

fig_box = go.Figure()

for i, col in enumerate(price_cols):
    fig_box.add_trace(go.Box(
        y=df[col],
        name=col,
        boxpoints='outliers',
        marker_color=px.colors.qualitative.Set2[i]
    ))

fig_box.update_layout(
    title=dict(
        text='Distribuicao de Precos (OHLC)',
        font=dict(size=18)
    ),
    yaxis_title='Preco (R$)',
    template='plotly_white',
    height=450,
    showlegend=True
)

fig_box.show()

In [14]:
# Monthly price distribution over time
df['YearMonth'] = df['Date'].dt.to_period('M').astype(str)

# Sample every 3 months for clarity
months_sample = df['YearMonth'].unique()[::3]
df_monthly = df[df['YearMonth'].isin(months_sample)]

fig_monthly_box = px.box(
    df_monthly, 
    x='YearMonth', 
    y='Close',
    title='Distribuicao Mensal do Preco de Fechamento',
    labels={'YearMonth': 'Mes/Ano', 'Close': 'Preco (R$)'}
)

fig_monthly_box.update_layout(
    template='plotly_white',
    height=450,
    xaxis_tickangle=45
)

fig_monthly_box.show()

### 4.3 Analise de Volatilidade

Evolucao da volatilidade (rolling standard deviation dos retornos) ao longo do tempo.

In [15]:
# Volatility time series
fig_vol_ts = go.Figure()

fig_vol_ts.add_trace(go.Scatter(
    x=df['Date'],
    y=df['Volatility'] * 100,  # Convert to percentage
    mode='lines',
    name='Volatilidade 21d',
    line=dict(color='#9467bd', width=1.5),
    fill='tozeroy',
    fillcolor='rgba(148, 103, 189, 0.2)'
))

# Add average volatility line
avg_vol = df['Volatility'].mean() * 100
fig_vol_ts.add_hline(
    y=avg_vol, 
    line_dash='dash', 
    line_color='red',
    annotation_text=f'Media: {avg_vol:.2f}%'
)

fig_vol_ts.update_layout(
    title=dict(
        text='Volatilidade Historica (Rolling 21 dias)',
        font=dict(size=18)
    ),
    xaxis_title='Data',
    yaxis_title='Volatilidade (%)',
    template='plotly_white',
    height=400
)

fig_vol_ts.show()

### 4.4 Analise de Correlacao

Matriz de correlacao entre as variaveis do dataset.

In [16]:
# Correlation matrix
corr_cols = ['Close', 'High', 'Low', 'Open', 'Volume', 'SMA_7', 'SMA_21', 'Returns', 'Volatility']
corr_matrix = df[corr_cols].corr()

# Create heatmap
fig_corr = go.Figure(data=go.Heatmap(
    z=corr_matrix.values,
    x=corr_cols,
    y=corr_cols,
    colorscale='RdBu_r',
    zmid=0,
    text=np.round(corr_matrix.values, 2),
    texttemplate='%{text}',
    textfont=dict(size=10),
    hoverongaps=False
))

fig_corr.update_layout(
    title=dict(
        text='Matriz de Correlacao',
        font=dict(size=18)
    ),
    template='plotly_white',
    height=600,
    width=700
)

fig_corr.show()

In [17]:
# Print key correlations
print("Correlacoes Importantes:")
print("=" * 50)
print(f"\nCorrelacao Close vs Volume: {corr_matrix.loc['Close', 'Volume']:.4f}")
print(f"Correlacao Close vs SMA_7: {corr_matrix.loc['Close', 'SMA_7']:.4f}")
print(f"Correlacao Close vs SMA_21: {corr_matrix.loc['Close', 'SMA_21']:.4f}")
print(f"Correlacao Returns vs Volatility: {corr_matrix.loc['Returns', 'Volatility']:.4f}")
print(f"Correlacao Volume vs Volatility: {corr_matrix.loc['Volume', 'Volatility']:.4f}")

Correlacoes Importantes:

Correlacao Close vs Volume: -0.4421
Correlacao Close vs SMA_7: 0.9829
Correlacao Close vs SMA_21: 0.9400
Correlacao Returns vs Volatility: 0.0062
Correlacao Volume vs Volatility: -0.0070


## 5. Analise Temporal

### 5.1 Performance Anual

Comparacao do desempenho ano a ano.

In [18]:
# Extract year from date
df['Year'] = df['Date'].dt.year

# Annual statistics
annual_stats = df.groupby('Year').agg({
    'Close': ['first', 'last', 'mean', 'std', 'min', 'max'],
    'Volume': 'mean',
    'Returns': 'sum'
}).round(4)

annual_stats.columns = ['Primeiro', 'Ultimo', 'Media', 'Std', 'Minimo', 'Maximo', 'Volume Medio', 'Retorno Acum.']
annual_stats['Retorno Anual (%)'] = ((annual_stats['Ultimo'] / annual_stats['Primeiro']) - 1) * 100

print("Estatisticas Anuais:")
display(annual_stats.round(2))

Estatisticas Anuais:


Unnamed: 0_level_0,Primeiro,Ultimo,Media,Std,Minimo,Maximo,Volume Medio,Retorno Acum.,Retorno Anual (%)
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2020,3.02,3.83,3.15,0.5,1.83,4.12,450153.92,0.37,26.96
2021,3.79,4.05,4.24,0.19,3.79,4.72,285561.78,0.1,7.03
2022,4.12,3.1,3.47,0.51,2.71,4.59,366108.16,-0.21,-24.76
2023,3.1,3.62,3.39,0.32,2.79,3.91,638715.44,0.19,16.98
2024,3.57,4.77,3.93,0.26,3.44,4.77,357969.48,0.3,33.77
2025,4.61,3.55,3.83,0.32,3.44,4.67,441890.0,-0.27,-22.96


In [19]:
# Annual returns bar chart
fig_annual = go.Figure()

colors = ['#26a69a' if x >= 0 else '#ef5350' for x in annual_stats['Retorno Anual (%)']]

fig_annual.add_trace(go.Bar(
    x=annual_stats.index,
    y=annual_stats['Retorno Anual (%)'],
    marker_color=colors,
    text=[f'{x:.1f}%' for x in annual_stats['Retorno Anual (%)']],
    textposition='outside'
))

fig_annual.update_layout(
    title=dict(
        text='Retorno Anual KLBN3.SA',
        font=dict(size=18)
    ),
    xaxis_title='Ano',
    yaxis_title='Retorno (%)',
    template='plotly_white',
    height=400
)

fig_annual.show()

### 5.2 Padroes Intradiarios

Analise do spread diario (High - Low) como indicador de volatilidade intradiaria.

In [20]:
# Calculate daily range (High - Low)
df['Daily_Range'] = df['High'] - df['Low']
df['Daily_Range_Pct'] = (df['Daily_Range'] / df['Close']) * 100

# Daily range over time
fig_range = go.Figure()

fig_range.add_trace(go.Scatter(
    x=df['Date'],
    y=df['Daily_Range_Pct'],
    mode='lines',
    name='Range Diario (%)',
    line=dict(color='#17becf', width=1),
))

# Add rolling average
range_ma = df['Daily_Range_Pct'].rolling(window=21).mean()
fig_range.add_trace(go.Scatter(
    x=df['Date'],
    y=range_ma,
    mode='lines',
    name='MA 21 dias',
    line=dict(color='#d62728', width=2)
))

fig_range.update_layout(
    title=dict(
        text='Range Diario (High-Low) como % do Preco',
        font=dict(size=18)
    ),
    xaxis_title='Data',
    yaxis_title='Range (%)',
    template='plotly_white',
    height=400
)

fig_range.show()

## 6. Resumo e Insights

### Principais Descobertas da Analise Exploratoria

In [21]:
# Generate summary insights
print("=" * 70)
print("RESUMO DA ANALISE EXPLORATORIA - KLBN3.SA")
print("=" * 70)

print(f"\n1. PERIODO ANALISADO")
print(f"   - Inicio: {df['Date'].min().strftime('%Y-%m-%d')}")
print(f"   - Fim: {df['Date'].max().strftime('%Y-%m-%d')}")
print(f"   - Total de pregoes: {len(df)}")

print(f"\n2. ESTATISTICAS DE PRECO")
print(f"   - Preco medio: R$ {df['Close'].mean():.2f}")
print(f"   - Preco minimo: R$ {df['Close'].min():.2f}")
print(f"   - Preco maximo: R$ {df['Close'].max():.2f}")
print(f"   - Desvio padrao: R$ {df['Close'].std():.2f}")

print(f"\n3. RETORNOS")
print(f"   - Retorno medio diario: {df['Returns'].mean()*100:.4f}%")
print(f"   - Retorno acumulado total: {((df['Close'].iloc[-1] / df['Close'].iloc[0]) - 1)*100:.2f}%")
print(f"   - Maior ganho diario: {df['Returns'].max()*100:.2f}%")
print(f"   - Maior perda diaria: {df['Returns'].min()*100:.2f}%")

print(f"\n4. VOLATILIDADE")
print(f"   - Volatilidade media (21d): {df['Volatility'].mean()*100:.4f}%")
print(f"   - Volatilidade maxima: {df['Volatility'].max()*100:.4f}%")

print(f"\n5. VOLUME")
print(f"   - Volume medio diario: {df['Volume'].mean():,.0f}")
print(f"   - Volume maximo: {df['Volume'].max():,.0f}")

print(f"\n6. FEATURES PARA LSTM")
print(f"   - Colunas utilizadas: {preprocessor.feature_columns}")
print(f"   - Janela de entrada: 60 dias")
print(f"   - Horizonte de predicao: 5 dias")

print("\n" + "=" * 70)

RESUMO DA ANALISE EXPLORATORIA - KLBN3.SA

1. PERIODO ANALISADO
   - Inicio: 2020-01-31
   - Fim: 2025-10-30
   - Total de pregoes: 1433

2. ESTATISTICAS DE PRECO
   - Preco medio: R$ 3.67
   - Preco minimo: R$ 1.83
   - Preco maximo: R$ 4.77
   - Desvio padrao: R$ 0.52

3. RETORNOS
   - Retorno medio diario: 0.0329%
   - Retorno acumulado total: 17.70%
   - Maior ganho diario: 13.57%
   - Maior perda diaria: -12.43%

4. VOLATILIDADE
   - Volatilidade media (21d): 1.8428%
   - Volatilidade maxima: 6.5418%

5. VOLUME
   - Volume medio diario: 422,397
   - Volume maximo: 2,738,890

6. FEATURES PARA LSTM
   - Colunas utilizadas: ['Close', 'Volume', 'SMA_7', 'SMA_21', 'Returns', 'Volatility']
   - Janela de entrada: 60 dias
   - Horizonte de predicao: 5 dias



## 7. Proximos Passos

Com base nesta analise exploratoria, os dados estao prontos para:

1. **Treinamento do Modelo LSTM**: Os dados apresentam padroes temporais claros e as features engenheiradas capturam tendencias e volatilidade.

2. **Consideracoes para o Modelo**:
   - A serie temporal nao e estacionaria (tendencia presente)
   - Retornos nao seguem distribuicao normal (caudas pesadas)
   - Periodos de alta volatilidade podem impactar predicoes
   - Correlacao forte entre Close e SMAs indica boa captura de tendencias

3. **Recomendacoes**:
   - Monitorar periodos de alta volatilidade
   - Considerar adicionar mais features tecnicas (RSI, MACD, Bollinger Bands)
   - Avaliar impacto de eventos externos (COVID-19 em 2020)
