# üöÄ ML Trading System - Interakt√≠vna Explor√°cia D√°t

Tento notebook ti uk√°≈æe **krok-po-kroku** ƒço pipeline rob√≠.

M√¥≈æe≈° si:
- Pozrie≈• ka≈æd√Ω krok pipeline
- Vidie≈• grafy a vizualiz√°cie
- Pohra≈• sa s d√°tami
- Experimentova≈• s r√¥znymi tickermi

---

## üì¶ Import kni≈æn√≠c

In [None]:
# üîß FIX: Pridaj adres√°r projektu do Python path
import sys
import os

# ABSOL√öTNA cesta k projektov√©mu adres√°ru
project_dir = r'C:\Users\milan\Desktop\Git-Projects\ml_trading_system'

# Pridaj do Python path
if project_dir not in sys.path:
    sys.path.insert(0, project_dir)

# Zme≈à working directory na projektov√Ω adres√°r
os.chdir(project_dir)

print(f"‚úÖ Python path nastaven√Ω: {project_dir}")
print(f"‚úÖ Working directory: {os.getcwd()}")
print(f"‚úÖ Moduly data_collector, feature_engineering, data_validator s√∫ teraz dostupn√©!")

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

# Na≈°e moduly
from data_collector import DataCollector
from feature_engineering import FeatureEngineer
from data_validator import generate_data_quality_report

# Nastavenia pre grafy
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline

print("‚úÖ V≈°etky kni≈ænice naƒç√≠tan√©!")

---

## üéØ KROK 1: Stiahnutie D√°t

Stiahneme historick√© d√°ta pre **SPY** (S&P 500 ETF) z Yahoo Finance.

**M√¥≈æe≈° zmeni≈•:**
- `TICKER` na in√Ω ticker (napr. 'QQQ', 'AAPL', 'TSLA')
- `START_DATE` a `END_DATE` na in√© obdobie

In [None]:
# ‚öôÔ∏è KONFIGUR√ÅCIA - M√î≈ΩE≈† ZMENI≈§!
TICKER = 'SPY'
START_DATE = '2020-01-01'
END_DATE = '2024-12-31'

print(f"üìä S≈•ahujem d√°ta pre {TICKER} od {START_DATE} do {END_DATE}...\n")

# Vytvor√≠me DataCollector
collector = DataCollector(TICKER)

# Stiahnutie d√°t
df_raw = collector.download_historical(START_DATE, END_DATE)

print(f"\n‚úÖ Stiahnut√©: {len(df_raw)} riadkov")
print(f"üìÖ Od: {df_raw.index[0].strftime('%Y-%m-%d')}")
print(f"üìÖ Do: {df_raw.index[-1].strftime('%Y-%m-%d')}")

### üëÄ Pozrime sa na surov√© d√°ta

In [None]:
# Prv√Ωch 10 riadkov
print("üìã Prv√Ωch 10 riadkov:")
df_raw.head(10)

In [None]:
# Z√°kladn√© ≈°tatistiky
print("üìä Z√°kladn√© ≈°tatistiky:")
df_raw.describe()

### üìà Graf 1: V√Ωvoj Ceny (Close Price)

In [None]:
plt.figure(figsize=(14, 6))
plt.plot(df_raw.index, df_raw['Close'], linewidth=2, label='Close Price')
plt.title(f'{TICKER} - V√Ωvoj Ceny (2020-2024)', fontsize=16, fontweight='bold')
plt.xlabel('D√°tum', fontsize=12)
plt.ylabel('Cena ($)', fontsize=12)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("üí° Vid√≠≈° COVID crash (marec 2020) a n√°sledn√Ω rast!")

### üìä Graf 2: Volume (Objem obchodovania)

In [None]:
plt.figure(figsize=(14, 5))
plt.bar(df_raw.index, df_raw['Volume'], width=1.0, alpha=0.7, label='Volume')
plt.title(f'{TICKER} - Objem Obchodovania', fontsize=16, fontweight='bold')
plt.xlabel('D√°tum', fontsize=12)
plt.ylabel('Volume', fontsize=12)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("üí° Vysok√Ω volume poƒças COVID crash = veƒæa paniky!")

---

## üîß KROK 2: Vytvorenie Features (pr√≠znakov)

Teraz vytvor√≠me **11 technick√Ωch indik√°torov**:

### Returns (v√Ωnosy):
- `return_1d` - 1-d≈àov√Ω v√Ωnos
- `return_5d` - 5-d≈àov√Ω v√Ωnos
- `return_10d` - 10-d≈àov√Ω v√Ωnos
- `return_20d` - 20-d≈àov√Ω v√Ωnos

### Volatility (volatilita):
- `volatility_20d` - 20-d≈àov√° volatilita
- `volatility_60d` - 60-d≈àov√° volatilita

### Volume:
- `volume_ratio` - pomer voƒçi 20-d≈àov√©mu priemeru

### Price Position:
- `price_position` - poz√≠cia ceny v 20-d≈àovom rozp√§t√≠ (0-1)

### Trend:
- `sma_20` - 20-d≈àov√Ω kƒ∫zav√Ω priemer
- `sma_50` - 50-d≈àov√Ω kƒ∫zav√Ω priemer
- `trend` - 1 ak sma_20 > sma_50 (uptrend)

In [None]:
print("üîß Vytv√°ram features...\n")

engineer = FeatureEngineer()
df_featured = engineer.create_basic_features(df_raw.copy())

print(f"‚úÖ Vytvoren√Ωch {len(engineer.get_feature_names())} features!")
print(f"\nüìã Features:")
for i, feat in enumerate(engineer.get_feature_names(), 1):
    print(f"  {i}. {feat}")

### üëÄ Pozrime sa na features

In [None]:
# Uk√°≈æ prv√Ωch 100 riadkov (aby sme videli aj non-NaN hodnoty)
print("üìã D√°ta s features (riadky 60-70, aby sme videli platn√© hodnoty):")
df_featured[['Close', 'return_1d', 'return_5d', 'volatility_20d', 'volume_ratio', 'trend']].iloc[60:70]

### üìä Graf 3: Returns (v√Ωnosy)

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(16, 10))

# 1-day returns
axes[0, 0].plot(df_featured.index, df_featured['return_1d'], linewidth=1, alpha=0.7)
axes[0, 0].axhline(y=0, color='red', linestyle='--', alpha=0.5)
axes[0, 0].set_title('1-Day Returns', fontsize=14, fontweight='bold')
axes[0, 0].set_ylabel('Return', fontsize=12)
axes[0, 0].grid(True, alpha=0.3)

# 5-day returns
axes[0, 1].plot(df_featured.index, df_featured['return_5d'], linewidth=1, alpha=0.7)
axes[0, 1].axhline(y=0, color='red', linestyle='--', alpha=0.5)
axes[0, 1].set_title('5-Day Returns', fontsize=14, fontweight='bold')
axes[0, 1].set_ylabel('Return', fontsize=12)
axes[0, 1].grid(True, alpha=0.3)

# 10-day returns
axes[1, 0].plot(df_featured.index, df_featured['return_10d'], linewidth=1, alpha=0.7)
axes[1, 0].axhline(y=0, color='red', linestyle='--', alpha=0.5)
axes[1, 0].set_title('10-Day Returns', fontsize=14, fontweight='bold')
axes[1, 0].set_ylabel('Return', fontsize=12)
axes[1, 0].grid(True, alpha=0.3)

# 20-day returns
axes[1, 1].plot(df_featured.index, df_featured['return_20d'], linewidth=1, alpha=0.7)
axes[1, 1].axhline(y=0, color='red', linestyle='--', alpha=0.5)
axes[1, 1].set_title('20-Day Returns', fontsize=14, fontweight='bold')
axes[1, 1].set_ylabel('Return', fontsize=12)
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("üí° ƒå√≠m dlh≈°ie obdobie, t√Ωm v√§ƒç≈°ie v√Ωkyvy (ale menej ƒçastej≈°ie zmeny)")

### üìä Graf 4: Volatilita

In [None]:
plt.figure(figsize=(14, 6))
plt.plot(df_featured.index, df_featured['volatility_20d'], label='20-day Volatility', linewidth=2)
plt.plot(df_featured.index, df_featured['volatility_60d'], label='60-day Volatility', linewidth=2)
plt.title('Volatilita (≈†tandardn√° odch√Ωlka v√Ωnosov)', fontsize=16, fontweight='bold')
plt.xlabel('D√°tum', fontsize=12)
plt.ylabel('Volatilita', fontsize=12)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("üí° Vysok√° volatilita v marci 2020 (COVID) a zaƒçiatkom 2022 (Bear market)")

### üìä Graf 5: Trend Indik√°tory (Moving Averages)

In [None]:
plt.figure(figsize=(14, 7))
plt.plot(df_featured.index, df_featured['Close'], label='Close Price', linewidth=2, alpha=0.7)
plt.plot(df_featured.index, df_featured['sma_20'], label='SMA 20', linewidth=2, linestyle='--')
plt.plot(df_featured.index, df_featured['sma_50'], label='SMA 50', linewidth=2, linestyle='--')
plt.title('Cena a Kƒ∫zav√© Priemery (Moving Averages)', fontsize=16, fontweight='bold')
plt.xlabel('D√°tum', fontsize=12)
plt.ylabel('Cena ($)', fontsize=12)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("üí° Keƒè SMA20 > SMA50 = UPTREND (trend = 1)")
print("üí° Keƒè SMA20 < SMA50 = DOWNTREND (trend = 0)")

### üìä Graf 6: Price Position (Poz√≠cia v rozp√§t√≠)

In [None]:
plt.figure(figsize=(14, 6))
plt.plot(df_featured.index, df_featured['price_position'], linewidth=2)
plt.axhline(y=0.5, color='red', linestyle='--', alpha=0.5, label='Stred (0.5)')
plt.axhline(y=0.8, color='orange', linestyle='--', alpha=0.5, label='Bl√≠zko high (0.8)')
plt.axhline(y=0.2, color='green', linestyle='--', alpha=0.5, label='Bl√≠zko low (0.2)')
plt.title('Price Position (Poz√≠cia ceny v 20-d≈àovom rozp√§t√≠)', fontsize=16, fontweight='bold')
plt.xlabel('D√°tum', fontsize=12)
plt.ylabel('Position (0 = low, 1 = high)', fontsize=12)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("üí° Price position = 1 ‚Üí cena je na 20-d≈àovom maxime")
print("üí° Price position = 0 ‚Üí cena je na 20-d≈àovom minime")

---

## üéØ KROK 3: Vytvorenie Target Variable (Cieƒæov√° premenn√°)

**Target** = 5-d≈àov√Ω v√Ωnos do bud√∫cnosti

‚ö†Ô∏è **CRITICAL:** Target je spr√°vne posunut√Ω dopredu = **NO LOOK-AHEAD BIAS!**

V ƒçase `t` predikujeme v√Ωnos od `t` do `t+5`.

In [None]:
HOLDING_PERIOD = 5

print(f"üéØ Vytv√°ram target variable: {HOLDING_PERIOD}-d≈àov√Ω forward return...\n")

df_featured = engineer.create_target(df_featured, horizon=HOLDING_PERIOD)

print(f"‚úÖ Target vytvoren√Ω!")
print(f"\nüìä Target statistics:")
print(df_featured['target'].describe())

### üîç Overenie: Target je naozaj forward return?

In [None]:
# Vezmime riadok 100 a over√≠me target manu√°lne
test_idx = 100

# Target z modelu
target_model = df_featured['target'].iloc[test_idx]

# Manu√°lny v√Ωpoƒçet: (Close[t+5] / Close[t]) - 1
close_t = df_featured['Close'].iloc[test_idx]
close_t5 = df_featured['Close'].iloc[test_idx + 5]
target_manual = (close_t5 / close_t) - 1

print(f"üîç Overenie na riadku {test_idx}:")
print(f"\n  Close[t={test_idx}]:     ${close_t:.2f}")
print(f"  Close[t+5={test_idx+5}]: ${close_t5:.2f}")
print(f"\n  Target (model):  {target_model:.6f}")
print(f"  Target (manu√°l): {target_manual:.6f}")
print(f"\n  ‚úÖ Zhoduje sa: {np.isclose(target_model, target_manual)}")

print("\nüí° Target je spr√°vne! Predikujeme BUD√öCI v√Ωnos, nie minul√Ω!")

### üìä Graf 7: Distrib√∫cia Target Variable

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(16, 5))

# Histogram
axes[0].hist(df_featured['target'].dropna(), bins=50, edgecolor='black', alpha=0.7)
axes[0].axvline(x=0, color='red', linestyle='--', linewidth=2, label='Zero return')
axes[0].set_title('Distrib√∫cia 5-Day Forward Returns', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Return', fontsize=12)
axes[0].set_ylabel('Frequency', fontsize=12)
axes[0].legend(fontsize=12)
axes[0].grid(True, alpha=0.3)

# Time series
axes[1].plot(df_featured.index, df_featured['target'], linewidth=1, alpha=0.7)
axes[1].axhline(y=0, color='red', linestyle='--', linewidth=2)
axes[1].set_title('5-Day Forward Returns Over Time', fontsize=14, fontweight='bold')
axes[1].set_xlabel('D√°tum', fontsize=12)
axes[1].set_ylabel('Target Return', fontsize=12)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

positive_pct = (df_featured['target'].dropna() > 0).sum() / len(df_featured['target'].dropna()) * 100
print(f"üí° {positive_pct:.1f}% obdob√≠ m√° pozit√≠vny 5-d≈àov√Ω v√Ωnos")

---

## üßπ KROK 4: ƒåistenie D√°t (Odstr√°nenie NaN)

Prv√© riadky maj√∫ NaN hodnoty kv√¥li rolling window features.

Napr√≠klad:
- `volatility_20d` potrebuje 20 dn√≠ hist√≥rie
- `volatility_60d` potrebuje 60 dn√≠ hist√≥rie
- `target` m√° NaN na posledn√Ωch 5 riadkoch (nem√°me bud√∫ce d√°ta)

In [None]:
print("üßπ ƒåistenie d√°t (odstr√°nenie NaN)...\n")

rows_before = len(df_featured)
print(f"  Riadkov pred ƒçisten√≠m: {rows_before}")

# Odstr√°nime NaN v features a target
feature_cols = engineer.get_feature_names()
cols_to_check = feature_cols + ['target']
df_clean = df_featured.dropna(subset=cols_to_check)

rows_after = len(df_clean)
rows_lost = rows_before - rows_after

print(f"  Riadkov po ƒçisten√≠:    {rows_after}")
print(f"  Straten√Ωch:            {rows_lost} ({rows_lost/rows_before*100:.1f}%)")
print(f"\n‚úÖ D√°ta s√∫ ƒçist√©!")

---

## ‚úÇÔ∏è KROK 5: Train/Test Split (Tempor√°lne delenie)

**D√îLE≈ΩIT√â:** V time-series **NIKDY** ne≈°aflujem d√°ta!

Train set = Prv√Ωch 70% (chronologicky)
Test set = Posledn√Ωch 30% (chronologicky)

‚ö†Ô∏è **NO SHUFFLING** - zachov√°vame ƒçasov√∫ postupnos≈•!

In [None]:
TRAIN_SPLIT = 0.7

print(f"‚úÇÔ∏è Del√≠m d√°ta na train/test ({int(TRAIN_SPLIT*100)}/{int((1-TRAIN_SPLIT)*100)})...\n")

# Tempor√°lny split (BEZ shufflovania!)
split_idx = int(len(df_clean) * TRAIN_SPLIT)

train_df = df_clean.iloc[:split_idx].copy()
test_df = df_clean.iloc[split_idx:].copy()

print(f"üìä Train set:")
print(f"  - Riadkov: {len(train_df)} ({len(train_df)/len(df_clean)*100:.1f}%)")
print(f"  - Od: {train_df.index[0].strftime('%Y-%m-%d')}")
print(f"  - Do: {train_df.index[-1].strftime('%Y-%m-%d')}")

print(f"\nüìä Test set:")
print(f"  - Riadkov: {len(test_df)} ({len(test_df)/len(df_clean)*100:.1f}%)")
print(f"  - Od: {test_df.index[0].strftime('%Y-%m-%d')}")
print(f"  - Do: {test_df.index[-1].strftime('%Y-%m-%d')}")

# Overenie, ≈æe sa neprekr√Ωvaj√∫
if train_df.index[-1] < test_df.index[0]:
    print(f"\n‚úÖ Train a test sa neprekr√Ωvaj√∫!")
else:
    print(f"\n‚ùå CHYBA: Train a test sa prekr√Ωvaj√∫!")

### üìä Graf 8: Train/Test Split Vizualiz√°cia

In [None]:
plt.figure(figsize=(14, 6))

# Train data
plt.plot(train_df.index, train_df['Close'], label='Train Data', color='blue', linewidth=2)

# Test data
plt.plot(test_df.index, test_df['Close'], label='Test Data', color='orange', linewidth=2)

# Split line
plt.axvline(x=train_df.index[-1], color='red', linestyle='--', linewidth=2, label='Train/Test Split')

plt.title('Train/Test Split (Tempor√°lne delenie)', fontsize=16, fontweight='bold')
plt.xlabel('D√°tum', fontsize=12)
plt.ylabel('Close Price ($)', fontsize=12)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("üí° Modr√° = d√°ta na tr√©novanie modelu")
print("üí° Oran≈æov√° = d√°ta na testovanie modelu (neviden√©!)")

---

## üìä KROK 6: Anal√Ωza Features (Korel√°cia)

Pozrime sa, ktor√© features najviac koreluj√∫ s targetom.

In [None]:
# Vyberieme len features a target pre train set
train_features = train_df[feature_cols + ['target']]

# Vypoƒç√≠tame korel√°ciu s targetom
correlations = train_features.corr()['target'].drop('target').sort_values(ascending=False)

print("üìä Korel√°cia features s targetom (5-day return):")
print("\n" + "="*50)
for feat, corr in correlations.items():
    bar = '‚ñà' * int(abs(corr) * 50)
    sign = '+' if corr > 0 else '-'
    print(f"{feat:20} {sign} {bar} {corr:.4f}")
print("="*50)

print("\nüí° ƒå√≠m v√§ƒç≈°√≠ absol√∫tny korelaƒçn√Ω koeficient, t√Ωm silnej≈°√≠ vz≈•ah!")

### üìä Graf 9: Correlation Heatmap

In [None]:
# Correlation matrix
corr_matrix = train_features.corr()

plt.figure(figsize=(12, 10))
sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='coolwarm', center=0, 
            square=True, linewidths=1, cbar_kws={"shrink": 0.8})
plt.title('Feature Correlation Matrix', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

print("üí° ƒåerven√° = pozit√≠vna korel√°cia, Modr√° = negat√≠vna korel√°cia")

---

## üéÆ KROK 7: Experimentuj s D√°tami!

Teraz si m√¥≈æe≈° pohra≈• s d√°tami. Tu je niekoƒæko n√°padov:

### üí° N√°pady na experimenty:

1. **Zme≈à ticker** - Sk√∫s QQQ, AAPL, TSLA
2. **Zme≈à holding period** - Sk√∫s 10-d≈àov√Ω alebo 20-d≈àov√Ω target
3. **Porovnaj volatility** - V ktor√Ωch obdobiach bola najvy≈°≈°ia?
4. **Analyzuj trendy** - Koƒæko percent ƒçasu je trh v uptrende?
5. **N√°jdi najlep≈°√≠ de≈à** - Ktor√Ω de≈à mal najvy≈°≈°√≠ 5-day return?

### üéØ Experiment 1: Najlep≈°ie a najhor≈°ie obdobia

In [None]:
# Top 10 najlep≈°√≠ch 5-day returns
print("üöÄ TOP 10 Najlep≈°√≠ch 5-day returns:")
print("="*60)
top_returns = df_clean.nlargest(10, 'target')[['Close', 'target', 'volatility_20d', 'trend']]
print(top_returns)

print("\nüí£ TOP 10 Najhor≈°√≠ch 5-day returns:")
print("="*60)
worst_returns = df_clean.nsmallest(10, 'target')[['Close', 'target', 'volatility_20d', 'trend']]
print(worst_returns)

### üéØ Experiment 2: Uptrend vs Downtrend

In [None]:
# Rozdel√≠me d√°ta podƒæa trendu
uptrend = df_clean[df_clean['trend'] == 1]
downtrend = df_clean[df_clean['trend'] == 0]

print("üìä Anal√Ωza Uptrend vs Downtrend:")
print("\n" + "="*60)
print(f"\nüü¢ UPTREND (SMA20 > SMA50):")
print(f"  - Poƒçet dn√≠: {len(uptrend)} ({len(uptrend)/len(df_clean)*100:.1f}%)")
print(f"  - Priemern√Ω 5-day return: {uptrend['target'].mean():.4f} ({uptrend['target'].mean()*100:.2f}%)")
print(f"  - ≈†tandardn√° odch√Ωlka: {uptrend['target'].std():.4f}")

print(f"\nüî¥ DOWNTREND (SMA20 < SMA50):")
print(f"  - Poƒçet dn√≠: {len(downtrend)} ({len(downtrend)/len(df_clean)*100:.1f}%)")
print(f"  - Priemern√Ω 5-day return: {downtrend['target'].mean():.4f} ({downtrend['target'].mean()*100:.2f}%)")
print(f"  - ≈†tandardn√° odch√Ωlka: {downtrend['target'].std():.4f}")
print("="*60)

# Vizualiz√°cia
fig, axes = plt.subplots(1, 2, figsize=(16, 5))

# Histogram pre uptrend
axes[0].hist(uptrend['target'], bins=30, alpha=0.7, color='green', edgecolor='black')
axes[0].axvline(x=uptrend['target'].mean(), color='red', linestyle='--', linewidth=2, label='Mean')
axes[0].set_title('Uptrend: 5-Day Returns', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Return', fontsize=12)
axes[0].set_ylabel('Frequency', fontsize=12)
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Histogram pre downtrend
axes[1].hist(downtrend['target'], bins=30, alpha=0.7, color='red', edgecolor='black')
axes[1].axvline(x=downtrend['target'].mean(), color='blue', linestyle='--', linewidth=2, label='Mean')
axes[1].set_title('Downtrend: 5-Day Returns', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Return', fontsize=12)
axes[1].set_ylabel('Frequency', fontsize=12)
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüí° V uptrende s√∫ vy≈°≈°ie v√Ωnosy? Alebo naopak?")

### üéØ Experiment 3: Vysok√° vs N√≠zka Volatilita

In [None]:
# Rozdel√≠me podƒæa volatility (nad/pod medi√°nom)
median_vol = df_clean['volatility_20d'].median()

high_vol = df_clean[df_clean['volatility_20d'] > median_vol]
low_vol = df_clean[df_clean['volatility_20d'] <= median_vol]

print(f"üìä Anal√Ωza Vysok√° vs N√≠zka Volatilita (medi√°n = {median_vol:.4f}):")
print("\n" + "="*60)
print(f"\n‚ö° VYSOK√Å VOLATILITA:")
print(f"  - Poƒçet dn√≠: {len(high_vol)}")
print(f"  - Priemern√Ω 5-day return: {high_vol['target'].mean():.4f} ({high_vol['target'].mean()*100:.2f}%)")
print(f"  - ≈†tandardn√° odch√Ωlka: {high_vol['target'].std():.4f}")
print(f"  - Max return: {high_vol['target'].max():.4f}")
print(f"  - Min return: {high_vol['target'].min():.4f}")

print(f"\nüòå N√çZKA VOLATILITA:")
print(f"  - Poƒçet dn√≠: {len(low_vol)}")
print(f"  - Priemern√Ω 5-day return: {low_vol['target'].mean():.4f} ({low_vol['target'].mean()*100:.2f}%)")
print(f"  - ≈†tandardn√° odch√Ωlka: {low_vol['target'].std():.4f}")
print(f"  - Max return: {low_vol['target'].max():.4f}")
print(f"  - Min return: {low_vol['target'].min():.4f}")
print("="*60)

print("\nüí° Vysok√° volatilita = v√§ƒç≈°ie pr√≠le≈æitosti, ale aj v√§ƒç≈°ie rizik√°!")

---

## üíæ KROK 8: Ulo≈æ D√°ta (Ako v Pipeline)

Ulo≈æ√≠me train a test sety do CSV s√∫borov.

In [None]:
print("üíæ Uklad√°m d√°ta...\n")

# Vytvor adres√°re ak neexistuj√∫
import os
os.makedirs('data/notebook_output', exist_ok=True)

# Ulo≈æ train set
train_df.to_csv('data/notebook_output/train_data.csv')
train_df[feature_cols].to_csv('data/notebook_output/train_X.csv')
train_df['target'].to_csv('data/notebook_output/train_y.csv')

# Ulo≈æ test set
test_df.to_csv('data/notebook_output/test_data.csv')
test_df[feature_cols].to_csv('data/notebook_output/test_X.csv')
test_df['target'].to_csv('data/notebook_output/test_y.csv')

print("‚úÖ D√°ta ulo≈æen√© v data/notebook_output/")
print("\nS√∫bory:")
print("  - train_data.csv, train_X.csv, train_y.csv")
print("  - test_data.csv, test_X.csv, test_y.csv")

---

## üéâ HOTOVO!

### üéì ƒåo si sa nauƒçil:

1. ‚úÖ Ako stiahnu≈• finanƒçn√© d√°ta z Yahoo Finance
2. ‚úÖ Ako vytvori≈• technick√© indik√°tory (returns, volatilita, trendy)
3. ‚úÖ Ako vytvori≈• target variable bez look-ahead bias
4. ‚úÖ Ako spr√°vne rozdeli≈• time-series d√°ta (tempor√°lne!)
5. ‚úÖ Ako analyzova≈• korel√°ciu features s targetom
6. ‚úÖ Ako vizualizova≈• finanƒçn√© d√°ta

### üöÄ ƒéal≈°ie kroky:

1. **Experimentuj s r√¥znymi tickermi** - Zme≈à `TICKER` na zaƒçiatku notebooku
2. **Sk√∫s r√¥zne holding periods** - Zme≈à `HOLDING_PERIOD`
3. **Vytvor vlastn√© features** - Pridaj nov√© indik√°tory do `FeatureEngineer`
4. **Zaƒçni s ML modelmi** - Ridge Regression, XGBoost (Phase 2)

---

### üí° U≈æitoƒçn√© tipy:

- Spusti cel√Ω notebook: `Cell ‚Üí Run All`
- Zme≈à parametre na zaƒçiatku a spusti znova
- Sk√∫s r√¥zne vizualiz√°cie - matplotlib a seaborn s√∫ veƒæmi siln√©!

---

**Have fun exploring! üéÆüìä**