# Stock Data Collection and Exploration

This notebook demonstrates how to fetch and explore stock market data using yfinance.

## Learning Objectives:
- Fetch historical stock data for single and multiple tickers
- Understand OHLCV data structure (Open, High, Low, Close, Volume)
- Explore different time intervals and date ranges
- Fetch fundamental data (financial statements, ratios)
- Implement data caching for efficiency
- Perform basic data exploration and visualization

## Table of Contents:
1. [Setup and Imports](#setup)
2. [Fetch Single Stock Data](#single)
3. [Explore Data Structure](#explore)
4. [Fetch Multiple Stocks](#multiple)
5. [Different Time Intervals](#intervals)
6. [Fundamental Data](#fundamental)
7. [Data Caching](#caching)
8. [Basic Visualization](#visualization)

<a id='setup'></a>
## 1. Setup and Imports

In [None]:
import sys
sys.path.append('..')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# Import our custom modules
from src.data.fetcher import StockDataFetcher, get_stock_data, get_multiple_stocks
from src.data.preprocessor import StockDataPreprocessor

# Configure pandas display
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("Imports successful!")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

<a id='single'></a>
## 2. Fetch Single Stock Data

Let's start by fetching data for a single stock - Apple (AAPL).

In [None]:
# Create a fetcher instance
fetcher = StockDataFetcher()

# Fetch Apple stock data for the last 2 years
aapl = fetcher.get_stock_data('AAPL', start='2022-01-01')

print(f"Data shape: {aapl.shape}")
print(f"Date range: {aapl.index[0]} to {aapl.index[-1]}")
print(f"\nColumns: {aapl.columns.tolist()}")

<a id='explore'></a>
## 3. Explore Data Structure

Let's examine the structure of OHLCV data:
- **Open**: Opening price for the period
- **High**: Highest price during the period
- **Low**: Lowest price during the period
- **Close**: Closing price for the period
- **Volume**: Number of shares traded
- **Dividends**: Dividend payments (if any)
- **Stock Splits**: Stock split information (if any)

In [None]:
# Display first few rows
print("First 5 rows:")
display(aapl.head())

# Display last few rows
print("\nLast 5 rows:")
display(aapl.tail())

In [None]:
# Basic statistics
print("Statistical Summary:")
display(aapl.describe())

In [None]:
# Check for missing values
print("Missing values per column:")
print(aapl.isnull().sum())

# Data types
print("\nData types:")
print(aapl.dtypes)

<a id='multiple'></a>
## 4. Fetch Multiple Stocks

Fetch data for multiple stocks simultaneously.

In [None]:
# Define a list of tech stocks
tech_stocks = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META']

# Fetch data for all stocks
stocks_data = fetcher.get_multiple_stocks(tech_stocks, start='2023-01-01')

print(f"Fetched {len(stocks_data)} stocks")
for ticker, df in stocks_data.items():
    print(f"{ticker}: {len(df)} records, Latest price: ${df['Close'].iloc[-1]:.2f}")

In [None]:
# Compare closing prices
plt.figure(figsize=(14, 6))

for ticker, df in stocks_data.items():
    # Normalize to percentage change from start
    normalized = (df['Close'] / df['Close'].iloc[0] - 1) * 100
    plt.plot(normalized.index, normalized, label=ticker, linewidth=2)

plt.title('Tech Stocks Performance Comparison (% Change)', fontsize=16, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Return (%)', fontsize=12)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

<a id='intervals'></a>
## 5. Different Time Intervals

yfinance supports various time intervals:
- `1d` (1 day - default)
- `1wk` (1 week)
- `1mo` (1 month)
- `1h`, `30m`, `15m`, `5m`, `1m` (intraday - limited history)

In [None]:
# Fetch weekly data
aapl_weekly = fetcher.get_stock_data('AAPL', start='2020-01-01', interval='1wk')

print(f"Weekly data shape: {aapl_weekly.shape}")
display(aapl_weekly.tail())

In [None]:
# Fetch monthly data
aapl_monthly = fetcher.get_stock_data('AAPL', start='2015-01-01', interval='1mo')

print(f"Monthly data shape: {aapl_monthly.shape}")
display(aapl_monthly.tail())

In [None]:
# Compare different intervals
fig, axes = plt.subplots(3, 1, figsize=(14, 10))

# Daily
axes[0].plot(aapl.index, aapl['Close'], color='blue', linewidth=1)
axes[0].set_title('Daily Data', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Price ($)')
axes[0].grid(True, alpha=0.3)

# Weekly
axes[1].plot(aapl_weekly.index, aapl_weekly['Close'], color='green', linewidth=1.5)
axes[1].set_title('Weekly Data', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Price ($)')
axes[1].grid(True, alpha=0.3)

# Monthly
axes[2].plot(aapl_monthly.index, aapl_monthly['Close'], color='red', linewidth=2)
axes[2].set_title('Monthly Data', fontsize=12, fontweight='bold')
axes[2].set_ylabel('Price ($)')
axes[2].set_xlabel('Date')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

<a id='fundamental'></a>
## 6. Fundamental Data

Fetch fundamental data including financial statements and company information.

In [None]:
# Get stock information
aapl_info = fetcher.get_stock_info('AAPL')

# Display key information
print("Apple Inc. Key Information:\n")
important_keys = [
    'symbol', 'longName', 'sector', 'industry',
    'marketCap', 'trailingPE', 'forwardPE',
    'dividendYield', 'beta', 'fiftyTwoWeekHigh', 'fiftyTwoWeekLow'
]

for key in important_keys:
    if key in aapl_info:
        value = aapl_info[key]
        if isinstance(value, float):
            print(f"{key:20s}: {value:,.2f}")
        else:
            print(f"{key:20s}: {value}")

In [None]:
# Get fundamental data (financial statements)
aapl_fundamentals = fetcher.get_fundamental_data('AAPL')

print("Available fundamental data:")
for key in aapl_fundamentals.keys():
    print(f"  - {key}")

In [None]:
# Display income statement
print("Income Statement (Annual):")
display(aapl_fundamentals['income_stmt'])

In [None]:
# Display balance sheet
print("Balance Sheet (Annual):")
display(aapl_fundamentals['balance_sheet'])

<a id='caching'></a>
## 7. Data Caching

Our fetcher implements caching to:
- Avoid hitting API rate limits
- Speed up repeated data access
- Enable offline work

In [None]:
# First fetch (downloads from API)
import time

start_time = time.time()
tesla = fetcher.get_stock_data('TSLA', start='2023-01-01')
first_fetch_time = time.time() - start_time

print(f"First fetch time: {first_fetch_time:.2f} seconds")

In [None]:
# Second fetch (loads from cache)
start_time = time.time()
tesla = fetcher.get_stock_data('TSLA', start='2023-01-01')
cached_fetch_time = time.time() - start_time

print(f"Cached fetch time: {cached_fetch_time:.2f} seconds")
print(f"Speedup: {first_fetch_time / cached_fetch_time:.1f}x faster")

In [None]:
# Clear cache for a specific ticker
# fetcher.clear_cache('TSLA')

# Clear all cache
# fetcher.clear_cache()

<a id='visualization'></a>
## 8. Basic Visualization

Create some basic visualizations of the stock data.

In [None]:
# Price and Volume chart
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8), gridspec_kw={'height_ratios': [3, 1]})

# Price
ax1.plot(aapl.index, aapl['Close'], label='Close Price', color='#2E86AB', linewidth=2)
ax1.fill_between(aapl.index, aapl['Low'], aapl['High'], alpha=0.2, color='#2E86AB')
ax1.set_title('AAPL Stock Price', fontsize=16, fontweight='bold')
ax1.set_ylabel('Price ($)', fontsize=12)
ax1.legend(fontsize=10)
ax1.grid(True, alpha=0.3)

# Volume
colors = ['green' if aapl['Close'].iloc[i] >= aapl['Close'].iloc[i-1] 
          else 'red' for i in range(len(aapl))]
ax2.bar(aapl.index, aapl['Volume'], color=colors, alpha=0.5)
ax2.set_ylabel('Volume', fontsize=12)
ax2.set_xlabel('Date', fontsize=12)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Calculate daily returns
aapl['Daily_Return'] = aapl['Close'].pct_change()

# Plot returns distribution
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Histogram
ax1.hist(aapl['Daily_Return'].dropna(), bins=50, edgecolor='black', alpha=0.7)
ax1.axvline(0, color='red', linestyle='--', linewidth=2, label='Zero return')
ax1.set_title('Daily Returns Distribution', fontsize=14, fontweight='bold')
ax1.set_xlabel('Daily Return', fontsize=12)
ax1.set_ylabel('Frequency', fontsize=12)
ax1.legend()
ax1.grid(True, alpha=0.3)

# Time series of returns
ax2.plot(aapl.index, aapl['Daily_Return'], linewidth=1, alpha=0.7)
ax2.axhline(0, color='red', linestyle='--', linewidth=1)
ax2.set_title('Daily Returns Over Time', fontsize=14, fontweight='bold')
ax2.set_xlabel('Date', fontsize=12)
ax2.set_ylabel('Daily Return', fontsize=12)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Statistics
print("\nDaily Returns Statistics:")
print(f"Mean: {aapl['Daily_Return'].mean():.4f}")
print(f"Std Dev: {aapl['Daily_Return'].std():.4f}")
print(f"Min: {aapl['Daily_Return'].min():.4f}")
print(f"Max: {aapl['Daily_Return'].max():.4f}")

## Summary

In this notebook, we learned how to:
1. Fetch historical stock data using yfinance
2. Understand the OHLCV data structure
3. Work with different time intervals
4. Access fundamental data and financial statements
5. Utilize caching for improved performance
6. Create basic visualizations

## Next Steps

In the next notebook (`02_technical_analysis.ipynb`), we'll learn how to:
- Calculate technical indicators (SMA, EMA, RSI, MACD, Bollinger Bands)
- Visualize indicators on price charts
- Detect trading signals
- Compare multiple stocks using technical analysis