
# 📂 Data Loading Examples

_Date generated: 2025-09-04_

This notebook shows **common patterns for loading data** into your hedge fund research environment.

**Sections**
1. Load CSVs (prices, returns, weights, factors)
2. Load from SQL (SQLite example)
3. Load via APIs (Yahoo Finance)
4. Data validation & cleaning examples


## 1) Load from CSV

In [None]:

import os
import pandas as pd

# Example file paths
PATH_PRICES = "data/prices.csv"
PATH_RETURNS = "data/returns.csv"
PATH_WEIGHTS = "data/weights.csv"

# Prices (date, ticker, price)
if os.path.exists(PATH_PRICES):
    prices = pd.read_csv(PATH_PRICES, parse_dates=["date"])
    print(prices.head())

# Returns (date as index, wide tickers)
if os.path.exists(PATH_RETURNS):
    returns = pd.read_csv(PATH_RETURNS, parse_dates=["date"]).set_index("date")
    print(returns.head())

# Weights (ticker, weight)
if os.path.exists(PATH_WEIGHTS):
    weights = pd.read_csv(PATH_WEIGHTS).set_index("ticker")
    print(weights.head())


## 2) Load from SQL (SQLite example)

In [None]:

import sqlite3

# Assume we have a SQLite DB with 'prices' table
DB_PATH = "data/market.db"

if os.path.exists(DB_PATH):
    conn = sqlite3.connect(DB_PATH)
    query = "SELECT date, ticker, price FROM prices WHERE ticker IN ('AAPL','MSFT') LIMIT 10;"
    df_sql = pd.read_sql(query, conn, parse_dates=["date"])
    conn.close()
    print(df_sql)


## 3) Load from Yahoo Finance API

In [None]:

import datetime
import yfinance as yf

tickers = ["AAPL","MSFT","GOOG"]
start = datetime.date(2022,1,1)
end = datetime.date(2023,1,1)

try:
    data = yf.download(tickers, start=start, end=end)["Adj Close"]
    print(data.head())
except Exception as e:
    print("yfinance failed (likely no internet).", e)


## 4) Data Validation & Cleaning

In [None]:

# Handling missing values
if 'returns' in locals():
    returns_clean = returns.fillna(0.0)
    print("Filled NA:", returns_clean.isna().sum().sum())

# Outlier detection
if 'returns' in locals():
    zscore = (returns - returns.mean())/returns.std()
    outliers = (zscore.abs()>5).sum().sum()
    print("Extreme return outliers:", outliers)
