## **0. Data donwload**

**OVERVIEW**
- This notebook downloads daily OHLCV data for the portfolio asset universe (indices, FX, commodities, and bond ETFs) for the period 2019-2024 using **Yahoo Finance (yfinance)**.
- The **US treasury Bill 3M (3-month T-Bill)** is also extracted as a proxy for the risk-free rate for further KPIs calculations such as Sharpe Ratio. It's a worldwide accepted **`risk-free`** benchmark.
- All downloaded, cleaned data will be stored as CSV files under `data/raw/prices`.
- These raw files will be the input for all subsequent steps: returns calculation, portfolio construction, risk metrics and optimization.
- The `auto_adjust=True` argument from `yfinance` was used to choose **Adj_close** as my **Close** price

**SUMMARY RESULTS**
- All assets were downloaded successfully and saved in `data/raw/prices`
- Special characters ('^', '=F', '=X') were removed from **tickers** when defining file names

#### **0.1 Importing libraries**

In [1]:
# Importing necessary libraries
import re
import pandas as pd
import yfinance as yf
from typing import cast
from src.helpers_io import raw_path

#### **0.2 Asset universe**

- **EURUSD=X** – Euro / US Dollar (FX spot pair)
- **GLD** – SPDR Gold Shares (Gold ETF)
- **IEF** – iShares 7–10 Year Treasury Bond ETF
- **SPY** – SPDR S&P 500 ETF Trust (US equity beta)
- **UNG** – United States Natural Gas Fund (Natural Gas ETF)
- **USDJPY=X** – US Dollar / Japanese Yen (FX spot pair)
- **USO** - United States Oil Fund (Crude Oil ETF)
- **^IRX** – 13 Week Treasury Bill (risk-free proxy, non-investable)

In [2]:
# 1. Asset universe
indices = ["SPY"]
fx = ["EURUSD=X", "USDJPY=X"]
commodities = ["GLD", "UNG", "USO"]
bonds = ["IEF"]
rf_rate = ["^IRX"]

all_tickers = indices + fx + commodities + bonds + rf_rate

# 2. Date range
start_date = "2019-01-01"
end_date = "2024-12-31"

#### **0.3 Downloading data from Yahoo Finance**

In [3]:
# 3. Download and save each asset to data/raw
prices_dir = raw_path("prices")
prices_dir.mkdir(parents=True, exist_ok=True)   # Ensures 'prices' folder exists

# Function for advanced ticker cleaning
def cleaning_ticker(ticker: str) -> str:
    # Step 1: Handle Suffixes (remove =X, =F and everything after)
    clean_ticker = ticker.split('=')[0]
    
    # Step 2: Handle Prefixes (remove ^ only from start)
    clean_ticker = clean_ticker.lstrip('^')
    
    # Step 3: Sanitize (remove residual characters like '-' in BTC-USD)
    clean_ticker = re.sub(r'[^a-zA-Z0-9]', '', clean_ticker)

    return clean_ticker

# Downloading data
files_saved = []

for ticker in all_tickers:
    # auto_adust = True -> Adj_close = Close
    # progress = False -> No progress bar
    # multi_level_index = False -> Avoids multi level index in the column names
    data = cast(pd.DataFrame, yf.download(ticker, start=start_date, end=end_date, auto_adjust=True, progress=False, multi_level_index=False))

    # Basic sanity check
    if data.empty:
        print(f"Warning: no data returned for {ticker}")
        continue

    # Reset index to have "Date" as a column for better data manipulation
    data = data.reset_index()

    # Clean filename (removing ^, =X, =F from tickers)
    clean_ticker = cleaning_ticker(ticker)
    filename = f"{clean_ticker}_prices.csv"

    # Full file path inside 'data/raw/prices'
    filepath = prices_dir / filename

    # Save as CSV in data/raw/prices
    data.to_csv(filepath, index=False)

    # Storing files saved
    files_saved.append(filename)

print("All data downloaded successfully! ✅")

# 4. Checking files
downloaded = len(list(prices_dir.glob("*.csv")))
print(f"{downloaded} files saved in {prices_dir}")
display(files_saved)

All data downloaded successfully! ✅
8 files saved in C:\Users\james\Desktop\UK Life\Data Scientist Career Path\My notes (Python, SQL, etc.)\Portfolio of projects\finance-project\data\raw\prices


['SPY_prices.csv',
 'EURUSD_prices.csv',
 'USDJPY_prices.csv',
 'GLD_prices.csv',
 'UNG_prices.csv',
 'USO_prices.csv',
 'IEF_prices.csv',
 'IRX_prices.csv']