# Obtaining Data for Beta Estimation and Portfolio Optimization with Python

This tutorial walks through the process of collecting and preparing financial data for **beta estimation** and **portfolio optimization**. You will obtain data from several sources using Python to ensure consistency and efficiency.

## Objectives
- Download market excess returns and risk-free rates from the Fama/French 3 Factors dataset.
- Collect future expected market returns and bond yields from Damodaran's webpage.
- Download historical stock prices using `yfinance`.
- Ensure data alignment and calculate excess returns.


## Step 1: Obtain Market Excess Returns and Risk-Free Rate

In [None]:
import pandas as pd
import zipfile
import requests
from io import BytesIO

# Download the Fama-French zipped data
url = 'http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors.CSV.zip'
response = requests.get(url)

# Unzip the file and extract the CSV
with zipfile.ZipFile(BytesIO(response.content)) as z:
    z.extractall()
    csv_filename = [name for name in z.namelist() if name.endswith('.csv')][0]

# Load the CSV and filter monthly data
df_ff = pd.read_csv(csv_filename, skiprows=3)
df_ff.columns = ['Date', 'Mkt-RF', 'SMB', 'HML', 'RF']
df_ff = df_ff[df_ff['Date'].apply(lambda x: len(str(x)) == 6)]
df_ff['Date'] = pd.to_datetime(df_ff['Date'], format='%Y%m').dt.strftime('%Y-%m')
df_ff = df_ff.tail(60)  # Select the last 60 months
df_ff.head()

## Step 2: Obtain Future Risk-Free Rate and Market Expected Return

In [None]:
# Download and read Damodaran's ERP data
url = 'http://www.stern.nyu.edu/~adamodar/pc/implprem/ERPbymonth.xlsx'
df_erp = pd.read_excel(url, sheet_name='HistERP')

# Extract the latest risk-free rate and market return
latest_data = df_erp.tail(1)
t_bond_rate = latest_data['T.Bond Rate'].values[0]
expected_return_sp500 = latest_data['ERP (Implied)'].values[0]

t_bond_rate, expected_return_sp500

## Step 3: Obtain Stock Prices

In [None]:
import yfinance as yf

# Define stock ticker, start date, and end date
ticker = 'AAPL'  # Example: Apple
start_date = '2017-08-01'
end_date = '2022-08-31'

# Download stock prices
df_stock = yf.download(ticker, start=start_date, end=end_date, interval='1mo')
df_stock = df_stock[['Adj Close']].reset_index()
df_stock['Date'] = pd.to_datetime(df_stock['Date']).dt.strftime('%Y-%m')
df_stock.head()

### Calculating Monthly Returns

In [None]:
# Calculate monthly returns
df_stock['Return (%)'] = df_stock['Adj Close'].pct_change() * 100
df_stock = df_stock.dropna()
df_stock.head()

## Step 4: Verify Data Alignment

In [None]:
# Verify date alignment
print(f"Fama/French Data Range: {df_ff['Date'].min()} to {df_ff['Date'].max()}")
print(f"Stock Prices Data Range: {df_stock['Date'].min()} to {df_stock['Date'].max()}")

# Ensure alignment
assert df_ff['Date'].max() == df_stock['Date'].max(), 'Data ranges do not align!'
print('Data alignment verified!')

## Step 5: Merge Data and Calculate Excess Returns

In [None]:
# Merge datasets and calculate excess returns
merged_df = pd.merge(df_stock, df_ff, on='Date')
merged_df['Excess Return'] = merged_df['Return (%)'] - merged_df['RF']
merged_df.head()

## Conclusion
This tutorial demonstrated how to collect, align, and process financial data for beta estimation and portfolio optimization. You retrieved market excess returns, risk-free rates, and historical stock prices using Python, ensuring proper alignment. This setup enables you to conduct regression analysis and build optimized portfolios with reliable data.

With this foundation, you're ready to explore financial models, build beta estimations, or construct efficient portfolios.