# **Obtaining Data with Python for Beta Estimation and Portfolio Optimization**
This tutorial guides you through **downloading, inspecting, and processing financial data** to perform **beta estimation and portfolio optimization**. Each step includes code explanations to help you master these techniques.

## **Overview of Python Libraries for Financial Analysis**
1. **`pandas`**: Used for data manipulation and analysis.
2. **`yfinance`**: Fetches stock price data from Yahoo Finance.
3. **`openpyxl`**: Reads and writes Excel files.
4. **`statsmodels`**: Performs statistical analysis like regression.
5. **`matplotlib`**: Plots data for insights and trends.

## **Installing Required Libraries**

In [37]:
# Uncomment and run this code to install necessary libraries
!pip3 install pandas yfinance openpyxl matplotlib statsmodels seaborn pyarrow PyPortfolioOpt

Collecting pyarrow
  Downloading pyarrow-18.0.0-cp312-cp312-macosx_12_0_arm64.whl.metadata (3.3 kB)
Collecting PyPortfolioOpt
  Downloading pyportfolioopt-1.5.5-py3-none-any.whl.metadata (23 kB)
Collecting cvxpy<2.0.0,>=1.1.19 (from PyPortfolioOpt)
  Downloading cvxpy-1.5.3-cp312-cp312-macosx_10_9_universal2.whl.metadata (8.8 kB)
Collecting osqp>=0.6.2 (from cvxpy<2.0.0,>=1.1.19->PyPortfolioOpt)
  Downloading osqp-0.6.7.post3-cp312-cp312-macosx_11_0_arm64.whl.metadata (1.9 kB)
Collecting ecos>=2 (from cvxpy<2.0.0,>=1.1.19->PyPortfolioOpt)
  Downloading ecos-2.0.14.tar.gz (142 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m142.4/142.4 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m MB/s[0m eta [36m0:00:01[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting clarabel>=0.5.0 (from cvxpy<2.0.0,>=1.1.19->PyPortfolioOpt)
 

## **Step 1: Downloading Data Using Bash Commands**

In [None]:
!mkdir financial_data
!curl -o financial_data/ff_data.csv http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors.CSV
!curl -o financial_data/erp_data.xlsx https://pages.stern.nyu.edu/~adamodar/pc/ERPbymonth.xlsx

#### Code Explanation:
- **`mkdir`**: Creates a new directory for storing the data.
- **`curl`**: Downloads the CSV and Excel datasets from their respective URLs.

## **Step 2: Inspecting the Files**

In [None]:
!ls -lh financial_data
!head -n 10 financial_data/ff_data.csv
!tail -n 10 financial_data/ff_data.csv
!file financial_data/erp_data.xlsx

#### Code Explanation:
- **`ls -lh`**: Lists the contents of the directory with readable sizes.
- **`head`** and **`tail`**: Display the first and last few lines of the CSV file.
- **`file`**: Confirms the format of the downloaded Excel file.


## **Step 3a: Loading Data into Python**

In [None]:
import pandas as pd

# Load the Fama/French data
fama_french = pd.read_csv('financial_data/ff_data.csv', skiprows=3, skipfooter=1, engine='python')
fama_french.rename(columns={'Unnamed: 0': 'Date'}, inplace=True)
fama_french = fama_french.dropna()[fama_french.dropna().Date.map(lambda x: int(x)>9999)]
fama_french['Date'] = pd.to_datetime(fama_french['Date'].astype(str), format='%Y%m').dt.strftime('%Y-%m')
fama_french = fama_french[['Date', 'Mkt-RF', 'RF']]
latest_60_months = fama_french.iloc[-60:].copy()
latest_60_months[['Mkt-RF', 'RF']] = latest_60_months[['Mkt-RF','RF']].apply(pd.to_numeric, errors='coerce', axis=1)
latest_60_months.info()

### **Code Explanation:**

1. **`import pandas as pd`**  
   - Imports the `pandas` library for data manipulation and analysis.

2. **`pd.read_csv()`**  
   - Loads the **Fama-French dataset** from a CSV file.  
   - **`skiprows=3`**: Skips the first three rows (usually headers or metadata).  
   - **`skipfooter=1`**: Skips the last row (likely a summary or footer).  
   - **`engine='python'`**: Uses the Python engine for parsing.

3. **`rename()`**  
   - Renames the first column from `Unnamed: 0` to **`Date`** for clarity.  
   - **`inplace=True`**: Modifies the DataFrame in place.

4. **`dropna()` and `Date.map()`**  
   - Removes rows with missing values.  
   - **`Date.map(lambda x: int(x) > 9999)`**: Filters rows to keep only those with valid **year-month** dates.

5. **`pd.to_datetime()` and `strftime()`**  
   - Converts the **`Date`** column to datetime using the **`%Y%m`** format.  
   - **`strftime('%Y-%m')`**: Formats the dates as `YYYY-MM` strings for consistency.

6. **Column Selection**  
   - Selects the **`Date`**, **`Mkt-RF`** (market excess return), and **`RF`** (risk-free rate) columns for further analysis.

7. **`iloc[-60:]` and `copy()`**  
   - Extracts the **last 60 rows** (the latest 60 months of data) and makes a copy to avoid altering the original DataFrame.

8. **`apply(pd.to_numeric, errors='coerce', axis=1)`**  
   - Converts the **`Mkt-RF`** and **`RF`** columns to numeric values.  
   - **`errors='coerce'`**: Replaces any non-numeric values with NaN.

9. **`info()`**  
   - Displays a **summary of the DataFrame**, including the data types of each column and non-null value counts, to ensure everything is correctly formatted.

## **Step 3b: Download Historical Stock Prices Using Yahoo Finance**

In [59]:
import yfinance as yf
# Updated list of 20 tickers (replacing TWTR with SHOP)
ticker_list = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'TSLA', 'META', 'NFLX', 'NVDA', 'INTC', 'ORCL',
               'IBM', 'ADBE', 'CSCO', 'QCOM', 'AMD', 'SAP', 'PYPL', 'CRM', 'UBER', 'SHOP']

# Define the date range
start_date = '2017-08-01'
end_date = '2022-08-31'

# Download data for multiple tickers
data = yf.download(
    tickers=ticker_list,
    start=start_date,
    end=end_date,
    interval='1mo',
    group_by='ticker'
)

# Extract 'Adj Close' for each ticker and rename to 'Adjusted Price'
adjusted_data = pd.DataFrame()

for ticker in ticker_list:
    # Select and rename the 'Adj Close' column
    ticker_data = data[ticker][['Adj Close']].rename(columns={'Adj Close': 'Adjusted Price'})
    # Add the ticker symbol as a column
    ticker_data['Ticker'] = ticker
    # Append the data to the main DataFrame
    adjusted_data = pd.concat([adjusted_data, ticker_data])

# Reset index to make it more readable
adjusted_data.reset_index(inplace=True)

# Transpose the data using pivot_table
transposed_data = adjusted_data.pivot_table(values='Adjusted Price', index='Date', columns='Ticker')


[*********************100%***********************]  20 of 20 completed


### **Code Explanation:**

1. **`import yfinance as yf`**  
   - Imports the `yfinance` library for fetching stock market data.

2. **`ticker_list`**  
   - A list containing 20 stock tickers, including companies like **Apple (AAPL)**, **Microsoft (MSFT)**, and **Shopify (SHOP)**. These tickers represent the stocks for which data will be retrieved.

3. **`start_date` and `end_date`**  
   - Define the time period for the stock data:  
     - **Start Date:** August 1, 2017.  
     - **End Date:** August 31, 2022.  
   - This range ensures the download includes 5 years of monthly stock data.

4. **`yf.download()`**  
   - Downloads historical stock data for the tickers in the list.  
   - **`tickers=ticker_list`**: Retrieves data for all 20 tickers.  
   - **`interval='1mo'`**: Fetches monthly data.  
   - **`group_by='ticker'`**: Organizes the data so each ticker's data is kept separate within the same DataFrame.

5. **Creating an Empty DataFrame**  
   - Initializes an empty DataFrame called `adjusted_data` to store the cleaned data for all tickers.

6. **Looping Through Tickers**  
   - For each ticker in the list:
     - Extracts the **Adjusted Close** price to account for dividends and stock splits.
     - Renames the **'Adj Close'** column to **'Adjusted Price'** for better clarity.
     - Adds a new column to store the ticker symbol for reference.

7. **Appending Data with `pd.concat()`**  
   - Each ticker’s data is appended to the main DataFrame, combining all the stock data into a single structured DataFrame.

8. **Resetting the Index**  
   - Resets the index to convert the date from the index back to a column for easier manipulation and display.

9. **Using `pivot_table()`**  
   - Transposes the DataFrame, organizing it with **dates as rows** and **tickers as columns**.  
   - This transformation provides a clean view, where each column shows the adjusted prices for a specific ticker over time. 

This final structure makes it easy to analyze the performance of multiple stocks simultaneously, with dates aligned for comparison across the different companies.

## **Step 4: Calculating Stock Returns with `pct_change()`**

In [None]:
returns_data = transposed_data.pct_change() * 100
returns_data = returns_data.rename(columns=lambda x: x + '_return')
returns_data = returns_data.dropna()
returns_data = returns_data.reset_index()
returns_data['Date'] = pd.to_datetime(returns_data['Date'].astype(str), format='%Y-%m-%d %H:%M:%S%z').dt.strftime('%Y-%m')

Ticker,Date,AAPL_return,ADBE_return,AMD_return,AMZN_return,CRM_return,CSCO_return,GOOGL_return,IBM_return,INTC_return,...,MSFT_return,NFLX_return,NVDA_return,ORCL_return,PYPL_return,QCOM_return,SAP_return,SHOP_return,TSLA_return,UBER_return
0,2019-07,7.639464,1.428811,0.263418,-1.417918,1.825597,1.224182,12.504616,7.498187,5.598508,...,1.724378,-12.068501,2.733957,-1.176076,-3.547089,-2.946329,-10.065797,5.907049,8.122253,-9.141875
1,2019-08,-2.018411,-4.801571,3.284072,-4.847382,1.016194,-14.972184,-2.271382,-8.57393,-6.21167,...,1.166808,-9.052909,-0.717161,-7.156758,-1.222825,6.301244,-3.153694,21.237578,-6.622237,-22.710014
2,2019-09,7.703828,-2.90324,-7.821943,-2.273274,-4.888835,5.554355,2.571121,8.561502,9.423992,...,1.18451,-8.895321,4.019026,5.704931,-5.006884,-1.915882,-1.074275,-19.131272,6.763889,-6.447652
3,2019-10,11.068449,0.608142,17.040361,2.34747,5.423065,-3.845395,3.084004,-8.038783,9.703093,...,3.121621,7.394817,15.482272,-0.981281,0.492328,6.290867,12.479852,0.612847,30.742722,3.380376
4,2019-11,7.432887,11.369772,15.384619,1.35873,4.089722,-3.90822,3.59786,0.538421,2.688834,...,5.586971,9.481229,7.820145,3.487612,3.756007,3.866216,2.541865,7.392285,4.769465,-6.031745


### **Code Explanation:**

1. **`transposed_data.pct_change()`**  
   - Calculates the **percentage change** between consecutive rows (monthly returns) for each ticker.  
   - The result shows how much each stock’s price has changed (in percentage) from one month to the next.

2. **`* 100`**  
   - Converts the fractional percentage change values into **actual percentage values** by multiplying by 100.

3. **`rename(columns=lambda x: x + '_return')`**  
   - Renames each column by appending **`_return`** to the original ticker symbol (e.g., `AAPL` becomes `AAPL_return`).  
   - This clearly distinguishes these columns as **return values** rather than raw prices.

4. **`dropna()`**  
   - Removes any **rows with missing values** (NaN). This typically happens if there are no previous data points to compute percentage changes, such as the first row for each ticker.

5. **`reset_index()`**  
   - Resets the DataFrame’s index to make the data easier to read and manipulate.  
   - The original date index becomes a regular column.

6. **`pd.to_datetime()`**  
   - Converts the **'Date'** column into a proper datetime object to ensure consistent date formatting.

7. **`astype(str)`**  
   - Converts the date values into **string format**, so they can be formatted further.

8. **`strftime('%Y-%m')`**  
   - Formats the **Date** column to show only the **year and month** (e.g., `2022-08`), which aligns with the monthly intervals of the data.

This code processes the transposed stock data to calculate and format monthly returns for each stock, ensuring that the data is clean, structured, and ready for further analysis or visualization.

## **Step 5: Aligning Data for Beta Estimation**

In [None]:
# Merge Stock Returns and FF data, Aligning for RF subtraction
stock_ff = pd.merge(returns_data, latest_60_months, left_index=True, right_index=True)

# Subtract 'RF' from each stock's return
rf_adjusted_data = returns_data.iloc[:, 1:].sub(stock_ff['RF'], axis=0)

# Rename columns to indicate the RF adjustment (e.g., 'AAPL-RF')
rf_adjusted_data = rf_adjusted_data.rename(columns=lambda x: f"{x.split('_')[0]}-RF")

# Add the Date column back to the adjusted DataFrame
# rf_adjusted_data.insert(0, 'Date', returns_data['Date']) 

aligned_data = pd.merge(rf_adjusted_data, latest_60_months, how='inner', left_index=True, right_index=True)
#aligned_data.head()

In [88]:
rf_adjusted_data

Ticker,ADBE-RF,AMD-RF,AMZN-RF,CRM-RF,CSCO-RF,GOOGL-RF,IBM-RF,INTC-RF,META-RF,MSFT-RF,NFLX-RF,NVDA-RF,ORCL-RF,PYPL-RF,QCOM-RF,SAP-RF,SHOP-RF,TSLA-RF,UBER-RF
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
2019-07,1.238811,0.073418,-1.607918,1.635597,1.034182,12.314616,7.308187,5.408508,0.447303,1.534378,-12.258501,2.543957,-1.366076,-3.737089,-3.136329,-10.255797,5.717049,7.932253,-9.331875
2019-08,-4.961571,3.124072,-5.007382,0.856194,-15.132184,-2.431382,-8.73393,-6.37167,-4.567145,1.006808,-9.212909,-0.877161,-7.316758,-1.382825,6.141244,-3.313694,21.077578,-6.782237,-22.870014
2019-09,-3.08324,-8.001943,-2.453274,-5.068835,5.374355,2.391121,8.381502,9.243992,-4.267897,1.00451,-9.075321,3.839026,5.524931,-5.186884,-2.095882,-1.254275,-19.311272,6.583889,-6.627652
2019-10,0.448142,16.880361,2.18747,5.263065,-4.005395,2.924004,-8.198783,9.543093,7.460163,2.961621,7.234817,15.322272,-1.141281,0.332328,6.130867,12.319852,0.452847,30.582722,3.220376
2019-11,11.249772,15.264619,1.23873,3.969722,-4.02822,3.47786,0.418421,2.568834,5.092633,5.466971,9.361229,7.700145,3.367612,3.636007,3.746216,2.421865,7.272285,4.649465,-6.151745
2019-12,6.411869,16.999205,2.472169,-0.293482,5.708581,2.566884,0.73252,3.528188,1.650329,4.38939,2.691629,8.503361,-5.768807,0.008131,5.46143,-1.58171,17.923847,26.649715,0.332971
2020-01,6.337365,2.355825,8.57638,11.96419,-4.279295,6.842578,7.099229,6.687057,-1.757282,7.81545,6.520798,0.350244,-1.130351,5.157974,-2.689805,-2.533169,16.993604,55.385986,21.894214
2020-02,-1.834421,-3.354044,-6.341372,-6.652829,-12.620445,-6.647867,-9.568289,-13.275013,-4.795352,-4.948755,6.817321,14.108306,-5.400975,-5.300439,-8.337071,-5.625854,-0.624661,2.557647,-6.788509
2020-03,-7.918597,-0.13,3.372057,-15.634691,-1.68272,-13.368761,-14.007564,-2.17156,-13.467142,-2.518274,1.623245,-2.46728,-2.414661,-11.473643,-13.731545,-10.706994,-10.140577,-21.685707,-17.697166
2020-04,11.123688,15.193491,26.890012,12.480902,7.809721,15.899997,13.188489,10.827784,22.727819,13.632628,11.81092,10.88011,9.600675,28.47295,17.214798,7.276022,51.653752,49.213732,8.416907


### **Code Explanation:**

1. **Merging DataFrames with `pd.merge()`**  
   - **`aligned_data = pd.merge(returns_data, latest_60_months, how='inner')`**  
     - This merges the **`returns_data`** DataFrame (containing stock returns) with **`latest_60_months`**, likely containing relevant dates and the **risk-free rate (RF)**.  
     - **`how='inner'`** ensures that only matching rows (common dates) from both DataFrames are included in the merged result.

2. **Subtracting `RF` from Stock Returns**  
   - **`rf_adjusted_data = returns_data.iloc[:, 1:].sub(aligned_data['RF'], axis=0)`**  
     - This operation subtracts the **RF** value from each stock’s return.  
     - **`returns_data.iloc[:, 1:]`** selects all columns except the first one (usually the Date column).  
     - **`sub(aligned_data['RF'], axis=0)`** performs element-wise subtraction along the **rows (axis=0)** to ensure the correct RF value is subtracted from each stock's return for every date.

3. **Renaming Columns to Indicate RF Adjustment**  
   - **`rf_adjusted_data.rename(columns=lambda x: f"{x.split('_')[0]}-RF")`**  
     - This renames each column by removing the **`_return`** suffix (e.g., `AAPL_return`) and appending **`-RF`** to indicate the adjusted return (e.g., `AAPL-RF`).

4. **Adding the Date Column Back**  
   - **`rf_adjusted_data.insert(0, 'Date', returns_data['Date'])`**  
     - This inserts the **Date** column at the first position (index 0) to retain the original dates in the adjusted DataFrame for reference.

5. **Displaying the First Few Rows**  
   - **`rf_adjusted_data.head()`**  
     - Displays the first few rows of the **RF-adjusted DataFrame** to confirm the results and inspect the structure.

This code calculates the excess returns for each stock (i.e., return minus risk-free rate) and ensures the resulting DataFrame is properly labeled with dates and ticker symbols, ready for further analysis.

## **Step 6: Performing Regression Analysis to Estimate Beta**

In [47]:
import statsmodels.api as sm

# List of stocks based on column names (excluding 'Date' and 'Mkt-RF')
stocks = [col for col in aligned_data.columns if col not in ['Date', 'Mkt-RF']]

# Independent variable: Market excess returns with intercept added
X = aligned_data[['Mkt-RF']]
X = sm.add_constant(X)

# Run OLS regression for each stock
for stock in stocks:
    Y = aligned_data[stock]  # Dependent variable: Stock's excess return
    model = sm.OLS(Y, X).fit()  # Fit the model
    print(f"Regression Results for {stock}:\n")
    print(model.summary())
    print("\n" + "="*80 + "\n")  # Separator for clarity

Regression Results for AAPL-RF:

                            OLS Regression Results                            
Dep. Variable:                AAPL-RF   R-squared:                       0.605
Model:                            OLS   Adj. R-squared:                  0.594
Method:                 Least Squares   F-statistic:                     55.05
Date:                Tue, 29 Oct 2024   Prob (F-statistic):           9.32e-09
Time:                        15:09:35   Log-Likelihood:                -118.98
No. Observations:                  38   AIC:                             242.0
Df Residuals:                      36   BIC:                             245.2
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.28

### **Code Explanation:**

1. **`import statsmodels.api as sm`**  
   - Imports the **`statsmodels`** library, which provides functions for statistical models, including **Ordinary Least Squares (OLS) regression**.

2. **Extract List of Stocks:**  
   - **`stocks = [col for col in aligned_data.columns if col not in ['Date', 'Mkt-RF']]`**  
     - This line creates a list of stock columns by excluding non-stock columns like `'Date'` and `'Mkt-RF'`.  
     - These stock columns represent the dependent variables (excess returns) for the regression.

3. **Set Independent Variable (`X`):**  
   - **`X = aligned_data[['Mkt-RF']]`**  
     - This selects the **market excess returns** as the independent variable for the regression.  
     - **`Mkt-RF`** represents the difference between market returns and the risk-free rate.

   - **`X = sm.add_constant(X)`**  
     - Adds an **intercept (constant)** to the independent variable. This ensures the model can estimate a non-zero intercept in the regression.

4. **Run OLS Regression in a Loop:**  
   - **`for stock in stocks:`**  
     - Iterates over each stock in the list to perform a separate regression for every stock.

5. **Set Dependent Variable (`Y`):**  
   - **`Y = aligned_data[stock]`**  
     - For each iteration, the dependent variable is the **stock’s excess return**, which is regressed against the market excess return.

6. **Fit the OLS Model:**  
   - **`model = sm.OLS(Y, X).fit()`**  
     - Fits an **OLS regression model** with the stock’s excess return as the dependent variable and market excess return as the independent variable.

7. **Print the Regression Results:**  
   - **`print(f"Regression Results for {stock}:\n")`**  
     - Prints the stock name to indicate which stock’s regression results are being displayed.

   - **`print(model.summary())`**  
     - Displays the **regression summary**, which includes the coefficients, R-squared value, p-values, and other statistical details.

8. **Separator for Clarity:**  
   - **`print("\n" + "="*80 + "\n")`**  
     - Prints a line of `=` symbols between each stock’s results to clearly separate them.

---

### **Summary:**

This code runs an **OLS regression for each stock**, using **market excess returns (Mkt-RF)** as the independent variable and the **stock’s excess return** as the dependent variable. The loop allows the program to handle multiple stocks, and the regression results are printed for each one, making it easy to analyze the relationship between the stock’s performance and the market. The intercept and the slope (Beta) are displayed for every stock, helping to understand its volatility and behavior relative to the market.

In [48]:
import statsmodels.api as sm

# Verify column names to ensure correct format
print(aligned_data.columns)

# Extract the list of stock columns, excluding non-stock columns
stocks = [col for col in aligned_data.columns if col not in ['Date', 'Mkt-RF', 'RF']]

# Independent variable: Market excess returns with an intercept
X = aligned_data[['Mkt-RF']]
X = sm.add_constant(X)

# Loop through each stock to run the regression and analyze Beta
for stock in stocks:
    try:
        # Dependent variable: Stock's excess return
        Y = aligned_data[stock]
        
        # Fit the OLS model
        model = sm.OLS(Y, X).fit()
        
        # Extract the Beta (coefficient for 'Mkt-RF')
        beta = model.params['Mkt-RF']
        
        # Print the Beta value and volatility interpretation
        print(f"Regression Results for {stock}:")
        print(f"Beta (Mkt-RF coefficient): {beta:.4f}")
        
        # Check if the stock is more or less volatile than the market
        if beta > 1:
            print(f"{stock} is more volatile than the market (Beta > 1).")
        elif beta < 1:
            print(f"{stock} is less volatile than the market (Beta < 1).")
        else:
            print(f"{stock} has the same volatility as the market (Beta = 1).")
        
        print("\n" + "="*80 + "\n")  # Separator for clarity

    except KeyError as e:
        print(f"Error: {e}. Check if column '{stock}' exists in aligned_data.")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")


Index(['Date', 'AAPL-RF', 'ADBE-RF', 'AMD-RF', 'AMZN-RF', 'CRM-RF', 'CSCO-RF',
       'GOOGL-RF', 'IBM-RF', 'INTC-RF', 'META-RF', 'MSFT-RF', 'NFLX-RF',
       'NVDA-RF', 'ORCL-RF', 'PYPL-RF', 'QCOM-RF', 'SAP-RF', 'SHOP-RF',
       'TSLA-RF', 'UBER-RF', 'Mkt-RF', 'RF'],
      dtype='object')
Regression Results for AAPL-RF:
Beta (Mkt-RF coefficient): 1.2144
AAPL-RF is more volatile than the market (Beta > 1).


Regression Results for ADBE-RF:
Beta (Mkt-RF coefficient): 1.0780
ADBE-RF is more volatile than the market (Beta > 1).


Regression Results for AMD-RF:
Beta (Mkt-RF coefficient): 1.5302
AMD-RF is more volatile than the market (Beta > 1).


Regression Results for AMZN-RF:
Beta (Mkt-RF coefficient): 1.1697
AMZN-RF is more volatile than the market (Beta > 1).


Regression Results for CRM-RF:
Beta (Mkt-RF coefficient): 1.1345
CRM-RF is more volatile than the market (Beta > 1).


Regression Results for CSCO-RF:
Beta (Mkt-RF coefficient): 0.8877
CSCO-RF is less volatile than the market 

### **Code Explanation:**

1. **`import statsmodels.api as sm`**  
   - Imports the **`statsmodels`** library, which provides tools for statistical modeling, including **Ordinary Least Squares (OLS) regression**.

2. **Printing Column Names:**  
   - **`print(aligned_data.columns)`**  
     - Prints all column names from the `aligned_data` DataFrame to ensure they are correctly formatted. This step helps in troubleshooting column name issues.

3. **Extracting Stock Columns:**  
   - **`stocks = [col for col in aligned_data.columns if col not in ['Date', 'Mkt-RF', 'RF']]`**  
     - Creates a list of stock columns by excluding non-stock columns like `'Date'`, `'Mkt-RF'` (market excess returns), and `'RF'` (risk-free rate). These stock columns will be used as dependent variables in the regressions.

4. **Setting the Independent Variable (`X`):**  
   - **`X = aligned_data[['Mkt-RF']]`**  
     - Selects **market excess returns** as the independent variable.  
   - **`X = sm.add_constant(X)`**  
     - Adds an **intercept (constant)** term to the independent variable to ensure the regression model can estimate a non-zero intercept.

5. **Looping Through Stocks to Run Regressions:**  
   - **`for stock in stocks:`**  
     - Iterates over each stock in the `stocks` list to perform a regression for every stock.

6. **Setting the Dependent Variable (`Y`):**  
   - **`Y = aligned_data[stock]`**  
     - For each iteration, the dependent variable is the **stock’s excess return**.

7. **Fitting the OLS Model:**  
   - **`model = sm.OLS(Y, X).fit()`**  
     - Fits an **OLS regression** model with the stock’s excess return as the dependent variable and **market excess returns (`Mkt-RF`)** as the independent variable.

8. **Extracting and Printing the Beta Coefficient:**  
   - **`beta = model.params['Mkt-RF']`**  
     - Extracts the **Beta coefficient** for `Mkt-RF` from the model’s parameters. Beta measures how the stock moves in relation to the market.  

9. **Printing Results and Interpretation:**  
   - **`print(f"Beta (Mkt-RF coefficient): {beta:.4f}")`**  
     - Prints the Beta value rounded to 4 decimal places.
   - **Volatility Check:**  
     - If **Beta > 1**, the stock is **more volatile than the market**.  
     - If **Beta < 1**, the stock is **less volatile than the market**.  
     - If **Beta = 1**, the stock has the **same volatility as the market**.  
   - The volatility interpretation is printed to provide insights.

10. **Error Handling with `try-except`:**  
   - **`KeyError` Handling:**  
     - If a column is missing from `aligned_data`, a **KeyError** is raised, and the code prints an error message with the missing column name.
   - **General Exception Handling:**  
     - If any other unexpected error occurs, it is caught, and an appropriate message is printed.

11. **Separator for Readability:**  
   - **`print("\n" + "="*80 + "\n")`**  
     - Prints a separator between the results of each stock for clarity.

---

### **Summary:**

This code performs **OLS regressions** for multiple stocks, using **market excess returns (`Mkt-RF`)** as the independent variable and each **stock’s excess return** as the dependent variable. The **Beta coefficient** is extracted and interpreted to determine whether the stock is more or less volatile than the market. **Error handling** ensures that any missing columns or unexpected issues are gracefully managed.

## **Step 7: Portfolio Optimization Using PyPortfolioOpt**

In [None]:
#import pandas as pd
from pypfopt import EfficientFrontier, risk_models, expected_returns

# Ensure 'Date' is the index and filter relevant columns (excluding non-stock columns)
aligned_data.set_index('Date', inplace=True)
stock_data = aligned_data.drop(['Mkt-RF', 'RF'], axis=1)
stock_data.dropna(inplace=True, axis=1)




In [55]:
print(stock_data.isna().sum())

AAPL-RF     0
ADBE-RF     0
AMD-RF      0
AMZN-RF     0
CRM-RF      0
CSCO-RF     0
GOOGL-RF    0
IBM-RF      0
INTC-RF     0
META-RF     0
MSFT-RF     0
NFLX-RF     0
NVDA-RF     0
ORCL-RF     0
PYPL-RF     0
QCOM-RF     0
SAP-RF      0
SHOP-RF     0
TSLA-RF     0
UBER-RF     0
dtype: int64


In [51]:
# 1. Calculate Expected Returns and Covariance Matrix
mu = expected_returns.mean_historical_return(stock_data)  # Mean historical returns
S = risk_models.sample_cov(stock_data)  # Sample covariance matrix

In [54]:
mu

AAPL-RF              NaN
ADBE-RF              NaN
AMD-RF               NaN
AMZN-RF     10385.642939
CRM-RF               NaN
CSCO-RF              NaN
GOOGL-RF             NaN
IBM-RF               NaN
INTC-RF              NaN
META-RF     54325.365056
MSFT-RF              NaN
NFLX-RF        -1.000000
NVDA-RF              NaN
ORCL-RF      3380.714327
PYPL-RF              NaN
QCOM-RF      1318.595545
SAP-RF         -0.657715
SHOP-RF              NaN
TSLA-RF              NaN
UBER-RF              NaN
dtype: float64

In [52]:
# 2. Optimize Portfolio Using Mean-Variance Optimization
ef = EfficientFrontier(mu, S)  # Initialize the Efficient Frontier object

# Maximize the Sharpe ratio (risk-adjusted return)
weights = ef.max_sharpe()
cleaned_weights = ef.clean_weights()  # Clean weights (remove small values)
print("Optimized Portfolio Weights:", cleaned_weights)



ERROR in LDL_factor: Error in KKT matrix LDL factorization when computing the nonzero elements. The problem seems to be non-convex
ERROR in osqp_setup: KKT matrix factorization.
The problem seems to be non-convex.


SolverError: Workspace allocation error!

In [None]:
# 3. Portfolio Performance (Expected Return, Volatility, Sharpe Ratio)
performance = ef.portfolio_performance(verbose=True)

# 4. Convert Weights to a DataFrame for Better Display
weights_df = pd.DataFrame.from_dict(cleaned_weights, orient='index', columns=['Weight'])
weights_df = weights_df[weights_df['Weight'] > 0]  # Filter out zero weights
print("\nOptimized Portfolio Weights:\n", weights_df)