#Non Linear Optimization - Group 5
By: Jiaxin Lin, Alexandra Oteana, Daniil Rusanyuk, Juan Camilo Velasco, Michel Ward


This project uses nonlinear optimization to study and improve the stock portfolios of two political figures, Josh Gottheimer and Scott Franklin. We analyze their top 10 stocks to build a risk-return model and then apply optimization techniques to find their best trading strategies.

# Josh Gottheimer

Here’s a more concise description of each stock:

1. Microsoft (MSFT)
A leading tech company known for software (Windows, Office), cloud services (Azure), and gaming (Xbox). Strong growth driven by cloud and subscription products.

2. Apple (AAPL)
Famous for its consumer electronics (iPhone, iPad), services (iCloud, Apple Music), and innovation. One of the most valuable companies globally.

3. Meta (META)
Owner of Facebook, Instagram, WhatsApp, and Oculus. Focuses on social media, digital ads, and the metaverse.

4. Amazon (AMZN)
Global leader in e-commerce and cloud computing (AWS). Also invests in entertainment, logistics, and AI.

5. Johnson & Johnson (JNJ)
A healthcare giant that makes medical devices, pharmaceuticals, and consumer health products. Known for stability and dividend payouts.

6. Goldman Sachs (GS)
A leading investment bank offering financial services like asset management, trading, and mergers & acquisitions.

7. United Airlines (UAL)
One of the largest U.S. airlines, offering domestic and international flights. Stock sensitive to economic conditions and fuel prices.

8. Tesla (TSLA)
Pioneering electric vehicle maker, also focused on renewable energy and autonomous driving technologies.

9. Mastercard (MA)
Global leader in digital payments and financial technologies, facilitating secure electronic transactions.

10. Alphabet (GOOG)
Parent company of Google, YouTube, and Waymo. Dominates online search and digital advertising, while investing in AI and autonomous driving.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from IPython.display import display # Helps to display
import random
import yfinance as yf
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error, r2_score
from math import sqrt
from statsmodels.tsa.statespace.sarimax import SARIMAX

## Importing Dataset

Here we are importing the dataset for Josh Gottheimer through excel.

In [None]:
import pandas as pd

colab_path = "/content/Josh Gottheimer Final.xlsx"
xls = pd.ExcelFile(excel_path)

df_list = []

for sheet_name in xls.sheet_names:
    # Parse the sheet and strip extra whitespace from column names
    df = xls.parse(sheet_name)
    df.columns = df.columns.str.strip()

    # Convert the Date column to datetime and set it as the index
    df["Date"] = pd.to_datetime(df["Date"], infer_datetime_format=True, errors="coerce")
    df.set_index("Date", inplace=True)

    # Use the sheet name as the ticker identifier
    ticker = sheet_name.strip()

    # Define the ticker-specific column names (these should match exactly your Excel headers)
    open_col      = f"Open_{ticker}"
    high_col      = f"High_{ticker}"
    low_col       = f"Low_{ticker}"
    close_col     = f"Close_{ticker}"
    adj_close_col = f"Adj Close_{ticker}"
    volume_col    = f"Volume_{ticker}"
    buy_col       = f"Buy_{ticker}"
    sell_col      = f"Sell_{ticker}"
    net_col       = f"Holding_{ticker}"

    # Clean numeric columns: remove commas and convert to float for Buy, Sell, and Net columns
    for col in [buy_col, sell_col, net_col]:
        if col in df.columns:
            df.loc[:, col] = pd.to_numeric(
                df[col].replace({',': ''}, regex=True), errors="coerce"
            ).fillna(0)

    # --- Calculate Trade Metrics for this ticker ---
    # Create a trade action column specific for this ticker
    trade_action_col = f"Trade_Action_{ticker}"
    def determine_trade_action(row):
        # If both Buy and Sell are > 0, return "Buy & Sell"
        if row[buy_col] > 0 and row[sell_col] > 0:
            return "Buy & Sell"
        elif row[buy_col] > 0:
            return "Buy"
        elif row[sell_col] > 0:
            return "Sell"
        else:
            return "No Trade"
    df[trade_action_col] = df.apply(determine_trade_action, axis=1)

    # Calculate next-day return based on the ticker-specific Close column
    next_day_return_col = f"Next_Day_Return_{ticker}"
    df[next_day_return_col] = df[close_col].pct_change().shift(-1)

    # Evaluate trade impact for this ticker
    trade_impact_col = f"Trade_Impact_{ticker}"
    def evaluate_trade(row):
        action = row[trade_action_col]
        ret = row[next_day_return_col]
        if pd.isnull(ret):
            return "No Data"
        if action == "Buy":
            return "Good Trade" if ret > 0 else "Bad Trade"
        elif action == "Sell":
            return "Good Trade" if ret < 0 else "Bad Trade"
        elif action == "Buy & Sell":
            return "No Difference"
        else:
            return "No Trade"
    df[trade_impact_col] = df.apply(evaluate_trade, axis=1)

    # Calculate net change in the net holding (Buy Sell) for this ticker
    net_change_col = f"Net_Change_{ticker}"
    df[net_change_col] = df[net_col].shift(1) - df[net_col]

    # NEW: Relate portfolio holdings to next-day return.
    # Multiply the net holding (Buy Sell) by the next-day return to get the dollar impact.
    portfolio_impact_col = f"Portfolio_Impact_{ticker}"
    df[portfolio_impact_col] = df[net_col] * df[next_day_return_col]

    # Append the processed DataFrame to our list
    df_list.append(df)

# Concatenate all processed sheets side by side (wide) on the Date index
df_jg = pd.concat(df_list, axis=1, join="outer")
df_jg.sort_index(inplace=True)

# Display final column list and a preview of the wide DataFrame
print("Final columns:")
print(df_jg.columns.tolist())
print(df_jg.head(10))

In [None]:
df_jg.head(100)

Metrics:

Date, Open, High, Low, Close, Adj Close, Volume were imported from Yahoo Finance starting from 01/02/2024 to 02/24/2025.

Added Metrics:

1. Buy = Did they buy on that day?

2. Sell = Did they sell on that day?

  - Imported their buy and sell trade data from Quiver Quantitative.

  - We were given ranges of how much they traded and decided used the midpoint number to make sure that it would be the most accurate in this case.

3. Holding = How much do they hold in that certain stock on that certain day?

  - Imported how much they money held in each stock from Quiver Quantitative

4. Trade Action = Was the trade a buy or sell? Or no trade?

5. Next Day Return =  Calculates the net day return based on the closed price of that stock from the previous day

6. Trade Impact = If they made a trade, was it “Good” or “Bad” the next day? “No Trade” --> not evaluated

7. Net Change = Calculates the net change of their hodldings in that stock from the previous day to the current day

8. Portfolio Impact = How much did their overall holding's value change because of price movement, whether you traded or not?

These metrics let us see the whole picture of how a portfolio is performing. They show us how much of the change comes from the market itself and how much comes from the investor's own trading decisions. This helps us understand the underlying reasons for gains or losses in the portfolio.

### Shape of Dataset

In [None]:
df_jg.shape

This dataset has 287 rows and 140 columns.

### .info()



In [None]:
df_jg.info()

- The dataframe has 287 rows (indexed by dates from 2024-01-02 to 2025-02-24)
- There are 140 columns, with names running from “Open_MSFT” through “Portfolio_Impact_GOOG.”
- Of the 140 columns, 102 are floating-point, 18 are integers, and 20 are objects (often strings or mixed data).
- The entire DataFrame is using about 316 KB of memory.

### Data Type

In [None]:
# Increase the max rows (or columns) displayed in the console
pd.set_option('display.max_rows', None)  # No limit on rows
# pd.set_option('display.max_columns', None)  # No limit on columns if needed

# Now printing dtypes will not truncate
print("Data types for df_jg columns:")
print(df_jg.dtypes)

Checking if all the data types are accurate.

In [None]:
for col in df_jg.columns:
    if "Trade_Action" in col:
        df_jg[col] = df_jg[col].astype('category')
    elif "Trade_Impact" in col:
        df_jg[col] = df_jg[col].astype('category')

Changing the data type for Trade_Action_{ticker} and Trade_Impact{ticker) to category instead of object to help with analysis and uses less memory.

In [None]:
# Increase the max rows (or columns) displayed in the console
pd.set_option('display.max_rows', None)  # No limit on rows
# pd.set_option('display.max_columns', None)  # No limit on columns if needed

# Now printing dtypes will not truncate
print("Data types for df_jg columns:")
print(df_jg.dtypes)

Checking to see if the changes were made.

### Missing Values

In [None]:
df_jg.isnull().sum()

In [None]:
# Drop rows with any missing values
df_jg.dropna(inplace=True)

# Display the DataFrame after removing missing values
print(df_jg.head(10))
df_jg.info()
df_jg.isnull().sum()

In [None]:
df_jg.shape

Checking for missing values. The missing values are coming from Next_Day_Return_{ticker} which calculates the net day return based on the closed price of that stock from the previous day and since the data starts from 2024-01-02 and there is no data on 2024-01-01 therefore it shows up NaN so we are deleting them.

##Univariate Analysis

###Histograms

#### MSFT

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant MSFT features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
msft_columns = [f"{feature}_MSFT" for feature in features]

# Filter dataset for MSFT, dropping rows with NaNs
df_msft = df_jg[msft_columns].dropna()

# Set up color palette for MSFT
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_MSFT"
    if col_name in df_msft.columns:
        sns.histplot(data=df_msft, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of MSFT {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_MSFT", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

For MSFT, we can see that all of the price based features and returns have bell shaped curves showing that during this specific time period, there were no harsh movements in Microsoft’s stock price. In addition, in regards to Volume, Buy and Sell activity, all of their histograms are right skewed which could mean that across this time period, Microsoft’s stock had a relatively low trading activity but there were some specific days that had a very high trading activity. After performing some research we came up with a very interesting analysis and is that across the year we saw that Microsoft had a very low volatility with small earnings and losses but there are some days in the time period that there were high volume of trading normally investors that have a notable position decide to either sell or buy the stock.

#### JNJ

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant JNJ features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
jnj_columns = [f"{feature}_JNJ" for feature in features]

# Filter dataset for JNJ, dropping rows with NaNs
df_jnj = df_jg[jnj_columns].dropna()

# Set up color palette for JNJ
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_JNJ"
    if col_name in df_jnj.columns:
        sns.histplot(data=df_jnj, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of JNJ {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_JNJ", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

For JNJ, the price distributions are multimodal which means that there were notable changes in the stock price through this time period. In addition, the High and Sell activity is based on just a few large transactions and that is why we believe these histograms have a right skewed distribution.

#### GOOG

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant GOOG features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
goog_columns = [f"{feature}_GOOG" for feature in features]

# Filter dataset for GOOG, dropping rows with NaNs
df_goog = df_jg[goog_columns].dropna()

# Set up color palette for GOOG
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_GOOG"
    if col_name in df_goog.columns:
        sns.histplot(data=df_goog, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of GOOG {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_GOOG", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

For the price based features for GOOG, we can see that the histograms have a normal distribution where the price of the stock ranges between 140 and 200. These histograms show us that across this time period, Google’s stock price maintained very stable. Once again, just like in the previous stocks, we can see that based on these histograms, most of the days, this stock had a low trading volume, but there were some days in which important spikes occurred. In addition, from these histograms we can also infer that although there were a low amount of trades, there are a few massive trades which likely were made by institutional investors. Moreover, the Next-Day returns also follow a normal distribution which means that daily returns are normally very stable and the closing price of the GOOG stock tends to be very similar to the opening price.

#### AAPL

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant AAPL features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
aapl_columns = [f"{feature}_AAPL" for feature in features]

# Filter dataset for AAPL, dropping rows with NaNs
df_aapl = df_jg[aapl_columns].dropna()

# Set up color palette for AAPL
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_AAPL"
    if col_name in df_aapl.columns:
        sns.histplot(data=df_aapl, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of AAPL {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_AAPL", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

For the price based features for Apple’s stock, we can see that the histograms have multimodal or bi-modal distributions which means that the prices tend to be around a larger range (180-260). On the other hand, the volume distribution is right skewed which means that there are a few days in the year that have very high trading activity. Last but not least important, in regards to return, the histograms have a normal distribution meaning that daily returns are balanced among gains and losses.

#### GS

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant GS features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
gs_columns = [f"{feature}_GS" for feature in features]

# Filter dataset for GS, dropping rows with NaNs
df_gs = df_jg[gs_columns].dropna()

# Set up color palette for GS
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_GS"
    if col_name in df_gs.columns:
        sns.histplot(data=df_gs, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of GS {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_GS", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

Once again, the GS stock has a bi-modal or multimodal distribution in the price based features which means that this stock had one of the largest price ranges during this time period (400 and 650). In other words, this means that this stock fluctuated way more than the previous stocks. In addition, just like the previous stocks, there were some days in which the trading volume was way higher than the rest of the days. Moreover, the next day's return histogram which has a normal distribution helps us conclude that most days of the year, this stock did not experience harsh changes but there were some days that the stock had notable and very high peaks.

#### TSLA

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant TSLA features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
tsla_columns = [f"{feature}_TSLA" for feature in features]

# Filter dataset for TSLA, dropping rows with NaNs
df_tsla = df_jg[tsla_columns].dropna()

# Set up color palette for TSLA
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_TSLA"
    if col_name in df_tsla.columns:
        sns.histplot(data=df_tsla, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of TSLA {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_TSLA", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

In these histograms there are some important conclusions that we were able to make. First, the Tesla price distributions are highly skewed with a large range of prices from 200 up to over 450. Furthermore, TSLA is the first stock that we see that actually has a moderate trading volume all across the year but once again, there are some specific days in which the trading volume spikes. Also, something different about this stock is that Buy and Sell distributions are really similar and very concentrated near 0 which means that most trades made in this stock are small trades.

#### MA

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant MA features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
ma_columns = [f"{feature}_MA" for feature in features]

# Filter dataset for MA, dropping rows with NaNs
df_ma = df_jg[ma_columns].dropna()

# Set up color palette for MA
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_MA"
    if col_name in df_ma.columns:
        sns.histplot(data=df_ma, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of MA {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_MA", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

For MasterCard, we can see that the price features follow a somewhat normal distribution with some multimodal characteristics. Also, this stock has a price range between 425 and 575 which when compared to the other stocks that he has seen before in the project, it shows that this stock has a low volatility. Furthermore, we can see that in the Buy and Sell distributions, the histograms are deeply concentrated near 0 which means that most of the stock transactions are small. Lastly, in the Next Day Return histogram, we can see a normal distribution concentrated in 0 which translates to that most days the stock will close at a similar price ta]han the closing price of the day before.

#### UAL

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant UAL features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
ual_columns = [f"{feature}_UAL" for feature in features]

# Filter dataset for UAL, dropping rows with NaNs
df_ual = df_jg[ual_columns].dropna()

# Set up color palette for UAL
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_UAL"
    if col_name in df_ual.columns:
        sns.histplot(data=df_ual, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of UAL {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_UAL", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

The United Airlines price feature histograms are also heavily skewed and the long tail in the right means that across this time period stays at lower prices but there were some days that had sharp price increases. In regards to trading volume, the histogram is highly skewed allowing us to conclude that there are some days that this stock had extreme trading spikes. On the other hand, we could conclude from the Buy, Sell and Holding histograms that Senator Gottheimer did not trade this stock during this specific time period.

#### AMZN

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant AMZN features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
amzn_columns = [f"{feature}_AMZN" for feature in features]

# Filter dataset for AMZN, dropping rows with NaNs
df_amzn = df_jg[amzn_columns].dropna()

# Set up color palette for AMZN
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_AMZN"
    if col_name in df_amzn.columns:
        sns.histplot(data=df_amzn, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of AMZN {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_AMZN", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

In the price feature histograms of this stock we can witness that they have a near normal distribution with a small right skew. In addition, the flat distributions that we have in the Buy, Sell and Holding histograms mean that Senator Gottheimer did not trade this stock during this specific time period. Also, if we take a look at the Return histogram, we can conclude that based on its normal distribution Amazon’s returns are quite stable making this stock a low risk investment.

#### META

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant META features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
meta_columns = [f"{feature}_META" for feature in features]

# Filter dataset for META, dropping rows with NaNs
df_meta = df_jg[meta_columns].dropna()

# Set up color palette for META
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_META"
    if col_name in df_meta.columns:
        sns.histplot(data=df_meta, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of META {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_META", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

In this case, Meta’s stock prices are slightly right skewed which means that higher price movements were more frequent than large declines. Meta was another stock that fluctuated notably being in a range from 400 to 700, but most of the time was between 500 and 600. Once again, we have some flat distributions in some of the histograms which tells us that Senator Gottheimer did not buy any of this stock during this time period. Based on the Return histograms, we can see that Meta has stable return patterns making this stock a low risk investment opportunity for the Senator.

###Boxplots

####MSFT

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant MSFT features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
msft_columns = [f"{feature}_MSFT" for feature in features]

# Filter dataset for MSFT
df_msft = df_jg[msft_columns].dropna()

# Set up color palette for MSFT
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_MSFT"
    if col_name in df_msft.columns:
        sns.boxplot(y=df_msft[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"MSFT {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

These boxplots for MSFT show a stable price range across Open, High, Low, Close, and Adjusted Close values. The volume chart has some notable outliers, which means that there were some occasional spikes in trading activity. The Next-Day Return appears to be well distributed with some outliers, indicating some high daily movements.

####JNJ

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant JNJ features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
jnj_columns = [f"{feature}_JNJ" for feature in features]

# Filter dataset for JNJ
df_jnj = df_jg[jnj_columns].dropna()

# Set up color palette for JNJ
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_JNJ"
    if col_name in df_jnj.columns:
        sns.boxplot(y=df_jnj[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"JNJ {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

JNJ’s price related features are tightly clustered with small interquartile ranges, showing lower volatility compared to the other stocks. Nevertheless, trading volume shows a notable variability, with outliers indicating some spikes. The Next-Day Return is around zero, which suggests a balanced distribution of daily price fluctuations.

####GOOG

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant GOOG features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
goog_columns = [f"{feature}_GOOG" for feature in features]

# Filter dataset for GOOG
df_goog = df_jg[goog_columns].dropna()

# Set up color palette for GOOG
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_GOOG"
    if col_name in df_goog.columns:
        sns.boxplot(y=df_goog[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"GOOG {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()


Based on these boxplots we can conclude that GOOG has a larger range in price-related boxplots which shows that this stock had a higher volatility compared to JNJ. In addition, the trading volume has notable outliers, which means that there were a few days in this time frame that had a higher trading activity. The Next-Day Return and Net Change graphs show balanced distributions but with some extreme values, demonstrating some large price swings.

####AAPL

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant AAPL features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
aapl_columns = [f"{feature}_AAPL" for feature in features]

# Filter dataset for AAPL
df_aapl = df_jg[aapl_columns].dropna()

# Set up color palette for AAPL
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_AAPL"
    if col_name in df_aapl.columns:
        sns.boxplot(y=df_aapl[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"AAPL {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

AAPL’s price movements have a bigger range, with a relatively higher median price level. Volume once again has a high number of outliers, showing moments of elevated trading activity. The Next-Day Return and Net Change show a normal distribution but with some deviations, which could translate to short-term price changes.

####GS

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant GS features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
gs_columns = [f"{feature}_GS" for feature in features]

# Filter dataset for GS
df_gs = df_jg[gs_columns].dropna()

# Set up color palette for GS
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_GS"
    if col_name in df_gs.columns:
        sns.boxplot(y=df_gs[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"GS {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()


GS price related features distributions show steady trading patterns, having also some notable outliers in Volume and Next-Day Return, which translates once again to occasional high volatility.

####TSLA

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant TSLA features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
tsla_columns = [f"{feature}_TSLA" for feature in features]

# Filter dataset for TSLA
df_tsla = df_jg[tsla_columns].dropna()

# Set up color palette for TSLA
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_TSLA"
    if col_name in df_tsla.columns:
        sns.boxplot(y=df_tsla[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"TSLA {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

Based on these histograms, Tesla’s stock has a high volatility, with a big range in its open, high, low, and close prices, counting as well with numerous outliers, indicating strong price fluctuations. The trading volume also shows high variation, suggesting some days of very active trading. On the other hand, the Holding distribution is more dispersed, indicating that some positions in TSLA were significantly large. Lastly, the Next-Day Return distribution shows wide variability with some extreme values, which allows us to conclude that  Tesla is a high risk but also high reward stock option.

####MA

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant MA features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
ma_columns = [f"{feature}_MA" for feature in features]

# Filter dataset for MA
df_ma = df_jg[ma_columns].dropna()

# Set up color palette for MA
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_MA"
    if col_name in df_ma.columns:
        sns.boxplot(y=df_ma[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"MA {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

Based on these boxplots, Mastercard shows a low volatility in the price driving features. Also, trading volume has some occasional spikes. The Buy and Sell distributions are close to zero, indicating that most trades are small transactions rather than large institutional shifts. The Next-Day Return is centered around zero, showing a steady daily performance which makes this stock a low risk investment.

####UAL

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant UAL features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
ual_columns = [f"{feature}_UAL" for feature in features]

# Filter dataset for UAL
df_ual = df_jg[ual_columns].dropna()

# Set up color palette for UAL
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_UAL"
    if col_name in df_ual.columns:
        sns.boxplot(y=df_ual[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"UAL {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()


Based on these boxplots, UAL's stock has a significant price volatility with many outliers, indicating many notable sharp movements. Trading volume is also highly skewed, with some extreme spikes. In addition, once again the Next-day returns are mostly centered around zero, but some outliers confirm the sharp changes. Overall, UAL shows high signs of volatility making it a higher risk stock.

####AMZN

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant AMZN features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
amzn_columns = [f"{feature}_AMZN" for feature in features]

# Filter dataset for AMZN
df_amzn = df_jg[amzn_columns].dropna()

# Set up color palette for AMZN
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_AMZN"
    if col_name in df_amzn.columns:
        sns.boxplot(y=df_amzn[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"AMZN {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()


In this case, for the AMZN stock the price-related features show relatively stable distributions with not as many  outliers, indicating a constant trading activity. The volume data on the other hand shows significant variation, with many outliers, which makes us conclude that there were some spikes in the trading activity. The next-day return distribution is centered around zero, demonstrating a balanced daily performance overall.

####META

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant META features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
meta_columns = [f"{feature}_META" for feature in features]

# Filter dataset for META
df_meta = df_jg[meta_columns].dropna()

# Set up color palette for META
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_META"
    if col_name in df_meta.columns:
        sns.boxplot(y=df_meta[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"META {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()


Meta’s stock had relatively stable price movements but it also had some high spikes. Trading volume had some high-activity days. Furthermore, the low Buy/Sell activity means that this stock was not traded during this time period. Moreover, returns were mostly stable, but certain days saw notable fluctuations, impacting the overall portfolio performance.

## Analysis of Josh Gottheimer's investment portfolio

Analyzing Representative Josh Gottheimer's investment portfolio reveals a strategic emphasis on technology and large-cap companies, reflecting both his professional background and a focus on stable, long-term growth.​

1. Professional Background Influence:

Microsoft: Gottheimer's substantial allocation to Microsoft (MSFT) at 58.45% of his portfolio aligns with his prior role as a strategist for 3 years. This significant investment suggests strong confidence in Microsoft's performance and prospects. ​

2. Portfolio Composition:

- Technology Sector Dominance:

Major Holdings: Beyond Microsoft, Gottheimer's investments in Apple (AAPL - 3.28%), Alphabet (GOOG - 0.28%), Tesla (TSLA - 0.29%), and Meta Platforms (META - 1.44%) underscore a robust commitment to the technology sector.​

- Diversification Across Sectors:

- Consumer Discretionary:

Investments in Amazon (AMZN - 1.61%) and Tesla reflect exposure to e-commerce and innovative automotive industries.​

Financials: Holdings in Goldman Sachs (GS - 0.33%) and Mastercard (MA - 0.28%) provide access to traditional banking and payment processing sectors.​

Healthcare: A stake in Johnson & Johnson (JNJ - 0.35%) offers stability through a leading healthcare conglomerate.​

Transportation: Investment in United Airlines (UAL - 0.32%) indicates exposure to the aviation industry.​

3. Investment Strategy Insights:

Concentration in Familiar Entities: The significant investment in Microsoft suggests a preference for companies where Gottheimer possesses in-depth knowledge, potentially enhancing investment confidence.​

Emphasis on Technology: Allocating over 60% of the portfolio to tech companies indicates a strong belief in the sector's growth potential and resilience.​

Selective Diversification: While technology dominates, the inclusion of companies from various sectors demonstrates a strategy to balance growth with stability, mitigating sector-specific risks.​


In summary, Josh Gottheimer's portfolio reflects a strategic blend of leveraging personal expertise and targeting high-growth sectors, complemented by diversification to ensure long-term, stable returns.

## Bivariate

### Correlation Matrix

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Extract tickers from column names while excluding non-ticker suffixes
tickers = {col.split('_')[-1] for col in df_jg.columns
           if '_' in col and col.split('_')[-1] not in ['Shares', 'Held']}

# Define relevant features[
features = ["Open", "High", "Low", "Close", "Volume", "Adj Close", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Number of tickers
num_tickers = len(tickers)

# Create subplots (one per ticker) with a larger size
fig, axes = plt.subplots(num_tickers, 1, figsize=(10, num_tickers * 7), squeeze=False)

# Loop through tickers to compute and plot correlation matrices
for idx, ticker in enumerate(sorted(tickers)):  # Sort tickers alphabetically for consistency
    # Select relevant columns for the ticker
    ticker_cols = [f"{feature}_{ticker}" for feature in features if f"{feature}_{ticker}" in df_jg.columns]

    # Extract data and drop missing values
    ticker_data = df_jg[ticker_cols].dropna()

    # Ensure there are at least two columns for correlation calculation
    if ticker_data.shape[1] < 2:
        print(f"Skipping correlation matrix for {ticker} due to insufficient data.")
        continue

    # Compute correlation matrix
    ticker_corr = ticker_data.corr()

    # Plot heatmap
    ax = axes[idx, 0]
    sns.heatmap(ticker_corr, annot=True, cmap="coolwarm", fmt=".2f", ax=ax)
    ax.set_title(f"Correlation Matrix for {ticker.upper()}")

plt.tight_layout()
plt.show()



Values range from -1 to 1:
1.00 → Perfect positive correlation (Both variables move in the same direction).
-1.00 → Perfect negative correlation (One goes up, the other goes down).
0.00 → No correlation (Variables move independently).

Key Observations from Correlation Matrix:

1. Price-Based Features (Open, High, Low, Close, Adj Close)

High correlation (~0.9 - 1.0) between Open, High, Low, Close, and Adj Close.
This is expected since these values track the price movements and are interdependent.

2. Volume vs. Price

Volume may show weak or inverse correlation with Close:
If Volume ↑ and Close ↓, it means selling pressure is driving prices down.
If Volume ↑ and Close ↑, it suggests strong demand for the stock.
3. Stock-Specific Insights

For volatile stocks (tech stocks like AAPL, TSLA, NVDA):
Volume may fluctuate significantly, showing lower correlation with price.
For stable blue-chip stocks ( JNJ, PG):
Volume and price may have a steady relationship, leading to a stronger correlation.

1. MSFT (Microsoft)
- Strong positive correlation: Between Open, High, Low, and Close, indicating smooth price movements.
- Moderate correlation: Between Volume and Net_Change, suggesting price shifts occur on high-volume days.
- Weak correlation: Between Next_Day_Return and historical prices, implying some unpredictability.

2. AAPL (Apple)
- High correlation: Between Portfolio_Impact and Net_Change, indicating stock price moves significantly affect portfolios.
- Negative correlation: Between Buy and Next_Day_Return, suggesting traders may buy dips but short-term returns are unpredictable.

3. META (Meta Platforms)
- Strong correlation: Between Next_Day_Return and Portfolio_Impact, reflecting Meta’s volatility.
- Weak correlation: Between Volume and Price, indicating social sentiment likely drives stock movements more than volume alone.

4. AMZN (Amazon)
- Moderate correlation: Between Volume and Net_Change, showing some connection between trading volume and price movements.
- Low correlation: Between Buy signals and next-day returns, meaning immediate price reactions to purchases are weak.

5. JNJ (Johnson & Johnson)
- Low correlation: Across most metrics, suggesting a defensive stock with steady price action.
- Weak Buy correlation: Indicates that buying activity does not drive strong short-term price changes.

6. GS (Goldman Sachs)
- Strong correlation: Between Volume and Net_Change, as financial stocks react quickly to trading flows.
- Moderate inverse correlation: Between Holding and Next_Day_Return, suggesting traders adjust positions frequently.

7. UAL (United Airlines)
- High correlation: Between Volume and Price changes, reflecting travel industry volatility.
- Strong negative correlation: Between Net_Change and Holding, suggesting traders exit positions before large moves.

8. TSLA (Tesla)
- Low correlation: Between most indicators, showing extreme volatility and unpredictability.
- Weak correlation: Between Buy and Next_Day_Return, meaning purchases do not immediately reflect price gains.

9. MA (Mastercard)
- Moderate correlation: Between Holding and Portfolio_Impact, meaning long-term positions drive portfolio value.
- Low correlation: Between Volume and Price, suggesting steady movement without erratic volume shifts.

10. GOOG (Alphabet)
- Strong correlation: Between Next_Day_Return and Portfolio_Impact, showing that stock moves affect holdings significantly.
- Weak correlation: Between Volume and Price, reinforcing that Google’s price changes are not driven by short-term volume spikes.


## Pair Plots

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Extract tickers from column names while excluding non-ticker suffixes
tickers = {col.split('_')[-1] for col in df_jg.columns # Changed stock_data_JG to df_jg
           if '_' in col and col.split('_')[-1] not in ['Shares', 'Held']}

# Define relevant features
features = ["Open", "High", "Low", "Close", "Volume", "Adj Close", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Loop through tickers and generate pairplots
for ticker in sorted(tickers):  # Sort for consistent order
    # Select relevant columns for the ticker
    ticker_cols = [f"{feature}_{ticker}" for feature in features if f"{feature}_{ticker}" in df_jg.columns] # Changed stock_data_JG to df_jg

    # Extract data and drop missing values
    ticker_data = df_jg[ticker_cols].dropna() # Changed stock_data_JG to df_jg

    # Ensure there are at least two numeric columns
    if ticker_data.shape[1] < 2:
        print(f"Skipping pairplot for {ticker} due to insufficient data.")
        continue

    # Rename columns for better readability in pairplot
    ticker_data = ticker_data.rename(columns={col: col.replace(f"_{ticker}", "") for col in ticker_cols})

    # Generate pairplot
    print(f"Generating pairplot for {ticker}...")
    sns.pairplot(ticker_data, diag_kind="kde", corner=True)  # `corner=True` removes duplicate plots
    plt.suptitle(f"Pairplot for {ticker.upper()}", y=1.02)
    plt.show()



1. Price Features (Open, High, Low, Close, Adj Close)
Highly correlated (~0.9 - 1.0), as these metrics track the same price movement.
Diagonal KDE plots show distribution shapes (e.g., normal or skewed).
2. Volume vs. Price
Volume vs. Close:
If Volume ↑ but Close ↓ → Selling pressure (large sell-offs).
If Volume ↑ and Close ↑ → Strong demand (low offers).
3. Detecting Outliers
Scatterplots highlight anomalies in price or volume data.
If a stock has sudden volume spikes, it may indicate news impact or earnings reports.

## Hypothesis JG

### Does Higher Volume_GOOG (x-axis) correlate with lower Close_MSFT (y-axis) and is correlated to Close_APPL (by color)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import spearmanr

# Load the data
# Ensure df_jg is already created using your previous code

# Define relevant columns
volume_goog = "Volume_GOOG"
close_msft = "Close_MSFT"
close_appl = "Close_AAPL"

# Drop missing values to ensure a clean dataset for analysis
df_clean = df_jg[[volume_goog, close_msft, close_appl]].dropna()

# Compute correlation coefficients
corr_volume_close_msft, _ = spearmanr(df_clean[volume_goog], df_clean[close_msft])
corr_volume_close_appl, _ = spearmanr(df_clean[volume_goog], df_clean[close_appl])
corr_msft_appl, _ = spearmanr(df_clean[close_msft], df_clean[close_appl])

print(f"Spearman correlation between {volume_goog} and {close_msft}: {corr_volume_close_msft:.4f}")
print(f"Spearman correlation between {volume_goog} and {close_appl}: {corr_volume_close_appl:.4f}")
print(f"Spearman correlation between {close_msft} and {close_appl}: {corr_msft_appl:.4f}")

# Create a scatterplot
plt.figure(figsize=(10, 6))
scatter = plt.scatter(
    df_clean[volume_goog],
    df_clean[close_msft],
    c=df_clean[close_appl],
    cmap='coolwarm',
    edgecolor='k',
    alpha=0.7
)
plt.colorbar(label='Close_AAPL')
plt.xlabel("Volume_GOOG")
plt.ylabel("Close_MSFT")
plt.title("Multivariate Scatter Plot: Volume_GOOG vs Close_MSFT (colored by Close_AAPL)")

plt.show()


For this hypothesis, we selected Google (GOOG), Microsoft (MSFT), and Apple (AAPL) because they are part of the "Magnificent 7"—a group of leading tech stocks that have significantly driven market growth in recent years. These companies are among the most influential in the stock market due to their high market capitalization, strong financial performance, and significant impact on major indices like the S&P 500 and Nasdaq.

Given their dominance in the technology sector, we wanted to explore whether trading volume in GOOG correlates with price movements in MSFT, while AAPL’s closing price serves as a reference point to observe broader market trends. The hypothesis is based on the idea that trading patterns in one major tech stock could influence or be influenced by another, as institutional investors often trade them in similar market conditions.

From analyzing the scatter plot above, it appears that there isn't a strongly visible pattern between Volume_GOOG and Close_MSFT. The points are somewhat dispersed, and while there may be a weak negative correlation (as indicated by the coefficient of -0.2781), it is not immediately clear from the visualization.

Additionally, Close_AAPL (represented by color) Close_AAPL values (red shades) are more concentrated in the upper part of the plot where Close_MSFT is higher, and lower Close_AAPL values (blue shades) are more concentrated at the bottom where Close_MSFT is lower.

To better understand the relationship, adding a trend line in the next graph would help visualize the general direction and strength of the correlation. This will make it easier to determine whether Higher Volume_GOOG is actually associated with lower Close_MSFT and whether the relationship is significant.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import spearmanr

# Load the data
# Ensure df_jg is already created using your previous code

# Define relevant columns
volume_goog = "Volume_GOOG"
close_msft = "Close_MSFT"
close_appl = "Close_AAPL"

# Drop missing values
df_clean = df_jg[[volume_goog, close_msft, close_appl]].dropna()

# Compute correlation coefficients
corr_volume_close_msft, _ = spearmanr(df_clean[volume_goog], df_clean[close_msft])
corr_volume_close_appl, _ = spearmanr(df_clean[volume_goog], df_clean[close_appl])
corr_msft_appl, _ = spearmanr(df_clean[close_msft], df_clean[close_appl])

print(f"Spearman correlation between {volume_goog} and {close_msft}: {corr_volume_close_msft:.4f}")
print(f"Spearman correlation between {volume_goog} and {close_appl}: {corr_volume_close_appl:.4f}")
print(f"Spearman correlation between {close_msft} and {close_appl}: {corr_msft_appl:.4f}")

# Create a scatterplot with regression trend line
plt.figure(figsize=(10, 6))
scatter = plt.scatter(
    df_clean[volume_goog],
    df_clean[close_msft],
    c=df_clean[close_appl],
    cmap='coolwarm',
    edgecolor='k',
    alpha=0.7
)
plt.colorbar(label='Close_AAPL')

# Add trend line using seaborn's regression plot
sns.regplot(
    x=df_clean[volume_goog],
    y=df_clean[close_msft],
    scatter=False,  # Hide seaborn scatter plot
    line_kws={"color": "black", "linewidth": 2, "linestyle": "--"}  # Trend line style
)

# Labels and title
plt.xlabel("Volume_GOOG")
plt.ylabel("Close_MSFT")
plt.title("Multivariate Scatter Plot with Trend Line: Volume_GOOG vs Close_MSFT (colored by Close_AAPL)")

plt.show()


The trend line confirms a weak negative correlation between Volume_GOOG and Close_MSFT, but the relationship is not strong or significant.
Close_AAPL still shows a clear positive correlation with Close_MSFT.
Given the spread of data and confidence interval, we fail to reject the hypothesis that higher Volume_GOOG correlates with lower Close_MSFT—the evidence is not strong enough to support it.

## Time Series

In [None]:
import matplotlib.pyplot as plt

# Identify columns related to closing prices
close_columns = [col for col in df_jg.columns if "Close" in col]

# Check if there are any closing price columns
if not close_columns:
    print("No closing price data found in df_jg.")
else:
    plt.figure(figsize=(12, 6))

    # Plot closing prices for all available stocks
    for col in close_columns:
        plt.plot(df_jg.index, df_jg[col], label=col.replace("Close_", ""))  # Remove 'Close_' for cleaner labels

    # Graph customization
    plt.xlabel("Date")
    plt.ylabel("Closing Price")
    plt.title("Time Series of Stock Closing Prices in df_jg")
    plt.legend(loc="best")
    plt.grid(True)

    # Show the plot
    plt.show()

1. Upward trends suggest stocks are performing well.
Downward trends indicate declining stock performance.
Sideways trends (flat) suggest a consolidation phase.
2. Highly volatile stocks (rapid ups and downs) may be risky.
Smooth trends indicate stable price movement.
3. If one stock outperforms others consistently, it might be a stronger investment.
Divergences (one stock rising while others fall) might indicate sector rotation or company-specific events.  


In [None]:
import matplotlib.pyplot as plt

# Identify unique tickers based on column names
tickers = {col.split('_')[-1] for col in df_jg.columns if "Close_" in col}

# Loop through each ticker and plot its closing price separately
for ticker in sorted(tickers):  # Sorting for consistency
    close_col = f"Close_{ticker}"

    # Check if the column exists
    if close_col in df_jg.columns:
        plt.figure(figsize=(10, 5))
        plt.plot(df_jg.index, df_jg[close_col], label=ticker, color='b')

        # Graph customization
        plt.xlabel("Date")
        plt.ylabel("Closing Price")
        plt.title(f"Time Series of {ticker} Closing Prices")
        plt.legend()
        plt.grid(True)

        # Show the plot for each stock
        plt.show()
    else:
        print(f"Skipping {ticker}, as closing price data is missing.")

1.  MSFT (Microsoft Corp.)

  - Trend: Sideways with a mild uptrend

  - Volatility: Low

  - Insights: Microsoft remains strong due to AI and cloud expansion However, the stock is consolidating and needs a breakout catalyst

2. AAPL (Apple Inc.)

  - Trend: Strong uptrend

  - Volatility: Low to moderate

  - Insights: Apple’s growth is driven by iPhone sales and high-margin services. The stock remains a stable long-term investment

3. META (Meta Platforms Inc.)

  - Trend: Uptrend with a recent pullback

  - Volatility: High

  - Insights: Meta is benefiting from AI and digital advertising, but heavy spending on the Metaverse creates uncertainty

4. AMZN (Amazon Inc.)

  - Trend: Gradual uptrend

  - Volatility: Moderate

5. JNJ (Johnson & Johnson)

  - Trend: Slight downtrend or sideways movement

  - Volatility: Low

6. GS (Goldman Sachs Group Inc.)

  - Trend: Uptrend with fluctuations

  - Volatility: Moderate

7. UAL (United Airlines Holdings Inc.)

  - Trend: Choppy with periodic uptrends

  - Volatility: High

8. TSLA (Tesla Inc.)

  - Trend: Downtrend with volatility

  - Volatility: Very high

9. MA (Mastercard Inc.)

  - Trend: Strong uptrend

  - Volatility: Low

10. GOOG (Alphabet Inc.)

  - Trend: Uptrend with stability
  
  - Volatility: Low

- Tech Stocks (MSFT, AAPL, META, AMZN, GOOG): Strong AI and cloud growth, with META being the most volatile
- Finance & Payments (GS, MA): MA benefits from digital payments, while GS gains from interest rates but faces economic risks
- Cyclical Stocks (UAL, TSLA): UAL is dependent on travel demand, and TSLA is volatile due to market competition
- (JNJ): Stable but faces legal risks

In [None]:
import matplotlib.pyplot as plt

# Extract tickers
tickers = {col.split('_')[-1] for col in df_jg.columns if 'Close_' in col}

for ticker in tickers:
    buy_col = f"Buy_{ticker}"
    sell_col = f"Sell_{ticker}"

    if buy_col in df_jg.columns and sell_col in df_jg.columns:
        plt.figure(figsize=(10, 6))  # Adjust figure size as needed

        plt.plot(df_jg.index, df_jg[buy_col], label='Buy', color='green')
        plt.plot(df_jg.index, df_jg[sell_col], label='Sell', color='red')

        plt.xlabel("Date")
        plt.ylabel("Value")
        plt.title(f"Buy/Sell Trends for {ticker}")
        plt.legend()
        plt.grid(True)
        plt.show()
    else:
        print(f"Buy or Sell column not found for {ticker}")

- Stable stocks (JNJ, MA, GOOG) have low volatility in buy/sell activity, indicating long-term holding strategies.
- Tech stocks (AAPL, MSFT, META) show predictable buy patterns before events and sell-offs after price surges.
- Speculative stocks (TSLA, UAL, GS) have erratic buy/sell signals, reflecting high volatility and trading speculation.
- Financial stocks (GS, MA) show different behaviors: GS is fast-trading, while MA is stable.


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose

# Extract tickers from column names
tickers = {col.split('_')[-1] for col in df_jg.columns if "Close_" in col}

# Loop through each stock to perform seasonal decomposition
for ticker in sorted(tickers):
    close_col = f"Close_{ticker}"

    # Check if the column exists
    if close_col in df_jg.columns:
        df_stock = df_jg[[close_col]].dropna()  # Select only the relevant column and drop NaNs
        df_stock = df_stock.rename(columns={close_col: "Close"})  # Rename for consistency

        # Perform seasonal decomposition
        try:
            decomposition = seasonal_decompose(df_stock["Close"], model="additive", period=30)  # Adjust period as needed

            # Plot decomposition results
            plt.figure(figsize=(12, 8))

            plt.subplot(411)
            plt.plot(df_stock["Close"], label="Original Data", color="blue")
            plt.legend(loc="upper left")

            plt.subplot(412)
            plt.plot(decomposition.trend, label="Trend", color="green")
            plt.legend(loc="upper left")

            plt.subplot(413)
            plt.plot(decomposition.seasonal, label="Seasonality", color="orange")
            plt.legend(loc="upper left")

            plt.subplot(414)
            plt.plot(decomposition.resid, label="Residuals (Irregular Component)", color="red")
            plt.legend(loc="upper left")

            plt.suptitle(f"Seasonal Decomposition for {ticker}")
            plt.tight_layout()
            plt.show()

        except Exception as e:
            print(f"Skipping {ticker} due to decomposition error: {e}")
    else:
        print(f"Skipping {ticker}, as closing price data is missing.")


1. MSFT (Microsoft Corp.)

  - Trend: Upward, with periods of consolidation.

  - Seasonality: Slight cyclic movements every 30 days.

  - Residuals: Some unexpected fluctuations but mostly stable.

  - Insights: MSFT benefits from cloud and AI trends, keeping its stock in an uptrend with minor seasonal effects.

2. AAPL (Apple Inc.)

  - Trend: Strong uptrend.

  - Seasonality: Noticeable peaks around earnings announcements.

  - Residuals: Occasional sharp drops, likely linked to product launches and market reactions.

  - Insights: Apple’s stock remains steady, with seasonality influenced by new product releases and earnings cycles.

3. META (Meta Platforms Inc.)

  - Trend: Increasing, but with high volatility.

  - Seasonality: A repeating pattern, potentially tied to advertising revenue fluctuations.

  - Residuals: Larger than other tech stocks, suggesting uncertainty.

  - Insights: META’s heavy investments in AI and Metaverse cause fluctuations, but ad revenue growth supports long-term stability.

4. AMZN (Amazon Inc.)

  - Trend: Gradual uptrend.

  - Seasonality: Strong quarterly cycles, reflecting retail demand (holiday sales).

  - Residuals: Spikes around Q4, confirming the impact of holiday shopping.

  - Insights: AMZN’s stock follows retail seasonality, with clear uptrends in Q4 due to Black Friday and holiday sales.

5. JNJ (Johnson & Johnson)

  - Trend: Slight downward movement.

  - Seasonality: Minimal but present, likely tied to pharmaceutical cycles.

  - Residuals: Low volatility, making it relatively stable.

  - Insights: JNJ is a defensive stock, with little seasonal impact but a recent downward trend due to legal challenges.

6. GS (Goldman Sachs Group Inc.)

  - Trend: Upward but inconsistent.

  - Seasonality: Noticeable cycles, possibly linked to earnings and interest rate changes.

  - Residuals: High fluctuations during financial crises.

  - Insights: GS is influenced by market cycles, interest rates, and economic conditions, making it moderately seasonal.

7. UAL (United Airlines Holdings Inc.)

  - Trend: Choppy, with a long-term uptrend.

  - Seasonality: Strong, reflecting travel demand (peaks in summer and holidays).

  - Residuals: Large unexpected fluctuations, likely due to fuel costs and travel restrictions.

  - Insights: UAL’s stock is highly seasonal, peaking during travel seasons and facing risks from oil price volatility.

8. TSLA (Tesla Inc.)

  - Trend: Volatile, recently declining.

  - Seasonality: Some patterns, potentially influenced by delivery reports.

  - Residuals: High unpredictability due to Elon Musk’s actions and market sentiment.

  - Insights: TSLA faces high speculation, making seasonal patterns less reliable. Delivery announcements and earnings cause major swings.

9. MA (Mastercard Inc.)

  - Trend: Strong upward trend.

  - Seasonality: Noticeable, with spending cycles (e.g., holiday shopping boosts Q4).

  - Residuals: Small, indicating a stable company.

  - Insights: MA benefits from global payment trends, with slight seasonal effects due to consumer spending habits.

10. GOOG (Alphabet Inc.)

  - Trend: Upward and stable.

  - Seasonality: Moderate, linked to ad revenue and tech trends.

  - Residuals: Some volatility, often tied to regulatory news.

  - Insights: GOOG remains stable with minor seasonal dips during weaker ad revenue periods but overall strong growth.

## Modeling -- BASELINE

Calculating the Adj Close Average for modeling.

In [None]:
from pyomo.environ import *
import pandas as pd

# Identify all columns that contain "Adj Close_"
adj_cols = [col for col in df_jg.columns if col.startswith("Adj Close")]

# Extract tickers (assuming format "Adj Close_{ticker}")
tickers = [col.split("_")[1] for col in adj_cols]

# Loop through each ticker and calculate the average adjusted closing price
avg_adj_close = {}  # Use a dictionary to store the results
for ticker in tickers:
    avg_adj_close[ticker] = df_jg[f"Adj Close_{ticker}"].mean()

# Print the result
print("Average Adj Closing Price per stock:")
for ticker, avg_price in avg_adj_close.items():
    print(f"{ticker}: {avg_price}")

# Create a DataFrame from the dictionary
df_jg_returns = pd.DataFrame(list(avg_adj_close.items()), columns=["Ticker", "Average_Adj_Close"])

# Display the new DataFrame
print("\ndf_jg_returns:")
print(df_jg_returns)

Creating the covaraince matrix from the Average of Adj Close for modeling.

In [None]:
# Select closing prices for all relevant tickers:
close_prices_all = df_jg[[f"Adj Close_{ticker}" for ticker in tickers]]

# Calculate the covariance matrix:
df_jg_cov = close_prices_all.cov()

In [None]:
m = ConcreteModel()

In [None]:
# Decision Variables
m.MSFT = Var(within=NonNegativeReals, bounds= (0,1))
m.JNJ = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.GS = Var(within=NonNegativeReals, bounds= (0,1))
m.TSLA = Var(within=NonNegativeReals, bounds= (0,1))
m.MA = Var(within=NonNegativeReals, bounds= (0,1))
m.UAL = Var(within=NonNegativeReals, bounds= (0,1))
m.AMZN = Var(within=NonNegativeReals, bounds= (0,1))
m.META = Var(within=NonNegativeReals, bounds= (0,1))

MAX(Return = 419*MSFT + 210*AAPL + 527*META + 190*AMZN + 152*JNJ + 482*GS + 62*UAL + 249*TSLA + 481*MA +168*GOOG)

In [None]:
m.Objective = Objective(expr = df_jg_returns["Average_Adj_Close"].iloc[0] * m.MSFT +
                        df_jg_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_jg_returns["Average_Adj_Close"].iloc[2] * m.META +
                        df_jg_returns["Average_Adj_Close"].iloc[3] * m.AMZN +
                        df_jg_returns["Average_Adj_Close"].iloc[4] * m.JNJ +
                        df_jg_returns["Average_Adj_Close"].iloc[5] * m.GS +
                        df_jg_returns["Average_Adj_Close"].iloc[6] * m.UAL +
                        df_jg_returns["Average_Adj_Close"].iloc[7] * m.TSLA +
                        df_jg_returns["Average_Adj_Close"].iloc[8] * m.MA +
                        df_jg_returns["Average_Adj_Close"].iloc[9] * m.GOOG, sense = maximize)

In [None]:
m.total_risk = Constraint(expr = m.MSFT + m.AAPL + m.META + m.AMZN + m.JNJ + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 0.05)
m.sum_proportion = Constraint(expr = m.MSFT + m.AAPL + m.META + m.AMZN + m.JNJ + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.MSFT, m.AAPL, m.META, m.AMZN, m.JNJ, m.GS, m.UAL, m.TSLA, m.MA, m.GOOG]
  tickers = ['Adj Close_MSFT', 'Adj Close_AAPL', 'Adj Close_META', 'Adj Close_AMZN', 'Adj Close_JNJ', 'Adj Close_GS', 'Adj Close_UAL', 'Adj Close_TSLA', 'Adj Close_MA', 'Adj Close_GOOG']
  risk_exp = 0
  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i]*df_jg_cov.at[tickers[i],tickers[j]]*variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk =  0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005) # take tiny steps
risk_limits

In [None]:
%matplotlib inline
from pylab import *

import shutil
import sys
import os.path

if not shutil.which("pyomo"):
    !pip install -q pyomo
    assert(shutil.which("pyomo"))

if not shutil.which("ipopt"):
    # here is the IPOPT zip file
    !gdown 10XRvLZqrpSNiXVAN-pipU52BVRwoGcNQ
    !unzip -o -q ipopt-linux64_dw
    assert(shutil.which("ipopt") or os.path.isfile("ipopt"))

from pyomo.environ import *

SOLVER = 'ipopt'
EXECUTABLE = '/content/ipopt'
ipopt_executable = '/content/ipopt'

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.MSFT(), m.AAPL(), m.META(), m.AMZN(), m.JNJ(), m.GS(), m.UAL(), m.TSLA(), m.MA(), m.GOOG()]
  returns[r] =  m.MSFT()*df_jg_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_jg_returns["Average_Adj_Close"].iloc[1] + m.META()*df_jg_returns["Average_Adj_Close"].iloc[2] + m.AMZN()*df_jg_returns["Average_Adj_Close"].iloc[3] + m.JNJ()*df_jg_returns["Average_Adj_Close"].iloc[4] + m.GS()*df_jg_returns["Average_Adj_Close"].iloc[5] + m.UAL()*df_jg_returns["Average_Adj_Close"].iloc[6] + m.TSLA()*df_jg_returns["Average_Adj_Close"].iloc[7] + m.MA()*df_jg_returns["Average_Adj_Close"].iloc[8] + m.GOOG()*df_jg_returns["Average_Adj_Close"].iloc[9]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['MSFT', 'AAPL', 'META', 'AMZN', 'JNJ', 'GS', 'UAL', 'TSLA', 'MA', 'GOOG']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling BASELINE: Optimal Stock Allocation for Different Risk Levels') # corrected typo
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=5, ncols=2, figsize=(12, 18))
param_analysis.plot(
    subplots=True,
    layout=(5, 2),       # 5 rows, 2 columns
    ax=axes,             # use our custom axes
    sharex=False,
    sharey=False,
    legend=True
)

for row in axes:
    for ax in row:
        ax.set_xlabel("Risk Level")
        ax.set_ylabel("Optimal Allocation")

plt.tight_layout()
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 10))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling BASELINE: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk: {risk}") # added colon
reward = list(returns.values())
print(f"Reward: {reward}") # added colon
print('\t')

from pylab import *
plot(risk, reward, '-.')
title('Modeling BASELINE: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

In [None]:
import matplotlib.pyplot as plt

# Filter data for the specified risk range
filtered_risk = [r for r in risk if 0.042 <= r <= 0.05]
filtered_reward = [reward[risk.index(r)] for r in filtered_risk]

# Create the plot
plt.plot(filtered_risk, filtered_reward, '-.')
plt.title('Efficient Frontier (Risk 0.042 - 0.05)')
plt.xlabel('Risk')
plt.ylabel('Reward (Return)')
plt.grid(True)  # Add a grid for better readability
plt.show()

The Optimal Stock Allocation for Different Risk Levels charts (seperate optimal stock allocation graphs and a stacked bar graph for each stock for better visibility) is showing how an "optimal" stock allocation (y-axis) breaks down across different risk levels (x-axis).
  - In this case the risk levels ranges from (0.0003, 0.05).

- One stock (color purple) dominates most all of these bars which corresponds to the stock JNJ.
  - This indicates that the model often allocates JNJ in higher proportions and is always included in the portfolio.
- Stocks like GOOG and MSFT are also often included in the portfolio but at a lower proportion than JNJ.
- MA, AMZN, AAPL, UAL, GS, TSLA, and META are included in the portfolio typically at around risk level 0.0388 to 0.0443 (higher end of the range) but at a miniscule proportion.


The Efficient Frontier shows how returns change with different levels of risk.
  - The non-linear optimization reveals thresholds where a marginal change in risk constraints triggers a substantial rebalancing of asset allocations, yielding higher expected returns while also increasing volatility.
  - There are small changes in risk which are causing big increases in the portfolio's return.
    - This can be due to the portfolio being highly sensitive to minor changes in risk parameters.
  - At specific risk levels, a tiny increase in allowed risk allows the model to pick a very different mix of stocks that can result in significantly higher returns.
    - For example, specifically at a risk level of 0.0428 the model produces an unusal mix of stocks (all 10 stocks) and allocations, confirming that a critical threshold is reached within the optimization process.

### Ranking of Optimal Stock Allocation

In [None]:
# Choose a specific risk limit (for example, the maximum risk limit in your analysis)
selected_risk = param_analysis.index.max()
optimal_allocations = param_analysis.loc[selected_risk]

# Sort the allocations in ascending order
sorted_allocations = optimal_allocations.sort_values()

# Print each stock with its allocation percentage
print(f"Optimal Allocation (sorted ascending) for risk limit {selected_risk}:")
for stock, allocation in sorted_allocations.items():
    print(f"{stock}: {allocation*100:.2f}%")

This confirms that JNJ dominates the portfolio more than any other stock by x3.

## Modeling w/o JNJ

Our baseline model indicated that JNJ was dominating the portfolio. To explore alternatives, we’re removing JNJ from the decision variables while keeping all other conditions the same, allowing us to see which stock becomes the next most dominant.

In [None]:
m = ConcreteModel()
# Decision Variables
m.MSFT = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.GS = Var(within=NonNegativeReals, bounds= (0,1))
m.TSLA = Var(within=NonNegativeReals, bounds= (0,1))
m.MA = Var(within=NonNegativeReals, bounds= (0,1))
m.UAL = Var(within=NonNegativeReals, bounds= (0,1))
m.AMZN = Var(within=NonNegativeReals, bounds= (0,1))
m.META = Var(within=NonNegativeReals, bounds= (0,1))

MAX(Return = 419*MSFT + 210*AAPL + 527*META + 190*AMZN + 482*GS + 62*UAL + 249*TSLA + 481*MA +168*GOOG)

In [None]:
m.Objective = Objective(expr = df_jg_returns["Average_Adj_Close"].iloc[0] * m.MSFT +
                        df_jg_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_jg_returns["Average_Adj_Close"].iloc[2] * m.META +
                        df_jg_returns["Average_Adj_Close"].iloc[3] * m.AMZN +
                        df_jg_returns["Average_Adj_Close"].iloc[5] * m.GS +
                        df_jg_returns["Average_Adj_Close"].iloc[6] * m.UAL +
                        df_jg_returns["Average_Adj_Close"].iloc[7] * m.TSLA +
                        df_jg_returns["Average_Adj_Close"].iloc[8] * m.MA +
                        df_jg_returns["Average_Adj_Close"].iloc[9] * m.GOOG, sense = maximize)

In [None]:
m.total_risk = Constraint(expr = m.MSFT + m.AAPL + m.META + m.AMZN + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 0.05)
m.sum_proportion = Constraint(expr = m.MSFT + m.AAPL + m.META + m.AMZN + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.MSFT, m.AAPL, m.META, m.AMZN, m.GS, m.UAL, m.TSLA, m.MA, m.GOOG]
  tickers = ['Adj Close_MSFT', 'Adj Close_AAPL', 'Adj Close_META', 'Adj Close_AMZN', 'Adj Close_GS', 'Adj Close_UAL', 'Adj Close_TSLA', 'Adj Close_MA', 'Adj Close_GOOG']
  risk_exp = 0
  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i]*df_jg_cov.at[tickers[i],tickers[j]]*variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk =  0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005) # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.MSFT(), m.AAPL(), m.META(), m.AMZN(), m.GS(), m.UAL(), m.TSLA(), m.MA(), m.GOOG()]
  returns[r] =  m.MSFT()*df_jg_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_jg_returns["Average_Adj_Close"].iloc[1] + m.META()*df_jg_returns["Average_Adj_Close"].iloc[2] + m.AMZN()*df_jg_returns["Average_Adj_Close"].iloc[3] + m.GS()*df_jg_returns["Average_Adj_Close"].iloc[5] + m.UAL()*df_jg_returns["Average_Adj_Close"].iloc[6] + m.TSLA()*df_jg_returns["Average_Adj_Close"].iloc[7] + m.MA()*df_jg_returns["Average_Adj_Close"].iloc[8] + m.GOOG()*df_jg_returns["Average_Adj_Close"].iloc[9]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['MSFT', 'AAPL', 'META', 'AMZN', 'GS', 'UAL', 'TSLA', 'MA', 'GOOG']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o JNJ: Optimal Stock Allocation for Different Risk Levels') # corrected typo
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=3, ncols=3, figsize=(12, 18))
param_analysis.plot(
    subplots=True,
    layout=(3, 3),       # 3 rows, 3 columns
    ax=axes,             # use our custom axes
    sharex=False,
    sharey=False,
    legend=True
)

for row in axes:
    for ax in row:
        ax.set_xlabel("Risk Level")
        ax.set_ylabel("Optimal Allocation")

plt.tight_layout()
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 10))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o JNJ: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk: {risk}") # added colon
reward = list(returns.values())
print(f"Reward: {reward}") # added colon
print('\t')

from pylab import *
plot(risk, reward, '-.')
title('Modeling w/o JNJ: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After taking out JNJ, the next most dominate stock is MSFT which is expected when you look at the BASELINE model. It was either going to be GOOD or MSFT.
  - We are also starting to see other stocks becoming more dominate in the Optimal Stock Allocation like UAL and GOOG.
  - Without JNJ, the maximum return has dropped down to 250.

## Modeling w/o MSFT

After removing JNJ, MSFT emerged as the most dominant stock. To identify the next most influential asset, we are now excluding both MSFT and JNJ from the decision variables, allowing us to analyze the portfolio composition without these key components.

In [None]:
m = ConcreteModel()
# Decision Variables
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.GS = Var(within=NonNegativeReals, bounds= (0,1))
m.TSLA = Var(within=NonNegativeReals, bounds= (0,1))
m.MA = Var(within=NonNegativeReals, bounds= (0,1))
m.UAL = Var(within=NonNegativeReals, bounds= (0,1))
m.AMZN = Var(within=NonNegativeReals, bounds= (0,1))
m.META = Var(within=NonNegativeReals, bounds= (0,1))

MAX(Return = 210*AAPL + 527*META + 190*AMZN + 482*GS + 62*UAL + 249*TSLA + 481*MA +168*GOOG)

In [None]:
m.Objective = Objective(expr = df_jg_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_jg_returns["Average_Adj_Close"].iloc[2] * m.META +
                        df_jg_returns["Average_Adj_Close"].iloc[3] * m.AMZN +
                        df_jg_returns["Average_Adj_Close"].iloc[5] * m.GS +
                        df_jg_returns["Average_Adj_Close"].iloc[6] * m.UAL +
                        df_jg_returns["Average_Adj_Close"].iloc[7] * m.TSLA +
                        df_jg_returns["Average_Adj_Close"].iloc[8] * m.MA +
                        df_jg_returns["Average_Adj_Close"].iloc[9] * m.GOOG, sense = maximize)

In [None]:
m.total_risk = Constraint(expr = m.AAPL + m.META + m.AMZN + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 0.05)
m.sum_proportion = Constraint(expr = m.AAPL + m.META + m.AMZN + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.AAPL, m.META, m.AMZN, m.GS, m.UAL, m.TSLA, m.MA, m.GOOG]
  tickers = [ 'Adj Close_AAPL', 'Adj Close_META', 'Adj Close_AMZN', 'Adj Close_GS', 'Adj Close_UAL', 'Adj Close_TSLA', 'Adj Close_MA', 'Adj Close_GOOG']
  risk_exp = 0
  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i]*df_jg_cov.at[tickers[i],tickers[j]]*variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk =  0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005) # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.AAPL(), m.META(), m.AMZN(), m.GS(), m.UAL(), m.TSLA(), m.MA(), m.GOOG()]
  returns[r] = m.AAPL()*df_jg_returns["Average_Adj_Close"].iloc[1] + m.META()*df_jg_returns["Average_Adj_Close"].iloc[2] + m.AMZN()*df_jg_returns["Average_Adj_Close"].iloc[3] + m.GS()*df_jg_returns["Average_Adj_Close"].iloc[5] + m.UAL()*df_jg_returns["Average_Adj_Close"].iloc[6] + m.TSLA()*df_jg_returns["Average_Adj_Close"].iloc[7] + m.MA()*df_jg_returns["Average_Adj_Close"].iloc[8] + m.GOOG()*df_jg_returns["Average_Adj_Close"].iloc[9]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['AAPL', 'META', 'AMZN', 'GS', 'UAL', 'TSLA', 'MA', 'GOOG']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o MSFT: Optimal Stock Allocation for Different Risk Levels') # corrected typo
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=4, ncols=2, figsize=(12, 18))
param_analysis.plot(
    subplots=True,
    layout=(4, 2),       # 4 rows, 2 columns
    ax=axes,             # use our custom axes
    sharex=False,
    sharey=False,
    legend=True
)

for row in axes:
    for ax in row:
        ax.set_xlabel("Risk Level")
        ax.set_ylabel("Optimal Allocation")

plt.tight_layout()
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 10))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o MSFT: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk: {risk}") # added colon
reward = list(returns.values())
print(f"Reward: {reward}") # added colon
print('\t')

from pylab import *
plot(risk, reward, '-.')
title('Modeling w/o MSFT: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After taking out MSFT, the next most dominate stock is GOOG which is expected when you look at the BASELINE model. GOOG, MSFT, and JNJ looked like they were the top 3 most dominate stocks at the maximum risk level of 5%.
  - Without JNJ, the maximum return has exponentially dropped to 2% which indicates that JNJ and MSFT have a significant impact in the portfolios.

## Modeling w/o GOOG

After removing JNJ and MSFT, GOOG emerged as the most dominant stock. To identify the next most influential asset, we are now excluding MSFT, JNJ, and GOOG from the decision variables, allowing us to analyze the portfolio composition without these key components.

In [None]:
m = ConcreteModel()
# Decision Variables
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.GS = Var(within=NonNegativeReals, bounds= (0,1))
m.TSLA = Var(within=NonNegativeReals, bounds= (0,1))
m.MA = Var(within=NonNegativeReals, bounds= (0,1))
m.UAL = Var(within=NonNegativeReals, bounds= (0,1))
m.AMZN = Var(within=NonNegativeReals, bounds= (0,1))
m.META = Var(within=NonNegativeReals, bounds= (0,1))

MAX(Return = 210*AAPL + 527*META + 190*AMZN + 482*GS + 62*UAL + 249*TSLA + 481*MA)

In [None]:
m.Objective = Objective(expr = df_jg_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_jg_returns["Average_Adj_Close"].iloc[2] * m.META +
                        df_jg_returns["Average_Adj_Close"].iloc[3] * m.AMZN +
                        df_jg_returns["Average_Adj_Close"].iloc[5] * m.GS +
                        df_jg_returns["Average_Adj_Close"].iloc[6] * m.UAL +
                        df_jg_returns["Average_Adj_Close"].iloc[7] * m.TSLA +
                        df_jg_returns["Average_Adj_Close"].iloc[8] * m.MA, sense = maximize)

In [None]:
m.total_risk = Constraint(expr = m.AAPL + m.META + m.AMZN + m.GS + m.UAL + m.TSLA + m.MA == 0.05)
m.sum_proportion = Constraint(expr = m.AAPL + m.META + m.AMZN + m.GS + m.UAL + m.TSLA + m.MA == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.AAPL, m.META, m.AMZN, m.GS, m.UAL, m.TSLA, m.MA]
  tickers = [ 'Adj Close_AAPL', 'Adj Close_META', 'Adj Close_AMZN', 'Adj Close_GS', 'Adj Close_UAL', 'Adj Close_TSLA', 'Adj Close_MA']
  risk_exp = 0
  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i]*df_jg_cov.at[tickers[i],tickers[j]]*variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk =  0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005) # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.AAPL(), m.META(), m.AMZN(), m.GS(), m.UAL(), m.TSLA(), m.MA()]
  returns[r] = m.AAPL()*df_jg_returns["Average_Adj_Close"].iloc[1] + m.META()*df_jg_returns["Average_Adj_Close"].iloc[2] + m.AMZN()*df_jg_returns["Average_Adj_Close"].iloc[3] + m.GS()*df_jg_returns["Average_Adj_Close"].iloc[5] + m.UAL()*df_jg_returns["Average_Adj_Close"].iloc[6] + m.TSLA()*df_jg_returns["Average_Adj_Close"].iloc[7] + m.MA()*df_jg_returns["Average_Adj_Close"].iloc[8]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['AAPL', 'META', 'AMZN', 'GS', 'UAL', 'TSLA', 'MA']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o GOOG: Optimal Stock Allocation for Different Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=7, ncols=1, figsize=(12, 18))

# Iterate through the axes using flatten()
for ax in axes.flatten():  # Use flatten() to iterate over a 1D array
    i = axes.flatten().tolist().index(ax) #Get index for column name
    param_analysis.iloc[:, i].plot(ax=ax, legend=True)  # Plot each column on a separate subplot
    ax.set_xlabel("Risk Level")
    ax.set_ylabel("Optimal Allocation")


plt.tight_layout()
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 10))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o GOOG: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk: {risk}") # added colon
reward = list(returns.values())
print(f"Reward: {reward}") # added colon
print('\t')

from pylab import *
plot(risk, reward, '-.')
title('Modeling w/o GOOG: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After taking out GOOG, the next most dominant stock is AMZN.
  - We are also starting to see other stocks, such as AAPL, become more prominent in the Optimal Stock Allocation.
  - Without GOOG, the maximum return has stayed at 2%, which means that the portfolio's return potential remains the same even after removing one of its previously dominant stock.


## Modeling w/o AMZN

After removing JNJ, GOOG, and MSFT, AMZN emerged as the most dominant stock. To identify the next most influential asset, we are now also going to exclude AMZN from the decision variables, allowing us to analyze the portfolio composition without these key components.

In [None]:
m = ConcreteModel()
# Decision Variables
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.GS = Var(within=NonNegativeReals, bounds= (0,1))
m.TSLA = Var(within=NonNegativeReals, bounds= (0,1))
m.MA = Var(within=NonNegativeReals, bounds= (0,1))
m.UAL = Var(within=NonNegativeReals, bounds= (0,1))
m.META = Var(within=NonNegativeReals, bounds= (0,1))

MAX(Return = 210*AAPL + 527*META + 482*GS + 62*UAL + 249*TSLA + 481*MA)

In [None]:
m.Objective = Objective(expr = df_jg_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_jg_returns["Average_Adj_Close"].iloc[2] * m.META +
                        df_jg_returns["Average_Adj_Close"].iloc[5] * m.GS +
                        df_jg_returns["Average_Adj_Close"].iloc[6] * m.UAL +
                        df_jg_returns["Average_Adj_Close"].iloc[7] * m.TSLA +
                        df_jg_returns["Average_Adj_Close"].iloc[8] * m.MA, sense = maximize)

In [None]:
m.total_risk = Constraint(expr = m.AAPL + m.META + m.GS + m.UAL + m.TSLA + m.MA == 0.05)
m.sum_proportion = Constraint(expr = m.AAPL + m.META + m.GS + m.UAL + m.TSLA + m.MA == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.AAPL, m.META, m.GS, m.UAL, m.TSLA, m.MA]
  tickers = [ 'Adj Close_AAPL', 'Adj Close_META', 'Adj Close_GS', 'Adj Close_UAL', 'Adj Close_TSLA', 'Adj Close_MA']
  risk_exp = 0
  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i]*df_jg_cov.at[tickers[i],tickers[j]]*variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk =  0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005) # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.AAPL(), m.META(), m.GS(), m.UAL(), m.TSLA(), m.MA()]
  returns[r] = m.AAPL()*df_jg_returns["Average_Adj_Close"].iloc[1] + m.META()*df_jg_returns["Average_Adj_Close"].iloc[2] + m.GS()*df_jg_returns["Average_Adj_Close"].iloc[5] + m.UAL()*df_jg_returns["Average_Adj_Close"].iloc[6] + m.TSLA()*df_jg_returns["Average_Adj_Close"].iloc[7] + m.MA()*df_jg_returns["Average_Adj_Close"].iloc[8]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['AAPL', 'META', 'GS', 'UAL', 'TSLA', 'MA']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o AMZN: Optimal Stock Allocation for Different Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(12, 18))

# Iterate through the axes using flatten()
for ax in axes.flatten():  # Use flatten() to iterate over a 1D array
    i = axes.flatten().tolist().index(ax) #Get index for column name
    param_analysis.iloc[:, i].plot(ax=ax, legend=True)  # Plot each column on a separate subplot
    ax.set_xlabel("Risk Level")
    ax.set_ylabel("Optimal Allocation")


plt.tight_layout()
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 10))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o AMZN: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk: {risk}") # added colon
reward = list(returns.values())
print(f"Reward: {reward}") # added colon
print('\t')

from pylab import *
plot(risk, reward, '-.')
title('Modeling w/o AMZN: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After taking out AMZN, the next most dominant stock is UAL.
  - We are also starting to see other stocks, such as UAL, become more prominent in the Optimal Stock Allocation.
  - Without AMZN, the maximum return has dropped to 1% indicating that without the previous dominating stocks the return could keep getting lower.
  - We are starting to see which stocks are not being incorporated in the optimized stock allocation at all like META, GS, TSLA, and MA which could be indicating that under the current model parameters and constraints, these stocks do not contribute favorably to the portfolio's overall risk-return balance.



## Modeling w/o UAL

After removing JNJ, GOOG, MSFT, and AMZN from the decision variables, UAL emerged as the most dominant stock. To uncover the next most influential stock, we are now excluding these key components so we can analyze the portfolio composition to see which stock becomes the new leader.

In [None]:
m = ConcreteModel()
# Decision Variables
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.GS = Var(within=NonNegativeReals, bounds= (0,1))
m.TSLA = Var(within=NonNegativeReals, bounds= (0,1))
m.MA = Var(within=NonNegativeReals, bounds= (0,1))
m.META = Var(within=NonNegativeReals, bounds= (0,1))

m.Objective = Objective(expr = df_jg_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_jg_returns["Average_Adj_Close"].iloc[2] * m.META +
                        df_jg_returns["Average_Adj_Close"].iloc[5] * m.GS +
                        df_jg_returns["Average_Adj_Close"].iloc[7] * m.TSLA +
                        df_jg_returns["Average_Adj_Close"].iloc[8] * m.MA, sense = maximize)

MAX(Return = 210*AAPL + 527*META + 482*GS + 249*TSLA + 481*MA)

In [None]:
m.total_risk = Constraint(expr = m.AAPL + m.META + m.GS + m.TSLA + m.MA == 0.05)
m.sum_proportion = Constraint(expr = m.AAPL + m.META + m.GS + m.TSLA + m.MA == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.AAPL, m.META, m.GS, m.TSLA, m.MA]
  tickers = [ 'Adj Close_AAPL', 'Adj Close_META', 'Adj Close_GS', 'Adj Close_TSLA', 'Adj Close_MA']
  risk_exp = 0
  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i]*df_jg_cov.at[tickers[i],tickers[j]]*variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk =  0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005) # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.AAPL(), m.META(), m.GS(), m.TSLA(), m.MA()]
  returns[r] = m.AAPL()*df_jg_returns["Average_Adj_Close"].iloc[1] + m.META()*df_jg_returns["Average_Adj_Close"].iloc[2] + m.GS()*df_jg_returns["Average_Adj_Close"].iloc[5] + m.TSLA()*df_jg_returns["Average_Adj_Close"].iloc[7] + m.MA()*df_jg_returns["Average_Adj_Close"].iloc[8]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['AAPL', 'META', 'GS', 'TSLA', 'MA']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o UAL: Optimal Stock Allocation for Different Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=5, ncols=1, figsize=(12, 18))

# Iterate through the axes using flatten()
for ax in axes.flatten():  # Use flatten() to iterate over a 1D array
    i = axes.flatten().tolist().index(ax) #Get index for column name
    param_analysis.iloc[:, i].plot(ax=ax, legend=True)  # Plot each column on a separate subplot
    ax.set_xlabel("Risk Level")
    ax.set_ylabel("Optimal Allocation")


plt.tight_layout()
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 10))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o UAL: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk: {risk}") # added colon
reward = list(returns.values())
print(f"Reward: {reward}") # added colon
print('\t')

from pylab import *
plot(risk, reward, '-.')
title('Modeling w/o UAL: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After taking out UAL, the next most dominant stock is APPL.
  - Without UAL, the maximum return has increased back to 2% indicating that UAL was constraining the portfolio's return potential and that its removal allows for a little more efficient allocation of assets.

## Modeling w/o AAPL

After removing JNJ, GOOG, MSFT, AMZN, and UAL from the decision variables, AAPL emerged as the most dominant stock. To uncover the next most influential stock, we are now excluding these key components so we can analyze the portfolio composition to see which stock becomes the new leader.

In [None]:
m = ConcreteModel()
# Decision Variables
m.GS = Var(within=NonNegativeReals, bounds= (0,1))
m.TSLA = Var(within=NonNegativeReals, bounds= (0,1))
m.MA = Var(within=NonNegativeReals, bounds= (0,1))
m.META = Var(within=NonNegativeReals, bounds= (0,1))

m.Objective = Objective(expr = df_jg_returns["Average_Adj_Close"].iloc[2] * m.META +
                        df_jg_returns["Average_Adj_Close"].iloc[5] * m.GS +
                        df_jg_returns["Average_Adj_Close"].iloc[7] * m.TSLA +
                        df_jg_returns["Average_Adj_Close"].iloc[8] * m.MA, sense = maximize)

MAX(Return = 527*META + 482*GS + 249*TSLA + 481*MA)

In [None]:
m.total_risk = Constraint(expr = m.META + m.GS + m.TSLA + m.MA == 0.05)
m.sum_proportion = Constraint(expr = m.META + m.GS + m.TSLA + m.MA == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.META, m.GS, m.TSLA, m.MA]
  tickers = [ 'Adj Close_META', 'Adj Close_GS', 'Adj Close_TSLA', 'Adj Close_MA']
  risk_exp = 0
  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i]*df_jg_cov.at[tickers[i],tickers[j]]*variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk =  0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005) # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.META(), m.GS(), m.TSLA(), m.MA()]
  returns[r] = m.META()*df_jg_returns["Average_Adj_Close"].iloc[2] + m.GS()*df_jg_returns["Average_Adj_Close"].iloc[5] + m.TSLA()*df_jg_returns["Average_Adj_Close"].iloc[7] + m.MA()*df_jg_returns["Average_Adj_Close"].iloc[8]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['META', 'GS', 'TSLA', 'MA']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o AAPL: Optimal Stock Allocation for Different Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 18))

# Iterate through the axes using flatten()
for ax in axes.flatten():  # Use flatten() to iterate over a 1D array
    i = axes.flatten().tolist().index(ax) #Get index for column name
    param_analysis.iloc[:, i].plot(ax=ax, legend=True)  # Plot each column on a separate subplot
    ax.set_xlabel("Risk Level")
    ax.set_ylabel("Optimal Allocation")


plt.tight_layout()
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 10))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o AAPL: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk: {risk}") # added colon
reward = list(returns.values())
print(f"Reward: {reward}") # added colon
print('\t')

from pylab import *
plot(risk, reward, '-.')
title('Modeling w/o AAPL: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After taking out AAPL, the next most dominant stock is MA.
  - Without AAPL, the maximum return has increased to 2.5% indicating that AAPL was constraining the portfolio's return potential and that its removal allows for a little more efficient allocation of assets.  
  - We are also seeing that other than MA, the other stocks are not being incorporated which could be indicating that under the current model parameters, only a very limited set of stocks contributes meaningfully to optimizing the risk-return balance.

## Modeling w/o MA

After removing JNJ, GOOG, MSFT, AMZN, UAL, and AAPL from the decision variables, MA emerged as the most dominant stock. To uncover the next most influential stock, we are now excluding these key components so we can analyze the portfolio composition to see which stock becomes the new leader.

In [None]:
m = ConcreteModel()
# Decision Variables
m.GS = Var(within=NonNegativeReals, bounds= (0,1))
m.TSLA = Var(within=NonNegativeReals, bounds= (0,1))
m.META = Var(within=NonNegativeReals, bounds= (0,1))

m.Objective = Objective(expr = df_jg_returns["Average_Adj_Close"].iloc[2] * m.META +
                        df_jg_returns["Average_Adj_Close"].iloc[5] * m.GS +
                        df_jg_returns["Average_Adj_Close"].iloc[7] * m.TSLA, sense = maximize)

MAX(Return = 527*META + 482*GS + 249*TSLA)

In [None]:
m.total_risk = Constraint(expr = m.META + m.GS + m.TSLA == 0.05)
m.sum_proportion = Constraint(expr = m.META + m.GS + m.TSLA == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.META, m.GS, m.TSLA]
  tickers = [ 'Adj Close_META', 'Adj Close_GS', 'Adj Close_TSLA']
  risk_exp = 0
  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i]*df_jg_cov.at[tickers[i],tickers[j]]*variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk =  0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005) # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.META(), m.GS(), m.TSLA()]
  returns[r] = m.META()*df_jg_returns["Average_Adj_Close"].iloc[2] + m.GS()*df_jg_returns["Average_Adj_Close"].iloc[5] + m.TSLA()*df_jg_returns["Average_Adj_Close"].iloc[7]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['META', 'GS', 'TSLA']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o MA: Optimal Stock Allocation for Different Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(12, 18))

# Iterate through the axes using flatten()
for ax in axes.flatten():  # Use flatten() to iterate over a 1D array
    i = axes.flatten().tolist().index(ax) #Get index for column name
    param_analysis.iloc[:, i].plot(ax=ax, legend=True)  # Plot each column on a separate subplot
    ax.set_xlabel("Risk Level")
    ax.set_ylabel("Optimal Allocation")


plt.tight_layout()
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 10))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o MA: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk: {risk}") # added colon
reward = list(returns.values())
print(f"Reward: {reward}") # added colon
print('\t')

from pylab import *
plot(risk, reward, '-.')
title('Modeling w/o MA: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After taking out MA, the next most dominant stock is META.
  - Without MA, the maximum return has dropped significantly to 1.2% indicating that MA is a critical driver in the portfolio, essential for achieving higher return potential.

## Modeling w/o META

After removing JNJ, GOOG, MSFT, AMZN, UAL, AAPL, and MA from the decision variables, META emerged as the most dominant stock. To uncover the next most influential stock, we are now excluding these key components so we can analyze the portfolio composition to see which stock becomes the new leader.

In [None]:
m = ConcreteModel()
# Decision Variables
m.GS = Var(within=NonNegativeReals, bounds= (0,1))
m.TSLA = Var(within=NonNegativeReals, bounds= (0,1))

m.Objective = Objective(expr = df_jg_returns["Average_Adj_Close"].iloc[5] * m.GS +
                        df_jg_returns["Average_Adj_Close"].iloc[7] * m.TSLA, sense = maximize)

MAX(Return = 482*GS + 249*TSLA)

In [None]:
m.total_risk = Constraint(expr = m.GS + m.TSLA == 0.05)
m.sum_proportion = Constraint(expr = m.GS + m.TSLA == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.GS, m.TSLA]
  tickers = ['Adj Close_GS', 'Adj Close_TSLA']
  risk_exp = 0
  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i]*df_jg_cov.at[tickers[i],tickers[j]]*variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk =  0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005) # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.GS(), m.TSLA()]
  returns[r] = m.GS()*df_jg_returns["Average_Adj_Close"].iloc[5] + m.TSLA()*df_jg_returns["Average_Adj_Close"].iloc[7]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['GS', 'TSLA']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o META: Optimal Stock Allocation for Different Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 18))

# Iterate through the axes using flatten()
for ax in axes.flatten():  # Use flatten() to iterate over a 1D array
    i = axes.flatten().tolist().index(ax) #Get index for column name
    param_analysis.iloc[:, i].plot(ax=ax, legend=True)  # Plot each column on a separate subplot
    ax.set_xlabel("Risk Level")
    ax.set_ylabel("Optimal Allocation")


plt.tight_layout()
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 10))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o META: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk: {risk}") # added colon
reward = list(returns.values())
print(f"Reward: {reward}") # added colon
print('\t')

from pylab import *
plot(risk, reward, '-.')
title('Modeling w/o META: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After taking out META, the next most dominant stock is TSLA.
  - Without META, the maximum return has dropped by a little to 1% indicating that META was contributing meaningfully to the portfolio's return potential.
  - Additionally, the optimized allocations for TSLA and GS are very close to each other, suggesting that both stocks are nearly equally effective in driving the portfolio's performance.

## Modeling w/o TSLA

After removing JNJ, GOOG, MSFT, AMZN, UAL, AAPL, MA, and META from the decision variables, TSLA emerged as the most dominant stock. We are going to analyze how GS does by itself.

In [None]:
m = ConcreteModel()
# Decision Variables
m.GS = Var(within=NonNegativeReals, bounds= (0,1))

MAX(Return = 482*GS)

In [None]:
m.Objective = Objective(expr = df_jg_returns["Average_Adj_Close"].iloc[5] * m.GS, sense = maximize)

m.total_risk = Constraint(expr = m.GS == 0.05)
m.sum_proportion = Constraint(expr = m.GS == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.GS]
  tickers = ['Adj Close_GS']
  risk_exp = 0
  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i]*df_jg_cov.at[tickers[i],tickers[j]]*variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk =  0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005) # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.GS()]
  returns[r] = m.GS()*df_jg_returns["Average_Adj_Close"].iloc[5]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['GS']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o TSLA: Optimal Stock Allocation for Different Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(12, 18))

# Remove or comment out the for loop and directly plot on 'axes'
param_analysis.iloc[:, 0].plot(ax=axes, legend=True)  # Plot the first (and only) column on the axes
axes.set_xlabel("Risk Level")  # Set x-axis label for the axes
axes.set_ylabel("Optimal Allocation")  # Set y-axis label for the axes


plt.tight_layout()
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 10))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o TSLA: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk: {risk}") # added colon
reward = list(returns.values())
print(f"Reward: {reward}") # added colon
print('\t')

from pylab import *
plot(risk, reward, '-.')
title('Modeling w/o TSLA: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After removing all decision variables except for GS, the maximum return drops to 1.2%, highlighting that without the dominant stocks, the portfolio’s return potential collapses dramatically—from 300% to just 1.2%.

## Modeling 10%

In [None]:
m = ConcreteModel()
# Decision Variables
m.MSFT = Var(within=NonNegativeReals, bounds= (0,1))
m.JNJ = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.GS = Var(within=NonNegativeReals, bounds= (0,1))
m.TSLA = Var(within=NonNegativeReals, bounds= (0,1))
m.MA = Var(within=NonNegativeReals, bounds= (0,1))
m.UAL = Var(within=NonNegativeReals, bounds= (0,1))
m.AMZN = Var(within=NonNegativeReals, bounds= (0,1))
m.META = Var(within=NonNegativeReals, bounds= (0,1))

In [None]:
m.Objective = Objective(expr = df_jg_returns["Average_Adj_Close"].iloc[0] * m.MSFT +
                        df_jg_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_jg_returns["Average_Adj_Close"].iloc[2] * m.META +
                        df_jg_returns["Average_Adj_Close"].iloc[3] * m.AMZN +
                        df_jg_returns["Average_Adj_Close"].iloc[4] * m.JNJ +
                        df_jg_returns["Average_Adj_Close"].iloc[5] * m.GS +
                        df_jg_returns["Average_Adj_Close"].iloc[6] * m.UAL +
                        df_jg_returns["Average_Adj_Close"].iloc[7] * m.TSLA +
                        df_jg_returns["Average_Adj_Close"].iloc[8] * m.MA +
                        df_jg_returns["Average_Adj_Close"].iloc[9] * m.GOOG, sense = maximize)

MAX(Return = 419*MSFT + 210*AAPL + 527*META + 190*AMZN + 152*JNJ + 482*GS + 62*UAL + 249*TSLA + 481*MA +168*GOOG)

In [None]:
m.total_risk = Constraint(expr = m.MSFT + m.AAPL + m.META + m.AMZN + m.JNJ + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 0.10)
m.sum_proportion = Constraint(expr = m.MSFT + m.AAPL + m.META + m.AMZN + m.JNJ + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.MSFT, m.AAPL, m.META, m.AMZN, m.JNJ, m.GS, m.UAL, m.TSLA, m.MA, m.GOOG]
  tickers = ['Adj Close_MSFT', 'Adj Close_AAPL', 'Adj Close_META', 'Adj Close_AMZN', 'Adj Close_JNJ', 'Adj Close_GS', 'Adj Close_UAL', 'Adj Close_TSLA', 'Adj Close_MA', 'Adj Close_GOOG']
  risk_exp = 0
  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i]*df_jg_cov.at[tickers[i],tickers[j]]*variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk =  0.10

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.001) # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.MSFT(), m.AAPL(), m.META(), m.AMZN(), m.JNJ(), m.GS(), m.UAL(), m.TSLA(), m.MA(), m.GOOG()]
  returns[r] =  m.MSFT()*df_jg_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_jg_returns["Average_Adj_Close"].iloc[1] + m.META()*df_jg_returns["Average_Adj_Close"].iloc[2] + m.AMZN()*df_jg_returns["Average_Adj_Close"].iloc[3] + m.JNJ()*df_jg_returns["Average_Adj_Close"].iloc[4] + m.GS()*df_jg_returns["Average_Adj_Close"].iloc[5] + m.UAL()*df_jg_returns["Average_Adj_Close"].iloc[6] + m.TSLA()*df_jg_returns["Average_Adj_Close"].iloc[7] + m.MA()*df_jg_returns["Average_Adj_Close"].iloc[8] + m.GOOG()*df_jg_returns["Average_Adj_Close"].iloc[9]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['MSFT', 'AAPL', 'META', 'AMZN', 'JNJ', 'GS', 'UAL', 'TSLA', 'MA', 'GOOG']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling 10%: Optimal Stock Allocation for Different Risk Levels') # corrected typo
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=5, ncols=2, figsize=(12, 18))
param_analysis.plot(
    subplots=True,
    layout=(5, 2),       # 5 rows, 2 columns
    ax=axes,             # use our custom axes
    sharex=False,
    sharey=False,
    legend=True
)

for row in axes:
    for ax in row:
        ax.set_xlabel("Risk Level")
        ax.set_ylabel("Optimal Allocation")

plt.tight_layout()
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 10))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling 10%: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk: {risk}") # added colon
reward = list(returns.values())
print(f"Reward: {reward}") # added colon
print('\t')

from pylab import *
plot(risk, reward, '-.')
title('Modeling 10%: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

The Optimal Stock Allocation for Different Risk Levels charts (stacked bar graph for better visibility) is showing how an "optimal" stock allocation (y-axis) breaks down across different risk levels (x-axis).
  - In this case the risk levels ranges from (0.0003, 0.1).

- One stock (color purple) dominates most all of these bars which corresponds to the stock JNJ.
  - This indicates that the model often allocates JNJ in higher proportions and is always included in the portfolio.
- Stocks like GOOG and MSFT are also often included in the portfolio but at a lower proportion than JNJ.
- META is included in the portfolio only at the risk level of 0.0813 but at a miniscule proportion compared to JNJ.
- AMZN, AAPL, UAL, GS, MA, and TSLA are included twice in the portfolio, once at the risk levels of 0.0583 and 0.0813, also at a miniscule proportion compared to JNJ.

The Efficient Frontier shows how returns change with different levels of risk.
  - The non-linear optimization reveals thresholds where a marginal change in risk constraints triggers a substantial rebalancing of asset allocations, yielding higher expected returns while also increasing volatility.
  - There are small changes in risk which are causing big increases in the portfolio's return.
    - This can be due to the portfolio being highly sensitive to minor changes in risk parameters.
  - At specific risk levels, a tiny increase in allowed risk allows the model to pick a very different mix of stocks that can result in significantly higher returns.
    - For example, specifically at a risk level of 0.0813 the model produces an unusal mix of stocks (all 10 stocks) and allocations, confirming that a critical threshold is reached within the optimization process.
  - Overall, while higher risk generally leads to higher returns, the relationship is not linear.
    - The chart illustrates that returns increase with risk, but there are distinct thresholds where even a slight increase in risk yields a disproportionately large increase in return, resulting in an efficient frontier with sudden jumps rather than a gradual increase.

## Modeling 25%

In [None]:
m = ConcreteModel()
# Decision Variables
m.MSFT = Var(within=NonNegativeReals, bounds= (0,1))
m.JNJ = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.GS = Var(within=NonNegativeReals, bounds= (0,1))
m.TSLA = Var(within=NonNegativeReals, bounds= (0,1))
m.MA = Var(within=NonNegativeReals, bounds= (0,1))
m.UAL = Var(within=NonNegativeReals, bounds= (0,1))
m.AMZN = Var(within=NonNegativeReals, bounds= (0,1))
m.META = Var(within=NonNegativeReals, bounds= (0,1))

In [None]:
m.Objective = Objective(expr = df_jg_returns["Average_Adj_Close"].iloc[0] * m.MSFT +
                        df_jg_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_jg_returns["Average_Adj_Close"].iloc[2] * m.META +
                        df_jg_returns["Average_Adj_Close"].iloc[3] * m.AMZN +
                        df_jg_returns["Average_Adj_Close"].iloc[4] * m.JNJ +
                        df_jg_returns["Average_Adj_Close"].iloc[5] * m.GS +
                        df_jg_returns["Average_Adj_Close"].iloc[6] * m.UAL +
                        df_jg_returns["Average_Adj_Close"].iloc[7] * m.TSLA +
                        df_jg_returns["Average_Adj_Close"].iloc[8] * m.MA +
                        df_jg_returns["Average_Adj_Close"].iloc[9] * m.GOOG, sense = maximize)

MAX(Return = 419*MSFT + 210*AAPL + 527*META + 190*AMZN + 152*JNJ + 482*GS + 62*UAL + 249*TSLA + 481*MA +168*GOOG)

In [None]:
m.total_risk = Constraint(expr = m.MSFT + m.AAPL + m.META + m.AMZN + m.JNJ + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 0.25)
m.sum_proportion = Constraint(expr = m.MSFT + m.AAPL + m.META + m.AMZN + m.JNJ + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.MSFT, m.AAPL, m.META, m.AMZN, m.JNJ, m.GS, m.UAL, m.TSLA, m.MA, m.GOOG]
  tickers = ['Adj Close_MSFT', 'Adj Close_AAPL', 'Adj Close_META', 'Adj Close_AMZN', 'Adj Close_JNJ', 'Adj Close_GS', 'Adj Close_UAL', 'Adj Close_TSLA', 'Adj Close_MA', 'Adj Close_GOOG']
  risk_exp = 0
  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i]*df_jg_cov.at[tickers[i],tickers[j]]*variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk =  0.25

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0025) # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.MSFT(), m.AAPL(), m.META(), m.AMZN(), m.JNJ(), m.GS(), m.UAL(), m.TSLA(), m.MA(), m.GOOG()]
  returns[r] =  m.MSFT()*df_jg_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_jg_returns["Average_Adj_Close"].iloc[1] + m.META()*df_jg_returns["Average_Adj_Close"].iloc[2] + m.AMZN()*df_jg_returns["Average_Adj_Close"].iloc[3] + m.JNJ()*df_jg_returns["Average_Adj_Close"].iloc[4] + m.GS()*df_jg_returns["Average_Adj_Close"].iloc[5] + m.UAL()*df_jg_returns["Average_Adj_Close"].iloc[6] + m.TSLA()*df_jg_returns["Average_Adj_Close"].iloc[7] + m.MA()*df_jg_returns["Average_Adj_Close"].iloc[8] + m.GOOG()*df_jg_returns["Average_Adj_Close"].iloc[9]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['MSFT', 'AAPL', 'META', 'AMZN', 'JNJ', 'GS', 'UAL', 'TSLA', 'MA', 'GOOG']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Model 25%: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=5, ncols=2, figsize=(12, 18))
param_analysis.plot(
    subplots=True,
    layout=(5, 2),       # 5 rows, 2 columns
    ax=axes,             # use our custom axes
    sharex=False,
    sharey=False,
    legend=True
)

for row in axes:
    for ax in row:
        ax.set_xlabel("Risk Level")
        ax.set_ylabel("Optimal Allocation")

plt.tight_layout()
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 10))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling 25%: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Model 25%: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

The Optimal Stock Allocation for Different Risk Levels charts (stacked bar graph for better visibility) is showing how an "optimal" stock allocation (y-axis) breaks down across different risk levels (x-axis).
  - In this case the risk levels ranges from (0.0003, 0.25).

- One stock (color purple) dominates most all of these bars which corresponds to the stock JNJ.
  - This indicates that the model often allocates JNJ in higher proportions and is always included in the portfolio.
- Stocks like GOOG and MSFT are also often included in the portfolio but at a lower proportion than JNJ.
- MA, META, GS, TSLA, UAL, AMZN, and AAPL are included in the portfolio typically at around risk level 0.1353 to 0.2053 (higher end of the range) but at a miniscule proportion.

The Efficient Frontier shows how returns change with different levels of risk.
  - The non-linear optimization reveals thresholds where a marginal change in risk constraints triggers a substantial rebalancing of asset allocations, yielding higher expected returns while also increasing volatility.
  - There are small changes in risk which are causing big increases in the portfolio's return.
    - This can be due to the portfolio being highly sensitive to minor changes in risk parameters.
  - At specific risk levels, a tiny increase in allowed risk allows the model to pick a very different mix of stocks that can result in significantly higher returns.
    - For example, specifically at risk levels of 0.1978 and 0.2053 the model produces an unusal mix of stocks (all 10 stocks) and allocations, confirming that a critical threshold is reached within the optimization process.
  - Overall, while higher risk generally leads to higher returns, the relationship is not linear.
    - The chart illustrates that returns increase with risk, but there are distinct thresholds where even a slight increase in risk yields a disproportionately large increase in return, resulting in an efficient frontier with sudden jumps rather than a gradual increase.

## Modeling 50%

In [None]:
m = ConcreteModel()
# Decision Variables
m.MSFT = Var(within=NonNegativeReals, bounds= (0,1))
m.JNJ = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.GS = Var(within=NonNegativeReals, bounds= (0,1))
m.TSLA = Var(within=NonNegativeReals, bounds= (0,1))
m.MA = Var(within=NonNegativeReals, bounds= (0,1))
m.UAL = Var(within=NonNegativeReals, bounds= (0,1))
m.AMZN = Var(within=NonNegativeReals, bounds= (0,1))
m.META = Var(within=NonNegativeReals, bounds= (0,1))

In [None]:
m.Objective = Objective(expr = df_jg_returns["Average_Adj_Close"].iloc[0] * m.MSFT +
                        df_jg_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_jg_returns["Average_Adj_Close"].iloc[2] * m.META +
                        df_jg_returns["Average_Adj_Close"].iloc[3] * m.AMZN +
                        df_jg_returns["Average_Adj_Close"].iloc[4] * m.JNJ +
                        df_jg_returns["Average_Adj_Close"].iloc[5] * m.GS +
                        df_jg_returns["Average_Adj_Close"].iloc[6] * m.UAL +
                        df_jg_returns["Average_Adj_Close"].iloc[7] * m.TSLA +
                        df_jg_returns["Average_Adj_Close"].iloc[8] * m.MA +
                        df_jg_returns["Average_Adj_Close"].iloc[9] * m.GOOG, sense = maximize)

MAX(Return = 419*MSFT + 210*AAPL + 527*META + 190*AMZN + 152*JNJ + 482*GS + 62*UAL + 249*TSLA + 481*MA +168*GOOG)

In [None]:
m.total_risk = Constraint(expr = m.MSFT + m.AAPL + m.META + m.AMZN + m.JNJ + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 0.50)
m.sum_proportion = Constraint(expr = m.MSFT + m.AAPL + m.META + m.AMZN + m.JNJ + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.MSFT, m.AAPL, m.META, m.AMZN, m.JNJ, m.GS, m.UAL, m.TSLA, m.MA, m.GOOG]
  tickers = ['Adj Close_MSFT', 'Adj Close_AAPL', 'Adj Close_META', 'Adj Close_AMZN', 'Adj Close_JNJ', 'Adj Close_GS', 'Adj Close_UAL', 'Adj Close_TSLA', 'Adj Close_MA', 'Adj Close_GOOG']
  risk_exp = 0
  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i]*df_jg_cov.at[tickers[i],tickers[j]]*variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk =  0.50

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.005) # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.MSFT(), m.AAPL(), m.META(), m.AMZN(), m.JNJ(), m.GS(), m.UAL(), m.TSLA(), m.MA(), m.GOOG()]
  returns[r] =  m.MSFT()*df_jg_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_jg_returns["Average_Adj_Close"].iloc[1] + m.META()*df_jg_returns["Average_Adj_Close"].iloc[2] + m.AMZN()*df_jg_returns["Average_Adj_Close"].iloc[3] + m.JNJ()*df_jg_returns["Average_Adj_Close"].iloc[4] + m.GS()*df_jg_returns["Average_Adj_Close"].iloc[5] + m.UAL()*df_jg_returns["Average_Adj_Close"].iloc[6] + m.TSLA()*df_jg_returns["Average_Adj_Close"].iloc[7] + m.MA()*df_jg_returns["Average_Adj_Close"].iloc[8] + m.GOOG()*df_jg_returns["Average_Adj_Close"].iloc[9]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['MSFT', 'AAPL', 'META', 'AMZN', 'JNJ', 'GS', 'UAL', 'TSLA', 'MA', 'GOOG']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Model 50%: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling 50%: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=5, ncols=2, figsize=(12, 18))
param_analysis.plot(
    subplots=True,
    layout=(5, 2),       # 5 rows, 2 columns
    ax=axes,             # use our custom axes
    sharex=False,
    sharey=False,
    legend=True
)

for row in axes:
    for ax in row:
        ax.set_xlabel("Risk Level")
        ax.set_ylabel("Optimal Allocation")

plt.tight_layout()
plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Model 50%: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

The Optimal Stock Allocation for Different Risk Levels charts (stacked bar graph for better visibility) is showing how an "optimal" stock allocation (y-axis) breaks down across different risk levels (x-axis).
  - In this case the risk levels ranges from (0.0003, 0.5).

- One stock (color purple) dominates most all of these bars which corresponds to the stock JNJ.
  - This indicates that the model often allocates JNJ in higher proportions and is always included in the portfolio.
- Stocks like GOOG and MSFT are also often included in the portfolio but at a lower proportion than JNJ.
- MA, META, GS, TSLA, UAL, AMZN, and AAPL are included in the portfolio at risk levels 0.1403 and 0.3403 but at a miniscule proportion.

The Efficient Frontier shows how returns change with different levels of risk.
  - The non-linear optimization reveals thresholds where a marginal change in risk constraints triggers a substantial rebalancing of asset allocations, yielding higher expected returns while also increasing volatility.
  - There are small changes in risk which are causing big increases in the portfolio's return.
    - This can be due to the portfolio being highly sensitive to minor changes in risk parameters.
  - At specific risk levels, a tiny increase in allowed risk allows the model to pick a very different mix of stocks that can result in significantly higher returns.
    - For example, specifically at risk levels of 0.3403 the model produces an unusal mix of stocks (all 10 stocks) and allocations, confirming that a critical threshold is reached within the optimization process.
  - Overall, while higher risk generally leads to higher returns, the relationship is not linear.
    - The chart illustrates that returns increase with risk, but there are distinct thresholds where even a slight increase in risk yields a disproportionately large increase in return, resulting in an efficient frontier with sudden jumps rather than a gradual increase.

## Modeling 75%

In [None]:
m = ConcreteModel()
# Decision Variables
m.MSFT = Var(within=NonNegativeReals, bounds= (0,1))
m.JNJ = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.GS = Var(within=NonNegativeReals, bounds= (0,1))
m.TSLA = Var(within=NonNegativeReals, bounds= (0,1))
m.MA = Var(within=NonNegativeReals, bounds= (0,1))
m.UAL = Var(within=NonNegativeReals, bounds= (0,1))
m.AMZN = Var(within=NonNegativeReals, bounds= (0,1))
m.META = Var(within=NonNegativeReals, bounds= (0,1))

In [None]:
m.Objective = Objective(expr = df_jg_returns["Average_Adj_Close"].iloc[0] * m.MSFT +
                        df_jg_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_jg_returns["Average_Adj_Close"].iloc[2] * m.META +
                        df_jg_returns["Average_Adj_Close"].iloc[3] * m.AMZN +
                        df_jg_returns["Average_Adj_Close"].iloc[4] * m.JNJ +
                        df_jg_returns["Average_Adj_Close"].iloc[5] * m.GS +
                        df_jg_returns["Average_Adj_Close"].iloc[6] * m.UAL +
                        df_jg_returns["Average_Adj_Close"].iloc[7] * m.TSLA +
                        df_jg_returns["Average_Adj_Close"].iloc[8] * m.MA +
                        df_jg_returns["Average_Adj_Close"].iloc[9] * m.GOOG, sense = maximize)

MAX(Return = 419*MSFT + 210*AAPL + 527*META + 190*AMZN + 152*JNJ + 482*GS + 62*UAL + 249*TSLA + 481*MA +168*GOOG)

In [None]:
m.total_risk = Constraint(expr = m.MSFT + m.AAPL + m.META + m.AMZN + m.JNJ + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 0.75)
m.sum_proportion = Constraint(expr = m.MSFT + m.AAPL + m.META + m.AMZN + m.JNJ + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.MSFT, m.AAPL, m.META, m.AMZN, m.JNJ, m.GS, m.UAL, m.TSLA, m.MA, m.GOOG]
  tickers = ['Adj Close_MSFT', 'Adj Close_AAPL', 'Adj Close_META', 'Adj Close_AMZN', 'Adj Close_JNJ', 'Adj Close_GS', 'Adj Close_UAL', 'Adj Close_TSLA', 'Adj Close_MA', 'Adj Close_GOOG']
  risk_exp = 0
  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i]*df_jg_cov.at[tickers[i],tickers[j]]*variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk =  0.75

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0075) # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.MSFT(), m.AAPL(), m.META(), m.AMZN(), m.JNJ(), m.GS(), m.UAL(), m.TSLA(), m.MA(), m.GOOG()]
  returns[r] =  m.MSFT()*df_jg_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_jg_returns["Average_Adj_Close"].iloc[1] + m.META()*df_jg_returns["Average_Adj_Close"].iloc[2] + m.AMZN()*df_jg_returns["Average_Adj_Close"].iloc[3] + m.JNJ()*df_jg_returns["Average_Adj_Close"].iloc[4] + m.GS()*df_jg_returns["Average_Adj_Close"].iloc[5] + m.UAL()*df_jg_returns["Average_Adj_Close"].iloc[6] + m.TSLA()*df_jg_returns["Average_Adj_Close"].iloc[7] + m.MA()*df_jg_returns["Average_Adj_Close"].iloc[8] + m.GOOG()*df_jg_returns["Average_Adj_Close"].iloc[9]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['MSFT', 'AAPL', 'META', 'AMZN', 'JNJ', 'GS', 'UAL', 'TSLA', 'MA', 'GOOG']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling 75%: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling 75%: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=5, ncols=2, figsize=(12, 18))
param_analysis.plot(
    subplots=True,
    layout=(5, 2),       # 5 rows, 2 columns
    ax=axes,             # use our custom axes
    sharex=False,
    sharey=False,
    legend=True
)

for row in axes:
    for ax in row:
        ax.set_xlabel("Risk Level")
        ax.set_ylabel("Optimal Allocation")

plt.tight_layout()
plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling 75%The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

The Optimal Stock Allocation for Different Risk Levels charts (stacked bar graph for better visibility) is showing how an "optimal" stock allocation (y-axis) breaks down across different risk levels (x-axis).
  - In this case the risk levels ranges from (0.0003, 0.75).

- One stock (color purple) dominates most all of these bars which corresponds to the stock JNJ.
  - This indicates that the model often allocates JNJ in higher proportions and is always included in the portfolio.
- Stocks like GOOG and MSFT are also often included in the portfolio but at a lower proportion than JNJ.
- MA, META, GS, TSLA, AMZN, UAL, and AAPL are included in the portfolio around risk levels 0.1203 to 0.3453 (lower end of the range) but at a miniscule proportion.

The Efficient Frontier shows how returns change with different levels of risk.
  - The non-linear optimization reveals thresholds where a marginal change in risk constraints triggers a substantial rebalancing of asset allocations, yielding higher expected returns while also increasing volatility.
  - There are small changes in risk which are causing big increases in the portfolio's return.
    - This can be due to the portfolio being highly sensitive to minor changes in risk parameters.
  - At specific risk levels, a tiny increase in allowed risk allows the model to pick a very different mix of stocks that can result in significantly higher returns.
    - For example, specifically at risk levels of 0.3303 the model produces an unusal mix of stocks (all 10 stocks) and allocations, confirming that a critical threshold is reached within the optimization process.
  - Returns reached to 200 after increasing the maximum risk to 75%.
  - Overall, while higher risk generally leads to higher returns, the relationship is not linear.
    - The chart illustrates that returns increase with risk, but there are distinct thresholds where even a slight increase in risk yields a disproportionately large increase in return, resulting in an efficient frontier with sudden jumps rather than a gradual increase.

## Modeling 95%

In [None]:
m = ConcreteModel()
# Decision Variables
m.MSFT = Var(within=NonNegativeReals, bounds= (0,1))
m.JNJ = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.GS = Var(within=NonNegativeReals, bounds= (0,1))
m.TSLA = Var(within=NonNegativeReals, bounds= (0,1))
m.MA = Var(within=NonNegativeReals, bounds= (0,1))
m.UAL = Var(within=NonNegativeReals, bounds= (0,1))
m.AMZN = Var(within=NonNegativeReals, bounds= (0,1))
m.META = Var(within=NonNegativeReals, bounds= (0,1))

In [None]:
m.Objective = Objective(expr = df_jg_returns["Average_Adj_Close"].iloc[0] * m.MSFT +
                        df_jg_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_jg_returns["Average_Adj_Close"].iloc[2] * m.META +
                        df_jg_returns["Average_Adj_Close"].iloc[3] * m.AMZN +
                        df_jg_returns["Average_Adj_Close"].iloc[4] * m.JNJ +
                        df_jg_returns["Average_Adj_Close"].iloc[5] * m.GS +
                        df_jg_returns["Average_Adj_Close"].iloc[6] * m.UAL +
                        df_jg_returns["Average_Adj_Close"].iloc[7] * m.TSLA +
                        df_jg_returns["Average_Adj_Close"].iloc[8] * m.MA +
                        df_jg_returns["Average_Adj_Close"].iloc[9] * m.GOOG, sense = maximize)

MAX(Return = 419*MSFT + 210*AAPL + 527*META + 190*AMZN + 152*JNJ + 482*GS + 62*UAL + 249*TSLA + 481*MA +168*GOOG)

In [None]:
m.total_risk = Constraint(expr = m.MSFT + m.AAPL + m.META + m.AMZN + m.JNJ + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 0.95)
m.sum_proportion = Constraint(expr = m.MSFT + m.AAPL + m.META + m.AMZN + m.JNJ + m.GS + m.UAL + m.TSLA + m.MA + m.GOOG == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.MSFT, m.AAPL, m.META, m.AMZN, m.JNJ, m.GS, m.UAL, m.TSLA, m.MA, m.GOOG]
  tickers = ['Adj Close_MSFT', 'Adj Close_AAPL', 'Adj Close_META', 'Adj Close_AMZN', 'Adj Close_JNJ', 'Adj Close_GS', 'Adj Close_UAL', 'Adj Close_TSLA', 'Adj Close_MA', 'Adj Close_GOOG']
  risk_exp = 0
  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i]*df_jg_cov.at[tickers[i],tickers[j]]*variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk =  0.95

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0095) # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.MSFT(), m.AAPL(), m.META(), m.AMZN(), m.JNJ(), m.GS(), m.UAL(), m.TSLA(), m.MA(), m.GOOG()]
  returns[r] =  m.MSFT()*df_jg_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_jg_returns["Average_Adj_Close"].iloc[1] + m.META()*df_jg_returns["Average_Adj_Close"].iloc[2] + m.AMZN()*df_jg_returns["Average_Adj_Close"].iloc[3] + m.JNJ()*df_jg_returns["Average_Adj_Close"].iloc[4] + m.GS()*df_jg_returns["Average_Adj_Close"].iloc[5] + m.UAL()*df_jg_returns["Average_Adj_Close"].iloc[6] + m.TSLA()*df_jg_returns["Average_Adj_Close"].iloc[7] + m.MA()*df_jg_returns["Average_Adj_Close"].iloc[8] + m.GOOG()*df_jg_returns["Average_Adj_Close"].iloc[9]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['MSFT', 'AAPL', 'META', 'AMZN', 'JNJ', 'GS', 'UAL', 'TSLA', 'MA', 'GOOG']]
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling 95%: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling 95%: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=5, ncols=2, figsize=(12, 18))
param_analysis.plot(
    subplots=True,
    layout=(5, 2),       # 5 rows, 2 columns
    ax=axes,             # use our custom axes
    sharex=False,
    sharey=False,
    legend=True
)

for row in axes:
    for ax in row:
        ax.set_xlabel("Risk Level")
        ax.set_ylabel("Optimal Allocation")

plt.tight_layout()
plt.show()


# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling 95%: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

The Optimal Stock Allocation for Different Risk Levels charts (stacked bar graph for better visibility) is showing how an "optimal" stock allocation (y-axis) breaks down across different risk levels (x-axis).
  - In this case the risk levels ranges from (0.0003, 0.95).

- One stock (color purple) dominates most all of these bars which corresponds to the stock JNJ.
  - This indicates that the model often allocates JNJ in higher proportions and is always included in the portfolio.
- Stocks like GOOG and MSFT are also often included in the portfolio but at a lower proportion than JNJ.
- MA, META, GS, TSLA, AMZN, UAL, and AAPL are included in the portfolio around risk levels 0.0573 to 0.8648 (more spread out compared to previoud models) but at a miniscule proportion.

The Efficient Frontier shows how returns change with different levels of risk.
  - The non-linear optimization reveals thresholds where a marginal change in risk constraints triggers a substantial rebalancing of asset allocations, yielding higher expected returns while also increasing volatility.
  - There are small changes in risk which are causing big increases in the portfolio's return.
    - This can be due to the portfolio being highly sensitive to minor changes in risk parameters.
  - At specific risk levels, a tiny increase in allowed risk allows the model to pick a very different mix of stocks that can result in significantly higher returns.
    - For example, specifically at risk levels of 0.8648 the model produces an unusal mix of stocks (all 10 stocks) and allocations, confirming that a critical threshold is reached within the optimization process.
  - Returns reached to 300 after increasing the maximum risk to 95%.
  - Overall, while higher risk generally leads to higher returns, the relationship is not linear.
    - The chart illustrates that returns increase with risk, but there are distinct thresholds where even a slight increase in risk yields a disproportionately large increase in return, resulting in an efficient frontier with sudden jumps rather than a gradual increase.

## Josh's Model Conclusion



- The series of Optimal Stock Allocation models demonstrates that the risk return relationship is highly non-linear (Non-Linear Optimization). Small changes in risk constraints can trigger dramatic shifts in asset allocation and, consequently, in expected returns.  
- In our baseline models, a dominant stock (JNJ) consistently drives the portfolio's composition. Removing dominant stocks like JNJ, MSFT, and GOOG sequentially reveals how critical these assets are in achieving high returns.  
- Lower risk models (the 5% model, baseline model) can theoretically achieve very high returns through narrow, specific allocations, but they tend to be highly sensitive and potentially volatile. In contrast, the 10% model (the optimal model) delivers up to 175% return with a more stable profile, making it a better option compared to the others with a overall lower risk and volatility.
- Conversely, as the maximum allowed risk increases (25%, 50%, 75%, 95%), we observe that while the overall return potential may rise, the efficiency of the risk-return trade-off does not improve linearly.  


# Scott Franklin

1. JPMorgan Chase (JPM)
A leading global financial institution offering investment banking, asset management, and retail banking services.

2. Apple (AAPL)
Known for its consumer electronics (iPhone, iPad) and services (iCloud, Apple Music), a leader in tech innovation.

3. Home Depot (HD)
The largest home improvement retailer, benefiting from strong demand in DIY and home renovation products.

4. Microsoft (MSFT)
A tech giant in software (Windows, Office), cloud services (Azure), and gaming (Xbox), with steady growth.

5. Visa (V)
A global leader in digital payments, facilitating secure credit and debit transactions worldwide.

6. Walmart (WMT)
The world’s largest retailer, excelling in both physical stores and e-commerce.

7. McDonald's (MCD)
Global fast-food leader with a large franchise model, known for consistent growth and brand strength.

8. Johnson & Johnson (JNJ)
A diversified healthcare company specializing in medical devices, pharmaceuticals, and consumer health products.

9. Alphabet (GOOG)
Parent of Google, dominating digital advertising and online services, while investing in AI and innovation.

10. PepsiCo (PEP)
A multinational food and beverage company with popular brands like Pepsi, Gatorade, and Frito-Lay, ensuring steady growth.



## Importing Dataset

Here we are importing the dataset for Scott Franklin through excel.

In [None]:
import pandas as pd

excel_path = "/content/Scott Franklin Final.xlsx"
xls = pd.ExcelFile(excel_path)

df_list = []

for sheet_name in xls.sheet_names:
    # Parse the sheet and strip extra whitespace from column names
    df = xls.parse(sheet_name)
    df.columns = df.columns.str.strip()

    # Convert the Date column to datetime and set it as the index
    df["Date"] = pd.to_datetime(df["Date"], infer_datetime_format=True, errors="coerce")
    df.set_index("Date", inplace=True)

    # Use the sheet name as the ticker identifier
    ticker = sheet_name.strip()

    # Define the ticker-specific column names (these should match exactly your Excel headers)
    open_col      = f"Open_{ticker}"
    high_col      = f"High_{ticker}"
    low_col       = f"Low_{ticker}"
    close_col     = f"Close_{ticker}"
    adj_close_col = f"Adj Close_{ticker}"
    volume_col    = f"Volume_{ticker}"
    buy_col       = f"Buy_{ticker}"
    sell_col      = f"Sell_{ticker}"
    net_col       = f"Holding_{ticker}"

    # Clean numeric columns: remove commas and convert to float for Buy, Sell, and Net columns
    for col in [buy_col, sell_col, net_col]:
        if col in df.columns:
            df.loc[:, col] = pd.to_numeric(
                df[col].replace({',': ''}, regex=True), errors="coerce"
            ).fillna(0)

    # --- Calculate Trade Metrics for this ticker ---
    # Create a trade action column specific for this ticker
    trade_action_col = f"Trade_Action_{ticker}"
    def determine_trade_action(row):
        # If both Buy and Sell are > 0, return "Buy & Sell"
        if row[buy_col] > 0 and row[sell_col] > 0:
            return "Buy & Sell"
        elif row[buy_col] > 0:
            return "Buy"
        elif row[sell_col] > 0:
            return "Sell"
        else:
            return "No Trade"
    df[trade_action_col] = df.apply(determine_trade_action, axis=1)

    # Calculate next-day return based on the ticker-specific Close column
    next_day_return_col = f"Next_Day_Return_{ticker}"
    df[next_day_return_col] = df[close_col].pct_change().shift(-1)

    # Evaluate trade impact for this ticker
    trade_impact_col = f"Trade_Impact_{ticker}"
    def evaluate_trade(row):
        action = row[trade_action_col]
        ret = row[next_day_return_col]
        if pd.isnull(ret):
            return "No Data"
        if action == "Buy":
            return "Good Trade" if ret > 0 else "Bad Trade"
        elif action == "Sell":
            return "Good Trade" if ret < 0 else "Bad Trade"
        elif action == "Buy & Sell":
            return "No Difference"
        else:
            return "No Trade"
    df[trade_impact_col] = df.apply(evaluate_trade, axis=1)


    # Calculate net change in the net holding (Buy Sell) for this ticker
    net_change_col = f"Net_Change_{ticker}"
    df[net_change_col] = df[net_col].shift(1) - df[net_col]

    # NEW: Relate portfolio holdings to next-day return.
    # Multiply the net holding (Buy Sell) by the next-day return to get the dollar impact.
    portfolio_impact_col = f"Portfolio_Impact_{ticker}"
    df[portfolio_impact_col] = df[net_col] * df[next_day_return_col]

    # Append the processed DataFrame to our list
    df_list.append(df)

# Concatenate all processed sheets side by side (wide) on the Date index
df_sf = pd.concat(df_list, axis=1, join="outer")
df_sf.sort_index(inplace=True)

# Display final column list and a preview of the wide DataFrame
print("Final columns:")
print(df_sf.columns.tolist())
print(df_sf.head(10))

In [None]:
df_sf.head(100)

Metrics:

Date, Open, High, Low, Close, Adj Close, Volume were imported from Yahoo Finance starting from 01/02/2024 to 02/24/2025.

Added Metrics:

1. Buy = Did they buy on that day?

2. Sell = Did they sell on that day?

  - Imported their buy and sell trade data from Quiver Quantitative.

  - We were given ranges of how much they traded and decided used the midpoint number to make sure that it would be the most accurate in this case.

3. Holding = How much do they hold in that certain stock on that certain day?

  - Imported how much they money held in each stock from Quiver Quantitative

4. Trade Action = Was the trade a buy or sell? Or no trade?

5. Next Day Return =  Calculates the net day return based on the closed price of that stock from the previous day

6. Trade Impact = If they made a trade, was it “Good” or “Bad” the next day? “No Trade” → not evaluated.

7. Net Change = Calculates the net change of their hodldings in that stock from the previous day to the current day

8. Portfolio Impact = How much did their overall holding's value change because of price movement, whether you traded or not?

These metrics let us see the whole picture of how a portfolio is performing. They show us how much of the change comes from the market itself and how much comes from the investor's own trading decisions. This helps us understand the underlying reasons for gains or losses in the portfolio.

#### Shape of Dataset

In [None]:
df_sf.shape

There are 287 rows and 140 columns in this dataset.

### .info()

In [None]:
df_sf.info()

- The dataframe has 287 rows (indexed by dates from 2024-01-02 to 2025-02-24)
- There are 140 columns, with names running from “Open_JPM” through “Portfolio_Impact_PEP.”
- Of the 140 columns, 102 are floating-point, 18 are integers, and 20 are objects (often strings or mixed data).
- The entire DataFrame is using about 316 KB of memory.

### Data Type

In [None]:
# Increase the max rows (or columns) displayed in the console
pd.set_option('display.max_rows', None)  # No limit on rows
# pd.set_option('display.max_columns', None)  # No limit on columns if needed

# Now printing dtypes will not truncate
print("Data types for df_sf columns:")
print(df_sf.dtypes)

In [None]:
for col in df_sf.columns:
    if "Trade_Action" in col:
        df_sf[col] = df_sf[col].astype('category')
    elif "Trade_Impact" in col:
        df_sf[col] = df_sf[col].astype('category')

Changing the data type for Trade_Action_{ticker} and Trade_Impact{ticker) to category instead of object to help with analysis and uses less memory.

In [None]:
# Increase the max rows (or columns) displayed in the console
pd.set_option('display.max_rows', None)  # No limit on rows
# pd.set_option('display.max_columns', None)  # No limit on columns if needed

# Now printing dtypes will not truncate
print("Data types for df_sf columns:")
print(df_sf.dtypes)

Checking to see if the changes were made.

### Missing Values

In [None]:
df_sf.isnull().sum()

In [None]:
# Drop rows with any missing values
df_sf.dropna(inplace=True)

# Display the DataFrame after removing missing values
print(df_sf.head(10))
df_sf.info()
df_sf.isnull().sum()

In [None]:
df_sf.shape

Checking for missing values. The missing values are coming from Next_Day_Return_{ticker} which calculates the net day return based on the closed price of that stock from the previous day and since the data starts from 2024-01-02 and there is no data on 2024-01-01 therefore it shows up NaN. Additionally, in this dataset AAPL and HD have 37 less rows than the other 8 stocks which is why it is showing 37 missing values for both AAPL and HD columns. We have decided to delete them.

##Univariate Analysis

###Histograms

#### JPM

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant JPM features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
jpm_columns = [f"{feature}_JPM" for feature in features]

# Filter dataset for JPM, dropping rows with NaNs
df_jpm = df_sf[jpm_columns].dropna()

# Set up color palette for JPM
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_JPM"
    if col_name in df_jpm.columns:
        sns.histplot(data=df_jpm, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of JPM {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_JPM", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

If we look at price metrics (Open, High, Low, Close, and Adjusted Close), they show a right-skewed distribution, with most trading activity around 190–220. Trading volume follows a similar pattern, with moderate daily activity but significant spikes, which is probably because of institutional trades or major news impact.

Trading volume follows a right-skewed distribution too, meaning most days saw moderate volume, but there were occasional high-volume spikes.

The buy and sell activity histograms reveal that there was no buying activity, whereas large sell-offs occasionally occurred.

In terms of returns and portfolio impact, the next-day return histogram follows a normal distribution, which is expected in an efficient market. However, net change and portfolio impact show occasional extreme values, indicating that specific trades had a notable effect on portfolio performance.

Overall, JPM's stock data reflects consistent trading patterns with periods of volatility.

#### AAPL

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant AAPL features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
aapl_columns = [f"{feature}_AAPL" for feature in features]

# Filter dataset for AAPL, dropping rows with NaNs
df_aapl = df_sf[aapl_columns].dropna()

# Set up color palette for AAPL
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_AAPL"
    if col_name in df_aapl.columns:
        sns.histplot(data=df_aapl, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of AAPL {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_AAPL", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

The price distributions for AAPL (Open, High, Low, Close, and Adjusted Close) indicate that most trading activity occurred around $200–$240, with occasional spikes.

Trading volume follows a right-skewed pattern, meaning lower volumes were more common, but there were occasional large spikes, likely driven by market events. In this time period there is no buy activity, while there are some periodic large sell-offs.

Next-day returns follow a normal distribution, centering around zero, which aligns with expectations in an efficient market. Net change and portfolio impact show occasional extreme values, meaning specific trades had a notable effect on portfolio fluctuations.

Overall, AAPL stock shows stable yet volatile trading trends, with periods of heightened activity.

#### HD

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant HD features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
hd_columns = [f"{feature}_HD" for feature in features]

# Filter dataset for HD, dropping rows with NaNs
df_hd = df_sf[hd_columns].dropna()

# Set up color palette for HD
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_HD"
    if col_name in df_hd.columns:
        sns.histplot(data=df_hd, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of HD {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_HD", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

The price distributions (Open, High, Low, Close, and Adjusted Close) for Home Depot (HD) show a relatively even spread between 340 and 420, with no sharp concentration in any particular range. The KDE lines suggest a stable price movement with some fluctuations.

Trading volume follows a right-skewed pattern, meaning most days saw moderate trading activity, but a few high-volume spikes occurred, likely due to external market factors. The Buy and Sell activity histograms are empty, indicating no recorded transactions in these categories. Holdings remain constant, suggesting a long-term investment strategy without frequent trading.

Next-day returns follow a normal distribution, meaning small daily fluctuations in price are common. Portfolio impact shows a centered distribution around zero, with some days having notable positive or negative effects on the portfolio.

Overall, HD stock appears relatively stable, with consistent trading volume and limited active buying or selling activity.

#### MSFT

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant MSFT features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
msft_columns = [f"{feature}_MSFT" for feature in features]

# Filter dataset for MSFT, dropping rows with NaNs
df_msft = df_sf[msft_columns].dropna()

# Set up color palette for MSFT
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_MSFT"
    if col_name in df_msft.columns:
        sns.histplot(data=df_msft, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of MSFT {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_MSFT", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

Microsoft (MSFT) has a fairly normal price distribution, with most trading happening between 400 and 440 across Open, High, Low, and Close prices. There’s a slight skew towards higher prices, but overall, the price movements appear stable without major spikes.

Trading volume is right-skewed, meaning most days saw moderate activity, but there were some big trading days. Buy activity is non-existent, while Sell activity is minimal but occasionally spikes. The holdings data shows two distinct levels, indicating potential portfolio rebalancing at set intervals.

Next-day returns follow a normal distribution, suggesting that MSFT’s daily price changes are mostly small and predictable. The portfolio impact distribution is centered around zero, though some outlier days had a significant effect on the overall portfolio value.

Overall, MSFT looks like a stable, steadily traded stock in this portfolio, with occasional volume spikes.

#### V

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant V features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
v_columns = [f"{feature}_V" for feature in features]

# Filter dataset for V, dropping rows with NaNs
df_v = df_sf[v_columns].dropna()

# Set up color palette for V
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_V"
    if col_name in df_v.columns:
        sns.histplot(data=df_v, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of V {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_V", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

Visa's (V) stock analysis reveals some interesting patterns. The histograms for its open, high, and low prices show a strong concentration around the 270–290 range, with some outliers in the higher 300s. This suggests that for most of the analyzed period, Visa’s stock traded within a relatively narrow range, with occasional spikes.

The volume histogram is really skewed, with most trading days seeing lower volume and a few days experiencing significantly higher spikes.

The buy, sell, and holding histograms show that Visa was primarily held with little change in positions, which could indicate confidence in long-term value rather than active trading.

Overall, Visa’s stock behavior in this analysis suggests stability, with occasional movements.

#### WMT

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant WMT features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
wmt_columns = [f"{feature}_WMT" for feature in features]

# Filter dataset for JPM, dropping rows with NaNs
df_wmt = df_sf[wmt_columns].dropna()

# Set up color palette for WMT
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_WMT"
    if col_name in df_wmt.columns:
        sns.histplot(data=df_wmt, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of WMT {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_WMT", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

The Walmart (WMT) histograms show that most of its opening, high, and low prices are clustered in the lower price range, around 60-80, with a few outliers above 90.

The closing and adjusted close prices follow a similar pattern, suggesting that the stock has mostly traded in a stable range over time. The volume histogram is heavily skewed, meaning there were more days with lower trading volume, but some days saw significant spikes.

The next-day return distribution is centered around zero, meaning small daily fluctuations. The net change and portfolio impact histograms also show a roughly normal distribution, indicating most changes were minor, with a few larger shifts. Overall, WMT stock seems relatively stable, with occasional price swings but no extreme volatility.

#### MCD

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant MCD features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
mcd_columns = [f"{feature}_MCD" for feature in features]

# Filter dataset for MCD, dropping rows with NaNs
df_mcd = df_sf[mcd_columns].dropna()

# Set up color palette for MCD
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_MCD"
    if col_name in df_mcd.columns:
        sns.histplot(data=df_mcd, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of MCD {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_MCD", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

Looking at the MCD stock data, we can see that its opening, high, and low prices are mostly clustered around the 260-300 range, with some occasional outliers. The distributions are slightly skewed, suggesting some volatility.

The closing and adjusted closing prices follow a similar trend, reinforcing that MCD stock generally fluctuates within this range. The trading volume histogram is heavily right-skewed, meaning most days have relatively low trading volumes, but there are occasional spikes.

The next-day return distribution is centered around 0, meaning daily returns are fairly balanced with no extreme trends.

The portfolio impact histogram is close to a normal distribution, meaning small gains and losses are more frequent, with fewer extreme changes.

Overall, MCD’s stock movement seems fairly steady, with occasional volume spikes and moderate daily fluctuations.

#### JNJ

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant JNJ features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
jnj_columns = [f"{feature}_JNJ" for feature in features]

# Filter dataset for JNJ, dropping rows with NaNs
df_jnj = df_sf[jnj_columns].dropna()

# Set up color palette for JNJ
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_JNJ"
    if col_name in df_jnj.columns:
        sns.histplot(data=df_jnj, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of JNJ {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_JNJ", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

The JNJ stock data shows a spread in opening, high, and low prices, with a slight skew. The closing price follows a similar pattern. Trading volume is right-skewed, meaning most days had lower trading volumes with a few high-volume outliers. There is no buy but we can see there is a sell. Portfolio impact looks normally distributed, suggesting balanced fluctuations. The net change has an extreme outlier, indicating a big shift on certain days. Overall, JNJ stock data seems relatively stable, but with occasional spikes in activity.

#### GOOG

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant GOOG features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
goog_columns = [f"{feature}_GOOG" for feature in features]

# Filter dataset for GOOG, dropping rows with NaNs
df_goog = df_sf[goog_columns].dropna()

# Set up color palette for GOOG
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_GOOG"
    if col_name in df_goog.columns:
        sns.histplot(data=df_goog, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of GOOG {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_GOOG", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

GOOG’s price distributions have most trading activity around 160–180. Trading volume is right-skewed, showing occasional high-volume days, likely due to market events. Buy activity is absent, while Sell activity is minimal, suggesting low trading frequency in this portfolio. Holdings show a bimodal pattern, indicating periodic adjustments. Next-Day Returns follow a normal distribution, and portfolio impact is mostly neutral, though some extreme changes suggest occasional significant influence. Overall, GOOG stock appears stable with some rare high-volume movements.

#### PEP

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant PEP features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
pep_columns = [f"{feature}_PEP" for feature in features]

# Filter dataset for PEP, dropping rows with NaNs
df_pep = df_sf[pep_columns].dropna()

# Set up color palette for PEP
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 4, 5), sharex=False)

# Iterate over features to generate separate histograms
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_PEP"
    if col_name in df_pep.columns:
        sns.histplot(data=df_pep, x=col_name, ax=axes[col_idx],
                     color=feature_colors[feature], kde=True)
        axes[col_idx].set_title(f" Histogram of PEP {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_xlabel(f"{feature}_PEP", fontsize=10)
        axes[col_idx].set_ylabel("Frequency", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

PEP’s price distributions show most trading occurring between 160–175. Trading volume is right-skewed, indicating that while most days had moderate activity, occasional spikes occurred. Buy activity is absent, and Sell activity is minimal, suggesting limited trades. Holdings exhibit a bimodal distribution, possibly due to periodic portfolio adjustments. Next-Day Returns follow a normal distribution, while portfolio impact is mostly neutral, though occasional shifts suggest some high-impact trading days.

## Scott Franklin's Portfolio Strategy & Analysis

1. Diversification Across Sectors

  - The portfolio is spread across six different sectors, reducing sector-specific risks:

    - Technology: Apple (AAPL), Microsoft (MSFT)
    - Financials: JPMorgan Chase (JPM), Visa (V)
    - Consumer Staples: Walmart (WMT), PepsiCo (PEP)
    - Consumer Discretionary: Home Depot (HD), McDonald's (MCD)
    - Health Care: Johnson & Johnson (JNJ)
    - Communication Services: Alphabet (GOOG)

  - This mix balances high-growth tech stocks with stable blue-chip companies, creating a well-rounded investment strategy.

2. Heavy Emphasis on Large-Cap, Blue-Chip Stocks

  - All 10 companies are mega-cap stocks, each with a market capitalization of over $200 billion.

  - The presence of defensive stocks like JNJ, PEP, and WMT suggests a focus on stability, while tech giants AAPL, MSFT, and GOOG provide high-growth opportunities.

3. Connection to Scott Franklin’s Investment Strategy

  - The portfolio suggests an institutional-style investment approach focused on:

  - Low-risk, high-reward investments in established companies rather than speculative or small-cap stocks.

4. Potential Strategy Insights

  - The portfolio is low-volatility, with stocks that historically weather economic downturns well.

###Boxplots

####JPM

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant JPM features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
jpm_columns = [f"{feature}_JPM" for feature in features]

# Filter dataset for JPM
df_jpm = df_sf[jpm_columns].dropna()

# Set up color palette for JPM
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_JPM"
    if col_name in df_jpm.columns:
        sns.boxplot(y=df_jpm[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"JPM {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

The boxplots for JPM show that its stock prices (Open, High, Low, Close, and Adj Close) are fairly spread out, with a median around 200-220 and some outliers. Volume has a right-skewed distribution, with occasional high trading spikes. There’s almost no Buy activity and minimal Sell activity, with a few extreme sell outliers. Holdings remain stable, suggesting a long-term position. Next-Day Returns are mostly centered around 0%, but some outliers show occasional volatility. Net Change and Portfolio Impact have some big outliers, indicating that JPM sometimes has significant portfolio effects. Overall, JPM looks like a stable holding with occasional large market movements.

####AAPL

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant AAPL features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
aapl_columns = [f"{feature}_AAPL" for feature in features]

# Filter dataset for MSFT
df_aapl = df_sf[aapl_columns].dropna()

# Set up color palette for AAPL
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_AAPL"
    if col_name in df_aapl.columns:
        sns.boxplot(y=df_aapl[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"AAPL {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

AAPL's stock prices are stable, with a median around 220 and some price variation. Volume is right-skewed, showing occasional trading spikes. No Buy activity, and minimal Sell activity with some large sell outliers. Holdings are steady, suggesting a long-term position. Next-Day Returns are centered around 0%, but outliers indicate occasional volatility. Net Change and Portfolio Impact show some large movements, but overall, AAPL seems to be a stable core holding.

####HD

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant HD features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
hd_columns = [f"{feature}_HD" for feature in features]

# Filter dataset for HD
df_hd = df_sf[hd_columns].dropna()

# Set up color palette for HD
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_HD"
    if col_name in df_hd.columns:
        sns.boxplot(y=df_hd[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"HD {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

HD's stock price is stable, with a median around 380-400. Trading volume has occasional spikes, but overall activity is moderate. No Buy or Sell activity, indicating a long-term hold strategy. Holdings remain steady, suggesting confidence in the stock. Next-Day Returns are small, with limited volatility. Portfolio Impact is mostly neutral, with occasional fluctuations. HD appears to be a low-risk, steady investment in the portfolio.

####MSFT

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant MSFT features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
msft_columns = [f"{feature}_MSFT" for feature in features]

# Filter dataset for MSFT
df_msft = df_sf[msft_columns].dropna()

# Set up color palette for MSFT
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_MSFT"
    if col_name in df_msft.columns:
        sns.boxplot(y=df_msft[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"MSFT {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

MSFT's stock price ranges around 410-430, with some outliers indicating occasional volatility. Trading volume has spikes, suggesting increased activity during specific events. No Buy activity and minimal Sell activity, implying a long-term holding strategy. Holdings remain consistent, reflecting confidence in the stock. Next-Day Returns show slight variations, but overall, MSFT remains a stable investment with occasional high-impact movements.

####V

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant V features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
v_columns = [f"{feature}_V" for feature in features]

# Filter dataset for V
df_v = df_sf[v_columns].dropna()

# Set up color palette for V
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_V"
    if col_name in df_v.columns:
        sns.boxplot(y=df_v[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"V {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

Visa's stock price ranges around 280-300, showing moderate fluctuations. Volume has occasional spikes, suggesting increased activity on some days. No Buy or Sell activity, indicating a long-term hold strategy. Holdings remain steady, showing confidence in the stock. Next-Day Returns are mostly stable, with some outliers suggesting occasional volatility, but overall, Visa appears to be a consistent investment with limited risk.

####WMT

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant WMT features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
wmt_columns = [f"{feature}_WMT" for feature in features]

# Filter dataset for WMT
df_wmt = df_sf[wmt_columns].dropna()

# Set up color palette for WMT
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_WMT"
    if col_name in df_wmt.columns:
        sns.boxplot(y=df_wmt[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"WMT {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

Walmart's stock price fluctuates between 70-90, showing moderate stability. Trading volume has occasional spikes, suggesting periods of increased market activity. No Buy activity and minimal Sell activity, reinforcing a long-term holding strategy. Holdings remain consistent, showing confidence in the stock. Next-Day Returns are mostly stable, with some outliers indicating occasional volatility. Overall, Walmart appears to be a steady investment with controlled risk.

####MCD

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant MCD features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
mcd_columns = [f"{feature}_MCD" for feature in features]

# Filter dataset for MCD
df_mcd = df_sf[mcd_columns].dropna()

# Set up color palette for MCD
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_MCD"
    if col_name in df_mcd.columns:
        sns.boxplot(y=df_mcd[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"MCD {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

McDonald's stock price ranges between 270-300, showing moderate stability. Trading volume has occasional spikes, indicating periods of higher market interest. No Buy activity and minimal Sell activity, suggesting a long-term holding approach. Holdings remain stable, reinforcing confidence in the stock. Next-Day Returns are mostly consistent, with a few outliers indicating short-term volatility. Overall, McDonald's appears to be a steady investment with controlled risk and occasional market fluctuations.

####JNJ

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant JNJ features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
jnj_columns = [f"{feature}_JNJ" for feature in features]

# Filter dataset for JNJ
df_jnj = df_sf[jnj_columns].dropna()

# Set up color palette for JNJ
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_JNJ"
    if col_name in df_jnj.columns:
        sns.boxplot(y=df_jnj[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"JNJ {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

Johnson & Johnson's stock price fluctuates between 145-165, showing consistent stability. Trading volume spikes occasionally, likely due to major market events. Minimal Sell activity and no Buy transactions, indicating a long-term holding approach. Holdings remain stable, suggesting strong confidence in the stock. Next-Day Returns show low volatility, though a few outliers suggest occasional fluctuations. Overall, JNJ appears to be a steady, low-risk investment with limited short-term trading activity.

####GOOG

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant GOOG features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
goog_columns = [f"{feature}_GOOG" for feature in features]

# Filter dataset for GOOG
df_goog = df_sf[goog_columns].dropna()

# Set up color palette for GOOG
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_GOOG"
    if col_name in df_goog.columns:
        sns.boxplot(y=df_goog[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"GOOG {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

Google's (GOOG) stock price ranges between 140-200, with moderate volatility. Trading volume has frequent spikes, indicating occasional market interest. Minimal Sell activity and no Buy transactions, suggesting a long-term holding strategy. Holdings remain stable, reinforcing confidence in the stock. Next-Day Returns show a normal distribution, with a few outliers hinting at occasional price jumps. Overall, GOOG appears to be a steadily held investment with periodic trading volume surges.

####PEP

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Define relevant PEP features
features = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Format column names for consistency
pep_columns = [f"{feature}_PEP" for feature in features]

# Filter dataset for PEP
df_pep = df_sf[pep_columns].dropna()

# Set up color palette for PEP
palette = sns.color_palette("husl", len(features))
feature_colors = {feature: palette[i] for i, feature in enumerate(features)}

# Set up grid layout: 1 row, multiple columns (one for each feature)
fig, axes = plt.subplots(1, len(features), figsize=(len(features) * 3.5, 5), sharex=False)

# Iterate over features to generate separate boxplots
for col_idx, feature in enumerate(features):
    col_name = f"{feature}_PEP"
    if col_name in df_pep.columns:
        sns.boxplot(y=df_pep[col_name], ax=axes[col_idx], color=feature_colors[feature])
        axes[col_idx].set_title(f"PEP {feature}", fontsize=12, fontweight='bold')
        axes[col_idx].set_ylabel("Value", fontsize=10)

# Adjust layout for better spacing
plt.tight_layout()
plt.show()

PepsiCo (PEP) trades between 145-180, showing moderate price fluctuations. Trading volume has occasional spikes, likely due to market events. Minimal Sell activity and no Buy transactions, suggesting a long-term holding strategy. Holdings remain stable, indicating confidence in the stock. Next-Day Returns are normally distributed, with a few outliers suggesting occasional price swings. Overall, PEP appears to be a steady investment with low trading activity and periodic volume surges.

Scott Franklin’s Investment Strategy Based on Boxplot Analysis
General Strategy:

- Long-Term Holding Approach: Minimal buy/sell activity suggests low trading frequency, indicating a buy-and-hold strategy rather than active trading.

- Risk Control & Consistency: Stocks show moderate volatility with controlled downside risks, avoiding extreme price swings.

Boxplot Insights Across Stocks:

- Price Movements: Most stocks trade within a defined range with minimal outliers, showing stability and predictable price action.

##Bivariate

### Correlation Matrix

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Extract tickers from column names while excluding non-ticker suffixes
tickers = {col.split('_')[-1] for col in df_sf.columns
           if '_' in col and col.split('_')[-1] not in ['Shares', 'Held']}

# Define relevant features
features = ["Open", "High", "Low", "Close", "Volume", "Adj Close", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Number of tickers
num_tickers = len(tickers)

# Create subplots (one per ticker) with a larger size
fig, axes = plt.subplots(num_tickers, 1, figsize=(10, num_tickers * 7), squeeze=False)

# Loop through tickers to compute and plot correlation matrices
for idx, ticker in enumerate(sorted(tickers)):  # Sort tickers alphabetically for consistency
    # Select relevant columns for the ticker
    ticker_cols = [f"{feature}_{ticker}" for feature in features if f"{feature}_{ticker}" in df_sf.columns]

    # Extract data and drop missing values
    ticker_data = df_sf[ticker_cols].dropna()

    # Ensure there are at least two columns for correlation calculation
    if ticker_data.shape[1] < 2:
        print(f"Skipping correlation matrix for {ticker} due to insufficient data.")
        continue

    # Compute correlation matrix
    ticker_corr = ticker_data.corr()

    # Plot heatmap
    ax = axes[idx, 0]
    sns.heatmap(ticker_corr, annot=True, cmap="coolwarm", fmt=".2f", ax=ax)
    ax.set_title(f"Correlation Matrix for {ticker.upper()}")

plt.tight_layout()
plt.show()


1. JPM (JP Morgan)
- Strong correlation between Buy and Next_Day_Return, suggesting that buying activity is a predictor of next-day performance.
- Moderate negative correlation between Sell and Portfolio_Impact, meaning that selling reduces portfolio gains.
- Net_Change positively correlated with Adj Close, implying price movements influence adjusted closing prices.

2. AAPL (Apple)
- High correlation between Open, High, Low, and Close, meaning Apple has stable intraday price movement.
- Strong positive correlation between Volume and Net_Change, suggesting increased trading activity drives price swings.
- Moderate correlation between Buy and Holding, meaning investors tend to hold positions post-purchase

3. HD (Home Depot)
- Low correlation between Volume and Close, meaning price is not always affected by trading volume.
- Portfolio_Impact strongly correlated with Net_Change, suggesting short-term gains directly influence total portfolio value.
- Buy signals weakly correlated with Next_Day_Return, indicating purchases don’t immediately translate into gains.

4. MSFT (Microsoft)
- High correlation between Close and Adj Close, meaning adjustments for dividends have minimal effect.
- Volume and Net_Change moderately correlated, meaning larger trading days coincide with price changes.
- Buy and Sell activity moderately correlated, suggesting both bulls and bears actively trade Microsoft.

5. V (Visa)
- Buy signals highly correlated with Holding, suggesting investors accumulate Visa for long-term growth.
- Low correlation between Volume and Price Changes, meaning Visa trades are stable and not heavily influenced by market fluctuations.
- Net_Change and Portfolio_Impact highly correlated, indicating Visa contributes significantly to portfolio gains.

6. WMT (Walmart)
- High correlation between Open and Close, meaning Walmart shows low intraday volatility.
- Sell activity negatively correlated with Net_Change, meaning high sell-offs lead to declines.
- Volume and Portfolio_Impact weakly correlated, suggesting trading volume does not significantly affect portfolio performance.

7. MCD (McDonald's)
- Strong correlation between Adj Close and Close, meaning dividends and stock splits have minimal impact.
- Moderate correlation between Holding and Next_Day_Return, indicating long-term holders benefit from stable returns.
- Low correlation between Buy and Next_Day_Return, suggesting buying activity does not immediately translate into gains.

8. JNJ (Johnson & Johnson)
- High correlation between Portfolio_Impact and Net_Change, showing JNJ contributes significantly to portfolio returns.
- Low correlation between Volume and Close, meaning JNJ is less influenced by daily trading fluctuations.
- Sell and Net_Change negatively correlated, suggesting sell-offs reduce stock gains.

9. GOOG (Google)
- Strong correlation between Volume and Net_Change, suggesting price swings are driven by heavy trading activity.
- Buy and Next_Day_Return moderately correlated, meaning purchasing Google stocks can lead to short-term gains.
- Negative correlation between Sell and Portfolio_Impact, meaning large sell-offs impact long-term gains.

10. PEP (PepsiCo)
- High correlation between Holding and Portfolio_Impact, meaning Pepsi’s long-term investors benefit from steady portfolio gains.
- Weak correlation between Volume and Price Changes, indicating Pepsi is a low-volatility stock.
- Buy and Sell show low correlation, meaning market sentiment doesn’t fluctuate drastically.


## Pair Plots

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Extract tickers from column names while excluding non-ticker suffixes
tickers = {col.split('_')[-1] for col in df_sf.columns # Changed stock_data_sf to df_sf
           if '_' in col and col.split('_')[-1] not in ['Shares', 'Held']}

# Define relevant features
features = ["Open", "High", "Low", "Close", "Volume", "Adj Close", "Buy", "Sell", "Holding", "Next_Day_Return", "Net_Change", "Portfolio_Impact"]

# Loop through tickers and generate pairplots
for ticker in sorted(tickers):  # Sort for consistent order
    # Select relevant columns for the ticker
    ticker_cols = [f"{feature}_{ticker}" for feature in features if f"{feature}_{ticker}" in df_sf.columns] # Changed stock_data_sf to df_sf

    # Extract data and drop missing values
    ticker_data = df_sf[ticker_cols].dropna() # Changed stock_data_sf to df_sf

    # Ensure there are at least two numeric columns
    if ticker_data.shape[1] < 2:
        print(f"Skipping pairplot for {ticker} due to insufficient data.")
        continue

    # Rename columns for better readability in pairplot
    ticker_data = ticker_data.rename(columns={col: col.replace(f"_{ticker}", "") for col in ticker_cols})

    # Generate pairplot
    print(f"Generating pairplot for {ticker}...")
    sns.pairplot(ticker_data, diag_kind="kde", corner=True)  # `corner=True` removes duplicate plots
    plt.suptitle(f"Pairplot for {ticker.upper()}", y=1.02)
    plt.show()

- Tech stocks (AAPL, MSFT, GOOG) react to trading volume and exhibit momentum-driven behavior.
- Defensive stocks (JNJ, MCD, PEP) show minimal impact from short-term trading, making them stable investments.
- Visa and Home Depot contribute steadily to portfolio performance with low volatility.
-  Google is highly reactive to trading activity, making it attractive for traders.



## Hypothesis SF

### Does Higher Volume_HD (x-axis) correlate with lower Close_JNJ (y-axis) and is correlated to Close_V (by color)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import spearmanr

# Define relevant columns
volume_hd = "Volume_HD"
close_jnj = "Close_JNJ"
close_v = "Close_V"

# Drop missing values for clean analysis
df_clean = df_sf[[volume_hd, close_jnj, close_v]].dropna()

# Compute correlation coefficients
corr_volume_close_jnj, _ = spearmanr(df_clean[volume_hd], df_clean[close_jnj])
corr_volume_close_v, _ = spearmanr(df_clean[volume_hd], df_clean[close_v])
corr_jnj_v, _ = spearmanr(df_clean[close_jnj], df_clean[close_v])

# Print correlation results
print(f"Spearman correlation between {volume_hd} and {close_jnj}: {corr_volume_close_jnj:.4f}")
print(f"Spearman correlation between {volume_hd} and {close_v}: {corr_volume_close_v:.4f}")
print(f"Spearman correlation between {close_jnj} and {close_v}: {corr_jnj_v:.4f}")

# --- First Scatter Plot (Without Trend Line) ---
plt.figure(figsize=(10, 6))
scatter = plt.scatter(
    df_clean[volume_hd],
    df_clean[close_jnj],
    c=df_clean[close_v],
    cmap='coolwarm',
    edgecolor='k',
    alpha=0.7
)
plt.colorbar(label='Close_V')
plt.xlabel("Volume_HD")
plt.ylabel("Close_JNJ")
plt.title("Multivariate Scatter Plot: Volume_HD vs Close_JNJ (colored by Close_V)")
plt.show()

# --- Second Scatter Plot (With Trend Line) ---
plt.figure(figsize=(10, 6))
scatter = plt.scatter(
    df_clean[volume_hd],
    df_clean[close_jnj],
    c=df_clean[close_v],
    cmap='coolwarm',
    edgecolor='k',
    alpha=0.7
)
plt.colorbar(label='Close_V')

# Add a regression trend line
sns.regplot(
    x=df_clean[volume_hd],
    y=df_clean[close_jnj],
    scatter=False,  # Hide seaborn scatter points
    line_kws={"color": "black", "linewidth": 2, "linestyle": "--"}  # Trend line style
)

# Labels and title
plt.xlabel("Volume_HD")
plt.ylabel("Close_JNJ")
plt.title("Multivariate Scatter Plot with Trend Line: Volume_HD vs Close_JNJ (colored by Close_V)")

plt.show()


1. Why These Three Stocks?

  For this hypothesis, I selected Home Depot (HD), Johnson & Johnson (JNJ), and Visa (V) from Scott Franklin’s portfolio because they are blue-chip stocks that are part of the Dow Jones Industrial Average (DJIA). These companies represent leaders in their respective sectors:

  - Home Depot (HD) – A dominant player in the home improvement retail industry.
  - Johnson & Johnson (JNJ) – A global healthcare and pharmaceutical giant.
  - Visa (V) – A leader in the digital payments sector.

  The reason for choosing these stocks is to examine whether changes in trading volume for Home Depot (HD) impact Johnson & Johnson (JNJ)’s closing price and whether Visa’s (V) closing price correlates with this relationship. Since Dow Jones blue-chip stocks often move in response to broader market trends, investor sentiment, and macroeconomic factors, analyzing their interactions can provide insights into sector rotation and institutional trading behavior.

2. Hypothesis:

  We test whether higher Volume_HD (x-axis) correlates with lower Close_JNJ (y-axis) and whether Close_JNJ is related to Close_V (by color).

3. Observations of Scatter Plots:

  No strong visible pattern, but some downward movement in Close_JNJ as Volume_HD increases.

  The correlation between Close_JNJ and Close_V is negative, contradicting expectations. And as we can see more red dots (higher price of Close_V are on the bottom of the scatterplot.

4. Conclusion:

  We reject the hypothesis in general.

  However, we fail to reject that higher Volume_HD correlates with lower Close_JNJ, as a weak negative trend is observed.

  However, Close_JNJ and Close_V are negatively correlated, which was unexpected.

  The relationships suggest sector-specific movements rather than strong cross-stock dependency within this portfolio.

  All in all this analysis suggests that while Home Depot's trading activity may have a minor influence on Johnson & Johnson’s stock price, the relationship between JNJ and Visa is not as expected, indicating that different factors may be driving their price movements independently.

# Time Series

In [None]:
import matplotlib.pyplot as plt

# Set the index to 'Date' if not already
df_sf = df_sf.sort_index()

# Plot each stock's closing price
plt.figure(figsize=(12, 6))

for col in df_sf.columns:
    if "Close_" in col:  # Filter only closing price columns
        plt.plot(df_sf.index, df_sf[col], label=col.replace("Close_", ""))

# Formatting the plot
plt.xlabel("Date")
plt.ylabel("Closing Price")
plt.title("Time Series of Stock Closing Prices in df_sf")
plt.legend(title="Stocks")
plt.grid(True)

# Show the plot
plt.show()

In [None]:
import matplotlib.pyplot as plt

# Ensure the index is set to Date and sorted
df_sf = df_sf.sort_index()

# Loop through each stock and plot separately
for col in df_sf.columns:
    if "Close_" in col:  # Filter only closing price columns
        stock_name = col.replace("Close_", "")  # Extract stock ticker

        # Create a new figure for each stock
        plt.figure(figsize=(10, 5))
        plt.plot(df_sf.index, df_sf[col], label=stock_name, color='blue')

        # Formatting the plot
        plt.xlabel("Date")
        plt.ylabel("Closing Price")
        plt.title(f"Time Series of {stock_name}")
        plt.legend()
        plt.grid(True)

        # Show the plot
        plt.show()

1. AAPL (Apple Inc.)
- Trend: Apple has a strong upward trend with occasional pullbacks, suggesting that its stock price has generally risen over time, with minor dips.
- Volatility: Moderate volatility, meaning the stock price shows some fluctuations but remains within a relatively predictable range.

2. MSFT (Microsoft Corp.)
- Trend: The trend for Microsoft is sideways with a mild uptrend, indicating that while the stock price doesn't have significant increases, there is a gradual rise over time.
- Volatility: Low volatility, suggesting that Microsoft's stock price is relatively stable and less prone to large swings.

3. JPM (JP Morgan)
- -Trend: Highly volatile with large swings, meaning the price of JPM stock tends to move significantly up and down.
- Volatility: High volatility, with noticeable fluctuations that can lead to sharp changes in the stock price over short periods.

4. MCD (McDonald's Corp.)
- Trend: McDonald's has a steady upward trend, indicating consistent growth over time.
- Volatility: Low volatility, suggesting McDonald's stock price moves in a stable manner without large fluctuations.

5. V (Visa Inc.)
- -Trend: Visa's price shows a smooth upward trend with mild pullbacks, implying steady growth with only occasional dips.
- Volatility: Low volatility, as Visa's price remains relatively stable with mild fluctuations.

6. WMT (Walmart)
- Trend: Walmart's stock shows a modest upward trend with some stagnation, meaning the price rises at a slower pace with some periods where the stock price doesn't grow significantly.
- Volatility: Moderate volatility, reflecting a moderate amount of fluctuation in stock prices.

7. JNJ (Johnson & Johnson)
- Trend: Johnson & Johnson exhibits a steady upward trend, meaning the stock price has generally increased over time without significant downward movements.
- Volatility: Low volatility, indicating relatively stable performance with few fluctuations.

8. HD (Home Depot Inc.)
- Trend: Home Depot's stock has a strong upward trend, especially during periods of housing booms, suggesting it performs well when the housing market is strong.
- Volatility: Low to moderate volatility, reflecting that while there are some fluctuations, the overall price movement is stable compared to other highly volatile stocks.

9. GOOG (Alphabet Inc.)
- Trend: Alphabet's stock shows a consistent upward trend with occasional volatility, meaning the stock tends to rise over time but can experience noticeable fluctuations.
- Volatility: Moderate volatility, indicating moderate price swings around the general upward trend.

10. PEP (PepsiCo Inc.)
- Trend: PepsiCo has a steady, gradual upward trend, suggesting consistent growth over time with no sudden changes.
- Volatility: Low volatility, meaning the stock price moves smoothly without significant fluctuations.

In [None]:
import matplotlib.pyplot as plt

# Extract tickers
tickers = {col.split('_')[-1] for col in df_sf.columns if 'Close_' in col}

for ticker in tickers:
    buy_col = f"Buy_{ticker}"
    sell_col = f"Sell_{ticker}"

    if buy_col in df_sf.columns and sell_col in df_sf.columns:
        plt.figure(figsize=(10, 6))  # Adjust figure size as needed

        plt.plot(df_sf.index, df_sf[buy_col], label='Buy', color='green')
        plt.plot(df_sf.index, df_sf[sell_col], label='Sell', color='red')

        plt.xlabel("Date")
        plt.ylabel("Value")
        plt.title(f"Buy/Sell Trends for {ticker}")
        plt.legend()
        plt.grid(True)
        plt.show()
    else:
        print(f"Buy or Sell column not found for {ticker}")

For each stock, the Buy and Sell trends indicate investor sentiment and market reactions to various factors ( earnings reports, market movements, product launches). Stocks like MSFT, V, and AAPL show frequent buy signals, indicating investor confidence and accumulation in anticipation of growth. Defensive stocks like MCD and JNJ display less sell-off activity, reflecting their long-term hold appeal.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from statsmodels.tsa.seasonal import seasonal_decompose

# Ensure the index is set to Date and sorted
df_sf = df_sf.sort_index()

# Loop through each stock and perform seasonal decomposition
for col in df_sf.columns:
    if "Close_" in col:  # Filter only closing price columns
        stock_name = col.replace("Close_", "")  # Extract stock ticker

        # Drop NaN values to avoid issues with decomposition
        stock_data = df_sf[col].dropna()

        # Ensure we have enough data points for decomposition
        if len(stock_data) < 60:
            print(f"Skipping {stock_name} due to insufficient data.")
            continue

        # Perform seasonal decomposition
        decomposition = seasonal_decompose(stock_data, model="additive", period=30)  # Adjust period as needed

        # Plot decomposition results
        plt.figure(figsize=(12, 8))

        plt.subplot(411)
        plt.plot(stock_data, label="Original Data", color="blue")
        plt.legend(loc="upper left")

        plt.subplot(412)
        plt.plot(decomposition.trend, label="Trend", color="green")
        plt.legend(loc="upper left")

        plt.subplot(413)
        plt.plot(decomposition.seasonal, label="Seasonality", color="orange")
        plt.legend(loc="upper left")

        plt.subplot(414)
        plt.plot(decomposition.resid, label="Residuals (Irregular Component)", color="red")
        plt.legend(loc="upper left")

        plt.suptitle(f"Seasonal Decomposition of {stock_name}")
        plt.tight_layout()
        plt.show()


1. AAPL (Apple Inc.)
- Trend: Shows a general upward trend, consistent with Apple's long-term growth, especially from innovation and strong market position.
- Seasonality: There’s a slight seasonal component, potentially reflecting product cycles (e.g., new iPhone releases).
- Residuals: The residuals show random fluctuations, indicating relatively clean data with no significant outliers.
- Insights: The overall trend suggests strong growth, but seasonal peaks may align with quarterly earnings reports or product launches.

2. MSFT (Microsoft Corp.)
- Trend: A steady upward trend with occasional pauses, indicating Microsoft's stable and consistent growth over time.
- Seasonality: A moderate seasonal cycle could reflect product updates or earnings seasonality.
- Residuals: The residuals are small, indicating that most of the variance in the data is explained by the trend and seasonal components.
- Insights: The data suggests a relatively predictable stock with cyclical behavior tied to quarterly earnings or market events.

3. TSLA (Tesla Inc.)
- Trend: The trend shows a sharp upward growth but with more volatility, aligning with Tesla's history of strong price movements.
- Seasonality: Some seasonality is visible, possibly tied to production cycles or product launches (like new car models or major events with Elon Musk).
- Residuals: High residuals, indicating large volatility that isn’t fully explained by the trend or seasonal cycles.
- Insights: Tesla’s stock is highly volatile, and the trend shows strong growth, but it’s often subject to large fluctuations unrelated to typical market cycles.

4. MCD (McDonald's Corp.)
- Trend: McDonald's shows a slow, stable upward trend, reflecting its reliable earnings and position as a defensive stock.
- Seasonality: There’s a slight seasonal cycle, which could align with consumer trends (e.g., summer increases in fast food consumption or holiday promotions).
- Residuals: Low residuals, indicating that the trend and seasonality explain most of the price movements.
- Insights: McDonald's is a defensive stock with a steady upward trend, and its price movements seem to reflect seasonal consumer behavior rather than major shifts in its core business.

5. V (Visa Inc.)
- Trend: Visa has a clear upward trend, indicating its dominance in the digital payments space and growth over time.
- Seasonality: Mild seasonal fluctuations, which may reflect annual cycles in consumer spending or specific financial events.
- Residuals: Low residuals, reinforcing that Visa’s price is largely explained by its trend and seasonality.
- Insights: Visa’s stock shows steady growth, with minor seasonal fluctuations potentially tied to consumer activity and payment cycles.

6. WIT (Wipro Ltd.)
- Trend: Wipro shows a flattening trend, with no significant upward movement, indicating that it might be in a consolidation phase.
- Seasonality: Mild seasonality, potentially reflecting IT outsourcing cycles or quarterly project completions.
- Residuals: Moderate residuals, indicating that there are factors affecting Wipro's stock price that are not captured by the seasonal and trend components.
- Insights: Wipro's flat trend may indicate low investor interest or market saturation, with seasonal fluctuations suggesting some periodicity linked to project timelines or industry cycles.

7. JNJ (Johnson & Johnson)
- -Trend: A strong upward trend, reflecting JNJ’s long-term stability in healthcare and pharmaceuticals.
- Seasonality: Limited seasonality, which could be related to pharmaceutical cycles (e.g., quarterly earnings or regulatory approvals).
- Residuals: The residuals are very low, showing that JNJ’s price is largely explained by its underlying trend and consistent performance.
- Insights: Johnson & Johnson exhibits steady growth over time, with minimal seasonal impact, indicating its status as a defensive, reliable stock.

8. HD (Home Depot Inc.)
- Trend: The trend shows a strong upward trajectory, consistent with the booming housing market and home improvement trends.
- Seasonality: Clear seasonal patterns likely linked to peak home improvement seasons, such as spring and summer.
Residuals: Low residuals, indicating that most of the price movement can be explained by the trend and seasonality.
- Insights: Home Depot’s stock is closely tied to housing and home improvement trends, with strong seasonal demand peaks and consistent growth in periods of high consumer spending on home projects.

9. GOOG (Alphabet Inc.)
- Trend: Google’s stock shows a consistent upward trend, reflecting its dominance in search and ad revenue.
- Seasonality: Minor seasonal components, possibly reflecting advertising cycles or quarterly earnings reports.
- Residuals: Minimal residuals, indicating that the stock’s behavior is well-explained by its trend and seasonality.
- Insights: Google’s stock is driven by consistent growth with minor seasonal fluctuations, aligning with its strong market presence and ad-based revenue model.

10. PEP (PepsiCo Inc.)
- Trend: PepsiCo’s stock shows a gradual upward trend, driven by its steady revenue from snacks, beverages, and consumer goods.
- Seasonality: Moderate seasonal fluctuations, likely tied to consumer behavior during peak holidays or summer months.
- Residuals: Low residuals, indicating that PepsiCo’s stock price is largely explained by the trend and seasonal factors.
- Insights: PepsiCo’s stable upward trajectory and moderate seasonal peaks reflect its consistent market position in the beverage and snack industry.

## Modeling -- BASELINE

Calculating the Adj Close Average for modeling.

In [None]:
from pyomo.environ import *
import pandas as pd

# Identify all columns that contain "Adj Close_"
adj_cols = [col for col in df_sf.columns if col.startswith("Adj Close")]

# Extract tickers (assuming format "Adj Close_{ticker}")
tickers = [col.split("_")[1] for col in adj_cols]

# Loop through each ticker and calculate the average adjusted closing price
avg_adj_close = {}  # Use a dictionary to store the results
for ticker in tickers:
    avg_adj_close[ticker] = df_sf[f"Adj Close_{ticker}"].mean()

# Print the result
print("Average Adj Closing Price per stock:")
for ticker, avg_price in avg_adj_close.items():
    print(f"{ticker}: {avg_price}")

# Create a DataFrame from the dictionary
df_sf_returns = pd.DataFrame(list(avg_adj_close.items()), columns=["Ticker", "Average_Adj_Close"])

# Display the new DataFrame
print("\ndf_sf_returns:")
print(df_sf_returns)

Creating the covaraince matrix from the Average of Adj Close for modeling.

In [None]:
# Select closing prices for all relevant tickers:
close_prices_all = df_sf[[f"Adj Close_{ticker}" for ticker in tickers]]

# Calculate the covariance matrix:
df_sf_cov = close_prices_all.cov()

In [None]:
m = ConcreteModel()

In [None]:
# Decision Variables
m.JPM = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.HD = Var(within=NonNegativeReals, bounds= (0,1))
m.MSFT = Var(within=NonNegativeReals, bounds= (0,1))
m.V = Var(within=NonNegativeReals, bounds= (0,1))
m.WMT = Var(within=NonNegativeReals, bounds= (0,1))
m.MCD = Var(within=NonNegativeReals, bounds= (0,1))
m.JNJ = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))
m.PEP = Var(within=NonNegativeReals, bounds= (0,1))

In [None]:
m.Objective = Objective(expr = df_sf_returns["Average_Adj_Close"].iloc[0] * m.JPM +
                        df_sf_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_sf_returns["Average_Adj_Close"].iloc[2] * m.HD +
                        df_sf_returns["Average_Adj_Close"].iloc[3] * m.MSFT +
                        df_sf_returns["Average_Adj_Close"].iloc[4] * m.V +
                        df_sf_returns["Average_Adj_Close"].iloc[5] * m.WMT +
                        df_sf_returns["Average_Adj_Close"].iloc[6] * m.MCD +
                        df_sf_returns["Average_Adj_Close"].iloc[7] * m.JNJ +
                        df_sf_returns["Average_Adj_Close"].iloc[8] * m.GOOG +
                        df_sf_returns["Average_Adj_Close"].iloc[9] * m.PEP, sense = maximize)

MAX(RETURN = 215*JPM + 213*AAPL + 373*HD + 422*MSFT + 287*V + 75*WMT + 279*MCD + 152*JNJ + 171*GOOG + 163*PEP)

In [None]:
m.total_risk = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.JNJ + m.GOOG + m.PEP == 0.05)
m.sum_proportion = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.JNJ + m.GOOG + m.PEP == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.JPM, m.AAPL, m.HD, m.MSFT, m.V, m.WMT, m.MCD, m.JNJ, m.GOOG, m.PEP]
  tickers = ['Adj Close_JPM', 'Adj Close_AAPL', 'Adj Close_HD', 'Adj Close_MSFT', 'Adj Close_V', 'Adj Close_WMT', 'Adj Close_MCD', 'Adj Close_JNJ', 'Adj Close_GOOG', 'Adj Close_PEP']
  risk_exp = 0

  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i] * df_sf_cov.at[tickers[i], tickers[j]] * variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk = 0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005)  # take tiny steps
risk_limits

In [None]:
%matplotlib inline
from pylab import *

import shutil
import sys
import os.path

if not shutil.which("pyomo"):
    !pip install -q pyomo
    assert(shutil.which("pyomo"))

if not shutil.which("ipopt"):
    # here is the IPOPT zip file
    !gdown 10XRvLZqrpSNiXVAN-pipU52BVRwoGcNQ
    !unzip -o -q ipopt-linux64_dw
    assert(shutil.which("ipopt") or os.path.isfile("ipopt"))

from pyomo.environ import *

SOLVER = 'ipopt'
EXECUTABLE = '/content/ipopt'
ipopt_executable = '/content/ipopt'

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.JPM(), m.AAPL(), m.HD(), m.MSFT(), m.V(), m.WMT(), m.MCD(), m.JNJ(), m.GOOG(), m.PEP()]
  returns[r] =  m.JPM()*df_sf_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_sf_returns["Average_Adj_Close"].iloc[1] + m.HD()*df_sf_returns["Average_Adj_Close"].iloc[2] + m.MSFT()*df_sf_returns["Average_Adj_Close"].iloc[3] + m.V()*df_sf_returns["Average_Adj_Close"].iloc[4] + m.WMT()*df_sf_returns["Average_Adj_Close"].iloc[5] + m.MCD()*df_sf_returns["Average_Adj_Close"].iloc[6] + m.JNJ()*df_sf_returns["Average_Adj_Close"].iloc[7] + m.GOOG()*df_sf_returns["Average_Adj_Close"].iloc[8] + m.PEP()*df_sf_returns["Average_Adj_Close"].iloc[9]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['JPM', 'AAPL', 'HD', 'MSFT', 'V', 'WMT', 'MCD', 'JNJ', 'GOOG', 'PEP']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling BASELINE: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling BASELINE: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=5, ncols=2, figsize=(12, 18))
param_analysis.plot(
    subplots=True,
    layout=(5, 2),       # 5 rows, 2 columns
    ax=axes,             # use our custom axes
    sharex=False,
    sharey=False,
    legend=True
)

for row in axes:
    for ax in row:
        ax.set_xlabel("Risk Level")
        ax.set_ylabel("Optimal Allocation")

plt.tight_layout()
plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling BASELINE: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

The Optimal Stock Allocation for Different Risk Levels charts (stacked bar graph for better visibility) is showing how an "optimal" stock allocation (y-axis) breaks down across different risk levels (x-axis).
  - In this case the risk levels ranges from (0.0003, 0.05).

- Two stocks, PEP and JNJ, dominates all of the bars on the stacked bar graph.
  - This indicates that the model often allocates PEP and JNJ in higher proportions and is always included in the portfolio.
- Stocks like GOOG, V, and MSFT are also often included in the portfolio but at a lower proportion than JNJ.
- JPM, AAPL, HD, WMT, MCD, are also included in the portfolio once somewhere below 0.01 but at a miniscule proportion.

The Efficient Frontier shows how returns change with different levels of risk.
  - The non-linear optimization reveals thresholds where a marginal change in risk constraints triggers a substantial rebalancing of asset allocations, yielding higher expected returns while also increasing volatility.
  - The maximum return for this baseline model is 200.

### Ranking of Optimal Stock Allocation

In [None]:
# Choose a specific risk limit (for example, the maximum risk limit in your analysis)
selected_risk = param_analysis.index.max()
optimal_allocations = param_analysis.loc[selected_risk]

# Sort the allocations in ascending order
sorted_allocations = optimal_allocations.sort_values()

# Print each stock with its allocation percentage
print(f"Optimal Allocation (sorted ascending) for risk limit {selected_risk}:")
for stock, allocation in sorted_allocations.items():
    print(f"{stock}: {allocation*100:.2f}%")

total = round(sorted_allocations.sum(), 10)  # Round to 10 decimal places
print(f"Sum of allocations = {total:.4f}")

This confirms that PEP and JNJ are dominating the Stock Allocations.

## Modeling w/o PEP

Our baseline model indicated that PEP was dominating over 30% of the portfolio. To explore alternatives, we’re removing PEP from the decision variables while keeping all other conditions the same, allowing us to see which stock becomes the next most dominant.

In [None]:
m = ConcreteModel()
# Decision Variables
m.JPM = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.HD = Var(within=NonNegativeReals, bounds= (0,1))
m.MSFT = Var(within=NonNegativeReals, bounds= (0,1))
m.V = Var(within=NonNegativeReals, bounds= (0,1))
m.WMT = Var(within=NonNegativeReals, bounds= (0,1))
m.MCD = Var(within=NonNegativeReals, bounds= (0,1))
m.JNJ = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))

m.Objective = Objective(expr = df_sf_returns["Average_Adj_Close"].iloc[0] * m.JPM +
                        df_sf_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_sf_returns["Average_Adj_Close"].iloc[2] * m.HD +
                        df_sf_returns["Average_Adj_Close"].iloc[3] * m.MSFT +
                        df_sf_returns["Average_Adj_Close"].iloc[4] * m.V +
                        df_sf_returns["Average_Adj_Close"].iloc[5] * m.WMT +
                        df_sf_returns["Average_Adj_Close"].iloc[6] * m.MCD +
                        df_sf_returns["Average_Adj_Close"].iloc[7] * m.JNJ +
                        df_sf_returns["Average_Adj_Close"].iloc[8] * m.GOOG, sense = maximize)

MAX(RETURN = 215*JPM + 213*AAPL + 373*HD + 422*MSFT + 287*V + 75*WMT + 279*MCD + 152*JNJ + 171*GOOG)

In [None]:
m.total_risk = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.JNJ + m.GOOG == 0.05)
m.sum_proportion = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.JNJ + m.GOOG == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.JPM, m.AAPL, m.HD, m.MSFT, m.V, m.WMT, m.MCD, m.JNJ, m.GOOG]
  tickers = ['Adj Close_JPM', 'Adj Close_AAPL', 'Adj Close_HD', 'Adj Close_MSFT', 'Adj Close_V', 'Adj Close_WMT', 'Adj Close_MCD', 'Adj Close_JNJ', 'Adj Close_GOOG']
  risk_exp = 0

  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i] * df_sf_cov.at[tickers[i], tickers[j]] * variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk = 0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005)  # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.JPM(), m.AAPL(), m.HD(), m.MSFT(), m.V(), m.WMT(), m.MCD(), m.JNJ(), m.GOOG()]
  returns[r] =  m.JPM()*df_sf_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_sf_returns["Average_Adj_Close"].iloc[1] + m.HD()*df_sf_returns["Average_Adj_Close"].iloc[2] + m.MSFT()*df_sf_returns["Average_Adj_Close"].iloc[3] + m.V()*df_sf_returns["Average_Adj_Close"].iloc[4] + m.WMT()*df_sf_returns["Average_Adj_Close"].iloc[5] + m.MCD()*df_sf_returns["Average_Adj_Close"].iloc[6] + m.JNJ()*df_sf_returns["Average_Adj_Close"].iloc[7] + m.GOOG()*df_sf_returns["Average_Adj_Close"].iloc[8]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['JPM', 'AAPL', 'HD', 'MSFT', 'V', 'WMT', 'MCD', 'JNJ', 'GOOG']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o PEP: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o PEP: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=3, ncols=3, figsize=(12, 18))
param_analysis.plot(
    subplots=True,
    layout=(3, 3),       # 3 rows, 3 columns
    ax=axes,             # use our custom axes
    sharex=False,
    sharey=False,
    legend=True
)

for row in axes:
    for ax in row:
        ax.set_xlabel("Risk Level")
        ax.set_ylabel("Optimal Allocation")

plt.tight_layout()
plt.show()


# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling w/o PEP: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

In [None]:
# Choose a specific risk limit (for example, the maximum risk limit in your analysis)
selected_risk = param_analysis.index.max()
optimal_allocations = param_analysis.loc[selected_risk]

# Sort the allocations in ascending order
sorted_allocations = optimal_allocations.sort_values()

# Print each stock with its allocation percentage
print(f"Optimal Allocation (sorted ascending) for risk limit {selected_risk}:")
for stock, allocation in sorted_allocations.items():
    print(f"{stock}: {allocation*100:.2f}%")

After taking out PEP, the next most dominate stock is JNJ which is expected when you look at the BASELINE model. It was dominating 14% of the portfolio the 2nd most dominant sotck.
  - We are also starting to see other stocks becoming more dominate in the Optimal Stock Allocation like JNJ (40% of the portfolio).
  - Without JNJ, the maximum return has dropped down to 175.

## Modeling w/o JNJ

After removing JNJ, MSFT emerged as the most dominant stock. To identify the next most influential asset, we are now excluding both PEP and JNJ from the decision variables, allowing us to analyze the portfolio composition without these key components.

In [None]:
m = ConcreteModel()
# Decision Variables
m.JPM = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.HD = Var(within=NonNegativeReals, bounds= (0,1))
m.MSFT = Var(within=NonNegativeReals, bounds= (0,1))
m.V = Var(within=NonNegativeReals, bounds= (0,1))
m.WMT = Var(within=NonNegativeReals, bounds= (0,1))
m.MCD = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))

m.Objective = Objective(expr = df_sf_returns["Average_Adj_Close"].iloc[0] * m.JPM +
                        df_sf_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_sf_returns["Average_Adj_Close"].iloc[2] * m.HD +
                        df_sf_returns["Average_Adj_Close"].iloc[3] * m.MSFT +
                        df_sf_returns["Average_Adj_Close"].iloc[4] * m.V +
                        df_sf_returns["Average_Adj_Close"].iloc[5] * m.WMT +
                        df_sf_returns["Average_Adj_Close"].iloc[6] * m.MCD +
                        df_sf_returns["Average_Adj_Close"].iloc[8] * m.GOOG, sense = maximize)

MAX(RETURN = 215*JPM + 213*AAPL + 373*HD + 422*MSFT + 287*V + 75*WMT + 279*MCD + 171*GOOG)

In [None]:
m.total_risk = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.GOOG == 0.05)
m.sum_proportion = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.GOOG == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.JPM, m.AAPL, m.HD, m.MSFT, m.V, m.WMT, m.MCD, m.GOOG]
  tickers = ['Adj Close_JPM', 'Adj Close_AAPL', 'Adj Close_HD', 'Adj Close_MSFT', 'Adj Close_V', 'Adj Close_WMT', 'Adj Close_MCD', 'Adj Close_GOOG']
  risk_exp = 0

  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i] * df_sf_cov.at[tickers[i], tickers[j]] * variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk = 0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005)  # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.JPM(), m.AAPL(), m.HD(), m.MSFT(), m.V(), m.WMT(), m.MCD(), m.GOOG()]
  returns[r] =  m.JPM()*df_sf_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_sf_returns["Average_Adj_Close"].iloc[1] + m.HD()*df_sf_returns["Average_Adj_Close"].iloc[2] + m.MSFT()*df_sf_returns["Average_Adj_Close"].iloc[3] + m.V()*df_sf_returns["Average_Adj_Close"].iloc[4] + m.WMT()*df_sf_returns["Average_Adj_Close"].iloc[5] + m.MCD()*df_sf_returns["Average_Adj_Close"].iloc[6] + m.GOOG()*df_sf_returns["Average_Adj_Close"].iloc[8]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['JPM', 'AAPL', 'HD', 'MSFT', 'V', 'WMT', 'MCD', 'GOOG']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o JNJ: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o JNJ: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=4, ncols=2, figsize=(12, 18))

# Iterate through the axes using flatten()
for ax in axes.flatten():  # Use flatten() to iterate over a 1D array
    i = axes.flatten().tolist().index(ax) #Get index for column name
    param_analysis.iloc[:, i].plot(ax=ax, legend=True)  # Plot each column on a separate subplot
    ax.set_xlabel("Risk Level")
    ax.set_ylabel("Optimal Allocation")


plt.tight_layout()
plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling w/o JNJ: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After taking out JNJ, the next most dominate stock is MSFT which is expected when you look at the BASELINE model. It was dominating 14% of the portfolio the 2nd most dominant stock.
  - We are also starting to see other stocks becoming more dominate in the Optimal Stock Allocation like GOOG and MCD.
  - Without JNJ, the maximum return has increased to 300% indicating that JNJ was limiting the portfolio's return potential by anchoring a significant portion of the allocation, and its removal allows the model to shift towards stocks with higher return.

## Modeling w/o MSFT

After removing JNJ and PEP, MSFT emerged as the most dominant stock. To identify the next most influential asset, we are now excluding MSFT, JNJ, and MSFT from the decision variables, allowing us to analyze the portfolio composition without these key components.

In [None]:
m = ConcreteModel()
# Decision Variables
m.JPM = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.HD = Var(within=NonNegativeReals, bounds= (0,1))
m.V = Var(within=NonNegativeReals, bounds= (0,1))
m.WMT = Var(within=NonNegativeReals, bounds= (0,1))
m.MCD = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))

m.Objective = Objective(expr = df_sf_returns["Average_Adj_Close"].iloc[0] * m.JPM +
                        df_sf_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_sf_returns["Average_Adj_Close"].iloc[2] * m.HD +
                        df_sf_returns["Average_Adj_Close"].iloc[4] * m.V +
                        df_sf_returns["Average_Adj_Close"].iloc[5] * m.WMT +
                        df_sf_returns["Average_Adj_Close"].iloc[6] * m.MCD +
                        df_sf_returns["Average_Adj_Close"].iloc[8] * m.GOOG, sense = maximize)

MAX(RETURN = 215*JPM + 213*AAPL + 373*HD + 287*V + 75*WMT + 279*MCD + 171*GOOG)

In [None]:
m.total_risk = Constraint(expr = m.JPM + m.AAPL + m.HD + m.V + m.WMT + m.MCD + m.GOOG == 0.05)
m.sum_proportion = Constraint(expr = m.JPM + m.AAPL + m.HD + m.V + m.WMT + m.MCD + m.GOOG == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.JPM, m.AAPL, m.HD, m.V, m.WMT, m.MCD, m.GOOG]
  tickers = ['Adj Close_JPM', 'Adj Close_AAPL', 'Adj Close_HD', 'Adj Close_V', 'Adj Close_WMT', 'Adj Close_MCD', 'Adj Close_GOOG']
  risk_exp = 0

  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i] * df_sf_cov.at[tickers[i], tickers[j]] * variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk = 0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005)  # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.JPM(), m.AAPL(), m.HD(), m.V(), m.WMT(), m.MCD(), m.GOOG()]
  returns[r] =  m.JPM()*df_sf_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_sf_returns["Average_Adj_Close"].iloc[1] + m.HD()*df_sf_returns["Average_Adj_Close"].iloc[2] + m.V()*df_sf_returns["Average_Adj_Close"].iloc[4] + m.WMT()*df_sf_returns["Average_Adj_Close"].iloc[5] + m.MCD()*df_sf_returns["Average_Adj_Close"].iloc[6] + m.GOOG()*df_sf_returns["Average_Adj_Close"].iloc[8]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['JPM', 'AAPL', 'HD', 'V', 'WMT', 'MCD', 'GOOG']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o MSFT: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o MSFT: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=7, ncols=1, figsize=(12, 18))

# Iterate through the axes using flatten()
for ax in axes.flatten():  # Use flatten() to iterate over a 1D array
    i = axes.flatten().tolist().index(ax) #Get index for column name
    param_analysis.iloc[:, i].plot(ax=ax, legend=True)  # Plot each column on a separate subplot
    ax.set_xlabel("Risk Level")
    ax.set_ylabel("Optimal Allocation")


plt.tight_layout()
plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling w/o MSFT: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After taking out MSFT, the next most dominate stock is GOOG which is expected when you look at the BASELINE model, it is one of the dominating stocks.
  -  We are also seeing that some other stocks like HD, AAPL, JPM, and WMT are not being incorporated which could be indicating that under the current model parameters, only a very limited set of stocks contributes meaningfully to optimizing the risk-return balance.
  - Without MSFT, the maximum return has decreased significantly to 200% indicating that MSFT is a crucial driver of the portfolio's high return potential.

## Modeling w/o GOOG

After removing JNJ, PEP, and MSFT, GOOG emerged as the most dominant stock. To identify the next most influential asset, we are now also going to exclude GOOG from the decision variables, allowing us to analyze the portfolio composition without these key components.

In [None]:
m = ConcreteModel()
# Decision Variables
m.JPM = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.HD = Var(within=NonNegativeReals, bounds= (0,1))
m.V = Var(within=NonNegativeReals, bounds= (0,1))
m.WMT = Var(within=NonNegativeReals, bounds= (0,1))
m.MCD = Var(within=NonNegativeReals, bounds= (0,1))

m.Objective = Objective(expr = df_sf_returns["Average_Adj_Close"].iloc[0] * m.JPM +
                        df_sf_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_sf_returns["Average_Adj_Close"].iloc[2] * m.HD +
                        df_sf_returns["Average_Adj_Close"].iloc[4] * m.V +
                        df_sf_returns["Average_Adj_Close"].iloc[5] * m.WMT +
                        df_sf_returns["Average_Adj_Close"].iloc[6] * m.MCD, sense = maximize)

MAX(RETURN = 215*JPM + 213*AAPL + 373*HD + 287*V + 75*WMT + 279*MCD)

In [None]:
m.total_risk = Constraint(expr = m.JPM + m.AAPL + m.HD + m.V + m.WMT + m.MCD == 0.05)
m.sum_proportion = Constraint(expr = m.JPM + m.AAPL + m.HD + m.V + m.WMT + m.MCD == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.JPM, m.AAPL, m.HD, m.V, m.WMT, m.MCD]
  tickers = ['Adj Close_JPM', 'Adj Close_AAPL', 'Adj Close_HD', 'Adj Close_V', 'Adj Close_WMT', 'Adj Close_MCD']
  risk_exp = 0

  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i] * df_sf_cov.at[tickers[i], tickers[j]] * variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk = 0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005)  # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.JPM(), m.AAPL(), m.HD(), m.V(), m.WMT(), m.MCD()]
  returns[r] =  m.JPM()*df_sf_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_sf_returns["Average_Adj_Close"].iloc[1] + m.HD()*df_sf_returns["Average_Adj_Close"].iloc[2] + m.V()*df_sf_returns["Average_Adj_Close"].iloc[4] + m.WMT()*df_sf_returns["Average_Adj_Close"].iloc[5] + m.MCD()*df_sf_returns["Average_Adj_Close"].iloc[6]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['JPM', 'AAPL', 'HD', 'V', 'WMT', 'MCD']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o GOOG: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o GOOG: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(12, 18))

# Iterate through the axes using flatten()
for ax in axes.flatten():  # Use flatten() to iterate over a 1D array
    i = axes.flatten().tolist().index(ax) #Get index for column name
    param_analysis.iloc[:, i].plot(ax=ax, legend=True)  # Plot each column on a separate subplot
    ax.set_xlabel("Risk Level")
    ax.set_ylabel("Optimal Allocation")


plt.tight_layout()
plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling w/o GOOG: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After taking out GOOG, the next most dominate stock is WMT.
  -  We are still seeing that some other stocks like HD, AAPL, V, and JPM are not being incorporated which could be indicating that under the current model parameters, only a very limited set of stocks contributes meaningfully to optimizing the risk-return balance.
  - Without GOOG, the maximum return has decreased significantly to 7% indicating that GOOG is a crucial driver of the portfolio's high return potential.

## Modeling w/o WMT

After removing JNJ, PEP, MSFT, and GOOG from the decision variables, WMT emerged as the most dominant stock. To uncover the next most influential stock, we are now excluding these key components so we can analyze the portfolio composition to see which stock becomes the new leader.

In [None]:
m = ConcreteModel()
# Decision Variables
m.JPM = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.HD = Var(within=NonNegativeReals, bounds= (0,1))
m.V = Var(within=NonNegativeReals, bounds= (0,1))
m.MCD = Var(within=NonNegativeReals, bounds= (0,1))

m.Objective = Objective(expr = df_sf_returns["Average_Adj_Close"].iloc[0] * m.JPM +
                        df_sf_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_sf_returns["Average_Adj_Close"].iloc[2] * m.HD +
                        df_sf_returns["Average_Adj_Close"].iloc[4] * m.V +
                        df_sf_returns["Average_Adj_Close"].iloc[6] * m.MCD, sense = maximize)

MAX(RETURN = 215*JPM + 213*AAPL + 373*HD + 287*V + 279*MCD)

In [None]:
m.total_risk = Constraint(expr = m.JPM + m.AAPL + m.HD + m.V + m.MCD == 0.05)
m.sum_proportion = Constraint(expr = m.JPM + m.AAPL + m.HD + m.V + m.MCD == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.JPM, m.AAPL, m.HD, m.V, m.MCD]
  tickers = ['Adj Close_JPM', 'Adj Close_AAPL', 'Adj Close_HD', 'Adj Close_V', 'Adj Close_MCD']
  risk_exp = 0

  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i] * df_sf_cov.at[tickers[i], tickers[j]] * variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk = 0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005)  # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.JPM(), m.AAPL(), m.HD(), m.V(), m.MCD()]
  returns[r] =  m.JPM()*df_sf_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_sf_returns["Average_Adj_Close"].iloc[1] + m.HD()*df_sf_returns["Average_Adj_Close"].iloc[2] + m.V()*df_sf_returns["Average_Adj_Close"].iloc[4] + m.MCD()*df_sf_returns["Average_Adj_Close"].iloc[6]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['JPM', 'AAPL', 'HD', 'V', 'MCD']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o WMT: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o WMT: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=5, ncols=1, figsize=(12, 18))

# Iterate through the axes using flatten()
for ax in axes.flatten():  # Use flatten() to iterate over a 1D array
    i = axes.flatten().tolist().index(ax) #Get index for column name
    param_analysis.iloc[:, i].plot(ax=ax, legend=True)  # Plot each column on a separate subplot
    ax.set_xlabel("Risk Level")
    ax.set_ylabel("Optimal Allocation")


plt.tight_layout()
plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling w/o WMT: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After taking out WMT, the next most dominate stock is MCD.
  - We are still seeing that some stocks like JPM and HD are not being incorporated which could be indicating that under the current model parameters, only a very limited set of stocks contributes meaningfully to optimizing the risk-return balance.
  - Stocks like V, AAPL, and MCD are starting to play a role in the portfolio's risk-return balance.
  - Without WMT, the maximum return has decreased to 3.5% indicating that WMT s a key driver in enhancing the portfolio's return potential.

## Modeling w/o MCD

After removing JNJ, GOOG, MSFT, PEP, and WMT from the decision variables, MCD emerged as the most dominant stock. To uncover the next most influential stock, we are now excluding these key components so we can analyze the portfolio composition to see which stock becomes the new leader.

In [None]:
m = ConcreteModel()
# Decision Variables
m.JPM = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.HD = Var(within=NonNegativeReals, bounds= (0,1))
m.V = Var(within=NonNegativeReals, bounds= (0,1))

m.Objective = Objective(expr = df_sf_returns["Average_Adj_Close"].iloc[0] * m.JPM +
                        df_sf_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_sf_returns["Average_Adj_Close"].iloc[2] * m.HD +
                        df_sf_returns["Average_Adj_Close"].iloc[4] * m.V, sense = maximize)

MAX(RETURN = 215*JPM + 213*AAPL + 373*HD + 287*V)

In [None]:
m.total_risk = Constraint(expr = m.JPM + m.AAPL + m.HD + m.V == 0.05)
m.sum_proportion = Constraint(expr = m.JPM + m.AAPL + m.HD + m.V == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.JPM, m.AAPL, m.HD, m.V]
  tickers = ['Adj Close_JPM', 'Adj Close_AAPL', 'Adj Close_HD', 'Adj Close_V']
  risk_exp = 0

  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i] * df_sf_cov.at[tickers[i], tickers[j]] * variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk = 0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005)  # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.JPM(), m.AAPL(), m.HD(), m.V()]
  returns[r] =  m.JPM()*df_sf_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_sf_returns["Average_Adj_Close"].iloc[1] + m.HD()*df_sf_returns["Average_Adj_Close"].iloc[2] + m.V()*df_sf_returns["Average_Adj_Close"].iloc[4]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['JPM', 'AAPL', 'HD', 'V']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o MCD: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o MCD: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 18))

# Iterate through the axes using flatten()
for ax in axes.flatten():  # Use flatten() to iterate over a 1D array
    i = axes.flatten().tolist().index(ax) #Get index for column name
    param_analysis.iloc[:, i].plot(ax=ax, legend=True)  # Plot each column on a separate subplot
    ax.set_xlabel("Risk Level")
    ax.set_ylabel("Optimal Allocation")


plt.tight_layout()
plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling w/o MCD: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After taking out MCD, the next most dominate stock is V.
  - We are still seeing that some stocks like JPM and HD are not being incorporated which could be indicating that under the current model parameters, only a very limited set of stocks contributes meaningfully to optimizing the risk-return balance.
  - Without MCD, the maximum return has decreased to 2.5% indicating that MCD was a driver in enhancing the portfolio's return potential.

## Modeling w/o V

After removing JNJ, PEP, MSFT, GOOG, WMT, and MCD from the decision variables, V emerged as the most dominant stock. To uncover the next most influential stock, we are now excluding these key components so we can analyze the portfolio composition to see which stock becomes the new leader.

In [None]:
m = ConcreteModel()
# Decision Variables
m.JPM = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.HD = Var(within=NonNegativeReals, bounds= (0,1))

m.Objective = Objective(expr = df_sf_returns["Average_Adj_Close"].iloc[0] * m.JPM +
                        df_sf_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_sf_returns["Average_Adj_Close"].iloc[2] * m.HD, sense = maximize)

MAX(RETURN = 215*JPM + 213*AAPL + 373*HD)

In [None]:
m.total_risk = Constraint(expr = m.JPM + m.AAPL + m.HD == 0.05)
m.sum_proportion = Constraint(expr = m.JPM + m.AAPL + m.HD == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.JPM, m.AAPL, m.HD]
  tickers = ['Adj Close_JPM', 'Adj Close_AAPL', 'Adj Close_HD']
  risk_exp = 0

  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i] * df_sf_cov.at[tickers[i], tickers[j]] * variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk = 0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005)  # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.JPM(), m.AAPL(), m.HD()]
  returns[r] =  m.JPM()*df_sf_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_sf_returns["Average_Adj_Close"].iloc[1] + m.HD()*df_sf_returns["Average_Adj_Close"].iloc[2]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['JPM', 'AAPL', 'HD']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o V: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o V: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(12, 18))

# Iterate through the axes using flatten()
for ax in axes.flatten():  # Use flatten() to iterate over a 1D array
    i = axes.flatten().tolist().index(ax) #Get index for column name
    param_analysis.iloc[:, i].plot(ax=ax, legend=True)  # Plot each column on a separate subplot
    ax.set_xlabel("Risk Level")
    ax.set_ylabel("Optimal Allocation")


plt.tight_layout()
plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling w/o V: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After taking out V, the next most dominate stock is AAPL.
  - All the stocks, AAPL, JPM, and HD are being incorporated to the portfolio.
  - Without V, the maximum return has decreased to 2.0% indicating that MCD was a driver in enhancing the portfolio's return potential and the return is going to decrease as we take out more and more variables.

## Modeling w/o AAPL

After removing JNJ, PEP, MSFT, GOOG, WMT, V, and MCD from the decision variables, AAPL emerged as the most dominant stock. To uncover the next most influential stock, we are now excluding these key components so we can analyze the portfolio composition to see which stock becomes the new leader.

In [None]:
m = ConcreteModel()
# Decision Variables
m.JPM = Var(within=NonNegativeReals, bounds= (0,1))
m.HD = Var(within=NonNegativeReals, bounds= (0,1))

m.Objective = Objective(expr = df_sf_returns["Average_Adj_Close"].iloc[0] * m.JPM +
                        df_sf_returns["Average_Adj_Close"].iloc[2] * m.HD, sense = maximize)

MAX(RETURN = 215*JPM + 373*HD)

In [None]:
m.total_risk = Constraint(expr = m.JPM + m.HD == 0.05)
m.sum_proportion = Constraint(expr = m.JPM + m.HD == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.JPM, m.HD]
  tickers = ['Adj Close_JPM', 'Adj Close_HD']
  risk_exp = 0

  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i] * df_sf_cov.at[tickers[i], tickers[j]] * variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk = 0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005)  # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.JPM(), m.HD()]
  returns[r] =  m.JPM()*df_sf_returns["Average_Adj_Close"].iloc[0] + m.HD()*df_sf_returns["Average_Adj_Close"].iloc[2]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['JPM', 'HD']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o AAPL: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o AAPL: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 18))

# Iterate through the axes using flatten()
for ax in axes.flatten():  # Use flatten() to iterate over a 1D array
    i = axes.flatten().tolist().index(ax) #Get index for column name
    param_analysis.iloc[:, i].plot(ax=ax, legend=True)  # Plot each column on a separate subplot
    ax.set_xlabel("Risk Level")
    ax.set_ylabel("Optimal Allocation")


plt.tight_layout()
plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling w/o AAPL: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After taking out AAPL, the next most dominant stock is JPM.
  - Without AAPL the maximum return has stayed the same at 2%.
  - Additionally, the optimized allocations for JPM and HD are very close to each other, suggesting that both stocks are nearly equally effective in driving the portfolio's performance.

## Modeling w/o JPM

After removing PEP, JNJ, MSFT, GOOG, WMT, V, MCD, and AAPL from the decision variables, JPM emerged as the most dominant stock. We are going to analyze how HD does by itself.

In [None]:
m = ConcreteModel()
# Decision Variables
m.HD = Var(within=NonNegativeReals, bounds= (0,1))

m.Objective = Objective(expr = df_sf_returns["Average_Adj_Close"].iloc[2] * m.HD, sense = maximize)

MAX(RETURN = 373*HD)

In [None]:
m.total_risk = Constraint(expr = m.HD == 0.05)
m.sum_proportion = Constraint(expr = m.HD == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.HD]
  tickers = ['Adj Close_HD']
  risk_exp = 0

  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i] * df_sf_cov.at[tickers[i], tickers[j]] * variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk = 0.05

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0005)  # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.HD()]
  returns[r] = m.HD()*df_sf_returns["Average_Adj_Close"].iloc[2]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['HD']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling w/o JPM: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling w/o JPM: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(12, 18))

# Remove or comment out the for loop and directly plot on 'axes'
param_analysis.iloc[:, 0].plot(ax=axes, legend=True)  # Plot the first (and only) column on the axes
axes.set_xlabel("Risk Level")  # Set x-axis label for the axes
axes.set_ylabel("Optimal Allocation")  # Set y-axis label for the axes


plt.tight_layout()
plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling w/o JPM: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

After removing all decision variables except for HD, the portfolio's maximum return collapses dramatically—from 200% to just 2.5%, indicating that without the key components, the portfolio loses nearly all of its return-generating power.

## Modeling 10%

In [None]:
m = ConcreteModel()
# Decision Variables
m.JPM = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.HD = Var(within=NonNegativeReals, bounds= (0,1))
m.MSFT = Var(within=NonNegativeReals, bounds= (0,1))
m.V = Var(within=NonNegativeReals, bounds= (0,1))
m.WMT = Var(within=NonNegativeReals, bounds= (0,1))
m.MCD = Var(within=NonNegativeReals, bounds= (0,1))
m.JNJ = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))
m.PEP = Var(within=NonNegativeReals, bounds= (0,1))

In [None]:
m.Objective = Objective(expr = df_sf_returns["Average_Adj_Close"].iloc[0] * m.JPM +
                        df_sf_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_sf_returns["Average_Adj_Close"].iloc[2] * m.HD +
                        df_sf_returns["Average_Adj_Close"].iloc[3] * m.MSFT +
                        df_sf_returns["Average_Adj_Close"].iloc[4] * m.V +
                        df_sf_returns["Average_Adj_Close"].iloc[5] * m.WMT +
                        df_sf_returns["Average_Adj_Close"].iloc[6] * m.MCD +
                        df_sf_returns["Average_Adj_Close"].iloc[7] * m.JNJ +
                        df_sf_returns["Average_Adj_Close"].iloc[8] * m.GOOG +
                        df_sf_returns["Average_Adj_Close"].iloc[9] * m.PEP, sense = maximize)

MAX(RETURN = 215*JPM + 213*AAPL + 373*HD + 422*MSFT + 287*V + 75*WMT + 279*MCD + 152*JNJ + 171*GOOG + 163*PEP)

In [None]:
m.total_risk = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.JNJ + m.GOOG + m.PEP == 0.1)
m.sum_proportion = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.JNJ + m.GOOG + m.PEP == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.JPM, m.AAPL, m.HD, m.MSFT, m.V, m.WMT, m.MCD, m.JNJ, m.GOOG, m.PEP]
  tickers = ['Adj Close_JPM', 'Adj Close_AAPL', 'Adj Close_HD', 'Adj Close_MSFT', 'Adj Close_V', 'Adj Close_WMT', 'Adj Close_MCD', 'Adj Close_JNJ', 'Adj Close_GOOG', 'Adj Close_PEP']
  risk_exp = 0

  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i] * df_sf_cov.at[tickers[i], tickers[j]] * variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk = 0.1

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.001)  # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.JPM(), m.AAPL(), m.HD(), m.MSFT(), m.V(), m.WMT(), m.MCD(), m.JNJ(), m.GOOG(), m.PEP()]
  returns[r] =  m.JPM()*df_sf_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_sf_returns["Average_Adj_Close"].iloc[1] + m.HD()*df_sf_returns["Average_Adj_Close"].iloc[2] + m.MSFT()*df_sf_returns["Average_Adj_Close"].iloc[3] + m.V()*df_sf_returns["Average_Adj_Close"].iloc[4] + m.WMT()*df_sf_returns["Average_Adj_Close"].iloc[5] + m.MCD()*df_sf_returns["Average_Adj_Close"].iloc[6] + m.JNJ()*df_sf_returns["Average_Adj_Close"].iloc[7] + m.GOOG()*df_sf_returns["Average_Adj_Close"].iloc[8] + m.PEP()*df_sf_returns["Average_Adj_Close"].iloc[9]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['JPM', 'AAPL', 'HD', 'MSFT', 'V', 'WMT', 'MCD', 'JNJ', 'GOOG', 'PEP']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling 10%: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling 10%: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=5, ncols=2, figsize=(12, 18))
param_analysis.plot(
    subplots=True,
    layout=(5, 2),       # 5 rows, 2 columns
    ax=axes,             # use our custom axes
    sharex=False,
    sharey=False,
    legend=True
)

for row in axes:
    for ax in row:
        ax.set_xlabel("Risk Level")
        ax.set_ylabel("Optimal Allocation")

plt.tight_layout()
plt.show()


# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling 10%: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

The Optimal Stock Allocation for Different Risk Levels charts (stacked bar graph for better visibility) is showing how an "optimal" stock allocation (y-axis) breaks down across different risk levels (x-axis).
  - In this case the risk levels ranges from (0.0003, 0.1).

- Two stocks, PEP and JNJ, dominates all of the bars on the stacked bar graph.
  - This indicates that the model often allocates PEP and JNJ in higher proportions and is always included in the portfolio.
- Stocks like GOOG, V, and MSFT are also often included in the portfolio but at a lower proportion than JNJ.
- JPM, AAPL, HD, WMT, MCD, are also included in the portfolio once at risk level 0.0813 but at a miniscule proportion.

The Efficient Frontier shows how returns change with different levels of risk.
  - The non-linear optimization reveals thresholds where a marginal change in risk constraints triggers a substantial rebalancing of asset allocations, yielding higher expected returns while also increasing volatility.
  - There are small changes in risk which are causing big increases in the portfolio's return.
    - This can be due to the portfolio being highly sensitive to minor changes in risk parameters.
  - At specific risk levels, a tiny increase in allowed risk allows the model to pick a very different mix of stocks that can result in significantly higher returns.
    - For example, specifically at risk levels of 0.0813 the model produces an unusal mix of stocks (all 10 stocks) and allocations, confirming that a critical threshold is reached within the optimization process.
  - Returns dropped from 200 to 175 after increasing the maximum risk to 10%.
  - Overall, while higher risk generally leads to higher returns, the relationship is not linear.
    - The chart illustrates that returns increase with risk, but there are distinct thresholds where even a slight increase in risk yields a disproportionately large increase in return, resulting in an efficient frontier with sudden jumps rather than a gradual increase.

## Modeling 25%

In [None]:
m = ConcreteModel()
# Decision Variables
m.JPM = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.HD = Var(within=NonNegativeReals, bounds= (0,1))
m.MSFT = Var(within=NonNegativeReals, bounds= (0,1))
m.V = Var(within=NonNegativeReals, bounds= (0,1))
m.WMT = Var(within=NonNegativeReals, bounds= (0,1))
m.MCD = Var(within=NonNegativeReals, bounds= (0,1))
m.JNJ = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))
m.PEP = Var(within=NonNegativeReals, bounds= (0,1))

In [None]:
m.Objective = Objective(expr = df_sf_returns["Average_Adj_Close"].iloc[0] * m.JPM +
                        df_sf_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_sf_returns["Average_Adj_Close"].iloc[2] * m.HD +
                        df_sf_returns["Average_Adj_Close"].iloc[3] * m.MSFT +
                        df_sf_returns["Average_Adj_Close"].iloc[4] * m.V +
                        df_sf_returns["Average_Adj_Close"].iloc[5] * m.WMT +
                        df_sf_returns["Average_Adj_Close"].iloc[6] * m.MCD +
                        df_sf_returns["Average_Adj_Close"].iloc[7] * m.JNJ +
                        df_sf_returns["Average_Adj_Close"].iloc[8] * m.GOOG +
                        df_sf_returns["Average_Adj_Close"].iloc[9] * m.PEP, sense = maximize)

MAX(RETURN = 215*JPM + 213*AAPL + 373*HD + 422*MSFT + 287*V + 75*WMT + 279*MCD + 152*JNJ + 171*GOOG + 163*PEP)

In [None]:
m.total_risk = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.JNJ + m.GOOG + m.PEP == 0.25)
m.sum_proportion = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.JNJ + m.GOOG + m.PEP == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.JPM, m.AAPL, m.HD, m.MSFT, m.V, m.WMT, m.MCD, m.JNJ, m.GOOG, m.PEP]
  tickers = ['Adj Close_JPM', 'Adj Close_AAPL', 'Adj Close_HD', 'Adj Close_MSFT', 'Adj Close_V', 'Adj Close_WMT', 'Adj Close_MCD', 'Adj Close_JNJ', 'Adj Close_GOOG', 'Adj Close_PEP']
  risk_exp = 0

  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i] * df_sf_cov.at[tickers[i], tickers[j]] * variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk = 0.25

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0025)  # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.JPM(), m.AAPL(), m.HD(), m.MSFT(), m.V(), m.WMT(), m.MCD(), m.JNJ(), m.GOOG(), m.PEP()]
  returns[r] =  m.JPM()*df_sf_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_sf_returns["Average_Adj_Close"].iloc[1] + m.HD()*df_sf_returns["Average_Adj_Close"].iloc[2] + m.MSFT()*df_sf_returns["Average_Adj_Close"].iloc[3] + m.V()*df_sf_returns["Average_Adj_Close"].iloc[4] + m.WMT()*df_sf_returns["Average_Adj_Close"].iloc[5] + m.MCD()*df_sf_returns["Average_Adj_Close"].iloc[6] + m.JNJ()*df_sf_returns["Average_Adj_Close"].iloc[7] + m.GOOG()*df_sf_returns["Average_Adj_Close"].iloc[8] + m.PEP()*df_sf_returns["Average_Adj_Close"].iloc[9]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['JPM', 'AAPL', 'HD', 'MSFT', 'V', 'WMT', 'MCD', 'JNJ', 'GOOG', 'PEP']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling 25%: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling 25%: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=5, ncols=2, figsize=(12, 18))
param_analysis.plot(
    subplots=True,
    layout=(5, 2),       # 5 rows, 2 columns
    ax=axes,             # use our custom axes
    sharex=False,
    sharey=False,
    legend=True
)

for row in axes:
    for ax in row:
        ax.set_xlabel("Risk Level")
        ax.set_ylabel("Optimal Allocation")

plt.tight_layout()
plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling 25%: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

The Optimal Stock Allocation for Different Risk Levels charts (stacked bar graph for better visibility) is showing how an "optimal" stock allocation (y-axis) breaks down across different risk levels (x-axis).
  - In this case the risk levels ranges from (0.0003, 0.25).

- Two stocks, PEP and JNJ, dominates all of the bars on the stacked bar graph.
  - This indicates that the model often allocates PEP and JNJ in higher proportions and is always included in the portfolio.
- Stocks like GOOG, V, and MSFT are also often included in the portfolio but at a lower proportion than JNJ.
- JPM, AAPL, HD, WMT, MCD, are also included in the portfolio once around risk level 0.2 and 0.25 but at a miniscule proportion.

The Efficient Frontier shows how returns change with different levels of risk.
  - The non-linear optimization reveals thresholds where a marginal change in risk constraints triggers a substantial rebalancing of asset allocations, yielding higher expected returns while also increasing volatility.
  - There are small changes in risk which are causing big increases in the portfolio's return.
    - This can be due to the portfolio being highly sensitive to minor changes in risk parameters.
  - Returns increase back to 200 after increasing the maximum risk to 25%.
  - Overall, while higher risk generally leads to higher returns, the relationship is not linear.
    - The chart illustrates that returns increase with risk, but there are distinct thresholds where even a slight increase in risk yields a disproportionately large increase in return, resulting in an efficient frontier with sudden jumps rather than a gradual increase.

## Modeling 50%

In [None]:
m = ConcreteModel()
# Decision Variables
m.JPM = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.HD = Var(within=NonNegativeReals, bounds= (0,1))
m.MSFT = Var(within=NonNegativeReals, bounds= (0,1))
m.V = Var(within=NonNegativeReals, bounds= (0,1))
m.WMT = Var(within=NonNegativeReals, bounds= (0,1))
m.MCD = Var(within=NonNegativeReals, bounds= (0,1))
m.JNJ = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))
m.PEP = Var(within=NonNegativeReals, bounds= (0,1))

In [None]:
m.Objective = Objective(expr = df_sf_returns["Average_Adj_Close"].iloc[0] * m.JPM +
                        df_sf_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_sf_returns["Average_Adj_Close"].iloc[2] * m.HD +
                        df_sf_returns["Average_Adj_Close"].iloc[3] * m.MSFT +
                        df_sf_returns["Average_Adj_Close"].iloc[4] * m.V +
                        df_sf_returns["Average_Adj_Close"].iloc[5] * m.WMT +
                        df_sf_returns["Average_Adj_Close"].iloc[6] * m.MCD +
                        df_sf_returns["Average_Adj_Close"].iloc[7] * m.JNJ +
                        df_sf_returns["Average_Adj_Close"].iloc[8] * m.GOOG +
                        df_sf_returns["Average_Adj_Close"].iloc[9] * m.PEP, sense = maximize)

MAX(RETURN = 215*JPM + 213*AAPL + 373*HD + 422*MSFT + 287*V + 75*WMT + 279*MCD + 152*JNJ + 171*GOOG + 163*PEP)

In [None]:
m.total_risk = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.JNJ + m.GOOG + m.PEP == 0.5)
m.sum_proportion = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.JNJ + m.GOOG + m.PEP == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.JPM, m.AAPL, m.HD, m.MSFT, m.V, m.WMT, m.MCD, m.JNJ, m.GOOG, m.PEP]
  tickers = ['Adj Close_JPM', 'Adj Close_AAPL', 'Adj Close_HD', 'Adj Close_MSFT', 'Adj Close_V', 'Adj Close_WMT', 'Adj Close_MCD', 'Adj Close_JNJ', 'Adj Close_GOOG', 'Adj Close_PEP']
  risk_exp = 0

  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i] * df_sf_cov.at[tickers[i], tickers[j]] * variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk = 0.5

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.005)  # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.JPM(), m.AAPL(), m.HD(), m.MSFT(), m.V(), m.WMT(), m.MCD(), m.JNJ(), m.GOOG(), m.PEP()]
  returns[r] =  m.JPM()*df_sf_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_sf_returns["Average_Adj_Close"].iloc[1] + m.HD()*df_sf_returns["Average_Adj_Close"].iloc[2] + m.MSFT()*df_sf_returns["Average_Adj_Close"].iloc[3] + m.V()*df_sf_returns["Average_Adj_Close"].iloc[4] + m.WMT()*df_sf_returns["Average_Adj_Close"].iloc[5] + m.MCD()*df_sf_returns["Average_Adj_Close"].iloc[6] + m.JNJ()*df_sf_returns["Average_Adj_Close"].iloc[7] + m.GOOG()*df_sf_returns["Average_Adj_Close"].iloc[8] + m.PEP()*df_sf_returns["Average_Adj_Close"].iloc[9]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['JPM', 'AAPL', 'HD', 'MSFT', 'V', 'WMT', 'MCD', 'JNJ', 'GOOG', 'PEP']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling 50%: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling 50%: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=5, ncols=2, figsize=(12, 18))
param_analysis.plot(
    subplots=True,
    layout=(5, 2),       # 5 rows, 2 columns
    ax=axes,             # use our custom axes
    sharex=False,
    sharey=False,
    legend=True
)

for row in axes:
    for ax in row:
        ax.set_xlabel("Risk Level")
        ax.set_ylabel("Optimal Allocation")

plt.tight_layout()
plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling 50%: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

The Optimal Stock Allocation for Different Risk Levels charts (stacked bar graph for better visibility) is showing how an "optimal" stock allocation (y-axis) breaks down across different risk levels (x-axis).
  - In this case the risk levels ranges from (0.0003, 0.5).

- Two stocks, PEP and JNJ, dominates all of the bars on the stacked bar graph.
  - This indicates that the model often allocates PEP and JNJ in higher proportions and is always included in the portfolio.
- Stocks like GOOG, V, and MSFT are also often included in the portfolio but at a lower proportion than JNJ.
- JPM, AAPL, HD, WMT, and MCD, are not included in the portfolio at all.

The Efficient Frontier shows how returns change with different levels of risk.
  - The non-linear optimization reveals thresholds where a marginal change in risk constraints triggers a substantial rebalancing of asset allocations, yielding higher expected returns while also increasing volatility.
  - There are small changes in risk which are causing big increases in the portfolio's return.
    - This can be due to the portfolio being highly sensitive to minor changes in risk parameters.
  - Overall, while higher risk generally leads to higher returns, the relationship is not linear.
    - The chart illustrates that returns increase with risk, but there are distinct thresholds where even a slight increase in risk yields a disproportionately large increase in return, resulting in an efficient frontier with sudden jumps rather than a gradual increase.

## Modeling 75%

In [None]:
m = ConcreteModel()
# Decision Variables
m.JPM = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.HD = Var(within=NonNegativeReals, bounds= (0,1))
m.MSFT = Var(within=NonNegativeReals, bounds= (0,1))
m.V = Var(within=NonNegativeReals, bounds= (0,1))
m.WMT = Var(within=NonNegativeReals, bounds= (0,1))
m.MCD = Var(within=NonNegativeReals, bounds= (0,1))
m.JNJ = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))
m.PEP = Var(within=NonNegativeReals, bounds= (0,1))

In [None]:
m.Objective = Objective(expr = df_sf_returns["Average_Adj_Close"].iloc[0] * m.JPM +
                        df_sf_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_sf_returns["Average_Adj_Close"].iloc[2] * m.HD +
                        df_sf_returns["Average_Adj_Close"].iloc[3] * m.MSFT +
                        df_sf_returns["Average_Adj_Close"].iloc[4] * m.V +
                        df_sf_returns["Average_Adj_Close"].iloc[5] * m.WMT +
                        df_sf_returns["Average_Adj_Close"].iloc[6] * m.MCD +
                        df_sf_returns["Average_Adj_Close"].iloc[7] * m.JNJ +
                        df_sf_returns["Average_Adj_Close"].iloc[8] * m.GOOG +
                        df_sf_returns["Average_Adj_Close"].iloc[9] * m.PEP, sense = maximize)

MAX(RETURN = 215*JPM + 213*AAPL + 373*HD + 422*MSFT + 287*V + 75*WMT + 279*MCD + 152*JNJ + 171*GOOG + 163*PEP)

In [None]:
m.total_risk = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.JNJ + m.GOOG + m.PEP == 0.75)
m.sum_proportion = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.JNJ + m.GOOG + m.PEP == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.JPM, m.AAPL, m.HD, m.MSFT, m.V, m.WMT, m.MCD, m.JNJ, m.GOOG, m.PEP]
  tickers = ['Adj Close_JPM', 'Adj Close_AAPL', 'Adj Close_HD', 'Adj Close_MSFT', 'Adj Close_V', 'Adj Close_WMT', 'Adj Close_MCD', 'Adj Close_JNJ', 'Adj Close_GOOG', 'Adj Close_PEP']
  risk_exp = 0

  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i] * df_sf_cov.at[tickers[i], tickers[j]] * variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk = 0.75

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0075)  # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.JPM(), m.AAPL(), m.HD(), m.MSFT(), m.V(), m.WMT(), m.MCD(), m.JNJ(), m.GOOG(), m.PEP()]
  returns[r] =  m.JPM()*df_sf_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_sf_returns["Average_Adj_Close"].iloc[1] + m.HD()*df_sf_returns["Average_Adj_Close"].iloc[2] + m.MSFT()*df_sf_returns["Average_Adj_Close"].iloc[3] + m.V()*df_sf_returns["Average_Adj_Close"].iloc[4] + m.WMT()*df_sf_returns["Average_Adj_Close"].iloc[5] + m.MCD()*df_sf_returns["Average_Adj_Close"].iloc[6] + m.JNJ()*df_sf_returns["Average_Adj_Close"].iloc[7] + m.GOOG()*df_sf_returns["Average_Adj_Close"].iloc[8] + m.PEP()*df_sf_returns["Average_Adj_Close"].iloc[9]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['JPM', 'AAPL', 'HD', 'MSFT', 'V', 'WMT', 'MCD', 'JNJ', 'GOOG', 'PEP']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling 75%: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling 75%: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=5, ncols=2, figsize=(12, 18))
param_analysis.plot(
    subplots=True,
    layout=(5, 2),       # 5 rows, 2 columns
    ax=axes,             # use our custom axes
    sharex=False,
    sharey=False,
    legend=True
)

for row in axes:
    for ax in row:
        ax.set_xlabel("Risk Level")
        ax.set_ylabel("Optimal Allocation")

plt.tight_layout()
plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling 75%: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

The Optimal Stock Allocation for Different Risk Levels charts (stacked bar graph for better visibility) is showing how an "optimal" stock allocation (y-axis) breaks down across different risk levels (x-axis).
  - In this case the risk levels ranges from (0.0003, 0.75).

- Two stocks, PEP and JNJ, dominates all of the bars on the stacked bar graph.
  - This indicates that the model often allocates PEP and JNJ in higher proportions and is always included in the portfolio.
- Stocks like GOOG, V, and MSFT are also often included in the portfolio but at a lower proportion than JNJ.
- JPM, AAPL, HD, WMT, MCD, are also included in the portfolio at risk levels 0.1053 and 0.2328 but at a miniscule proportion.

The Efficient Frontier shows how returns change with different levels of risk.
  - The non-linear optimization reveals thresholds where a marginal change in risk constraints triggers a substantial rebalancing of asset allocations, yielding higher expected returns while also increasing volatility.
  - There are small changes in risk which are causing big increases in the portfolio's return.
    - This can be due to the portfolio being highly sensitive to minor changes in risk parameters.
  - At specific risk levels, a tiny increase in allowed risk allows the model to pick a very different mix of stocks that can result in significantly higher returns.
    - For example, specifically at risk levels of 0.1053 the model produces an unusal mix of stocks (all 10 stocks) and allocations, confirming that a critical threshold is reached within the optimization process.
  - Returns increase from 200 to 250 after increasing the maximum risk to 75%.
  - Overall, while higher risk generally leads to higher returns, the relationship is not linear.
    - The chart illustrates that returns increase with risk, but there are distinct thresholds where even a slight increase in risk yields a disproportionately large increase in return, resulting in an efficient frontier with sudden jumps rather than a gradual increase.

## Modeling 95%

In [None]:
m = ConcreteModel()
# Decision Variables
m.JPM = Var(within=NonNegativeReals, bounds= (0,1))
m.AAPL = Var(within=NonNegativeReals, bounds= (0,1))
m.HD = Var(within=NonNegativeReals, bounds= (0,1))
m.MSFT = Var(within=NonNegativeReals, bounds= (0,1))
m.V = Var(within=NonNegativeReals, bounds= (0,1))
m.WMT = Var(within=NonNegativeReals, bounds= (0,1))
m.MCD = Var(within=NonNegativeReals, bounds= (0,1))
m.JNJ = Var(within=NonNegativeReals, bounds= (0,1))
m.GOOG = Var(within=NonNegativeReals, bounds= (0,1))
m.PEP = Var(within=NonNegativeReals, bounds= (0,1))

In [None]:
m.Objective = Objective(expr = df_sf_returns["Average_Adj_Close"].iloc[0] * m.JPM +
                        df_sf_returns["Average_Adj_Close"].iloc[1] * m.AAPL +
                        df_sf_returns["Average_Adj_Close"].iloc[2] * m.HD +
                        df_sf_returns["Average_Adj_Close"].iloc[3] * m.MSFT +
                        df_sf_returns["Average_Adj_Close"].iloc[4] * m.V +
                        df_sf_returns["Average_Adj_Close"].iloc[5] * m.WMT +
                        df_sf_returns["Average_Adj_Close"].iloc[6] * m.MCD +
                        df_sf_returns["Average_Adj_Close"].iloc[7] * m.JNJ +
                        df_sf_returns["Average_Adj_Close"].iloc[8] * m.GOOG +
                        df_sf_returns["Average_Adj_Close"].iloc[9] * m.PEP, sense = maximize)

MAX(RETURN = 215*JPM + 213*AAPL + 373*HD + 422*MSFT + 287*V + 75*WMT + 279*MCD + 152*JNJ + 171*GOOG + 163*PEP)

In [None]:
m.total_risk = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.JNJ + m.GOOG + m.PEP == 0.95)
m.sum_proportion = Constraint(expr = m.JPM + m.AAPL + m.HD + m.MSFT + m.V + m.WMT + m.MCD + m.JNJ + m.GOOG + m.PEP == 1)
m.return_floor = Constraint(expr = m.Objective >= 0.01)

In [None]:
def calc_risk(m):
  variables = [m.JPM, m.AAPL, m.HD, m.MSFT, m.V, m.WMT, m.MCD, m.JNJ, m.GOOG, m.PEP]
  tickers = ['Adj Close_JPM', 'Adj Close_AAPL', 'Adj Close_HD', 'Adj Close_MSFT', 'Adj Close_V', 'Adj Close_WMT', 'Adj Close_MCD', 'Adj Close_JNJ', 'Adj Close_GOOG', 'Adj Close_PEP']
  risk_exp = 0

  for i in range(len(variables)):
    for j in range(len(variables)):
      risk_exp += variables[i] * df_sf_cov.at[tickers[i], tickers[j]] * variables[j]
  return risk_exp

expr_risk = calc_risk(m)

max_risk = 0.95

import numpy as np
risk_limits = np.arange(0.0003, max_risk, 0.0095)  # take tiny steps
risk_limits

In [None]:
param_analysis = {} #paramater analysis --- risk vs return
returns = {} #{} dict --
for r in risk_limits:
  m.del_component(m.total_risk)
  m.total_risk = Constraint(expr = expr_risk <= r)
  result = SolverFactory('ipopt', executable=ipopt_executable).solve(m).write()
  param_analysis[r] = [m.JPM(), m.AAPL(), m.HD(), m.MSFT(), m.V(), m.WMT(), m.MCD(), m.JNJ(), m.GOOG(), m.PEP()]
  returns[r] =  m.JPM()*df_sf_returns["Average_Adj_Close"].iloc[0] + m.AAPL()*df_sf_returns["Average_Adj_Close"].iloc[1] + m.HD()*df_sf_returns["Average_Adj_Close"].iloc[2] + m.MSFT()*df_sf_returns["Average_Adj_Close"].iloc[3] + m.V()*df_sf_returns["Average_Adj_Close"].iloc[4] + m.WMT()*df_sf_returns["Average_Adj_Close"].iloc[5] + m.MCD()*df_sf_returns["Average_Adj_Close"].iloc[6] + m.JNJ()*df_sf_returns["Average_Adj_Close"].iloc[7] + m.GOOG()*df_sf_returns["Average_Adj_Close"].iloc[8] + m.PEP()*df_sf_returns["Average_Adj_Close"].iloc[9]

In [None]:
# Generate proportion of the portfolio for each risk limit
param_analysis = pd.DataFrame.from_dict(param_analysis, orient='index')
param_analysis.columns = [['JPM', 'AAPL', 'HD', 'MSFT', 'V', 'WMT', 'MCD', 'JNJ', 'GOOG', 'PEP']]
param_analysis.index.name = 'Risk Limit'
param_analysis.plot(figsize=(10, 6))
plt.title('Modeling 95%: Optimal Stock Allocation for Diffrent Risk Levels')
plt.xlabel('Risk Label')
plt.ylabel('Optimal Stock Allocation')
plt.show()

# Stacked bar chart of the portfolio for each risk limit
plt.figure(figsize=(30, 20))
param_analysis.plot(kind='bar', stacked=True, ax=plt.gca(), width=1.0)
plt.xticks(rotation=45, ha='right')
plt.title("Modeling 95%: Stacked Bar Chart of Optimal Stock Allocation for Different Risk Levels", fontsize=14)
plt.xlabel("Risk Level", fontsize=12)
plt.ylabel("Allocation", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Stocks")
plt.tight_layout()

plt.show()

# Seperate graphs for each stock
fig, axes = plt.subplots(nrows=5, ncols=2, figsize=(12, 18))
param_analysis.plot(
    subplots=True,
    layout=(5, 2),       # 5 rows, 2 columns
    ax=axes,             # use our custom axes
    sharex=False,
    sharey=False,
    legend=True
)

for row in axes:
    for ax in row:
        ax.set_xlabel("Risk Level")
        ax.set_ylabel("Optimal Allocation")

plt.tight_layout()
plt.show()

# Find and print risk and reward
risk = list(returns.keys())
print(f"Risk", risk)
reward = list(returns.values())
print(f"Reward", reward)
print('\t')

from pylab import *
# Plot risk and reward
plt.figure(figsize=(10,6))
plot(risk, reward, '-.')
title('Modeling 95%: The Efficient Frontier')
xlabel('Risk')
ylabel('Reward (Return)')
plt.show()

The Optimal Stock Allocation for Different Risk Levels charts (stacked bar graph for better visibility) is showing how an "optimal" stock allocation (y-axis) breaks down across different risk levels (x-axis).
  - In this case the risk levels ranges from (0.0003, 0.95).

- Two stocks, PEP and JNJ, dominates all of the bars on the stacked bar graph.
  - This indicates that the model often allocates PEP and JNJ in higher proportions and is always included in the portfolio.
- Stocks like GOOG, V, and MSFT are also often included in the portfolio but at a lower proportion than JNJ.
- JPM, AAPL, HD, WMT, and MCD, are not included in the portfolio at all.

The Efficient Frontier shows how returns change with different levels of risk.
  - The non-linear optimization reveals thresholds where a marginal change in risk constraints triggers a substantial rebalancing of asset allocations, yielding higher expected returns while also increasing volatility.
  - There are small changes in risk which are causing big increases in the portfolio's return.
    - This can be due to the portfolio being highly sensitive to minor changes in risk parameters.
  - Returns drop back to 200 after increasing the maximum risk to 95%.
  - Overall, while higher risk generally leads to higher returns, the relationship is not linear.
    - The chart illustrates that returns increase with risk, but there are distinct thresholds where even a slight increase in risk yields a disproportionately large increase in return, resulting in an efficient frontier with sudden jumps rather than a gradual increase.

## Scott's Model Conclusion

-  Across different risk levels, a few stocks—specifically PEP and JNJ—consistently dominate the portfolio. Their presence is pivotal in achieving high returns, while other stocks only contribute at specific risk thresholds.

- The Efficient Frontier reveals that even marginal changes in risk constraints can trigger significant rebalancing. For example, at a risk level of 0.1053, the model produces an unusual mix of stocks, underscoring that small adjustments can lead to disproportionately large increases in return.

- Although higher risk levels generally lead to higher returns, the increase is not linear. While our baseline model achieves 200% return at a 5% risk level, higher risk models (up to 95%) do not consistently improve the risk–return balance; in some cases, returns plateau or even decline due to increased volatility and concentration risk.

Overall, these models highlight that while expanding risk tolerance allows for broader stock allocations and potentially higher returns, it also introduces instability. The efficiency of the portfolio depends on striking the right balance, as even slight changes in risk constraints can have dramatic impacts on return potential.