# One Portfolio to Rule Them All!

This project will analyze portfolios consisting of different combinations of crypto and securities (markets) to determine which will provide the best return now and into the future. The analysis will be done in the following steps:

1. Collect Data
2. Analyze Data

    a. Performance (returns)
    
    b. Risk
    
3. Monte Carlo Simulation

In [35]:
# Import all necessary libraries
import os
from pathlib import Path
import pandas as pd
import hvplot.pandas
from dotenv import load_dotenv
import alpaca_trade_api as tradeapi
import datetime
from dateutil.relativedelta import relativedelta

## Collect Data

In [9]:
# Import data from API/csv into Pandas DataFrame
load_dotenv()

# Set Alpaca API key and secret
alpaca_api_key=os.getenv('APCA_API_KEY_ID')
alpaca_secret_key=os.getenv('APCA_API_SECRET_KEY')
alpaca_endpoint=os.getenv('APCA_API_BASE_URL')

# Create the Alpaca API object
alpaca = tradeapi.REST()


# *** IN GENERAL ***
# we could parameterize the dates...

# Format current and previous date as ISO format
date = datetime.date.today()
date_fmt = date.strftime('%Y-%m-%d')
today = pd.Timestamp(date_fmt, tz='America/New_York').isoformat()

# Set start date of five years back from today.
# Sample results may vary from the solution based on the time frame chosen
five_yrs_ago = date - relativedelta(years=5)
five_yrs_ago = five_yrs_ago.strftime('%Y-%m-%d')
start_date = pd.Timestamp(five_yrs_ago, tz='America/New_York').isoformat()

# Set the tickers
tickers = ['AAPL', 'MSFT', 'PFE', 'DIS']

# Set timeframe to "1Day" for Alpaca API
timeframe = "1Day"

# Get current closing prices for SPY and AGG
# The current day may be a day when the markets are closed (weekend, holiday, etc.)
# So, if the retrieved portfolio is empty, let's try the previous day.
tickers_df = alpaca.get_bars(tickers, timeframe, start=start_date, end=today).df

while stocks_df.empty:
    date -= relativedelta(days=1)
    date_fmt = date.strftime('%Y-%m-%d')
    today = pd.Timestamp(date_fmt, tz='America/New_York').isoformat()
    tickers_df = alpaca.get_bars(tickers, timeframe, start=start_date, end=today).df

In [36]:
# save to csv
output_file = Path('./stocks_data.csv')
tickers_df.to_csv(output_file)

In [16]:
# Review and clean data
tickers_df_list = [tickers_df[tickers_df['symbol'] == ticker].drop('symbol', axis='columns') for ticker in tickers]
stocks_df = pd.concat(tickers_df_list, axis='columns', join='inner', keys=tickers)
display(stocks_df.head())
display(stocks_df.tail())
tickers_df_list[0]

Unnamed: 0_level_0,AAPL,AAPL,AAPL,AAPL,AAPL,AAPL,AAPL,MSFT,MSFT,MSFT,...,PFE,PFE,PFE,DIS,DIS,DIS,DIS,DIS,DIS,DIS
Unnamed: 0_level_1,open,high,low,close,volume,trade_count,vwap,open,high,low,...,volume,trade_count,vwap,open,high,low,close,volume,trade_count,vwap
timestamp,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2018-04-11 04:00:00+00:00,172.23,173.9232,171.7,172.44,22431883,157149,172.739398,92.0,93.29,91.48,...,13168781,58579,35.834799,100.78,101.65,100.41,100.8,6317596,41366,100.917302
2018-04-12 04:00:00+00:00,173.41,175.0,173.04,174.14,22889285,151617,174.367087,92.43,94.16,92.43,...,22597698,81625,36.333862,101.42,101.51,99.68,100.39,7372808,45450,100.549757
2018-04-13 04:00:00+00:00,174.78,175.84,173.85,174.64,25127526,156694,174.861677,94.05,94.18,92.44,...,16864266,62505,36.328367,101.0,101.52,100.16,100.35,6324606,43426,100.572233
2018-04-16 04:00:00+00:00,175.0301,176.19,174.8301,175.82,21579320,155498,175.646643,94.07,94.66,93.42,...,15116082,68440,36.552518,100.69,101.0,99.73,100.24,10328002,51573,100.166105
2018-04-17 04:00:00+00:00,176.49,178.9365,176.41,178.24,26605711,168081,177.876089,95.0,96.54,94.88,...,16769973,69431,36.388082,101.2,102.59,100.75,102.17,9727561,58888,101.947372


Unnamed: 0_level_0,AAPL,AAPL,AAPL,AAPL,AAPL,AAPL,AAPL,MSFT,MSFT,MSFT,...,PFE,PFE,PFE,DIS,DIS,DIS,DIS,DIS,DIS,DIS
Unnamed: 0_level_1,open,high,low,close,volume,trade_count,vwap,open,high,low,...,volume,trade_count,vwap,open,high,low,close,volume,trade_count,vwap
timestamp,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2023-04-04 04:00:00+00:00,166.595,166.84,165.11,165.63,46323527,456992,165.912926,287.23,290.4499,285.67,...,19514531,117202,40.981528,100.3,100.42,98.76,99.57,6794168,89350,99.465892
2023-04-05 04:00:00+00:00,164.74,165.05,161.8,163.76,51534760,534317,163.491178,285.85,287.15,282.92,...,29486502,147314,41.772305,99.7,100.18,98.632,99.91,7705408,97727,99.675316
2023-04-06 04:00:00+00:00,162.43,164.9584,162.0,164.66,45390035,446212,164.025748,283.21,292.08,282.03,...,25931833,131406,41.568939,99.44,100.32,98.55,99.97,7042486,91244,99.608163
2023-04-10 04:00:00+00:00,161.42,162.03,160.08,162.03,47606637,562222,161.262424,289.208,289.6,284.71,...,15077283,97722,41.537562,99.3,100.81,98.9,100.81,7993515,116038,100.276102
2023-04-11 04:00:00+00:00,162.35,162.36,160.51,161.415,25871310,336427,161.146235,285.75,285.98,281.64,...,6418440,48607,41.854358,101.16,101.91,100.76,100.79,4618569,67178,101.291854


Unnamed: 0_level_0,open,high,low,close,volume,trade_count,vwap
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-04-11 04:00:00+00:00,172.2300,173.9232,171.7000,172.440,22431883,157149,172.739398
2018-04-12 04:00:00+00:00,173.4100,175.0000,173.0400,174.140,22889285,151617,174.367087
2018-04-13 04:00:00+00:00,174.7800,175.8400,173.8500,174.640,25127526,156694,174.861677
2018-04-16 04:00:00+00:00,175.0301,176.1900,174.8301,175.820,21579320,155498,175.646643
2018-04-17 04:00:00+00:00,176.4900,178.9365,176.4100,178.240,26605711,168081,177.876089
...,...,...,...,...,...,...,...
2023-04-04 04:00:00+00:00,166.5950,166.8400,165.1100,165.630,46323527,456992,165.912926
2023-04-05 04:00:00+00:00,164.7400,165.0500,161.8000,163.760,51534760,534317,163.491178
2023-04-06 04:00:00+00:00,162.4300,164.9584,162.0000,164.660,45390035,446212,164.025748
2023-04-10 04:00:00+00:00,161.4200,162.0300,160.0800,162.030,47606637,562222,161.262424


## Analyze Data

### Returns

We will find our daily returns to see how our portfolio performs across each data point (day). Then we will find the cumulative returns to see how our portfolio performs over the entire set of data points (the period).

In [24]:
# Calculate returns (pct_change) for all closing prices
returns_df_list = [stocks_df[ticker]['close'].pct_change() for ticker in tickers]
returns_df = pd.concat(returns_df_list, axis='columns', join='inner', keys = tickers)
returns_df

Unnamed: 0_level_0,AAPL,MSFT,PFE,DIS
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2018-04-11 04:00:00+00:00,,,,
2018-04-12 04:00:00+00:00,0.009859,0.018724,0.014809,-0.004067
2018-04-13 04:00:00+00:00,0.002871,-0.005343,0.000000,-0.000398
2018-04-16 04:00:00+00:00,0.006757,0.011710,0.005782,-0.001096
2018-04-17 04:00:00+00:00,0.013764,0.020176,-0.005475,0.019254
...,...,...,...,...
2023-04-04 04:00:00+00:00,-0.003250,-0.000174,-0.010883,-0.001905
2023-04-05 04:00:00+00:00,-0.011290,-0.009889,0.015892,0.003415
2023-04-06 04:00:00+00:00,0.005496,0.025533,-0.001203,0.000601
2023-04-10 04:00:00+00:00,-0.015972,-0.007579,0.005542,0.008403


In [33]:
# Plot the returns - this is strictly a visual to see if there are any anomalies
returns_df.hvplot(x='timestamp', xlabel='Date', ylabel='Percentage', title='Stock Performance by Returns', group_label='Stock Symbol', width=1200, height=600)

In [32]:
# Calculate the cumulative returns - we want this to be as high as possible
cml_returns_df = (1 + returns_df).cumprod()

In [34]:
# Plot the cumulative returns
cml_returns_df.hvplot(x='timestamp', xlabel='Date', ylabel='Cumulative Returns', title='Stock Performance By Cumulative Returns', group_label='Stock Symbol', width=1200, height=600)

### Risk

Our risk analysis will involve finding the following:

1. Standard Deviation
2. Sharpe Ratio
3. Correlation
4. Beta

#### Standard Deviation

This will tell us how far our data has spread from its mean value. It's an indication that is proportional to risk.

In [9]:
# Calculate standard deviation

In [10]:
# Create a box plot of std dev

#### Sharpe Ratio

This will tell us how much return we get vs. the amount of risk we incur. The higher the value, the better our returns are against the risk.

In [11]:
# Calculate the Sharpe Ratio

#### Correlation

This shows how diversified our portfolio is by examining its linearity. Lower corrleation means more diversification.

In [13]:
# Calculate the correlation

In [14]:
# Display the correlation matrix

In [15]:
# Plot the correlation heatmap

#### Beta

This is a proportional measure of the volatility of our portfolio relative to the market.

In [16]:
# Calculate the Beta

## Monte Carlo Simulation

This will forecast our cumulative returns for many years into the future. 

In [18]:
# Take our cleaned DataFrame from API and convert to appropriate currency

In [20]:
# Reorganize the DataFrame by tickers and combine them into a single DataFrame

In [21]:
# Setup initial equal weighting for tickers

In [23]:
# Setup & Run Monte Carlo Simulation for Y-years, W-weightings, N-simulations

In [24]:
# Plot the outcomes, distribution and print out summary