# Group Project Outline

## Objective: 
### Simulate one portfolio composed with top stocks in top industries (growth-wise), using historical stock return data, and measure its performance in aspects of the Sharp Ratio of the portfolio. 

## Steps: 
### Use historical stock price data to: 
#### 1. Compute one-month average returns of all the stocks in S&P 500 (say we use the historical 25 years data, then we have 25 x 12 = 300 monthly average returns on one stock. We don't want to use daily returns because thay may be too much data).

#### 2. Use the average returns to do 500 regression models and find the slope of the regression line and store those slopes in one dataframe. By now, we have one corresponding number for one stock that represents the growth of the stock, higher the slope higher the return. 

#### 3. Catagorize the slope data we have into sectors (industries), and we can calculate the average of those slope numbers in each industry. This average may represent the overall trend of the slope (don't know if this make sense mathematically though). 

#### 4. Find the top five industries that have the highest average and build a correlation table. See if one industries growth has something to do with another industry's growth (we can also pick the industries that we think should have close correlation). 

#### 5. For the picked-up industries in #4, we pick five stocks that has the highest regression slope number and do a correlation table based on their historical return averages that we calculated in #1. 

#### 6. If possible, we can use the historical returns of those five stocks and calculate the CAPM beta between the selected stocks and the corresponsing industry (here, we can look at the homeword we did on CAPM beta, in that homework we calculated the beta of one stock to the whole market, but now we focus on the beta between the stock and its industry. Shouldn't be too hard cuz we already have the code). 

#### 7. If possible, we can use the historical returns of selected industries and calculate the CAPM beta between the selected industry to the whole market.

#### 8. Simulate a portfolio with the top stocks in each sector and calculate the Sharp Ratio using their historical returns. (This shouldn't be hard cuz Sharp Ratio is one simple formula.)

## Results
#### We will discover the highest return industry, its correlation with other industries, and its beta with the market. 
#### We will discover the hottest stock in the top industries and their beta with their corresponding industries. 
#### We simulated one portofolio with stocks we picked up based on the regression slope (which represent the growth of the stock) and we will have different betas (between the stock and the industry, and between the industries and the market), and the Sharp Ratio of the portfolio. 

## Methods used: 
#### Regression (whish means visualization)
#### Functions
#### Class 
#### Maybe unittest to test classes 
#### Dataframe operations 
#### ...

## Need to decide: 
#### Steps needed 
#### Methods needed
#### If the math in the steps make sense
#### Rephrase the Steps to make them clearer and more instructive for operational purposes
#### ...


# Import Packages Needed

In [3]:
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns
import yfinance as yf
import pdb
import math
import scipy.stats as sp
import statsmodels.api as sm
import datetime as dt

# Clean Data

In [4]:
df = pd.read_csv('financials.csv')
df1 = df['Symbol']

In [20]:
for i in df1: 
    myticker = yf.Ticker(i)
    history = myticker.history()

- AET: No data found for this date range, symbol may be delisted
- ALXN: No data found, symbol may be delisted
- AGN: No data found, symbol may be delisted
- APC: No data found, symbol may be delisted
- ANDV: No data found for this date range, symbol may be delisted
- BHGE: No data found, symbol may be delisted
- BBT: No data found, symbol may be delisted


KeyboardInterrupt: 

In [5]:
missing = ('AET', 'ALXN', 'AGN', 'ARNC', 'APC','ANDV','BHGE','BBT','BRK.B','BF.B','CHK','CA','CBG','CBS','CELG','CTL','XEC','CXO','CSRA','DWDP','DPS','ETFC','EVHC','ESRX','FLIR','GGP','HRS','HCP','JEC','LB','LUK','KORS','MYL','NFX','NBL','PCLN','RTN','RHT','COL','SCG','SNI','STI','SYMC','TIF','TWX','TMK','TSS','UTX','VAR','VIAB','HCN','WYN')

In [8]:
df = df.set_index('Symbol')
for i in missing:
    df=df.drop(index=i)
sp500 = df.reset_index()

In [9]:
tickers = sp500['Symbol']
industries = sp500['Sector'].unique()
df_industry = pd.DataFrame(industries)
df_industry.columns = (['Industries'])

In [10]:
total_length = len(symbols)
for i in symbols:
    my_index = symbols.tolist().index(i)
    data = []
    ticker = yf.Ticker(i)
    history_i = ticker.history(start='1991-01-01', end='2021-01-01')
    daily_return_i = history_i['Close'].pct_change(1).dropna()
    mean_return_i = daily_return_i.mean()
    sector_i = df["Sector"][my_index]
    data.append(i)
    data.append(mean_return_i)
    data.append(sector_i)
    print(data)

NameError: name 'symbols' is not defined

# Stock Operation Class

In [11]:
class Stock:
    def __init__ (self, ticker):
        self.ticker = ticker
        self.industry = sp500['Sector'][sp500[sp500['Symbol'] == self.ticker].index.tolist()].tolist()[0]
    def monthly_avg(self):
        history = yf.Ticker(self.ticker).history(start='2000-01-01', end='2020-12-31').reset_index()
        monthly_avg = history.groupby(pd.PeriodIndex(history['Date'], freq="M"))['Close'].mean()
        return monthly_avg
    def plot(self):
        data = self.monthly_avg()
        plot = data.plot()
    def reg_slope(self):
        data = self.monthly_avg()
        data = data.reset_index()
        data['Month'] = range(1,len(data)+1)
        X = data['Month']
        y = data["Close"]
        model = sm.OLS(y, X).fit()
        predictions = model.predict(X)
        slope = model.params.values
        return slope
    def price_change(self):
        data = pd.DataFrame(self.monthly_avg())
        data = data.reset_index()
        begin = data['Close'][data[data['Date'] == '2000-01'].index.tolist()].tolist()
        end = data['Close'][data[data['Date'] == '2020-12'].index.tolist()].tolist()
        price_change = (((end[0]-begin[0]))/begin[0])
        return price_change
    def market_cap(self):
        history = yf.Ticker(self.ticker).history(start='2000-01-01', end='2020-12-31').reset_index()
        volume = history.groupby(pd.PeriodIndex(history['Date'], freq="M"))['Volume'].mean()
        price = self.monthly_avg()
        df_v = pd.DataFrame(volume)
        df_p = pd.DataFrame(price)
        market_cap = pd.concat([df_p, df_v], axis=1)
        market_cap['Market_Cap'] = market_cap['Volume'] * market_cap['Close']
        market_cap.columns = ([self.ticker + '_' + 'Price'] , [self.ticker + '_' + 'Volume'] , [self.ticker + '_' + 'Market_Cap'])
        return market_cap

In [12]:
AMT = Stock('AMT')

In [13]:
a = AMT.monthly_avg()

In [14]:
a = pd.DataFrame(a)

In [15]:
a.columns=(['AMT'])

In [16]:
a

Unnamed: 0_level_0,AMT
Date,Unnamed: 1_level_1
2000-01,30.404594
2000-02,35.784533
2000-03,42.031285
2000-04,37.344010
2000-05,35.084686
...,...
2020-08,244.665739
2020-09,241.617465
2020-10,234.881796
2020-11,231.240077


# S&P Operation Class

In [17]:
class SP500_Operation:
    def __init__ (self, df):
        self.df = df
    def stock_industry_price_change(self):
        data = []
        for i in tickers[:10]:
            try:
                stock = Stock(i)
                data.append((stock.ticker,stock.industry,stock.price_change()))
            except:
                pass
        stock_industry_price_change = pd.DataFrame(data)
        stock_industry_price_change.columns = ['Symbol', 'Industry','Price_Change']
        return stock_industry_price_change
    def stocks_and_returns_in_industry(self, industry):
        df = self.stock_industry_price_change()
        data = df[df['Industry']== industry]
        return data
    def industry_index(self, industry):
        A = Industry_Operation(df_industry)
        industry_market_cap, names, df = A.total_market_cap(industry)
        weight = A.stock_weight_in_industry(industry)
        data = pd.DataFrame()
        for i in names: 
            stock = Stock(i)
            price = stock.monthly_avg()
            price = pd.DataFrame(price)
            price.columns = ([i])
            data[i] = price
        #b = data.mul(weight, fill_value=0) 
        return weight
    def top_stocks_in_industry(self, industry, number_stocks=0):
        df = self.stocks_and_returns_in_industry(industry)
        top_price_change = df.nlargest(number_stocks,'Price_Change')
        return top_price_change

In [18]:
SP500 = SP500_Operation(sp500)

In [19]:
SP500.industry_index('Real Estate')

NameError: name 'Industry_Operation' is not defined

# Industry Operation Class

In [20]:
class Industry_Operation:
    def __init__ (self, df):
        self.df = df
    def total_market_cap(self, industry):
        stocks = sp500['Symbol'][sp500['Sector'] == industry]
        aapl = Stock('AAPL')
        df = pd.DataFrame(aapl.monthly_avg()).drop(['Close'], axis=1) 
        df.reset_index()
        data = pd.DataFrame()
        for i in stocks:
            stock = Stock(i)
            a = stock.market_cap()
            b = a.iloc[:,[2]]
            df = df.merge(b, how='left', on='Date')
            df = df.dropna(axis=1)
            data[industry + ' ' + 'Total Market Cap'] = df.sum(axis=1, numeric_only=True)
        industry_market_cap = pd.DataFrame(data[industry + ' ' + 'Total Market Cap'])
        columns_names = df.columns
        names = []
        for item in columns_names:
            name = item[0]
            tickername = name.split('_')[0]
            names.append(tickername)
        return industry_market_cap, names, df
    def stock_weight_in_industry(self, industry):
        industry_market_cap, names, df = self.total_market_cap(industry)
        weight = df.div(industry_market_cap.iloc[:,0], axis=0)
        return weight

# CAPM 

 
\begin{aligned} &E[R_i] - R_f  = \beta_i ( E[R_m] - R_f ) \\ \\ &\textbf{where:} \\ &ER_i = \text{expected return of stock i} \\ &R_f = \text{risk-free rate} \\ &\beta_i = \text{beta of the stock} \\ &ER_m - R_f = \text{market risk premium} \\ \end{aligned} 


In [21]:
class CAPM:
    def __init__(self,stock,industry):
        self.stock = stock
        self.industry = industry
    def CAPM_stock_industry(self, stock):
        #this is stock monthly percent change
        s = Stock(stock)
        a = s.monthly_avg()
        a = pd.DataFrame(a)
        a.columns=([stock])
        pch = a.pct_change()
        return pch
        
    def CAPM_industry_market(self, industry):
        #implement
        pass
    def beta_sensitivity_stock_industry(self):
        #implement
        pass
    def beta_sensitivity_industry_market(self):
        #implement
        pass
    def beta_sensitivity(self, x, y):
        beta_sensitivity_list = []
        for i in range(len(x)):
            dropped_x_val = x[i]
            dropped_y_val = y[i]
            
            dropped_x = np.delete(y, i).reshape(-1,1)
            dropped_y = np.delete(y, i).reshape(-1,1)
            
            a = np.reshape(dropped_x, (dropped_x.shape[0], 1))
            b = np.reshape(dropped_y, (dropped_y.shape[0], 1))
            
            beta = np.multiply(np.reciprocal(np.dot(a.T, a)), np.dot(a.T, b))
            beta_i = beta[0][0]
            beta_sensitivity_list.append(([i, dropped_y_val], beta_i))
        return beta_sensitivity_list

In [22]:
test1 = CAPM('AAPL','Technology')
test1.CAPM_stock_industry(test1.stock)

Unnamed: 0_level_0,AAPL
Date,Unnamed: 1_level_1
2000-01,
2000-02,0.080098
2000-03,0.151063
2000-04,-0.042003
2000-05,-0.181104
...,...
2020-08,0.229234
2020-09,-0.018221
2020-10,0.011187
2020-11,0.004942


# CAPM # Test Area 🎲🎲🎲

### Get the correlation among selected stocks

In [469]:
top_return_symbols = top_price_change['Symbol']

In [473]:
# Setting the index column for other dataframe to concat
aapl = Stock('AAPL')
top_stock_monthly_avg = pd.DataFrame(aapl.monthly_avg()).drop(['Close'], axis=1) 
# Generating the monthly average stock price for the top stocks in one dataframe
for i in top_return_symbols:
    stock = Stock(i)
    stock_i_monthly_avg = pd.DataFrame(stock.monthly_avg())
    stock_i_monthly_avg.columns=([i])
    top_stock_monthly_avg = pd.concat([stock_i_monthly_avg,top_stock_monthly_avg], axis=1)
    
stock_corr = top_stock_monthly_avg.corr() 
stock_corr

Unnamed: 0,MMM,ABT,AOS,ADBE,ATVI
MMM,1.0,0.84936,0.98329,0.769558,0.919592
ABT,0.84936,1.0,0.875211,0.969427,0.919668
AOS,0.98329,0.875211,1.0,0.818619,0.953318
ADBE,0.769558,0.969427,0.818619,1.0,0.910988
ATVI,0.919592,0.919668,0.953318,0.910988,1.0


In [420]:
aapl = Stock('AAPL')
data = aapl.monthly_avg()
#aapl = pd.DataFrame(data)

In [457]:
abt = Stock('ABT')
df = pd.DataFrame(abt.monthly_avg()).columns=(['ABT'])
df

Unnamed: 0_level_0,Close
Date,Unnamed: 1_level_1
2000-01,8.914329
2000-02,9.112125
2000-03,8.911923
2000-04,10.401491
2000-05,10.793253
...,...
2020-08,100.550693
2020-09,103.330695
2020-10,106.110295
2020-11,108.888888


In [389]:
mmm = Stock('MMM')
data = mmm.monthly_avg()
#mmm = pd.DataFrame(data)

In [396]:
data = pd.concat([aapl, mmm], axis=1)

In [397]:
data.corr()

Unnamed: 0,AAPL,MMM
AAPL,1.0,0.799666
MMM,0.799666,1.0


In [412]:
aapl

Unnamed: 0_level_0,AAPL
Date,Unnamed: 1_level_1
2000-01,0.791221
2000-02,0.854596
2000-03,0.983694
2000-04,0.942376
2000-05,0.771708
...,...
2020-08,116.342647
2020-09,114.222799
2020-10,115.500651
2020-11,116.071418


In [415]:
aapl.drop(['AAPL'],axis=1)

2000-01
2000-02
2000-03
2000-04
2000-05
...
2020-08
2020-09
2020-10
2020-11
2020-12


# Tasks to be done (this is written before thanskgiving)

1. Write instructions on how to use each class and functions -- Bruce
2. Finish def industry_index in SP500 Operation Class -- Kathy 
   Get top industies. Get top stocks in the top industries. 
3. Finish the CAPM Class -- Joseph 
   Get the beta of the selected stocks (beta bw this stock and its inline industry). Get the beta of the selected industries (bw the market). We need SP500 index numbers (proly in yf)
   Demo:
       def CAPM_stock_industry(self, stock)
           i = Stock(stock)
           industry = i.industry 
       def CAPM_industry_market(self, industry)
       def sensitivity_stock_industry
       def sensitivity_industry_market 
4. Unittest -- Kathy 
5. Regression on the stock and the industry (we look for R^2, which explains how much of the stock price chance is contributed by the price change of the industry) (this concept is similar with beta) -- Bruce
6. Sharpe Ratio of the portfolio -- Bruce
7. Portfolio return graph -- Bruce 

## Code part DONE by 26th Nov. 

# Paper!!!!!!!

### Paper DONE by Nov. 30 

# PPT by Dec. 1