# Stage 1: Generate Stock Universe

- Gather stocks of interest
- Gather stocks from specific criteria (SP500 top 50...)
- Gather stocks from specific portfolio account
- Assemble stock universe 
- Use stock sentiment to select stocks
- Gather price histories

In [9]:
from platform import python_version
import time
from datetime import datetime
import os
import pandas as pd
import numpy as np
import math
from tqdm.notebook import tqdm
import matplotlib.pyplot as plt

%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (20, 8)

# Set the import path for the tools directiory
import sys
# insert at position 1 in the path, as 0 is the path of this file.
sys.path.insert(1, '../tools')
import importlib
import ameritrade_functions as amc
importlib.reload(amc)
import utils
importlib.reload(utils)

print(f'Python version: {python_version()}')
print(f'Pandas version: {pd.__version__}')

Python version: 3.8.10
Pandas version: 0.25.3


## Configure Ameritrade Information

Ameritrade credentials are stored in environment variables to keep from having unencrypted passwords stored on disk.

The module automatically masks the account numbers to protect the actual accounts. An Ameritrade user can have many investment accounts. We will be working with only one for this demonstration.

## Authentication Tokens

To get data from Ameritrade you will need to obtains a short time use token (there is a re-use token, but I have not coded it yet.) You only need to do this if you
are going to use an existing Ameritrade account to define an initial set of stocks to analyze.

To obtain a token, you will need to have a Chrome driver located somewhere on your system. This will allow the module to use your credentials to obtain an authentication token.

For security reasons, I sugges using environment variables to store your credential information. If you store them in property files, or just code them into your notebook, you risk sharing the information with others if you use GitHub or some other SCCS. This also makes it easier to have them availabe from project to project in your development environment

<span style="color:blue">Note: *Account numbers are masked for security purposes.*</span>

In [2]:
username = os.getenv('maiotradeuser')
password = os.getenv('maiotradepw')
client_id = os.getenv('maiotradeclientid')

# For Chromedriver
from pathlib import Path
chrome_executabel_path = str(Path.home()) + r'\Anaconda Projects\chromedriver\chromedriver'

# Make sure we have a data directory
Path('./data').mkdir(parents=True, exist_ok=True) 

# Which account are we interested in
masked_account_number = '#---9216'
account_portfolios_file_name = 'data/portfolio_data.csv'
portfolio_file_name = 'data/portfolio_' + masked_account_number[-4:] + '.csv'
price_histories_file_name = 'data/price_histories.csv'

In [3]:
td_ameritrade = amc.AmeritradeRest(username, password, client_id, chrome_executabel_path)
td_ameritrade.authenticate()

if len(td_ameritrade.authorization) == 0:
    print('Error: No authorization data: {}'.format(td_ameritrade.authorization))
else:
    print('You have authorization')

You have authorization


## Stock Universe

Here we setup the univers. This needs some work. The long term goal is to use a pipeline process to help select stock that are in the top 500 or something similare.

For now we will use stocks from the portfolio, but stocks of interest (high news items), a list of well known stocks (this also has been augmented with some stocks that made Ameritrade's top 10 movers for a couple of days. This Ameritrade funciton has not been coded yet, but should be add down the line to automate pulling these tickers.

## First lets see why stocks we already own for a specific account

I only want to work with Equity investments. This is kind of confusing, but at the account level assets that can be traded are call "EQUITY". When you get quotes for each asset, the same asset can be something like "ETF".

I also use Ameritrade's portfolio planner tool to create an asset mix based off of their reccomendations. I don't want these stocks (or in my case mutual funds and ETFs) to be part of this analysis. So I'll remove them here.

In [4]:
# Specific Portfolio Account
account_portfolio_df = utils.get_account_portfolio_data(td_ameritrade.parse_portfolios_list(), masked_account_number)
equity_investments_df = utils.get_investments_by_type(account_portfolio_df, investment_type='EQUITY')

# Filter out non Equity investments
current_stocks = amc.AmeritradeRest(username, password, client_id).get_quotes(utils.get_investment_symbols(equity_investments_df)).query('assetType == "EQUITY"').index.tolist()
current_investments_df = equity_investments_df[equity_investments_df['symbol'].isin(current_stocks)]
current_investments_df

Unnamed: 0,account,shortQuantity,averagePrice,currentDayProfitLoss,currentDayProfitLossPercentage,longQuantity,settledLongQuantity,settledShortQuantity,marketValue,maintenanceRequirement,currentDayCost,previousSessionLongQuantity,assetType,cusip,symbol,description,type
6,#---9216,0.0,0.21531,-35.775002,-4.14,4500.0,4500.0,0.0,828.23,0.0,0.0,4500.0,EQUITY,Q3860H107,FGPHF,0,0


## Remove other assets

There may be some stocks that you are speculating on and do not want to be part of the analysis. Being a conservative investor, I have a percentage of my active portfolio (that is not part of the portfolio planner), that I have personally speculated on and are using for a long term play. These stocks will not be part of the portfolio optimization.

In [5]:
speculative_stocks = ['FGPHF']
final_investments_df = current_investments_df[~current_investments_df['symbol'].isin(speculative_stocks)]
final_existing_stocks = utils.get_investment_symbols(final_investments_df)
final_existing_stocks 

[]

In [6]:
symbols_of_interest = []
# Hardcoded for now
symbols_via_specific_criteria = []

In [3]:
snp_500_df = utils.get_snp500()
display(snp_500_df.head())
snp_500_symbols = snp_500_df.index.to_list()
quote_dfs = []
for i in range(0, 500, 100):
    quote_dfs.append(amc.AmeritradeRest(username, password, client_id).get_quotes(snp_500_symbols[i:i+100]))
    
snp_500_quotes_df = pd.concat(quote_dfs, axis=0)
snp_500_quotes_df.describe()

snp_500_tickers = snp_500_quotes_df.index.to_list()

Unnamed: 0_level_0,Security,SEC filings,GICS Sector,GICS Sub-Industry,Headquarters Location,Date first added,CIK,Founded
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
MMM,3M,reports,Industrials,Industrial Conglomerates,"Saint Paul, Minnesota",1976-08-09,66740,1902
ABT,Abbott Laboratories,reports,Health Care,Health Care Equipment,"North Chicago , Illinois",1964-03-31,1800,1888
ABBV,AbbVie,reports,Health Care,Pharmaceuticals,"North Chicago , Illinois",2012-12-31,1551152,2013 (1888)
ABMD,Abiomed,reports,Health Care,Health Care Equipment,"Danvers, Massachusetts",2018-05-31,815094,1981
ACN,Accenture,reports,Information Technology,IT Consulting & Other Services,"Dublin, Ireland",2011-07-06,1467373,1989


## Get sentiment data from Finvis

In [5]:
snp_500_tickers.index

<function list.index(value, start=0, stop=9223372036854775807, /)>

In [10]:
parsed_and_scored_news = utils.get_finvis_stock_sentiment(snp_500_symbols).sort_values(by='date')
parsed_and_scored_news

Tickers: 100%|██████████████████████████████████████████████████████████| 505/505 [01:58<00:00,  4.28Finvis Postings/s]
News Tables: 100%|████████████████████████████████████████████████████| 503/503 [00:02<00:00, 210.97News Table Items/s]


Unnamed: 0,ticker,date,time,headline,neg,neu,pos,compound
28998,L,2019-10-10,06:27PM,City of Memphis signs development agreement wi...,0.0,0.758,0.242,0.4939
28997,L,2019-10-11,11:34AM,Is Loews Corporation (L) Going To Burn These H...,0.0,1.000,0.000,0.0000
28996,L,2019-10-11,11:36AM,10 things to know about the $210M Loews hotel ...,0.0,1.000,0.000,0.0000
28995,L,2019-10-15,11:36AM,Bleak Near-Term Outlook for Multiline Insuranc...,0.0,1.000,0.000,0.0000
28994,L,2019-10-24,11:27AM,Loews (L) Gears Up to Report Q3 Earnings: What...,0.0,1.000,0.000,0.0000
...,...,...,...,...,...,...,...,...
30500,MCK,2021-07-22,08:00AM,McKesson Corporation Announces Early Tender Re...,0.0,0.839,0.161,0.3182
30599,MDT,2021-07-22,02:16PM,From Inspiring to Imperative: Celebrating Wome...,0.0,0.480,0.520,0.7579
30600,MDT,2021-07-22,08:07AM,Medtronic's (MDT) Hugo RAS System Now Availabl...,0.0,1.000,0.000,0.0000
30099,MA,2021-07-22,03:11PM,10 Stocks with Over 10 Years of Dividend Incre...,0.0,1.000,0.000,0.0000


In [32]:
# Group by date and ticker columns from scored_news and calculate the mean
mean_scores = parsed_and_scored_news.groupby(['ticker','date']).mean().fillna(0)
# Unstack the column ticker
mean_scores = mean_scores.unstack()
# Get the cross-section of compound in the 'columns' axis
mean_scores = mean_scores.xs('compound', axis="columns").transpose().fillna(0)
# Get cusmum score of each stock
cum_scores = mean_scores[-40:].cumsum(axis=0)
current_scores = cum_scores.iloc[-1]
mean_score = current_scores.mean()
stdv_score = current_scores.std()
cutoff = mean_score - stdv_score

print(mean_score, stdv_score, cutoff)

2.5217170991360103 1.8131439939272571 0.7085731052087532


In [37]:
stock_universe = current_scores.where(current_scores > cutoff).dropna().index.to_list()

# Price History data

One you have a set of investments you want to work with, you will need to pull some historical data for them.

We will obtain 3 years of price histories. In the end this will provide us with 2 years of factor data since some of the factors are based on 1 year returns.

In [38]:
number_of_years = 5
price_histories = amc.AmeritradeRest(username, password, client_id).get_price_histories(stock_universe, datetime.today().strftime('%Y-%m-%d'), num_periods=number_of_years)
utils.save_price_histories(price_histories, price_histories_file_name)

Tickers:   0%|          | 0/435 [00:00<?, ?Price Histories/s]

In [39]:
price_histories.head()

Unnamed: 0,open,high,low,close,volume,ticker,date
0,46.72,46.79,46.42,46.58,1512440,A,2016-07-22
156505,27.43,27.68,27.43,27.54,3491991,DRE,2016-07-22
71950,35.425,35.535,35.3,35.38,2100538,BLL,2016-07-22
430602,57.6,58.24,57.2,57.9,23899275,SBUX,2016-07-22
101246,134.69,135.28,134.5,135.18,640664,CLX,2016-07-22


In [40]:
price_histories = utils.read_price_histories(price_histories_file_name)
close = utils.get_close_values(price_histories)
close.tail()

ticker,A,AAP,ABBV,ABMD,ABT,ACN,ADBE,ADI,ADM,ADP,...,WY,WYNN,XEL,XLNX,XOM,XYL,ZBH,ZBRA,ZION,ZTS
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021-07-16 00:00:00+00:00,148.93,205.85,117.5,317.98,117.51,311.91,606.1,160.44,58.21,205.6,...,34.02,104.7,69.13,128.63,57.32,119.36,153.4,518.33,49.6,199.68
2021-07-19 00:00:00+00:00,147.58,205.67,115.42,314.62,118.79,309.18,602.05,158.98,57.45,201.68,...,32.95,104.74,68.27,129.33,55.35,117.23,150.17,506.91,47.7,197.83
2021-07-20 00:00:00+00:00,149.3,210.23,116.05,323.59,118.73,312.06,608.72,160.91,58.03,203.14,...,33.65,106.57,68.16,131.1,55.96,121.53,155.63,519.67,50.2,200.11
2021-07-21 00:00:00+00:00,148.95,209.4,116.99,322.85,118.98,313.77,612.26,164.84,58.1,204.47,...,34.08,109.59,67.06,135.33,57.76,122.43,159.57,533.77,51.64,198.04
2021-07-22 00:00:00+00:00,150.33,209.6,117.39,322.22,118.3,314.2128,623.11,161.6,57.49,203.04,...,34.12,107.54,67.0,136.07,57.24,122.065,158.25,535.54,50.32,200.48
