# Stage 1: Generate Stock Universe

- Gather stocks of interest
- Gather stocks from specific criteria (SP500 top 50...)
- Gather stocks from specific portfolio account
- Assemble stock universe 
- Use stock sentiment to select stocks
- Gather price histories

In [1]:
from platform import python_version
import time
from datetime import datetime
import os
import pandas as pd
import numpy as np
import math
from tqdm.notebook import tqdm
import matplotlib.pyplot as plt

%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (20, 8)

# Set the import path for the tools directiory
import sys
# insert at position 1 in the path, as 0 is the path of this file.
sys.path.insert(1, '../tools')
import importlib
import ameritrade_functions as amc
importlib.reload(amc)
import utils
importlib.reload(utils)

print(f'Python version: {python_version()}')
print(f'Pandas version: {pd.__version__}')

Python version: 3.8.10
Pandas version: 0.25.3


## Configure Ameritrade Information

Ameritrade credentials are stored in environment variables to keep from having unencrypted passwords stored on disk.

The module automatically masks the account numbers to protect the actual accounts. An Ameritrade user can have many investment accounts. We will be working with only one for this demonstration.

## Authentication Tokens

To get data from Ameritrade you will need to obtains a short time use token (there is a re-use token, but I have not coded it yet.) You only need to do this if you
are going to use an existing Ameritrade account to define an initial set of stocks to analyze.

To obtain a token, you will need to have a Chrome driver located somewhere on your system. This will allow the module to use your credentials to obtain an authentication token.

For security reasons, I sugges using environment variables to store your credential information. If you store them in property files, or just code them into your notebook, you risk sharing the information with others if you use GitHub or some other SCCS. This also makes it easier to have them availabe from project to project in your development environment

<span style="color:blue">Note: *Account numbers are masked for security purposes.*</span>

In [2]:
username = os.getenv('maiotradeuser')
password = os.getenv('maiotradepw')
client_id = os.getenv('maiotradeclientid')

# For Chromedriver
from pathlib import Path
chrome_executabel_path = str(Path.home()) + r'\Anaconda Projects\chromedriver\chromedriver'

# Make sure we have a data directory
Path('./data').mkdir(parents=True, exist_ok=True) 

# Which account are we interested in
masked_account_number = '#---9216'
account_portfolios_file_name = 'data/portfolio_data.csv'
portfolio_file_name = 'data/portfolio_' + masked_account_number[-4:] + '.csv'
price_histories_file_name = 'data/price_histories.csv'

## Stock Universe

Here we setup the univers. This needs some work. The long term goal is to use a pipeline process to help select stock that are in the top 500 or something similare.

For now we will use stocks from the portfolio, but stocks of interest (high news items), a list of well known stocks (this also has been augmented with some stocks that made Ameritrade's top 10 movers for a couple of days. This Ameritrade funciton has not been coded yet, but should be add down the line to automate pulling these tickers.

In [3]:
snp_500_df = utils.get_snp500()
snp_500_symbols = snp_500_df.index.to_list()
quote_dfs = []
for i in range(0, 500, 100):
    quote_dfs.append(amc.AmeritradeRest(username, password, client_id).get_quotes(snp_500_symbols[i:i+100]))
    
snp_500_quotes_df = pd.concat(quote_dfs, axis=0)
snp_500_quotes_df.describe()

snp_500_tickers = set(snp_500_df.index.to_list())
print(len(snp_500_tickers))

505


In [4]:
dow_tikers = set(utils.get_dow().index.to_list())
print(len(dow_tikers))

30


## Get sentiment data from Finvis

In [5]:
stock_universe = set.union(snp_500_tickers, dow_tikers)
print(len(stock_universe))

505


In [6]:
parsed_and_scored_news = utils.get_finvis_stock_sentiment(snp_500_symbols).sort_values(by='date')
parsed_and_scored_news

Tickers: 100%|██████████████████████████████████████████████████████████| 505/505 [01:46<00:00,  4.75Finvis Postings/s]
News Tables: 100%|████████████████████████████████████████████████████| 503/503 [00:02<00:00, 216.56News Table Items/s]


Unnamed: 0,ticker,date,time,headline,neg,neu,pos,compound
29000,L,2019-10-28,06:09PM,US Indexes Close Higher Monday,0.000,1.000,0.000,0.0000
29005,L,2019-10-28,06:00AM,Loews Corporation Reports Net Income Of $72 Mi...,0.000,1.000,0.000,0.0000
29004,L,2019-10-28,06:11AM,Loews reports slump in third-quarter profit as...,0.000,0.775,0.225,0.4404
29001,L,2019-10-28,03:26PM,Global Markets Advance Monday,0.000,1.000,0.000,0.0000
29002,L,2019-10-28,12:19PM,"Loews (L) Q3 Earnings Miss Estimates, Revenues...",0.167,0.833,0.000,-0.1531
...,...,...,...,...,...,...,...,...
36816,PSX,2021-09-02,11:31AM,Phillips 66 Partners LP (PSXP) Up 4.2% Since L...,0.000,1.000,0.000,0.0000
36817,PSX,2021-09-02,11:31AM,Phillips 66 (PSX) Down 3.4% Since Last Earning...,0.000,1.000,0.000,0.0000
36818,PSX,2021-09-02,11:00AM,Louisiana Refineries Could Take Weeks To Resta...,0.000,1.000,0.000,0.0000
37316,PPG,2021-09-02,08:55AM,PPG Industries (PPG) to Display EV Battery Coa...,0.000,1.000,0.000,0.0000


In [7]:
# Group by date and ticker columns from scored_news and calculate the mean
mean_scores = parsed_and_scored_news.groupby(['ticker','date']).mean().fillna(0)
# Unstack the column ticker
mean_scores = mean_scores.unstack()
# Get the cross-section of compound in the 'columns' axis
mean_scores = mean_scores.xs('compound', axis="columns").transpose().fillna(0)
# Get cusmum score of each stock
cum_scores = mean_scores[-40:].cumsum(axis=0)
current_scores = cum_scores.iloc[-1]
mean_score = current_scores.mean()
stdv_score = current_scores.std()
cutoff = mean_score - stdv_score

print(mean_score, stdv_score, cutoff)

2.2405789601068227 1.6650223285575183 0.5755566315493044


In [8]:
stock_universe = current_scores.where(current_scores > cutoff).dropna().index.to_list()

# Price History data

One you have a set of investments you want to work with, you will need to pull some historical data for them.

We will obtain 5 years of price histories. In the end this will provide us with 2 years of factor data since some of the factors are based on 1 year returns.

In [9]:
number_of_years = 5
price_histories = amc.AmeritradeRest(username, password, client_id).get_price_histories(stock_universe, datetime.today().strftime('%Y-%m-%d'), num_periods=number_of_years)
utils.save_price_histories(price_histories, price_histories_file_name)

Tickers:   0%|          | 0/436 [00:00<?, ?Price Histories/s]

In [10]:
price_histories.head()

Unnamed: 0,open,high,low,close,volume,ticker,date
0,47.23,47.39,46.89,47.0,1424454,A,2016-09-02
224757,24.0,24.35,23.6,23.92,7254945,GPS,2016-09-02
223498,104.04,104.04,103.06,103.41,454770,GPC,2016-09-02
222239,795.27,797.1,793.26,796.87,1349896,GOOGL,2016-09-02
363074,15.82,15.8375,15.455,15.6325,32314968,NVDA,2016-09-02


In [11]:
price_histories = utils.read_price_histories(price_histories_file_name)
close = utils.get_close_values(price_histories)
close.tail()

ticker,A,AAL,AAP,ABBV,ABT,ACN,ADBE,ADI,ADM,ADP,...,WYNN,XEL,XLNX,XOM,XYL,YUM,ZBH,ZBRA,ZION,ZTS
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021-08-27 00:00:00+00:00,173.81,20.22,203.82,119.58,124.59,335.66,658.52,166.48,60.57,208.96,...,101.05,68.7,156.27,55.77,134.94,131.7,150.07,587.96,58.55,203.82
2021-08-30 00:00:00+00:00,176.73,19.51,201.14,120.57,125.94,337.0,665.99,166.98,60.07,209.07,...,99.89,69.03,155.07,55.16,136.48,130.69,149.78,584.78,57.2,205.89
2021-08-31 00:00:00+00:00,175.47,19.94,202.85,120.78,126.37,336.56,663.7,162.95,60.0,209.04,...,101.69,68.75,155.59,54.52,136.31,131.03,150.45,587.17,57.9,204.56
2021-09-01 00:00:00+00:00,175.45,19.64,201.88,112.27,127.4,337.9,665.89,161.11,59.66,206.86,...,100.98,69.75,154.47,53.77,136.55,131.55,150.87,582.78,57.17,206.44
2021-09-02 00:00:00+00:00,177.66,19.755,201.14,111.36,127.95,340.31,663.54,162.19,59.93,206.27,...,102.1,70.07,153.6,54.985,137.3,131.16,148.99,587.92,57.07,208.55
