# Stage 1: Generate Stock Universe

- Gather stocks of interest
- Gather stocks from specific criteria (SP500 top 50...)
- Gather stocks from specific portfolio account
- Assemble stock universe 
- Use stock sentiment to select stocks
- Gather price histories

In [1]:
from platform import python_version
import time
from datetime import datetime
import os
import pandas as pd
import numpy as np
import math
from tqdm.notebook import tqdm
import matplotlib.pyplot as plt

%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (20, 8)

# Set the import path for the tools directiory
import sys
# insert at position 1 in the path, as 0 is the path of this file.
sys.path.insert(1, '../tools')
import importlib
import ameritrade_functions as amc
importlib.reload(amc)
import utils
importlib.reload(utils)

print(f'Python version: {python_version()}')
print(f'Pandas version: {pd.__version__}')

Python version: 3.8.10
Pandas version: 0.25.3


## Configure Ameritrade Information

Ameritrade credentials are stored in environment variables to keep from having unencrypted passwords stored on disk.

The module automatically masks the account numbers to protect the actual accounts. An Ameritrade user can have many investment accounts. We will be working with only one for this demonstration.

## Authentication Tokens

To get data from Ameritrade you will need to obtains a short time use token (there is a re-use token, but I have not coded it yet.) You only need to do this if you
are going to use an existing Ameritrade account to define an initial set of stocks to analyze.

To obtain a token, you will need to have a Chrome driver located somewhere on your system. This will allow the module to use your credentials to obtain an authentication token.

For security reasons, I sugges using environment variables to store your credential information. If you store them in property files, or just code them into your notebook, you risk sharing the information with others if you use GitHub or some other SCCS. This also makes it easier to have them availabe from project to project in your development environment

<span style="color:blue">Note: *Account numbers are masked for security purposes.*</span>

In [2]:
username = os.getenv('maiotradeuser')
password = os.getenv('maiotradepw')
client_id = os.getenv('maiotradeclientid')

# For Chromedriver
from pathlib import Path
chrome_executabel_path = str(Path.home()) + r'\Anaconda Projects\chromedriver\chromedriver'

# Make sure we have a data directory
Path('./data').mkdir(parents=True, exist_ok=True) 

# Which account are we interested in
masked_account_number = '#---9216'
account_portfolios_file_name = 'data/portfolio_data.csv'
portfolio_file_name = 'data/portfolio_' + masked_account_number[-4:] + '.csv'
price_histories_file_name = 'data/price_histories.csv'

## Stock Universe

Here we setup the univers. This needs some work. The long term goal is to use a pipeline process to help select stock that are in the top 500 or something similare.

For now we will use stocks from the portfolio, but stocks of interest (high news items), a list of well known stocks (this also has been augmented with some stocks that made Ameritrade's top 10 movers for a couple of days. This Ameritrade funciton has not been coded yet, but should be add down the line to automate pulling these tickers.

In [3]:
snp_500_df = utils.get_snp500()
display(snp_500_df.head())
snp_500_symbols = snp_500_df.index.to_list()
quote_dfs = []
for i in range(0, 500, 100):
    quote_dfs.append(amc.AmeritradeRest(username, password, client_id).get_quotes(snp_500_symbols[i:i+100]))
    
snp_500_quotes_df = pd.concat(quote_dfs, axis=0)
snp_500_quotes_df.describe()

snp_500_tickers = snp_500_quotes_df.index.to_list()

Unnamed: 0_level_0,Security,SEC filings,GICS Sector,GICS Sub-Industry,Headquarters Location,Date first added,CIK,Founded
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
MMM,3M,reports,Industrials,Industrial Conglomerates,"Saint Paul, Minnesota",1976-08-09,66740,1902
ABT,Abbott Laboratories,reports,Health Care,Health Care Equipment,"North Chicago , Illinois",1964-03-31,1800,1888
ABBV,AbbVie,reports,Health Care,Pharmaceuticals,"North Chicago , Illinois",2012-12-31,1551152,2013 (1888)
ABMD,Abiomed,reports,Health Care,Health Care Equipment,"Danvers, Massachusetts",2018-05-31,815094,1981
ACN,Accenture,reports,Information Technology,IT Consulting & Other Services,"Dublin, Ireland",2011-07-06,1467373,1989


## Get sentiment data from Finvis

In [4]:
snp_500_tickers.index

<function list.index(value, start=0, stop=9223372036854775807, /)>

In [5]:
parsed_and_scored_news = utils.get_finvis_stock_sentiment(snp_500_symbols).sort_values(by='date')
parsed_and_scored_news

Tickers: 100%|██████████████████████████████████████████████████████████| 505/505 [01:58<00:00,  4.27Finvis Postings/s]
News Tables: 100%|████████████████████████████████████████████████████| 503/503 [00:02<00:00, 214.20News Table Items/s]


Unnamed: 0,ticker,date,time,headline,neg,neu,pos,compound
28999,L,2019-10-11,11:36AM,10 things to know about the $210M Loews hotel ...,0.000,1.000,0.000,0.0000
28998,L,2019-10-15,11:36AM,Bleak Near-Term Outlook for Multiline Insuranc...,0.000,1.000,0.000,0.0000
28997,L,2019-10-24,11:27AM,Loews (L) Gears Up to Report Q3 Earnings: What...,0.000,1.000,0.000,0.0000
28996,L,2019-10-28,06:00AM,Diamond Offshore Announces Third Quarter 2019 ...,0.000,0.714,0.286,0.3400
28995,L,2019-10-28,06:00AM,Loews Corporation Reports Net Income Of $72 Mi...,0.000,1.000,0.000,0.0000
...,...,...,...,...,...,...,...,...
40505,SPGI,2021-08-02,07:30AM,S&P Global and IHS Markit Announce Agreement t...,0.000,0.789,0.211,0.4939
40504,SPGI,2021-08-02,08:00AM,August Is Actually A Great Month If You Own Th...,0.000,0.687,0.313,0.6249
23300,HPQ,2021-08-02,12:00AM,Tale of Fake Hewlett-Packard Gear Spurs Arrest...,0.514,0.486,0.000,-0.7506
9903,CVX,2021-08-02,06:00AM,"Joe Geagea, Chevron Executive Vice President, ...",0.000,1.000,0.000,0.0000


In [6]:
# Group by date and ticker columns from scored_news and calculate the mean
mean_scores = parsed_and_scored_news.groupby(['ticker','date']).mean().fillna(0)
# Unstack the column ticker
mean_scores = mean_scores.unstack()
# Get the cross-section of compound in the 'columns' axis
mean_scores = mean_scores.xs('compound', axis="columns").transpose().fillna(0)
# Get cusmum score of each stock
cum_scores = mean_scores[-40:].cumsum(axis=0)
current_scores = cum_scores.iloc[-1]
mean_score = current_scores.mean()
stdv_score = current_scores.std()
cutoff = mean_score - stdv_score

print(mean_score, stdv_score, cutoff)

2.4265159497008453 1.7193471898333408 0.7071687598675045


In [7]:
stock_universe = current_scores.where(current_scores > cutoff).dropna().index.to_list()

# Price History data

One you have a set of investments you want to work with, you will need to pull some historical data for them.

We will obtain 5 years of price histories. In the end this will provide us with 2 years of factor data since some of the factors are based on 1 year returns.

In [8]:
number_of_years = 5
price_histories = amc.AmeritradeRest(username, password, client_id).get_price_histories(stock_universe, datetime.today().strftime('%Y-%m-%d'), num_periods=number_of_years)
utils.save_price_histories(price_histories, price_histories_file_name)

Tickers:   0%|          | 0/436 [00:00<?, ?Price Histories/s]

Empty candle data for GE


In [9]:
price_histories.head()

Unnamed: 0,open,high,low,close,volume,ticker,date
0,47.71,47.75,46.97,47.18,1595948,A,2016-08-02
229476,138.91,139.0,136.82,137.22,3898130,HD,2016-08-02
339523,411.36,411.36,400.17,400.32,243507,MTD,2016-08-02
35224,72.0,72.0,69.95,70.19,728237,ANET,2016-08-02
228218,75.45,76.08,74.645,75.04,3591853,HCA,2016-08-02


In [10]:
price_histories = utils.read_price_histories(price_histories_file_name)
close = utils.get_close_values(price_histories)
close.tail()

ticker,A,AAP,ABBV,ABMD,ABT,ACN,ADBE,ADI,ADM,ADP,...,WYNN,XEL,XLNX,XOM,XYL,YUM,ZBH,ZBRA,ZION,ZTS
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021-07-26 00:00:00+00:00,150.25,214.09,117.79,325.0,119.52,318.98,620.8,163.9,58.44,206.27,...,104.32,68.31,135.44,58.48,124.08,123.45,159.65,545.31,51.48,200.82
2021-07-27 00:00:00+00:00,149.95,215.37,117.96,321.96,119.81,319.89,618.28,162.59,58.85,207.89,...,100.34,69.46,131.0,57.83,123.95,125.48,162.84,540.18,51.68,201.87
2021-07-28 00:00:00+00:00,151.46,212.25,118.55,323.8,120.52,316.31,620.92,164.04,58.64,206.88,...,102.16,68.7,138.54,58.22,122.93,122.61,162.56,545.74,52.36,203.27
2021-07-29 00:00:00+00:00,152.67,213.39,118.87,324.81,121.09,318.35,621.7,166.54,59.6,208.83,...,99.76,68.79,147.25,58.93,125.06,130.31,162.74,548.1,52.84,204.12
2021-07-30 00:00:00+00:00,153.23,212.06,116.3,327.14,120.98,317.68,621.63,167.42,59.72,209.63,...,98.33,68.25,149.84,57.57,125.85,131.39,163.42,552.48,52.15,202.7
