# Stock Recommender System

In this notebook, we will create the basic stock recommender system, using knowledge-based approaches, collaborative filtering and ranking (based on stock predictions).

First we load all relevant items:

In [2]:
import pandas as pd
import numpy as np
import sys, os
import sys
sys.path.insert(1, '..')
import recommender as rcmd
from recommender.contrib import fmp_api as fmp
from matplotlib import pyplot as plt
import seaborn as sns
import sklearn_recommender as skr
%matplotlib inline

from sklearn.feature_extraction.text import CountVectorizer
from scipy.spatial.distance import cosine
from sklearn.metrics.pairwise import cosine_similarity

Next we will load and pre-process the relevant data

In [3]:
# retrieve all relevant symbols
stocks = fmp.profile.list_symbols()
cache = rcmd.stocks.Cache()

# load the relevant profile informations
df_profile = cache.load_profile_data()

# generate glove embeddings
skr.glove.download('twitter')
gt = skr.glove.GloVeTransformer('twitter', 25, 'sent', tokenizer=skr.nlp.tokenize_clean)
embs = gt.transform(df_profile['description'].fillna(""))
df_embs = pd.concat([df_profile[['symbol']], pd.DataFrame(embs)], axis=1).set_index('symbol')

# create dummy for categorical values
df_sector_dummy = pd.get_dummies(df_profile['sector'], dummy_na=True, prefix='sector')
df_industry_dummy = pd.get_dummies(df_profile['industry'], dummy_na=True, prefix='industry')
df_exchange_dummy = pd.get_dummies(df_profile['exchange'], dummy_na=True, prefix='exchange')
df_cats = pd.concat([df_profile[['symbol']], df_sector_dummy, df_industry_dummy, df_exchange_dummy], axis=1)

# generate similarity matrix
tf = skr.transformer.SimilarityTransformer(cols=(1, None), index_col='symbol', normalize=True)
df_sim = tf.transform(df_cats)

File found, no download needed




## Knowledge Based Filtering

The first part of filtering we want to do is based on knowledge based filtering (sort of a mixture with content based approaches). We will use the glove embeddings to filter for stocks matching the description. From there on, we can diversify the setting (i.e. fill with stocks until a threshold is reached) using the similarity matrix.

In [7]:
query = 'Healthcare'

# embed the query
query_emb = gt.transform([query])

# rank each stock according to cosine similarity
rank = cosine_similarity(df_embs, query_emb)
df_res = pd.concat([df_profile, pd.DataFrame(rank, columns=['cosine'])], axis=1)
df_res.sort_values(by='cosine', ascending=False).head()

[0.94785204]


Unnamed: 0,beta,ceo,changes,changesPercentage,companyName,description,exchange,image,industry,lastDiv,mktCap,price,range,sector,symbol,volAvg,website,cosine
815,0.747056,Susan R. Salka,-0.09,(-0.15%),AMN Healthcare Services Inc,AMN Healthcare Services Inc is a healthcare st...,New York Stock Exchange,https://financialmodelingprep.com/images-New-j...,Employment Services,0.0,2748525000.0,56.48,45.042-68.2,Industrials,AMN,522923,http://www.amnhealthcare.com,0.947852
8228,0.747056,Susan R. Salka,-0.09,(-0.15%),AMN Healthcare Services Inc,AMN Healthcare Services Inc is a healthcare st...,New York Stock Exchange,https://financialmodelingprep.com/images-New-j...,Employment Services,0.0,2748525000.0,56.48,45.042-68.2,Industrials,,522923,http://www.amnhealthcare.com,0.947852
13289,0.475935,Susan D. DeVore,-0.01,(-0.03%),Premier Inc.,Premier Inc is a healthcare alliance. The Comp...,Nasdaq Global Select,https://financialmodelingprep.com/images-New-j...,Application Software,0.0,4992309000.0,36.35,28.81-47.22,Technology,,664737,https://www.premierinc.com,0.931755
2626,0.475935,Susan D. DeVore,-0.01,(-0.03%),Premier Inc.,Premier Inc is a healthcare alliance. The Comp...,Nasdaq Global Select,https://financialmodelingprep.com/images-New-j...,Application Software,0.0,4992309000.0,36.35,28.81-47.22,Technology,PINC,664737,https://www.premierinc.com,0.931755
8497,0.510477,Pascal Soriot,-0.05,(-0.12%),Astrazeneca PLC,AstraZeneca PLC belongs to the healthcare sect...,New York Stock Exchange,https://financialmodelingprep.com/images-New-j...,Drug Manufacturers,1.9,56552540000.0,44.64,34.38-43.295,Healthcare,,3760723,http://www.astrazeneca.com,0.928569


As we have sorted the data according to rank, we will filter them according to a threshold and find related items to fill them up

In [47]:
cosine_threshold = .92
sim_threshold = .65
max_stocks = 50

df_res = pd.concat([df_profile['symbol'], pd.DataFrame(rank, columns=['cosine'])], axis=1)
df_res = df_res.sort_values(by='cosine', ascending=False)
df_res = df_res[df_res['cosine'] > cosine_threshold].dropna()

# find related items
symbols = df_res['symbol'].values
res_symbols = list(np.copy(symbols))
res_rankings = list(np.copy(df_res['cosine'].values))
for symbol in symbols:
    df_row = df_sim.loc[symbol].sort_values(ascending=False)
    df_row = df_row[df_row > sim_threshold]
    for col in df_row.index:
        if isinstance(col, float): continue
        if len(res_symbols) > max_stocks: break
        res_symbols.append(col)
        res_rankings.append(cosine_threshold - 0.05)
    if len(res_symbols) > max_stocks: break
        
df_res = pd.DataFrame({'symbol': res_symbols, 'ranking': res_rankings})
df_res = pd.merge(df_res, df_profile, on='symbol').sort_values(by='ranking', ascending=False)
df_res

Unnamed: 0,ranking,symbol,beta,ceo,changes,changesPercentage,companyName,description,exchange,image,industry,lastDiv,mktCap,price,range,sector,volAvg,website
0,0.947852,AMN,0.747056,Susan R. Salka,-0.09,(-0.15%),AMN Healthcare Services Inc,AMN Healthcare Services Inc is a healthcare st...,New York Stock Exchange,https://financialmodelingprep.com/images-New-j...,Employment Services,0.0,2748525000.0,56.48,45.042-68.2,Industrials,522923,http://www.amnhealthcare.com
2,0.931755,PINC,0.475935,Susan D. DeVore,-0.01,(-0.03%),Premier Inc.,Premier Inc is a healthcare alliance. The Comp...,Nasdaq Global Select,https://financialmodelingprep.com/images-New-j...,Application Software,0.0,4992309000.0,36.35,28.81-47.22,Technology,664737,https://www.premierinc.com
4,0.928569,AZN,0.510477,Pascal Soriot,-0.05,(-0.12%),Astrazeneca PLC,AstraZeneca PLC belongs to the healthcare sect...,New York Stock Exchange,https://financialmodelingprep.com/images-New-j...,Drug Manufacturers,1.9,56552540000.0,44.64,34.38-43.295,Healthcare,3760723,http://www.astrazeneca.com
5,0.926393,CLNS,1.356758,Richard B. Saltzman,-0.0,(-0.08%),Colony NorthStar Inc.,Colony NorthStar Inc is a real estate and inve...,New York Stock Exchange,https://financialmodelingprep.com/images-New-j...,REITs,0.44,3411303000.0,6.41,5.28-14.74,Real Estate,10673907,http://www.clns.com
6,0.923644,EHC,0.716969,Mark J. Tarr,0.19,(+0.30%),Encompass Health Corporation,"Encompass Health Corp, formerly Healthsouth Co...",New York Stock Exchange,https://financialmodelingprep.com/images-New-j...,Health Care Providers,1.08,6430114000.0,64.43,56.2358-82.46,Healthcare,612132,http://www.healthsouth.com
7,0.922108,HR,0.521382,Todd Meredith,-0.11,(-0.33%),Healthcare Realty Trust Incorporated,Healthcare Realty Trust Inc is a healthcare fa...,New York Stock Exchange,https://financialmodelingprep.com/images-New-j...,REITs,1.2,4108822000.0,33.31,26.09-32.98,Real Estate,966319,http://www.healthcarerealty.com
8,0.921607,LPNT,0.520483,William F. Carpenter,0.0,(0.00%),LifePoint Health Inc.,LifePoint Health Inc is a healthcare company. ...,Nasdaq Global Select,https://financialmodelingprep.com/images-New-j...,Health Care Providers,0.0,2665000000.0,65.0,41.45-65.35,Healthcare,641564,http://www.lifepointhealth.net
31,0.87,NATI,0.800567,Alexander M. Davern,-0.12,(-0.26%),National Instruments Corporation,"National Instruments Corp designs, manufacture...",Nasdaq Global Select,https://financialmodelingprep.com/images-New-j...,Application Software,1.0,5790089000.0,43.45,38.78-51.53,Technology,597023,http://www.ni.com
40,0.87,STMP,0.329087,Kenneth McBride,-0.14,(-0.20%),Stamps.com Inc.,Stamps.com Inc provides internet-based postage...,Nasdaq Global Select,https://financialmodelingprep.com/images-New-j...,Application Software,0.0,1310150000.0,69.83,77.5688-285.745,Technology,450645,http://www.stamps.com
32,0.87,MANT,0.926468,Kevin M. Phillips,-0.39,(-0.56%),ManTech International Corporation,Mantech International Corp provides technologi...,Nasdaq Global Select,https://financialmodelingprep.com/images-New-j...,Application Software,1.08,2566941000.0,69.45,48.25-68.11,Technology,326193,http://www.mantech.com
