# Introduction to Recurrent Neural Networks (RNNs)

## Learning stock embeddings for portfolio optimization using bidirectional RNNs

In [1]:
#Import dependencies
import numpy as np
import pandas as pd
import matplotlib as plt
import tensorflow as tf

## Recurrent Neural Networks (RNNs)

## Bidirectional Gated Recurrent Units (Bi-GRUs)

## Using Bi-GRUs for price movement classification

For the purposes of this assignment, we will focus on training a classifier for 15 stocks from the S&P 500. The goal of our classifier is as follows:
We are interested in training a bidirectional RNN model that learns a relationship between news taglines related to the 15 stocks $\{l_1, \ldots, l_{15}\}$ that we have selected and the prices of those stocks. Define $p_i^{(t)}$ to be the price of stock $l_i$ on day $t$. Then, we can formally define our objective as follows:

Let $y_i^{(t)} = \begin{cases} 1 & p_i^{(t)} \geq p_i^{(t - 1)} \\ 0 & p_i^{(t)} < p_i^{(t - 1)} \end{cases}$. Suppose our dataset $D = \{N^{(t)}\}_{t_{in} \leq t \leq t_f}$, where $N^{(t)}$ is a collection of all the articles from day $t$ and $t_{in}$ and $t_f$ represent the dates of the earliest and latest articles in our dataset resepctively. Then, we want to learn a mapping $\hat y_i^{(t)} = f(N^{(t - \mu)} \cup \ldots \cup N^{(t)})$ such that $\hat y_i^{(t)}$ accurately predicts $y_i^{(t)}$. More specifically, as is often the case with classification problems, we want to minimize the loss function given by the mean cross-entropy loss for all $15$ stocks:
$$\mathcal{L} = \frac{1}{15} \sum_{i = 1}^{15} \mathcal{L}_i = \frac{1}{15} \sum_{i = 1}^{15} \left( \frac{-1}{t_f - t_{in}} \sum_{t = t_{in}}^{t_f} \big(y_i^{(t)} \log \hat y_i^{(t)} + (1 - y_i^{(t)}) \log (1 - \hat y_i^{(t)}) \right)$$
Here, we choose to use $\mu = 4$, so we aim to classify the price movement of stock $l_i$ on day $t$, given by $p_i^{(t)}$, using news information from days $[t-4, t]$, i.e., articles $\{N^{(t - 4)}, N^{(t - 3)}, N^{(t - 2)}, N^{(t - 1)}, N^{(t)}\}$. Notice that we are including information from day $t$, so we are not *predicting* the price movement but rather identifying a relationship between the stock price movement and the information contained in the news taglines from day $t$ and the previous 4 days.

## Generating word embeddings

The code below loads word embeddings that we have pre-generated for 15 stocks from the S&P 500. We used news tagline data from Reuters (data sourced from https://github.com/vedic-partap/Event-Driven-Stock-Prediction-using-Deep-Learning/blob/master/input/news_reuters.csv) to create word embeddings for all of the articles in our dataset using a pretrained Spacy encoder and a Word2Vec model that we trained on our data (don't worry if you don't know what this means yet). Our dataset contains news articles from 2011 to 2017 so we should have enough data to build a fairly accurate classifier. You will explore algorithms for generating word embeddings in more detail later in the course but for this assignment, we have done the work for you so that you can focus on building RNN models for your stock movement classifier.

For the purposes of our classifier, we are focusing on the 15 stocks from the Reuters dataset for which we have the most data, i.e., news articles.

<br>

The main idea is to convert all of the qualitative textual information that we have in each article tagline into a quantitative feature that we can use when training our classifier. Let $s_i \in \mathbb{R}^{64}$ represent the stock embedding that we are trying to learn for stock $l_i$. We then define the following quantities:

Let $n_i^{(t)}$ be a news article from day $t$, for some $1 \leq i \leq |N^{(t)}|$. We associate 2 embedding vectors $K_i^{(t)} \in \mathbb{R}^{64}$ and $V_i^{(t)} \in \mathbb{R}^{300}$ with the article $n_i^{(t)}$, which we have computed for you below. We define $score(n_i^{(t)}, s_j) = K_i^{(t)} \cdot s_j$ and the softmax variable $$\alpha_i^{(t)} = \frac{\exp(score(n_i^{(t)}, s_j)}{\sum_{n_k^{(t)} \in N^{(t)}}exp(score(n_k^{(t)}, s_j))}$$

Finally, we define the market status of stock $m_j$ on day $t$, given by $m_j^{(t)} = \sum_{n_i^{(t)} \in N^{(t)}} \alpha_i^{(t)} V_i^{(t)}$. This is the input to the classifier that you will build and train on the dataset to learn the stock embeddings $\{s_j\}_{1 \leq j \leq 15}$.

In [2]:
data = pd.read_csv("embeddings.csv")
data

Unnamed: 0,index,Ticker,Name,Date,Headline,Tagline,Rating,K0,K1,K2,...,V290,V291,V292,V293,V294,V295,V296,V297,V298,V299
0,1074,AAPL,1-800 FLOWERSCOM Inc,20140414,Apple antitrust compliance off to a promising ...,"NEW YORK Apple Inc has made a ""promising start...",topStory,0.728133,0.074376,-0.844244,...,-0.184006,0.032116,0.032128,-0.045440,0.027079,-0.100620,0.032597,-0.092093,0.048542,0.109286
1,1075,AAPL,1-800 FLOWERSCOM Inc,20140414,Apple antitrust compliance off to a promising ...,"NEW YORK April 14 Apple Inc has made a ""promi...",normal,0.757790,0.111567,-0.802569,...,-0.168789,0.039603,0.021292,-0.036883,0.029685,-0.110353,0.025347,-0.084554,0.045670,0.105747
2,1076,AAPL,1-800 FLOWERSCOM Inc,20140414,COLUMN-How to avoid the trouble coming to the ...,(The opinions expressed here are those of the ...,normal,-0.624152,-0.346050,-1.487509,...,-0.141506,-0.027039,-0.080825,-0.133556,0.018669,-0.056828,-0.052640,-0.169819,-0.033054,0.053817
3,1077,AAPL,1-800 FLOWERSCOM Inc,20140414,How to avoid the trouble coming to the tech se...,CHICAGO A resounding shot across the bow has b...,normal,0.387120,-0.099557,-0.590867,...,-0.233473,0.095700,0.113241,-0.027537,-0.119434,-0.074786,-0.072007,-0.049933,0.014863,0.063664
4,1078,AAPL,1-800 FLOWERSCOM Inc,20140415,Apple cannot escape U.S. states' e-book antitr...,NEW YORK Apple Inc on Tuesday lost an attempt ...,normal,0.824634,-1.637257,-0.352775,...,-0.232241,0.027836,-0.025965,0.036613,-0.087056,-0.103006,0.076729,-0.153311,0.038894,0.138866
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50787,184859,TAPR,Barclays Inverse US Treasury Composite ETN,20170209,BRIEF-Ultra Petroleum says Barclays agreed to ...,* Ultra Petroleum- on Feb 8 in connection wit...,normal,1.139437,0.682006,0.029171,...,-0.244216,0.053853,-0.008725,-0.048169,-0.032766,-0.062842,-0.059161,-0.104091,0.010547,0.129130
50788,184860,TAPR,Barclays Inverse US Treasury Composite ETN,20170209,MOVES-Barclays Nasdaq RenCap AXA BC Partners,Feb 9 The following financial services industr...,topStory,1.017802,-0.165982,-0.467275,...,-0.234947,0.049924,0.064670,0.022008,0.025572,-0.144732,-0.046366,-0.030195,-0.027131,0.093039
50789,184861,TAPR,Barclays Inverse US Treasury Composite ETN,20170217,Barclays Citi gave South Africa watchdog info...,JOHANNESBURG Feb 17 Barclays Plc and Citigrou...,normal,1.044449,-0.042930,0.201579,...,-0.234416,-0.001098,-0.035648,-0.053637,0.030076,-0.037331,0.048593,-0.019262,-0.030251,0.178724
50790,184862,TAPR,Barclays Inverse US Treasury Composite ETN,20170217,Barclays Citi helped South Africa with forex ...,JOHANNESBURG Barclays Plc and Citigroup appr...,topStory,1.288937,-0.372697,0.197727,...,-0.247672,0.049712,0.028656,-0.078167,0.047243,0.061589,0.016127,-0.073754,-0.011532,0.154577


Here, each row represents a different news article and is associated with one of the top 15 stocks that we are interested in for our classifier: <br>
`['AAPL', 'AMZN', 'BA', 'BCS', 'BP', 'C', 'DB', 'GM', 'GS', 'HSEA', 'HSEB', 'JPM', 'MSFT', 'MS', 'TAPR']`.

Additionally, the columns `[K0, ..., K63]` represent the components of the $K_i^{(t)}$ embedding vector and the columns `[V0, ..., V299]` represent the components of the $V_i^{(t)}$ embedding vector for each article $n_i^{(t)}$.

## Testing our Word Embeddings

## Building a Bi-GRU price movement classifier

### 1) Data processing

In [3]:
## do it for one stock, AAPL
aapl = data[data['Ticker'] == 'AAPL']
len(aapl)

6674

In [4]:
## set kappa to be max number of articles for a given day
kappa = np.max(aapl.groupby('Date').count()['index'])
kappa

12

In [5]:
## remove dates that have < 4 articles, i.e. kappa = 4
drop_dates = set(aapl['Date'].unique()[(aapl.groupby('Date').count()['index'] < 4)])
drop_indices = [not aapl['Date'][i] in drop_dates for i in range(len(aapl))]

In [6]:
## now all dates have 4 <= i < 12 articles
aapl_processed = aapl[drop_indices]

In [7]:
sorted_dates  = sorted(aapl_processed['Date'].unique())
num_sequences = len(sorted_dates[4:])
num_sequences

792

In [8]:
np.array(data.iloc[:3, data.columns.get_loc('K0') : data.columns.get_loc('K63') + 1]).T.shape

(64, 3)

Now that we have processed our data to include only robust inputs, let's do a quick refresher of what your initial input to the neural network is supposed to look like, and what dimensions it will have. Our key vectors are  $K_i^{(t)} \in \mathbb{R}^{64}$, and we have at most $\kappa$ articles per day, i.e. for any given day, the inputs are $ \in $  $ \mathbb{R}^{64 * \kappa}$. Since our network uses five market vectors for predicting stock price movement on any given day, and we have $ k $ overall market vector sequences that we are considering, the dimensions of our input matrix (flattened) are $\mathbb 64k * 5\kappa$

In [9]:
input_mat = np.zeros((64*num_sequences, 5*kappa))

In [10]:
#iterate through rows, step size of 64 to account for size of key vectors
for i in range(0, 64*num_sequences, 64): 
    dates = sorted_dates[i : i + 5]
    
    #counter to keep track of column index
    counter = 0
    for date in dates: 
        df = aapl_processed[aapl_processed['Date'] == date]
        sub_mat = np.array(df.iloc[:, df.columns.get_loc('K0') : df.columns.get_loc('K63') + 1]).T
        input_mat[i : i + 64, counter : counter + sub_mat.shape[1]] = sub_mat
        
        #increment by kappa to go to next day in sequence
        counter += kappa
        
    

In [11]:
64*num_sequences

50688

In [12]:
input_mat.shape

(50688, 60)

### 2) Building the model

In [33]:
from keras import backend as K

def custom_sum(x):
    linear_sum = []
    alpha_weights = x[:kappa]
    value_vecs = np.array(x[kappa:])
    
    value_vecs = value_vecs.reshape(300, kappa)
    
    for i in range(alpha_weights): 
        linear_sum += alpha_weights[i] * value_vecs[:, i]
        
    return linear_sum
    

In [16]:
from keras.layers import Activation, Input, Dense, GRU, Bidirectional
from keras.models import Model
from keras.layers.merge import Concatenate

In [17]:
x = Input(shape = input_mat.shape)

In [20]:
scores = []
for i in range(num_sequences*5): 
    scores.append(Dense(kappa)(x))

In [40]:
alphas = [Activation('softmax')(x) for x in scores]

In [49]:
linear_sums = []
for i in range(num_sequences*5): 
    df = aapl_processed.loc[i:i+kappa, :]
    value_vecs = np.array(df.iloc[:, df.columns.get_loc('V0') : df.columns.get_loc('V299') + 1]).T
    print(value_vecs.shape)
    value_vecs = value_vecs.reshape(value_vecs.shape[0] * value_vecs.shape[1], 1)
    
    alphas_arr = alphas[i]
    
    inp = tf.concat(alphas_arr, value_vecs) 
    print(inp.shape)
    #need to check how to pass multiple inputs into Dense layer
    linear_sums.append(Activation(custom_sum)(inp))

(300, 13)


ValueError: Shape (3900, 1) must have rank 0

In [None]:
flattened_alphas = Concatenate([linear_sums[i] for i in range(len(linear_sums))])
bigru = Bidirectional(GRU(num_sequences, activation = 'relu'))(flattened_alphas)
pred = Dense(num_sequences, activation = 'sigmoid')(bigru)

model = Model(inputs = x, outputs = pred)