# Introduction to Recurrent Neural Networks (RNNs)

## Learning stock embeddings for price movement classification using bidirectional RNNs

In [None]:
#Import dependencies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import itertools

## Recurrent Neural Networks (RNNs) - An intro to sequence classification, and an aside on LSTMs

To get you started, we look at a simple example built specifically for RNNs - digit sequence classification. We define the problem as follows: assume you are an airline company running baggage checks at the airport. The maximum total baggaged weight allowed for the flight is $1570$ pounds. We classify a baggage weight with label $0$ if the cumulative sum so far is lesser than the threshold, and $1$ if it goes above the threshold. Also assume that the length of all sequences is $35$. For example:

Sequence: $500, 500, 500, 327, 294, 102, \ldots$  
Labels  : $0  , 0  , 0  , 1  , 1  , 1, \ldots$

We will also plot confusion matrices and accuracy plots to inform our decisions about the model.


### Data Generation

Generate sequence data using the functions defined below:

In [None]:
baggage_threshold = 1570
sequence_len = 35

def generate_sequence(sequence_len):
    data = np.random.choice(521, sequence_len) #generate random ints in the given range
    labels = np.array([0 if np.sum(data[: i+1]) <= 1570 else 1 for i in range(len(data))])
    
    return data, labels


In [None]:
#construct data matrix X and labels y

X = np.zeros((1000, 35))
y = np.zeros((1000, 35))

#TODO: your code here
    
#split into training and testing data using an 80/20 split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

### Models

Now, we train a simple RNN to classify our data! We will be using binary cross-entropy loss, which you will see later in this notebook as well.  Here is some starter code to help you out:

In [None]:
from keras.layers import SimpleRNN, LSTM, Bidirectional, TimeDistributed, Dense
from keras.models import Sequential

model_1 = Sequential()
model_1.add(SimpleRNN(20, input_shape = (35, 1), return_sequences = True))

#add an output layer (Hint: Look up sigmoid activation)
#TODO: your code here

#compile your model with a binary cross-entropy loss and fit it using X_train and y_train
#TODO: your code here

Now, let's check our model accuracy on the test set: 

In [None]:
#evaluate your model on the test set and report the test loss and accuracy
#TODO: your code here

In [None]:
#import seaborn to plot the confusion matrix
import seaborn as sn

In [None]:
#flatten predictions tensor into a list
pred = list(itertools.chain(*model_1.predict_classes(X_test.reshape(200, 35, 1))))
pred = [i[0] for i in pred]

In [None]:
#flatten y_test values into a list
actual = []
for row in range(len(y_test)): 
    actual += list(y_test[row])

In [None]:
#horizontal axis is predicted label, vertical axis is actual label
conf_mat = confusion_matrix(actual, pred, labels = [0, 1])
df_cm = pd.DataFrame(conf_mat)
df_cm

In [None]:
#code to plot the confusion matrix
sn.set(font_scale=1.4) # for label size
ax = sn.heatmap(df_cm, cmap="BuPu")
ax.set(xlabel='Predicted Label', ylabel='Actual Label')
plt.show()


Not bad! Although we have a high overall accuracy, the confusion matrix shows that there are many terms with an actual label of $0$ that are classifier predicts to be $1$. Let's see if we can do better using LSTMs (a form of RNNs, refer to the note for more details) - make a new model that uses an LSTM layer instead of a simple RNN. Notice that it makes sense to use LSTMs here because we would like our model to remember earlier baggage weights to classify future baggage weights in the sequence.

In [None]:
model_2 = Sequential()

#add an LSTM layer and a dense layer -- use 20 units for the LSTM (same as for the SimpleRNN)
#TODO: your code here

#compile your model with a binary cross-entropy loss and fit it using X_train and y_train
#TODO: your code here

In [None]:
#evaluate your model on the test set and report the test loss and accuracy
#TODO: your code here

In [None]:
#flatten predictions tensor into a list
pred = list(itertools.chain(*model_2.predict_classes(X_test.reshape(200, 35, 1))))
pred = [i[0] for i in pred]


In [None]:
#generate confusion matrix
conf_mat = confusion_matrix(actual, pred, labels = [0, 1])
df_cm = pd.DataFrame(conf_mat, range(2), range(2))
df_cm

In [None]:
#code to plot the confusion matrix
sn.set(font_scale=1.4) # for label size
ax = sn.heatmap(df_cm, cmap="BuPu")
ax.set(xlabel='Predicted Label', ylabel='Actual Label')
plt.show()


Even better! As you can see from the confusion matrix, there are fewer false positives, i.e. terms that are predicted to be under label $1$ but are actually under label $0$. Now let's try using a Bidirectional LSTM as our model:

In [None]:
model_2 = Sequential()

#add a Bidirectional LSTM layer and a dense layer -- use 20 units for the Bi-LSTM (same as for the SimpleRNN)
#TODO: your code here

#compile your model with a binary cross-entropy loss and fit it using X_train and y_train
#TODO: your code here

In [None]:
#evaluate your model on the test set and report the test loss and accuracy
#TODO: your code here

In [None]:
#flatten predictions tensor into a list
pred = list(itertools.chain(*model_3.predict_classes(X_test.reshape(200, 35, 1))))
pred = [i[0] for i in pred]

#generate confusion matrix
conf_mat = confusion_matrix(actual, pred, labels = [0, 1])
df_cm = pd.DataFrame(conf_mat, range(2), range(2))
df_cm


In [None]:
#code to plot the confusion matrix
sn.set(font_scale=1.4) # for label size
ax = sn.heatmap(df_cm, cmap="BuPu")
ax.set(xlabel='Predicted Label', ylabel='Actual Label')
plt.show()

### Plot accuracy over epochs for all three models

In [None]:
num_epochs = 20
    
train_losses_1 = #TODO: fill in
train_accuracies_1 = #TODO: fill in

train_losses_2 = #TODO: fill in
train_accuracies_2 = #TODO: fill in

train_losses_3 = #TODO: fill in
train_accuracies_3 = #TODO: fill in

plt.figure()
plt.title("Accuracies vs Epochs")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.plot(train_accuracies_1, label="RNN Accuracy")
plt.plot(train_accuracies_2, label="LSTM Accuracy")
plt.plot(train_accuracies_3, label="Bi-LSTM Accuracy")
plt.legend()
plt.show()

plt.figure()
plt.title("Losses vs Epochs")
plt.xlabel("Epochs")
plt.ylabel("Losses")
plt.plot(train_losses_1, label="RNN Losses")
plt.plot(train_losses_2, label="LSTM Losses")
plt.plot(train_losses_3, label="Bi-LSTM Losses")
plt.legend()
plt.show()


Note here that in the Bidirectional case, you may have either gotten a marginal increase in accuracy or a slight decrease in accuracy, depending on how your sequences were generated. While this example was specifically desgined to make RNNs and LSTMs shine, we hope this gives you an appreciation for the kinds of problems we can solve using Recurrent Networks, and how improvements can be made to the simple RNN architecture. Now, we will try to use RNNs to tackle a much harder problem - stock price classification!

## Using Bi-GRUs for price movement classification

For the purposes of this assignment, we will focus on training a classifier for stocks from the S&P 500. The goal of our classifier is as follows:
We are interested in training a bidirectional RNN model that learns a relationship between news taglines related to the stocks $\{l_i\}$ that we have selected and the prices of those stocks. Define $p_i^{(t)}$ to be the price of stock $l_i$ on day $t$. Then, we can formally define our objective as follows:

Let $y_i^{(t)} = \begin{cases} 1 & \log(p_i^{(t)}) \geq \log(p_i^{(t - 1)}) \\ 0 & \log(p_i^{(t)}) < \log(p_i^{(t - 1)}) \end{cases}$. Suppose our dataset $D = \{N^{(t)}\}_{t_{in} \leq t \leq t_f}$, where $N^{(t)}$ is a collection of all the articles from day $t$ and $t_{in}$ and $t_f$ represent the dates of the earliest and latest articles in our dataset resepctively. Then, we want to learn a mapping $\hat y_i^{(t)} = f(N^{(t - \mu)} \cup \ldots \cup N^{(t)})$ such that $\hat y_i^{(t)}$ accurately predicts $y_i^{(t)}$. More specifically, as is often the case with classification problems, we want to minimize the loss function given by the mean time-series cross-entropy loss:
$$\mathcal{L}_i = \frac{-1}{t_f - t_{in}} \sum_{t = t_{in}}^{t_f} \big(y_i^{(t)} \log \hat y_i^{(t)} + (1 - y_i^{(t)}) \log (1 - \hat y_i^{(t)}\big)$$
Here, we choose to use $\mu = 4$, so we aim to classify the price movement of stock $l_i$ on day $t$, given by $p_i^{(t)}$, using news information from days $[t-4, t]$, i.e., articles $\{N^{(t - 4)}, N^{(t - 3)}, N^{(t - 2)}, N^{(t - 1)}, N^{(t)}\}$. Notice that we are including information from day $t$, so we are not *predicting* the price movement but rather identifying a relationship between the stock price movement and the information contained in the news taglines from day $t$ and the previous 4 days.

## Generating word embeddings

The code below loads word embeddings that we have pre-generated for 15 stocks from the S&P 500. We used news tagline data from Reuters (data sourced from https://github.com/vedic-partap/Event-Driven-Stock-Prediction-using-Deep-Learning/blob/master/input/news_reuters.csv) to create word embeddings for all of the articles in our dataset using a pretrained Spacy encoder and a Word2Vec model that we trained on our data (don't worry if you don't know what this means yet). Our dataset contains news articles from 2011 to 2017 so we should have enough data to build a fairly accurate classifier. You will explore algorithms for generating word embeddings in more detail later in the course but for this assignment, we have done the work for you so that you can focus on building RNN models for your stock movement classifier.

For the purposes of our classifier, we are focusing on the 15 stocks from the Reuters dataset for which we have the most data, i.e., news articles.

<br>

The main idea is to convert all of the qualitative textual information that we have in each article tagline into a quantitative feature that we can use when training our classifier. Let $s_i \in \mathbb{R}^{64}$ represent the stock embedding that we are trying to learn for stock $l_i$. We then define the following quantities:

Let $n_i^{(t)}$ be a news article from day $t$, for some $1 \leq i \leq |N^{(t)}|$. We associate an embedding vector $K_i^{(t)} \in \mathbb{R}^{64}$ with each article $n_i^{(t)}$, which we have computed for you below.

In [None]:
data = pd.read_csv("embeddings.csv")
cols_to_include = ["Date", "Ticker", "Headline", "Tagline"] + ["K{}".format(i) for i in range(64)]
data = data[cols_to_include]
data

Here, each row represents a different news article and is associated with one of the top 15 stocks that we are interested in for our classifier: <br>
`['AAPL', 'AMZN', 'BA', 'BCS', 'BP', 'C', 'DB', 'GM', 'GS', 'HSEA', 'HSEB', 'JPM', 'MSFT', 'MS', 'TAPR']`.

Additionally, the columns `[K0, ..., K63]` represent the components of the $K_i^{(t)}$ embedding vector for each article $n_i^{(t)}$.

## Building a Bi-GRU price movement classifier

We define $score(n_i^{(t)}, s_j) = K_i^{(t)} \cdot s_j$ and the softmax variable $$\alpha_i^{(t)} = \frac{\exp(score(n_i^{(t)}, s_j))}{\sum_{n_k^{(t)} \in N^{(t)}}exp(score(n_k^{(t)}, s_j))}$$

Finally, we define the market status of stock $l_j$ on day $t$, given by $m_j^{(t)} = \sum_{n_i^{(t)} \in N^{(t)}} \alpha_i^{(t)} V_i^{(t)}$. This is the input to the classifier that you will build and train on the dataset to learn the stock embeddings $\{s_j\}$.

Here, we will go through the process of building, training, and tuning our model for a single stock. After this is done, we can easily repeat the process for other stocks.

### 1) a) Data processing

In [None]:
## do it for one stock, AAPL
aapl = data[data['Ticker'] == 'AAPL']
len(aapl)

In [None]:
## set kappa to be max number of articles for a given day
kappa = #TODO: fill in

In [None]:
## remove dates that have < 4 articles and set keep_indices to be the list of indices in aapl.index corresponding
## to days with >= 4 articles
keep_indices = #TODO: fill in

In [None]:
aapl_processed = aapl[keep_indices]

In [None]:
sorted_dates  = sorted(aapl_processed['Date'].unique())
num_sequences = len(sorted_dates[4:])
num_sequences

Now that we have processed our data to include only robust inputs, let's do a quick refresher of what your initial input to the neural network is supposed to look like, and what dimensions it will have. Our key vectors for day $t$ are  $K_i^{(t)} \in \mathbb{R}^{64}$, and we have at most $\kappa$ articles per day. Thus, for any given day $t$, we can treat the input as $\begin{bmatrix} K_1^{(t)} & \cdots & K_\kappa^{(t)} \end{bmatrix} \in \mathbb{R}^{64 \times \kappa}$. Since our network uses five market vectors ($m^{(t - 4)}, \ldots, m^{(t)}$) for predicting stock price movement on any given day, we must pass in a sequence of $\kappa \cdot 5$ key vectors. 

So each input looks like $\begin{bmatrix} K_1^{(t - 4)} & \cdots & K_\kappa^{(t - 4)} & \cdots \cdots & K_1^{(t)} & \cdots & K_\kappa^{(t)} \end{bmatrix} \in \mathbb{R}^{64 \times 5\kappa}$. Then, assuming we have $k$ such datapoints (or in our case, $5$ day sequences) in our training dataset, our input is thus:
$\begin{bmatrix} 
K_1^{(t_1 - 4)} & \cdots & K_\kappa^{(t_1 - 4)} & \cdots \cdots & K_1^{(t_1)} & \cdots & K_\kappa^{(t_1)} \\
\vdots & & \vdots & & \vdots & & \vdots \\
K_1^{(t_k - 4)} & \cdots & K_\kappa^{(t_k - 4)} & \cdots \cdots & K_1^{(t_k)} & \cdots & K_\kappa^{(t_k)}
\end{bmatrix} \in \mathbb{R}^{64k \times 5\kappa}$
where each row represents a different sequence of $5$ days for the stock key vectors.

In [None]:
k = num_sequences
X_in = np.zeros((64*k, 5*kappa))

In [None]:
#TODO: fill in values for the elements of X_in, as described above        
    

In [None]:
X_in

In [None]:
X_in.shape

### 1) b) Generating classifier labels

The classifier aims to predict price movement -- in this notebook, we define movement in terms of log returns, the difference in log prices for a given day $t$, and a preceding day, $t-1$. Note that due to the nature of the dataset, we do not have a perfectly contiguous sequence of days, however, we use the next best approximation (i.e. $t-2$, if it is available, in place of $t-1$). Using these log returns, we binarize price movement: if returns are $> 0$ on day $t$, then $y^{(t)} = 1$, else $0$. 

You can find historical stock price data on http://finance.yahoo.com, and use close prices to calculate log returns.

For non-trading days, feel free to use any method to fill in the missing data - we generated labels for missing days by doing a coin toss (while this may seem arbitrary, note that the stock market itself moves in what seems like arbitrarily random directions). 

In [None]:
prices = pd.read_csv('AAPL.csv')
prices['log_returns'] = np.log(prices['Close']) - np.log(prices['Close'].shift(1))

In [None]:
#change date format
prices['Date'] = prices['Date'].apply(lambda x: int(x.replace('-', '')))

In [None]:
prices.head()

In [None]:
#generate labels for each of the k sequences for the AAPL stock
labels = []
for date in sorted_dates[4:]:
    log_ret = prices[prices['Date'] == date]['log_returns'].values
    
    if (len(log_ret) == 0): 
        labels.append(np.random.choice([1, 0]))
        continue
    
    if (log_ret[0]) > 0: 
        labels.append(1)
    else: 
        labels.append(0)
    

### 2) Building the classifier model

Before building our main classifier, recall that we need to do some preprocessing to get from our input above to the market vectors that are being used in the classifier. The documentation for tensorflow and Keras will help a lot with some of the manipulations required for this section. In particular, the documentation for `Dense` layers, `GRU` layers, `Activation` layers, `Bidirectional` layers, and `keras.activations.softmax` (which we have imported for you) may be helpful. The outline of the necessary preprocessing steps is as follows:

1. We want our input layer to be a tensor with shape $(k, 5, 64 \kappa)$ <br>
See how you can modify `X_in` $= \begin{bmatrix} 
K_1^{(t_1 - 4)} & \cdots & K_\kappa^{(t_1 - 4)} & \cdots \cdots & K_1^{(t_1)} & \cdots & K_\kappa^{(t_1)} \\
\vdots & & \vdots & & \vdots & & \vdots \\
K_1^{(t_k - 4)} & \cdots & K_\kappa^{(t_k - 4)} & \cdots \cdots & K_1^{(t_k)} & \cdots & K_\kappa^{(t_k)}
\end{bmatrix} \in \mathbb{R}^{64k \times 5\kappa}$ from above to achieve this.
<br>

2. By treating the stock embedding $s$ as a weight from the input layer to the first hidden layer, generate layer 1 of $score$ values of shape $(k, 5, \kappa)$. We can do this because $score_i^{(t)}$ is defined as $K_i^{(t)} \cdot s$, which aligns with the way weights act in neural networks. <br>
*Hint:* Remember that we use `Dense` layers in Keras to represent a regular feedforward layer. 
<br>

3. Apply softmax activations appropriately to generate layer 2 with the $\alpha$ values, also of shape $(k, 5, \kappa)$. <br>
*Hint:* Remember, we define $\alpha_i^{(t)} = \displaystyle \frac{\exp(score_i^{(t)})}{\sum_{j \in [\kappa], j \neq i}exp(score_j^{(t)})}$. You can also combine this step with the previous one if you wish.
<br>

4. Now, implement the Bi-GRU classifier model with $1024$ recurrent units. We will experiment with the number of units later when we conduct our hyperparameter tuning. Remember that for each sequence of 5 vectors that we input to the GRU, we are only interested in the prediction for the last one. That is, given $[\alpha_1^{(t - 4)}, \ldots, \alpha_\kappa^{(t - 4)}], \ldots, [\alpha_1^{(t)}, \ldots, \alpha_\kappa^{(t)}]$, we only care about what the GRU predicts for day $t$ since that is the movement that we are trying to classify.
<br>

5. Finally, apply a dense layer with a sigmoid activation to get a single value $\hat y^{(t)}$ from the GRU output.
<br>

6. Train your model for $10$ epochs against the cross-entropy loss and report the train and validation loss after each epoch.

In [None]:
from keras import backend as K
from keras.layers import Activation, Input, Dense, GRU, Bidirectional, Lambda
from keras.models import Model
from keras.layers.merge import Concatenate
from keras.activations import softmax

#### Step 1

In [None]:
#transform the input X_in from above into an appropriate tensor of shape (k, 5, 64kappa)

X_in_mod = np.zeros((k, 5, 64 * kappa))

#TODO: fill in values for elements of X_in_mod

In [None]:
#generate train, test, and validation sets from X_in_mod and labels. Use an 80-10-10 split for train-val-test.

X_train, X_val, y_train, y_val = #TODO: fill in
X_val, X_test, y_val, y_test = #TODO: fill in

In [None]:
#TODO: create an input layer of shape (None, 5, 64kappa)


#### Step 2 and 3

In [None]:
#TODO: implement and add layer 1 and layer 2


#### Step 4

In [None]:
#TODO: add a bidirectional GRU with 1024 units


#### Step 5

In [None]:
#TODO: create the output layer


#### Step 6

In [None]:
#TODO: compile and train your model for 10 epochs


### 3) Hyperparameter tuning

In this section, we will treat the number of GRU units and number of training epochs as hyperparameters and attempt to find optimal values for them.

First, make a plot of the log of train accuracy and validation accuracy against number of epochs for the model that we have built. Also plot the log of train loss and validation loss against number of epochs. Consider number of epochs in the range $[1, 20]$.

In [None]:
num_epochs = 20
    

train_losses = #TODO: fill in
val_losses = #TODO: fill in
train_accuracies = #TODO: fill in
val_accuracies = #TODO: fill in

plt.figure()
plt.title("Accuracies vs Epochs")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.plot(train_accuracies, label="Train")
plt.plot(val_accuracies, label="Val")
plt.legend()
plt.show()

plt.figure()
plt.title("Losses vs Epochs")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.plot(train_losses, label="Train")
plt.plot(val_losses, label="Val")
plt.legend()
plt.show()

Based on these plots, suggest a good value for the number of epochs by filling out the cell below.

In [None]:
num_epochs = #TODO: fill in

Using the optimal value for the number of epochs that you found above, train models with varying numbers of GRU units in the Bi-GRU layer. Make a plot of train accuracy and validation accuracy against number of GRU units and a plot of train loss and validation loss against number of GRU units. Use `[100, 200, 500, 1000, 2000]` as the values for the number of units that you are going to loop over.

In [None]:
unit_vals = [100, 200, 500, 1000, 2000]

train_losses = []
val_losses = []
train_accuracies = []
val_accuracies = []

#TODO: fill in train_losses, val_losses, train_accuracies, and val_accuracies
    
plt.figure()
plt.title("Accuracies vs Num GRU Units")
plt.xlabel("Num GRU Units")
plt.ylabel("Accuracy")
plt.plot(unit_vals, train_accuracies, label="Train")
plt.plot(unit_vals, val_accuracies, label="Val")
plt.legend()
plt.show()

plt.figure()
plt.title("Losses vs Num GRU Units")
plt.xlabel("Num GRU Units")
plt.ylabel("Loss")
plt.plot(unit_vals, train_losses, label="Train")
plt.plot(unit_vals, val_losses, label="Val")
plt.legend()
plt.show()

Based on these plots, suggest a good value for the number of GRU units by filling out the cell below.

In [None]:
num_GRU_units = #TODO: fill in

### 4) Evaluate

Finally, train a model with the hyperparameter values that you have obtained and evaluate it on the test set! Report the test loss and accuracy.

In [None]:
#build, compile, and fit your model on X_train, y_train
#TODO

#evaluate your model on X_test, y_test and report the loss and accuracy
#TODO

print()

print("Test loss = ", loss)
print("Test accuracy = ", acc)

As you can see, our model is not doing particularly well on the test set so this method is probably not going to be very useful for price movement classification as it currently stands. Of course, there could be several reasons for the subpar performance, including but not limited to: insufficient training data, poor choice of model, poor quality of word embeddings, and bad choice of sequence length (5 days). Or perhaps, stock price classification/prediction isn't a viable goal, regardless of which neural network architecture we choose (also known asthe Random Walk Hypothesis for stock markets). Despite this unfortunate outcome, we hope that you learned something about RNNs and how they can be applied to a field like finance. Feel free to play around with these parameters after completing the assignment to see if you can get a model worthy of use in the quantitative finance industry! 

### 5) Extract stock embeddings

Although classification didn't work, let's see if the stock embeddings that we learned are of any use. Remember that the stock embeddings are the weights being used in Layer 1. The weights in Layer 1 are of shape $(64 \kappa, \kappa)$ but we are interested in obtaining a single vector $s \in \mathbb{R}^{64}$ for the stock embedding. So if the weights in Layer 1 are $\begin{bmatrix} s[1, 1] & \ldots & s[1, \kappa] \\ \vdots & & \vdots \\ s[\kappa, 1] & \ldots & s[\kappa, \kappa] \end{bmatrix}$, define the stock embedding $s$ to be the mean of $s[1, 1], \ldots, s[1, \kappa], \ldots, s[\kappa, 1], \ldots, s[\kappa, \kappa]$. Compute this mean below.

In [None]:
s_mean_aapl = np.zeros(64)

s_mat = model.layers[1].get_weights()[0]
col_mean = np.zeros(64 * kappa)
for i in range(0, 64 * kappa, 64):
    col_mean[i : i + 64] = sum([s_mat[i : i + 64, j] for j in range(kappa)]) / kappa
for i in range(0, 64 * kappa, 64):
    s_mean_aapl += col_mean[i : i + 64] / kappa
s_mean_aapl

## Classify some more stocks!

So classifying the stock price movement for `AAPL` didn't go so well but maybe that just has something to do with Apple's stock price being unpredictable. Let's try the next stock in our dataset: another big tech company, `AMZN`. We have loaded the data for you below and you can refer to the previous sections for help with implementing and training the model. Do an 80-10-10 train, validation, and test split and report the test accuracy and loss as before.

In [None]:
prices_amzn = pd.read_csv('AMZN.csv')

In [None]:
## do it for one stock, AMZN
amzn = data[data['Ticker'] == 'AMZN']
len(amzn)

In [None]:
#TODO: build, compile, and fit the model for classifying price movement for AMZN

In [None]:
#TODO: evaluate on the test set and report test accuracy and loss

Note that classifying Amazon seems to give similar results to what we had previously for Apple. However, let's take a look at the stock embedding for Amazon, and if we were able to encode any of the stock's price volatility into the vector. We will also compare it to the embedding we got for Apple - specifically, we're looking to see whether the embeddings capture the price movement correlation between the two stocks. This is also an important piece of information that can help build optimized portfolios that are 'hedged' against price volatility risk. 

Report the stock embedding for Amazon below:

In [None]:
s_mean_amzn = np.zeros(64)

#TODO: compute s_mean for AMZN

Usually, for portfolio risk calculations, we use historical price returns to calculate the price movement correlation between two (or more) stocks. Let's see if our stock embeddings are a viable substitute for this! Typically, we compare the similarity between two embeddings by taking the dot product - calculate the dot product of the `AMZN` vector and the `AAPL` vector below: 

In [None]:
#TODO: compute the dot product between the stock embeddings for AAPL and AMZN


Are the stock prices positively or negatively correlated? Does this match the actual price movement graph from Yahoo finance pictured below?

![image.png](attachment:image.png)

## References

- How to Develop a Bidirectional LSTM For Sequence Classification in Python with Keras: <br> https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras/

- Tensorflow vs PyTorch for Text Classification using GRU: <br>
https://medium.com/swlh/tensorflow-vs-pytorch-for-text-classification-using-gru-e95f1b68fa2d

- Yahoo Finance: <br>
https://finance.yahoo.com

- Reuters News Dataset: <br>
https://github.com/vedic-partap/Event-Driven-Stock-Prediction-using-Deep-Learning/blob/master/input/news_reuters.csv

- Stock embeddings acquired from news articles and price history, andan application to portfolio optimization: <br>
https://www.aclweb.org/anthology/2020.acl-main.307