# Creating a nerual network to predict AAPL red or green days
This version considers 
* price at open
* highest traded price on day
* lowest traded price on day
* price at close (raw)
* adjusted close (accounting for after-market actions)
* trading volume on day

**Goal**: Given a list of the above features of a security, we wish for the neural network to predict a red or green (current) day from a list of 10 previous days. In this model, green days are exclusively upward price movement.


## Get data
Downloading historical data from yahoo finance

In [1]:
import yfinance as yf

We consider market data from 2014-2018

In [36]:
start_date = '2014-01-01'
end_date = '2018-12-31'
aapl = yf.download('AAPL', start=start_date, end=end_date, progress=False)

### Preprocessing data

In [37]:
import numpy as np
from sklearn import preprocessing

In [38]:
training_data = []
num_prev_days = 10
def create_training_data(df):
    prepped = []
    for i in range(0,len(df) - num_prev_days - 1):
        normed = preprocessing.normalize(df[i:i+num_prev_days])
        delta = df['Close'][i+num_prev_days] - df['Close'][i+num_prev_days-1]
        result = ([0,1], [1,0]) [delta > 0]
        prepped.append([normed.tolist(), result])
    return prepped

In [39]:
training_data = create_training_data(aapl)

### Randomize

In [40]:
import random
random.shuffle(training_data)

In [41]:
X = []
y = []
for features, label in training_data:
    X.append(features)
    y.append(label)

## Creating model

In [42]:
import tensorflow as tf
x_train = X
y_train = y

### Layers
* Input: flatten
* Hidden: 1 layer, 10 neurons, rectified linear unit activation function
* Output: softmax

Notes: I used the same configuration as my last build on XLY.

In [43]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(10, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(2, activation=tf.nn.softmax))

### Optimizer and loss function
* adam - This is a stochastic gradient descent method, based on an "adaptive estimation of first-order and second-order moments." (I may write a walkthrough on SGD. Given a random variable $X$ and integer $k>0$, $k$-th moments are $\mathbb{E}(x^k)$.)
* binary crossentropy - This loss function is useful in binary classification. I am not using it to its full potential in this model, but binary crossentropy can be very helpful when we wish to train multiple binary classifiers.
    * The specific formula for calculating loss is given below.
    * $\mathrm{Loss} = - \frac{1}{\mathrm{output \atop size}} \sum_{i=1}^{\mathrm{output \atop size}} y_i \cdot \mathrm{log}\; {\hat{y}}_i + (1-y_i) \cdot \mathrm{log}\; (1-{\hat{y}}_i)$

In [44]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7fa5ca5630a0>

## Evaluating model from prices on more recent days
### Evaluation 1
Using closing prices from the most recent year

In [45]:
import yfinance as yf
from datetime import date
# from dateutil.relativedelta import relativedelta

In [46]:
new1_aapl = yf.download('AAPL',
start='2020-01-01',
end=date.today(),
progress=False)

In [47]:
val1 = create_training_data(new1_aapl)
X_eval1 = []
y_eval1 = []
for features, label in val1:
    X_eval1.append(features)
    y_eval1.append(label)

In [48]:
val1_loss, val1_acc = model.evaluate(X_eval1, y_eval1)
print(val1_loss)
print(val1_acc)

0.6918272376060486
0.5274389982223511


### Evaluation 2
Using closing prices from 2019

In [49]:
new2_aapl = yf.download('AAPL',
start='2019-01-01',
end='2020-01-01',
progress=False)

In [50]:
val2 = create_training_data(new2_aapl)
X_eval2 = []
y_eval2 = []
for features,label in val2:
    X_eval2.append(features)
    y_eval2.append(label)

In [51]:
val2_loss, val2_acc = model.evaluate(X_eval2, y_eval2)
print(val2_loss)
print(val2_acc)

0.6883175373077393
0.5767635107040405


### Conclusions
Considering this past year's technology market, it was not unexpected to see worse performance on the evaluation from this past year's data. Running an evaluation on an earlier year (2019) did show improvement. Comparing this with my previous work on XLY, I still believe increasing the dimensionality of the data will improve neural network models. In future builds, I wish to add features that are not directly related to price. With increased dimensionality, I will also be studying PCA to help with the preprocessing.

### Future steps
I am currently looking at dynamic pricing models to study different perspectives on how we should consider securities. Additionally, I am interested in the portfolio management problem, and how neural networks may help us choose our actions.