# Creating a nerual network to predict XLY red or green days
Notes: this version only considers closing price

Goal: Given a list of prices at closing of a security, we wish for the neural network to predict a red or green (current) day from a list of 10 previous days. In this model, green days are exclusively positive movement.
## Get your data
We process the data from Kaggle: https://www.kaggle.com/borismarjanovic/price-volume-data-for-all-us-stocks-etfs

In [14]:
import numpy as np
import pandas as pd
import matplotlib as mpl
from sklearn.preprocessing import MinMaxScaler

In [15]:
source_link = "https://storage.googleapis.com/kagglesdsdata/datasets/4538/7213/ETFs/xly.us.txt?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20210507%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20210507T152440Z&X-Goog-Expires=259199&X-Goog-SignedHeaders=host&X-Goog-Signature=9946732343dcb4f09e75a07f630326b51807e6cb4a13f39e7591e117564b3d721b04fd58fee9b055cf6158d4e3c5f5c22096d54bfbcf300a18ae211f7b3837775523fe2600b82262af2cfba2b9419f83f6d7c92902e1d9d8a039a7e8025247626b6b80af8bedda638221bef4ecea5f81c723cb1eee7b68655e42ec9140f206ba4232a0565524c7e9fe7f6ec043ae2372863f73cea9e85efc35f47066e6b78b6b47d3a5bdf30f1e45af3400d77ef05e4502fa6bcd3fd22404791a06a2264ae000d817f0bac9ddd423936f232cb565f5448455467c6258eb8ae74bec4eaad0a2b4fd029895f1b19e2f3df405c26ba66653cce060d75f6ba23c6cc4497d9e9ee48a"

In [16]:
xly = pd.read_csv(source_link)
xly.set_index('Date', inplace=True)
print(xly.index.min(), xly.index.max())

2005-02-25 2017-11-10


In [17]:
xly.drop(['Open', 'High', 'Low', 'OpenInt'], axis=1, inplace=True)
xly.dropna(inplace=True)

## Prepping data for training

### Normalize
We set price data to be percent change from first closing price, with first closing price set to 0.

In [18]:
def normalize(arr):
    scaling_factor = arr[0]**(-1)
    answer = [round(i*scaling_factor - 1,8) for i in arr]
    return answer

### Creating training data

Goal: Given a list of prices at closing of a security, we wish for the neural network to predict a red or green (current) day from a list of 10 previous days. In this model, green days are exclusively positive movement.

In [19]:
training_data = []
num_prev_days = 10
def create_training_data(df):
    prepped = []
    for i in range(0,len(df) - num_prev_days - 1):
        prev_closes = (list) (df['Close'][i:(i+num_prev_days)])
        prev_closes = normalize(prev_closes)
        result = 0
        diff = df['Close'][i+num_prev_days] - df['Close'][i+num_prev_days-1]
        if diff > 0:
            result = 1
        prepped.append([prev_closes, result])
    return prepped

In [20]:
training_data = create_training_data(xly)

### Randomize

In [21]:
import random
random.shuffle(training_data)

In [22]:
X = []
y = []
for features, label in training_data:
    X.append(features)
    y.append(label)

## Creating model

In [59]:
import tensorflow as tf
x_train = X
y_train = y

### Layers
* Input: flatten
* Hidden: 1 layer, 10 neurons, rectified linear unit activation function
* Output: softmax

Notes: This configuration was marginally better than the other ones I have tested. Using more than one hidden layer seemed to cause overfitting, and 10 neurons was tested to be sufficient.

In [66]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(10, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(2, activation=tf.nn.softmax))

In [67]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7ff74957d340>

## Evaluating model from prices on more recent days
Using closing prices from the most recent year

In [68]:
import yfinance as yf
from datetime import date
from dateutil.relativedelta import relativedelta

In [69]:
last_year = date.today() - relativedelta(months=12)
new_xly = yf.download('XLY',
start=last_year,
end=date.today(),
progress=False)
new_xly.drop(['Open', 'High', 'Low', 'Adj Close','Volume'], axis=1, inplace=True)
new_xly.dropna(inplace=True)

In [70]:
val = create_training_data(new_xly)
X_eval = []
y_eval = []
for features, label in val:
    X_eval.append(features)
    y_eval.append(label)

### Conclusions
Results shown indicate a promising start. Neural networks tend to do well with larger, more complex inputs. Given that this model is trained on two dimensions time and closing prices, I'm hoping for better success once given a greater variety of data. 

In [71]:
val_loss, val_acc = model.evaluate(X_eval, y_eval)
print(val_loss)
print(val_acc)

0.6883647441864014
0.5767635107040405
