# Develop Your Own Neural Network Model for Crypocurrenct Prediction

- Exercise1 Trinary Classification Model
- Exercise2 Return Prediction Model

# Exercise1 Trinary Classification Model



### Data Download at: https://drive.google.com/open?id=1thjGhgnAm5k1zuSiWhGmlUJzBXM3IECi

This exercise is a little bit long exercise, that should give you an idea of a real world scenario. Feel free to look at the solution if you feel lost.

#### Requirements
1. In this exercise you will change outcome variables to `trinary variables (up / no change / down)` from the `binary (up / down) case`

  - up: > 1 * std of return
  - no change:  in between +1 * std of return and  - 1 * std of return
  - down: < -1* std of return

2. You should change your model to accept correspoding output (trinary)


#### Procedures
- Preprocessing
  1. Data Import and Create Balanced Panel
  2. Create Target Variable
  3. Train / Test Split
  4. Create Sequences

- Training / Predicting Model
  1. Model Build
  2. Model Train
  3. Prediction

## Preprocessing

### 1. Data Import and Create Balanced Panel

In [0]:
%matplotlib inline

In [0]:
from google.colab import drive
drive.mount('/content/gdrive')

In [0]:
DATA_PATH = "/content/gdrive/My Drive/Lecture/StudyPie/Data/"

In [0]:
!ls "/content/gdrive/My Drive/Lecture/StudyPie/Data/"

In [0]:
# Unzip Data
# It will take more than 5 min
import zipfile
import io

zf = zipfile.ZipFile(DATA_PATH+"crypto_data.zip", "r")
zf.extractall(DATA_PATH)

In [0]:
SEQ_LEN = 60  # how long of a preceeding sequence to collect for RNN
FUTURE_PERIOD_PREDICT = 3  # how far into the future are we trying to predict?
RATIO_TO_PREDICT = "LTC-USD"

In [0]:
import pandas as pd

main_df = pd.DataFrame() # begin empty

ratios = ["BTC-USD", "LTC-USD", "BCH-USD", "ETH-USD"]  # the 4 ratios we want to consider

for ratio in ratios:  # begin iteration
    print(ratio)
    dataset = DATA_PATH+f'crypto_data/{ratio}.csv'  # get the full path to the file.
    df = pd.read_csv(dataset, names=['time', 'low', 'high', 'open', 'close', 'volume'])  # read in specific file

    # rename volume and close to include the ticker so we can still which close/volume is which:
    df.rename(columns={"close": f"{ratio}_close", "volume": f"{ratio}_volume"}, inplace=True)

    df.set_index("time", inplace=True)  # set time as index so we can join them on this shared time
    df = df[[f"{ratio}_close", f"{ratio}_volume"]]  # ignore the other columns besides price and volume

    if len(main_df)==0:  # if the dataframe is empty
        main_df = df  # then it's just the current df
    else:  # otherwise, join this data to the main one
        main_df = main_df.join(df)

main_df.fillna(method="ffill", inplace=True)  # if there are gaps in data, use previously known values
main_df.dropna(inplace=True)
print(main_df.head())  # how did we do??

### 2. Create Target Variable

In [2]:
"""
classify target variable into three

Up 2: 1 std
No change 1: in between
Down : -1 std

"""

'\nclassify target variable into three\n\nUp 2: 1 std\nNo change 1: in between\nDown 0: -1 std\n\n'

### 3. Train / Test Split

In [0]:
times = sorted(main_df.index.values)  # get the times
last_5pct = sorted(main_df.index.values)[-int(0.05*len(times))]  # get the last 5% of the times

validation_main_df = main_df[(main_df.index >= last_5pct)]  # make the validation data where the index is in the last 5%
main_df = main_df[(main_df.index < last_5pct)]  # now the main_df is all the data up to the last 5%

### 4. Create Sequences

In [0]:
from sklearn import preprocessing  # pip install sklearn ... if you don't have it!
from collections import deque
import random
import numpy as np

def sequence_generator(main_df, SEQ_LEN, suffle=True,seed=101):
    
  sequential_data = []  # this is a list that will CONTAIN the sequences
  queue = deque(maxlen = SEQ_LEN)  # These will be our actual sequences. They are made with deque, which keeps the maximum length by popping out older values as new ones come in

  for i in main_df.values:  # iterate over the values
      queue.append([n for n in i[:-1]])  # store all but the target
      if len(queue) == SEQ_LEN:  # make sure we have 60 sequences!
          sequential_data.append([np.array(queue), i[-1]])  # append those bad boys!

  if suffle == True:
      random.seed(seed)
      random.shuffle(sequential_data)  # shuffle for good measure.

  X = []
  y = []

  for seq, target in sequential_data:  # going over our new sequential data
      X.append(seq)  # X is the sequences
      y.append(target)  # y is the targets/labels (buys vs sell/notbuy)

  return np.array(X), y  # return X and y...and make X a numpy array!

In [0]:
train_x, train_y = sequence_generator(main_df , SEQ_LEN, suffle=True, seed=101)
validation_x, validation_y = sequence_generator(validation_main_df , SEQ_LEN, suffle=True, seed=101)

In [0]:
print(train_x.shape, len(train_y))
print(validation_x.shape, len(validation_y))

## Up/ No Significant Change / Down Prediction Model

### 1. Model Build

In [0]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM, CuDNNLSTM, BatchNormalization, Flatten

def ex1_models(input_dim, output_dim):

  """
  write your own neural network model
  """
    
  return model

In [0]:
model1 = ex1_models(train_x.shape[1:], 3)

### 2. Model Train


In [0]:
BATCH_SIZE = 64 
NUM_ITERATIONS = 10

model1.fit(train_x, tf.keras.utils.to_categorical(train_y, num_classes=None), 
              batch_size = BATCH_SIZE,
              epochs = NUM_ITERATIONS)

### 3. Prediction

In [0]:
predictions = model1.predict_classes(validation_x)

# Score model
score = model1.evaluate(validation_x, tf.keras.utils.to_categorical(validation_y, num_classes=None), 
                       verbose=0)

print('Test loss:', score[0])
print('Test accuracy:', score[1])

# Exercise2 Return Prediction Model

#### Requirements

In this exercise you will change outcome variables to return (1D)
  
Hint
1. Which loss function? Does it still categorical?
2. Which activation function? Does it still softmax?


#### Procedures
- Preprocessing
  1. Data Import and Create Balanced Panel
  2. Create Target Variable
  3. Train / Test Split
  4. Create Sequences
  
- Training / Predicting Model
  1. Model Build
  2. Model Train
  3. Prediction

In [0]:
SEQ_LEN = 60  # how long of a preceeding sequence to collect for RNN
FUTURE_PERIOD_PREDICT = 3  # how far into the future are we trying to predict?
RATIO_TO_PREDICT = "LTC-USD"

## Preprocessing

### 1. Data Import and Create Balanced Panel

In [0]:
import pandas as pd
from sklearn import preprocessing 

main_df = pd.DataFrame() # begin empty

ratios = ["BTC-USD", "LTC-USD", "BCH-USD", "ETH-USD"]  # the 4 ratios we want to consider

for ratio in ratios:  # begin iteration
  print(ratio)
  dataset = DATA_PATH+f'crypto_data/{ratio}.csv'  # get the full path to the file.
  df = pd.read_csv(dataset, names=['time', 'low', 'high', 'open', 'close', 'volume'])  # read in specific file

  # rename volume and close to include the ticker so we can still which close/volume is which:
  df.rename(columns={"close": f"{ratio}_close", "volume": f"{ratio}_volume"}, inplace=True)

  df.set_index("time", inplace=True)  # set time as index so we can join them on this shared time
  df = df[[f"{ratio}_close", f"{ratio}_volume"]]  # ignore the other columns besides price and volume

  if len(main_df)==0:  # if the dataframe is empty
      main_df = df  # then it's just the current df
  else:  # otherwise, join this data to the main one
      main_df = main_df.join(df)

main_df.fillna(method="ffill", inplace=True)  # if there are gaps in data, use previously known values
main_df.dropna(inplace=True)
print(main_df.head())  # how did we do??

### 2. Create Target Variable

In [1]:
"""
create return column for return prediction

scaling the columns

"""

'\ncreate return column for return prediction\n\nscaling the columns\n\n'

In [0]:
main_df.head()

### 3. Train / Test Split

In [0]:
times = sorted(main_df.index.values)  # get the times
last_5pct = sorted(main_df.index.values)[-int(0.05*len(times))]  # get the last 5% of the times

validation_main_df = main_df[(main_df.index >= last_5pct)]  # make the validation data where the index is in the last 5%
main_df = main_df[(main_df.index < last_5pct)]  # now the main_df is all the data up to the last 5%

### 4. Create Sequences

In [0]:
from sklearn import preprocessing  # pip install sklearn ... if you don't have it!
from collections import deque
import random
import numpy as np

def sequence_generator(main_df, SEQ_LEN, suffle=True,seed=101):
    
  sequential_data = []  # this is a list that will CONTAIN the sequences
  queue = deque(maxlen = SEQ_LEN)  # These will be our actual sequences. They are made with deque, which keeps the maximum length by popping out older values as new ones come in

  for i in main_df.values:  # iterate over the values
      queue.append([n for n in i[:-1]])  # store all but the target
      if len(queue) == SEQ_LEN:  # make sure we have 60 sequences!
          sequential_data.append([np.array(queue), i[-1]])  # append those bad boys!

  if suffle == True:
      random.seed(seed)
      random.shuffle(sequential_data)  # shuffle for good measure.

  X = []
  y = []

  for seq, target in sequential_data:  # going over our new sequential data
      X.append(seq)  # X is the sequences
      y.append(target)  # y is the targets/labels (buys vs sell/notbuy)

  return np.array(X), y  # return X and y...and make X a numpy array!

In [0]:
train_x, train_y = sequence_generator(main_df , SEQ_LEN, suffle=True, seed=101)
validation_x, validation_y = sequence_generator(validation_main_df , SEQ_LEN, suffle=True, seed=101)

In [0]:
print(train_x.shape, len(train_y))
print(validation_x.shape, len(validation_y))

## Return Prediction Model

### 1. Model Build

In [0]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM,\
CuDNNLSTM, BatchNormalization, Flatten, Activation

def ex2_models(input_dim, output_dim):

  """
  write your own neural network model
  """

  return model

### 2. Model Train

In [0]:
model2 = ex2_models(train_x.shape[1:], 1)

In [0]:
BATCH_SIZE = 64 
NUM_ITERATIONS = 10

model2.fit(train_x, train_y, 
              batch_size = BATCH_SIZE,
              epochs = NUM_ITERATIONS)

### 3. Prediction

In [0]:
predictions = model2.predict(validation_x)

# Score model
score = model2.evaluate(validation_x, validation_y,
                       verbose=0)

print('Test loss:', score[0])