# Develop Your Own Neural Network Model for Crypocurrenct Prediction

- Exercise1 Trinary Classification Model
- Exercise2 Return Prediction Model

# Exercise1 Trinary Classification Model



### Data Download at: https://drive.google.com/open?id=1thjGhgnAm5k1zuSiWhGmlUJzBXM3IECi

This exercise is a little bit long exercise, that should give you an idea of a real world scenario. Feel free to look at the solution if you feel lost.

#### Requirements
1. In this exercise you will change outcome variables to `trinary variables (up / no change / down)` from the `binary (up / down) case`

  - up: > 1 * std of return
  - no change:  in between +1 * std of return and  - 1 * std of return
  - down: < -1* std of return

2. You should change your model to accept correspoding output (trinary)


#### Procedures
- Preprocessing
  1. Data Import and Create Balanced Panel
  2. Create Target Variable
  3. Train / Test Split
  4. Create Sequences

- Training / Predicting Model
  1. Model Build
  2. Model Train
  3. Prediction

## Preprocessing

### 1. Data Import and Create Balanced Panel

In [0]:
%matplotlib inline

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [0]:
DATA_PATH = "/content/gdrive/My Drive/Lecture/StudyPie/Data/"

In [4]:
!ls "/content/gdrive/My Drive/Lecture/StudyPie/Data/"

256x2-CNN.model  crypto_data.zip  M2_1_y.pickle  PetImages.zip
CoinOne		 log		  __MACOSX	 simple_rnn_model.h5
crypto_data	 M2_1_X.pickle	  PetImages


In [0]:
# Unzip Data
# It will take more than 5 min
import zipfile
import io

zf = zipfile.ZipFile(DATA_PATH+"crypto_data.zip", "r")
zf.extractall(DATA_PATH)

In [0]:
SEQ_LEN = 60  # how long of a preceeding sequence to collect for RNN
FUTURE_PERIOD_PREDICT = 3  # how far into the future are we trying to predict?
RATIO_TO_PREDICT = "LTC-USD"

In [7]:
import pandas as pd

main_df = pd.DataFrame() # begin empty

ratios = ["BTC-USD", "LTC-USD", "BCH-USD", "ETH-USD"]  # the 4 ratios we want to consider

for ratio in ratios:  # begin iteration
    print(ratio)
    dataset = DATA_PATH+f'crypto_data/{ratio}.csv'  # get the full path to the file.
    df = pd.read_csv(dataset, names=['time', 'low', 'high', 'open', 'close', 'volume'])  # read in specific file

    # rename volume and close to include the ticker so we can still which close/volume is which:
    df.rename(columns={"close": f"{ratio}_close", "volume": f"{ratio}_volume"}, inplace=True)

    df.set_index("time", inplace=True)  # set time as index so we can join them on this shared time
    df = df[[f"{ratio}_close", f"{ratio}_volume"]]  # ignore the other columns besides price and volume

    if len(main_df)==0:  # if the dataframe is empty
        main_df = df  # then it's just the current df
    else:  # otherwise, join this data to the main one
        main_df = main_df.join(df)

main_df.fillna(method="ffill", inplace=True)  # if there are gaps in data, use previously known values
main_df.dropna(inplace=True)
print(main_df.head())  # how did we do??

BTC-USD
LTC-USD
BCH-USD
ETH-USD
            BTC-USD_close  BTC-USD_volume  LTC-USD_close  LTC-USD_volume  \
time                                                                       
1528968720    6487.379883        7.706374      96.660004      314.387024   
1528968780    6479.410156        3.088252      96.570000       77.129799   
1528968840    6479.410156        1.404100      96.500000        7.216067   
1528968900    6479.979980        0.753000      96.389999      524.539978   
1528968960    6480.000000        1.490900      96.519997       16.991997   

            BCH-USD_close  BCH-USD_volume  ETH-USD_close  ETH-USD_volume  
time                                                                      
1528968720     870.859985       26.856577      486.01001       26.019083  
1528968780     870.099976        1.124300      486.00000        8.449400  
1528968840     870.789978        1.749862      485.75000       26.994646  
1528968900     870.000000        1.680500      486.00000    

### 2. Create Target Variable

In [0]:
SIGNIFICANT_CRITERIA = 1   # 1 std criteria
 
currency_targets = ["BTC"]

for currency_target in currency_targets:
    main_df[currency_target+'-USD-TARGET'] = main_df[currency_target+'-USD_close'].shift(-FUTURE_PERIOD_PREDICT )
    main_df[currency_target+'-USD-TARGET-RETURN'] = (main_df[currency_target+'-USD-TARGET'] 
                                                                - main_df[currency_target+'-USD_close'])/main_df[currency_target+'-USD_close']

In [0]:
import numpy as np

def classify_trinary(values):
    gp_std = np.std(values)

    target = []
    for value in values:
        if SIGNIFICANT_CRITERIA*gp_std < value: # significant increase
            target.append(2)
        elif -SIGNIFICANT_CRITERIA*gp_std > value:  # significant decrease
            target.append(0)  
        else:
            target.append(1) # No significant change
            
    return target

In [10]:
main_df.head()

Unnamed: 0_level_0,BTC-USD_close,BTC-USD_volume,LTC-USD_close,LTC-USD_volume,BCH-USD_close,BCH-USD_volume,ETH-USD_close,ETH-USD_volume,BTC-USD-TARGET,BTC-USD-TARGET-RETURN
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1528968720,6487.379883,7.706374,96.660004,314.387024,870.859985,26.856577,486.01001,26.019083,6479.97998,-0.001141
1528968780,6479.410156,3.088252,96.57,77.129799,870.099976,1.1243,486.0,8.4494,6480.0,9.1e-05
1528968840,6479.410156,1.4041,96.5,7.216067,870.789978,1.749862,485.75,26.994646,6477.220215,-0.000338
1528968900,6479.97998,0.753,96.389999,524.539978,870.0,1.6805,486.0,77.355759,6480.0,3e-06
1528968960,6480.0,1.4909,96.519997,16.991997,869.98999,1.669014,486.0,7.5033,6479.990234,-2e-06


In [11]:
for currency_target in currency_targets:
    print("SIGNIFICANT_CRITERIA:", SIGNIFICANT_CRITERIA)
    main_df[currency_target+'-TARGET'] = main_df[currency_target+'-USD-TARGET-RETURN'].transform(classify_trinary)
    main_df.drop(columns=[currency_target+'-USD-TARGET', currency_target+'-USD-TARGET-RETURN'], inplace=True)

SIGNIFICANT_CRITERIA: 1


In [12]:
main_df.head()

Unnamed: 0_level_0,BTC-USD_close,BTC-USD_volume,LTC-USD_close,LTC-USD_volume,BCH-USD_close,BCH-USD_volume,ETH-USD_close,ETH-USD_volume,BTC-TARGET
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1528968720,6487.379883,7.706374,96.660004,314.387024,870.859985,26.856577,486.01001,26.019083,1
1528968780,6479.410156,3.088252,96.57,77.129799,870.099976,1.1243,486.0,8.4494,1
1528968840,6479.410156,1.4041,96.5,7.216067,870.789978,1.749862,485.75,26.994646,1
1528968900,6479.97998,0.753,96.389999,524.539978,870.0,1.6805,486.0,77.355759,1
1528968960,6480.0,1.4909,96.519997,16.991997,869.98999,1.669014,486.0,7.5033,1


### 3. Train / Test Split

In [0]:
times = sorted(main_df.index.values)  # get the times
last_5pct = sorted(main_df.index.values)[-int(0.05*len(times))]  # get the last 5% of the times

validation_main_df = main_df[(main_df.index >= last_5pct)]  # make the validation data where the index is in the last 5%
main_df = main_df[(main_df.index < last_5pct)]  # now the main_df is all the data up to the last 5%

### 4. Create Sequences

In [0]:
from sklearn import preprocessing  # pip install sklearn ... if you don't have it!
from collections import deque
import random
import numpy as np

def sequence_generator(main_df, SEQ_LEN, suffle=True,seed=101):
    
  sequential_data = []  # this is a list that will CONTAIN the sequences
  queue = deque(maxlen = SEQ_LEN)  # These will be our actual sequences. They are made with deque, which keeps the maximum length by popping out older values as new ones come in

  for i in main_df.values:  # iterate over the values
      queue.append([n for n in i[:-1]])  # store all but the target
      if len(queue) == SEQ_LEN:  # make sure we have 60 sequences!
          sequential_data.append([np.array(queue), i[-1]])  # append those bad boys!

  if suffle == True:
      random.seed(seed)
      random.shuffle(sequential_data)  # shuffle for good measure.

  X = []
  y = []

  for seq, target in sequential_data:  # going over our new sequential data
      X.append(seq)  # X is the sequences
      y.append(target)  # y is the targets/labels (buys vs sell/notbuy)

  return np.array(X), y  # return X and y...and make X a numpy array!

In [0]:
train_x, train_y = sequence_generator(main_df , SEQ_LEN, suffle=True, seed=101)
validation_x, validation_y = sequence_generator(validation_main_df , SEQ_LEN, suffle=True, seed=101)

In [16]:
print(train_x.shape, len(train_y))
print(validation_x.shape, len(validation_y))

(92778, 60, 8) 92778
(4827, 60, 8) 4827


## Up/ No Significant Change / Down Prediction Model

### 1. Model Build

In [0]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM, CuDNNLSTM, BatchNormalization, Flatten

def ex1_models(input_dim, output_dim):

  # For illustration purpose 
  # I only use FNN
  
  # But you can try everything!
  
  L1 = 50  # 30
  L2 = 30  # 20
  L3 = 20  # 10
  L4 = 10  # 5
  L5 = 5

  model = Sequential()
  model.add(Dense(L1, input_shape=input_dim, activation='relu'))
  model.add(Dropout(0.2))
  model.add(BatchNormalization())

  model.add(Dense(L2, activation='relu'))
  model.add(Dropout(0.2))
  model.add(BatchNormalization())

  model.add(Dense(L3, activation='relu'))
  model.add(Dropout(0.2))
  model.add(BatchNormalization())

  model.add(Dense(L4, activation='relu'))
  model.add(Dropout(0.2))
  model.add(BatchNormalization())

  model.add(Dense(L5, activation='relu'))
  model.add(Dropout(0.2))

  model.add(Flatten())
  model.add(Dense(output_dim, activation='softmax'))

  model.compile(optimizer=tf.train.AdamOptimizer(0.001),
            loss='categorical_crossentropy',
            metrics=['accuracy'])

    
  return model

In [0]:
model1 = ex1_models(train_x.shape[1:], 3)

### 2. Model Train


In [19]:
BATCH_SIZE = 64 
NUM_ITERATIONS = 10

model1.fit(train_x, tf.keras.utils.to_categorical(train_y, num_classes=None), 
              batch_size = BATCH_SIZE,
              epochs = NUM_ITERATIONS)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f7b04a95160>

### 3. Prediction

In [20]:
predictions = model1.predict_classes(validation_x)

# Score model
score = model1.evaluate(validation_x, tf.keras.utils.to_categorical(validation_y, num_classes=None), 
                       verbose=0)

print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.5219763976361457
Test accuracy: 0.8960016572576689


# Exercise2 Return Prediction Model

#### Requirements

In this exercise you will change outcome variables to return (1D)
  
Hint
1. Which loss function? Does it still categorical?
2. Which activation function? Does it still softmax?


#### Procedures
- Preprocessing
  1. Data Import and Create Balanced Panel
  2. Create Target Variable
  3. Train / Test Split
  4. Create Sequences
  
- Training / Predicting Model
  1. Model Build
  2. Model Train
  3. Prediction

In [0]:
SEQ_LEN = 60  # how long of a preceeding sequence to collect for RNN
FUTURE_PERIOD_PREDICT = 3  # how far into the future are we trying to predict?
RATIO_TO_PREDICT = "LTC-USD"

## Preprocessing

### 1. Data Import and Create Balanced Panel

In [22]:
import pandas as pd
from sklearn import preprocessing 

main_df = pd.DataFrame() # begin empty

ratios = ["BTC-USD", "LTC-USD", "BCH-USD", "ETH-USD"]  # the 4 ratios we want to consider

for ratio in ratios:  # begin iteration
  print(ratio)
  dataset = DATA_PATH+f'crypto_data/{ratio}.csv'  # get the full path to the file.
  df = pd.read_csv(dataset, names=['time', 'low', 'high', 'open', 'close', 'volume'])  # read in specific file

  # rename volume and close to include the ticker so we can still which close/volume is which:
  df.rename(columns={"close": f"{ratio}_close", "volume": f"{ratio}_volume"}, inplace=True)

  df.set_index("time", inplace=True)  # set time as index so we can join them on this shared time
  df = df[[f"{ratio}_close", f"{ratio}_volume"]]  # ignore the other columns besides price and volume

  if len(main_df)==0:  # if the dataframe is empty
      main_df = df  # then it's just the current df
  else:  # otherwise, join this data to the main one
      main_df = main_df.join(df)

main_df.fillna(method="ffill", inplace=True)  # if there are gaps in data, use previously known values
main_df.dropna(inplace=True)
print(main_df.head())  # how did we do??

BTC-USD
LTC-USD
BCH-USD
ETH-USD
            BTC-USD_close  BTC-USD_volume  LTC-USD_close  LTC-USD_volume  \
time                                                                       
1528968720    6487.379883        7.706374      96.660004      314.387024   
1528968780    6479.410156        3.088252      96.570000       77.129799   
1528968840    6479.410156        1.404100      96.500000        7.216067   
1528968900    6479.979980        0.753000      96.389999      524.539978   
1528968960    6480.000000        1.490900      96.519997       16.991997   

            BCH-USD_close  BCH-USD_volume  ETH-USD_close  ETH-USD_volume  
time                                                                      
1528968720     870.859985       26.856577      486.01001       26.019083  
1528968780     870.099976        1.124300      486.00000        8.449400  
1528968840     870.789978        1.749862      485.75000       26.994646  
1528968900     870.000000        1.680500      486.00000    

### 2. Create Target Variable

In [0]:
currency_targets = ["BTC"]

for currency_target in currency_targets:
  main_df[currency_target+'-USD-TARGET'] = main_df[currency_target+'-USD_close'].shift(-FUTURE_PERIOD_PREDICT )
  main_df[currency_target+'-USD-TARGET-RETURN'] = (main_df[currency_target+'-USD-TARGET']-main_df[currency_target+'-USD_close'])/main_df[currency_target+'-USD_close']
  
  main_df.drop(columns=[currency_target+'-USD-TARGET'], inplace=True)

In [0]:
from sklearn import preprocessing  # pip install sklearn ... if you don't have it!

# Scaling Your Data
main_df.fillna(main_df.mean(), inplace=True)

In [25]:
main_df.head()

Unnamed: 0_level_0,BTC-USD_close,BTC-USD_volume,LTC-USD_close,LTC-USD_volume,BCH-USD_close,BCH-USD_volume,ETH-USD_close,ETH-USD_volume,BTC-USD-TARGET-RETURN
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1528968720,6487.379883,7.706374,96.660004,314.387024,870.859985,26.856577,486.01001,26.019083,-0.001141
1528968780,6479.410156,3.088252,96.57,77.129799,870.099976,1.1243,486.0,8.4494,9.1e-05
1528968840,6479.410156,1.4041,96.5,7.216067,870.789978,1.749862,485.75,26.994646,-0.000338
1528968900,6479.97998,0.753,96.389999,524.539978,870.0,1.6805,486.0,77.355759,3e-06
1528968960,6480.0,1.4909,96.519997,16.991997,869.98999,1.669014,486.0,7.5033,-2e-06


### 3. Train / Test Split

In [0]:
times = sorted(main_df.index.values)  # get the times
last_5pct = sorted(main_df.index.values)[-int(0.05*len(times))]  # get the last 5% of the times

validation_main_df = main_df[(main_df.index >= last_5pct)]  # make the validation data where the index is in the last 5%
main_df = main_df[(main_df.index < last_5pct)]  # now the main_df is all the data up to the last 5%

### 4. Create Sequences

In [0]:
from sklearn import preprocessing  # pip install sklearn ... if you don't have it!
from collections import deque
import random
import numpy as np

def sequence_generator(main_df, SEQ_LEN, suffle=True,seed=101):
    
  sequential_data = []  # this is a list that will CONTAIN the sequences
  queue = deque(maxlen = SEQ_LEN)  # These will be our actual sequences. They are made with deque, which keeps the maximum length by popping out older values as new ones come in

  for i in main_df.values:  # iterate over the values
      queue.append([n for n in i[:-1]])  # store all but the target
      if len(queue) == SEQ_LEN:  # make sure we have 60 sequences!
          sequential_data.append([np.array(queue), i[-1]])  # append those bad boys!

  if suffle == True:
      random.seed(seed)
      random.shuffle(sequential_data)  # shuffle for good measure.

  X = []
  y = []

  for seq, target in sequential_data:  # going over our new sequential data
      X.append(seq)  # X is the sequences
      y.append(target)  # y is the targets/labels (buys vs sell/notbuy)

  return np.array(X), y  # return X and y...and make X a numpy array!

In [0]:
train_x, train_y = sequence_generator(main_df , SEQ_LEN, suffle=True, seed=101)
validation_x, validation_y = sequence_generator(validation_main_df , SEQ_LEN, suffle=True, seed=101)

In [29]:
print(train_x.shape, len(train_y))
print(validation_x.shape, len(validation_y))

(92778, 60, 8) 92778
(4827, 60, 8) 4827


## Return Prediction Model

### 1. Model Build

In [0]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM,\
CuDNNLSTM, BatchNormalization, Flatten, Activation

def ex2_models(input_dim, output_dim):

  # For illustration purpose 
  # I only use FNN
  
  # But you can try everything!
  
  L1 = 50  # 30
  L2 = 30  # 20
  L3 = 20  # 10
  L4 = 10  # 5
  L5 = 5

  model = Sequential()
  model.add(Dense(L1, input_shape=input_dim, activation='relu'))
  model.add(Dropout(0.2))
  model.add(BatchNormalization())

  model.add(Dense(L2, activation='relu'))
  model.add(Dropout(0.2))
  model.add(BatchNormalization())

  model.add(Dense(L3, activation='relu'))
  model.add(Dropout(0.2))
  model.add(BatchNormalization())

  model.add(Dense(L4, activation='relu'))
  model.add(Dropout(0.2))
  model.add(BatchNormalization())

  model.add(Dense(L5, activation='relu'))
  model.add(Dropout(0.2))

  model.add(Flatten())
  model.add(Dense(output_dim))

  model.compile(optimizer=tf.train.AdamOptimizer(0.001),
            loss='mean_squared_error')

  return model

### 2. Model Train

In [0]:
model2 = ex2_models(train_x.shape[1:], 1)

In [32]:
BATCH_SIZE = 64 
NUM_ITERATIONS = 10

model2.fit(train_x, train_y, 
              batch_size = BATCH_SIZE,
              epochs = NUM_ITERATIONS)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f7aad282f60>

### 3. Prediction

In [33]:
predictions = model2.predict(validation_x)

# Score model
score = model2.evaluate(validation_x, validation_y,
                       verbose=0)

print('Test loss:', score) # this is mean_squared_error 

Test loss: 1.897500496257514e-06
