# In this notebook, simple RNN will be used to predict the closing price of Bitcoin

* Any feedback is welcome!!

reference: Deep learning with Python (some lines of code are extracted and modified from this book)
1. https://www.manning.com/books/deep-learning-with-python

# Please refer to the kernel below for data exploration
https://www.kaggle.com/kentata/time-series-data-exploration

In [None]:
#必要なPythonの仕組みをインストール
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns; sns.set()
# Input data files are available in the "../input/" directory.


from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

# Any results you write to the current directory are saved as output.


In [None]:
#データの読み取り
train = pd.read_csv("../input/bitcoin_price_Training - Training.csv")
test = pd.read_csv("../input/bitcoin_price_1week_Test - Test.csv")


In [None]:
#データの整形
train = train[::-1]
test = test[::-1]

In [None]:
#データの表示
train.head()

In [None]:
train = train['Close'].values.astype('float32')
test = test['Close'].values.astype('float32')

# Make sure to standardize the data before feeding it to neural nets
ニューラルネットワーク使う前に正規化

In [None]:
from sklearn.preprocessing import StandardScaler
#前処理ツールをインポート

In [None]:
# reshape data to scale the point
#データを正規化  -1と+1の間に比率で収める
train = train.reshape(-1, 1)
test = test.reshape(-1, 1)

scaler = StandardScaler()
train_n = scaler.fit_transform(train)
test_n = scaler.transform(test)

In [None]:
print(train_n.shape)
print(test_n.shape)
#データを表示させる

In [None]:
def generator(data, lookback, delay, min_index, max_index, 
              shuffle=False, batch_size=128, step=1):
    if max_index is None:
        max_index = len(data) - delay - 1
    i = min_index + lookback
    while 1:
        if shuffle:
            rows = np.random.randint(min_index + lookback, max_index, size=batch_size)
        else:
            if i + batch_size >= max_index:
                i = min_index + lookback
                
            rows = np.arange(i, min(i + batch_size, max_index))
            i += len(rows)
        samples = np.zeros((len(rows), lookback // step, data.shape[-1]))
        targets = np.zeros((len(rows),))
        for j, row in enumerate(rows):
            indices = range(rows[j] - lookback, rows[j], step)
            samples[j] = data[indices]
            targets[j] = data[rows[j] + delay]
        yield samples, targets

(ここから計算ロジック中身の解説と関連する設定数値(パラメータ)の詳細解説)

**ハイパーパラメータ：**

ルックバック：将来の価値を予測するために入力として使用されるポイント（日）の数

遅延：予測するポイント数

例：この例では、過去7日間の価格を予測するために、以前の価格を入力として24日間使用しています

私がこの値を選んだのは、

ルックバック：24は自己回帰モデル（https://www.kaggle.com/kentata/time-series-data-exploration）によって選択された値です。

遅延：テストデータには7ポイント（1日の値）があり、予測値と真の値が可能です

**Hyperparameters:**

lookback: how many points (days) shoud be used as inputs to predict the future value

delay: how many points should be predicted

Ex: In this example, we use previous prices for 24 days as input  to predict the prices for future 7 days

The reason why I chose thses values are:

lookback: 24 is the value chosen by autoregressive model (https://www.kaggle.com/kentata/time-series-data-exploration)

delay: test data has 7 points(daily values) so prediction and true values wiill be possible


In [None]:
lookback = 24
step = 1
delay = 7
batch_size = 128
train_gen = generator(train_n, lookback=lookback, delay=delay,
    min_index=0, max_index=1000, shuffle=True, step=step,
batch_size=batch_size)
val_gen = generator(train_n, lookback=lookback, delay=delay,
    min_index=1001, max_index=None, step=step, batch_size=batch_size)
test_gen = generator(test_n, lookback=lookback, delay=delay,
    min_index=0, max_index=None, step=step, batch_size=batch_size)
# This is how many steps to draw from `val_gen` in order to see the whole validation set:
#これは、バリデーションセット全体を見るために `val_gen`から描画するステップ数です：
val_steps = (len(train_n) - 1001 - lookback) // batch_size
# This is how many steps to draw from `test_gen` in order to see the whole test set:
#これは、テストセット全体を見るために `test_gen`から描画するステップ数です：
test_steps = (len(test_n) - lookback) // batch_size

In [None]:
# reproducibility (make sure each time training is occurred, the result will be the same)
#再現性（トレーニングが行われるたびに必ず、結果は同じになります）
np.random.seed(7)

In [None]:
#TensorFlowのパッケージであるKerasのインポート
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop

In [None]:
#DeepLearningの中身の計算層のデザインやパラメータ設定 詳細はこちら https://keras.io/ja/layers/recurrent/
model = Sequential()
model.add(layers.GRU(32,
                     dropout=0.2,
                     recurrent_dropout=0.2,
                     input_shape=(None, train_n.shape[-1])))
model.add(layers.Dense(1))
model.compile(optimizer=RMSprop(), loss='mae')
model.summary()

In [None]:
#計算回数の設定 steps_per_epoch 100とepochs 20を色々いじると遊べる     
#もともとは500と100でしたが計算が長くかかるので短くしました
history = model.fit_generator(train_gen,
                              steps_per_epoch=100,
                              epochs=20,
                              validation_data=val_gen,
                              validation_steps=val_steps)

In [None]:
#トレーニングデータとテストデータのロス率の推移を視覚化
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(loss))
plt.figure()
plt.plot(epochs, loss, 'blue', label='train loss')
plt.plot(epochs, val_loss, 'orange', label='test loss')
plt.title('Training and validation loss')
plt.legend()

Important to note: Overfitting. Further hyperparamter tuing is necessary (strong drop-out, regularization)
過学習しているのでハイパーパラメータのチューニングが必要

In [None]:
#正規化していたデータを元の値に戻してやる
train_re = train_n.reshape(-1,1,1)
pred = model.predict(train_re)

Since we scaled the data, it is necessary to scale back to data in original units to plot the data

In [None]:

pred = scaler.inverse_transform(pred)

In [None]:
#グラフ表示して可視化する
plt.plot(range(len(train_re)), train, label='train')
plt.plot(range(len(train_re)), pred, label='prediction')
plt.legend()

plt.title("Prediction on training data")

＃重要な注意点：予測は、1400日からの急激な価格上昇を捉えていません（予測は、1400日間はうまくいきます）。 これは予測が困難な突然の変化によるものかもしれません
# Important to note: prediction does not capture the steep increase of price from 1400 days (prediction performs well unitl 1400 days). This might be due to the abrupt change which is hard to predict
