# Tensorflow implementation 03: RNN

by [Sho Nakagome](https://github.com/shonaka)

This jupyter notebook is intended to implement a simple RNN (Recurrent Neural Network) model on predicting sequential time series. For the data, we will be using Bitcoin historical price data available [here](http://api.bitcoincharts.com/v1/csv/).

## Imports

In [1]:
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Just for visualization in jupyter notebook purposes
%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 8)

  from ._conv import register_converters as _register_converters


Programming environment: Python 3.6 (Anaconda)

In [2]:
tf.__version__

'1.5.0'

## Define global variables

Let's define some global variables to be used in the code. Don't worry about this for now since we will be describing these in the later part of the notebooks when these appear.

In [None]:
# RNN related
NUM_HIDDEN = 32  # Number of hidden units in a hidden layer

# Optimization related
LEARNING_RATE = 1e-3
BATCH_SIZE = 64  # Better to have a batch size 2^n
NUM_EPOCHS = BATCH_SIZE * 10

## Import data and checking the dataset

We will be importing data using pandas. [Pandas](https://pandas.pydata.org/) is a very good library to handle data structures and processing, like the .csv file we are going to use in this notebook. To know more about the library, go check the website linked in the beginning of this sentence.

In [None]:
# Load the data
df = pd.read_csv('data/.coinbaseUSD.csv')

Since the .csv file is a little over 150MB, it might take a while.

Once the loading is done, let's check the dataset a bit.

In [None]:
# Show the first n = 10 rows
df.head(10)

As you could see, the data is a little difficult to decipher. Let's put some titles to the columns and encode the date so that we could easily interpret the data. 

In [None]:
# Adding column labels
df.columns = ['TimeStamp', 'PriceUSD', 'Volume']
df.head() # default n = 5

In [None]:
# Encode the date and replace the index by the date
df.TimeStamp= pd.to_datetime(df['TimeStamp'], unit='s')

# Change the index with the encoded date
df.index = df.TimeStamp

# Show the data just for checking
df.head()

Now that we have the data organized, we will group them by day, month, year to see the trends.

In [None]:
# Group by day
df_day = df.resample('D').mean()

# Group by month
df_month = df.resample('M').mean()

# Group by year
df_year = df.resample('Y').mean()

Let's check each of the resampled data.

In [None]:
df_day.head(10)

In [None]:
df_month.head(10)

In [None]:
df_year.head(10)

Now you see that there's some missing day transactions in df_day and we will be removing the day with Nans. Note that there's only 5 rows in df_year since the bitcoin transaction on coinbase started in 2014. It's kind of amazing that this crypto currency stuff have gained so much popularity and money flowing in considering the fact that it only appeared less than 10 years ago.

In [None]:
# Removing the rows with Nans
df_day = df_day.dropna()
df_day.head(10)

As a last step in checking the datasets, let's visualize our data and then move on to predicting bitcoin prices by day.

In [None]:
# Visualizing the data
fig = plt.figure()

# Day
plt.subplot(311)
plt.plot(df_day.PriceUSD, '-', label='By Days')
plt.title('Bitcoin price from Coinbase in USD', fontsize=24)

# Month
plt.subplot(312)
plt.plot(df_month.PriceUSD, '-', label='By Months')
plt.ylabel('Bitcoin Price in USD', fontsize=18)

# Year
plt.subplot(313)
plt.plot(df_year.PriceUSD, '-', label='By Years')
plt.xlabel('Time', fontsize=18)

As you can see from the graph, from 2014 to 2016, there's not much price change, so I will be only considering the data from the beginning of 2016.

In [None]:
# The data we are going to use for prediction
data = df_day[(df_day.index > '2016-01-01')]

# Check the shape of the data that we are using
print('Data shape: ',data.shape)

Let's divide the data into training and testing sets. We will be using 90% of the data as training and the rest as testing. Also, note that the target data labels for training will be one sample ahead of the training samples. In other words, we will be using a previous sample to predict the next sample.

In [None]:
# Total length of the data set
tot_len = data.shape[0]

# Divide the data into training and testing sets
train_data = data.PriceUSD[:np.int(tot_len*0.9)]
test_data = data.PriceUSD[np.int(tot_len*0.9):]
train_target = data.PriceUSD[1:np.int(tot_len*0.9)+1]

# Just for checking the dimensions
print('Train data shape:',train_data.shape)
print('Test data shape:',test_data.shape)
print('Train target shape:',train_target.shape)

## Define tensorflow graph

From here, we are going to construct tensorflow graph of RNN (Recurrent Neural Network) model.

First, let's define variables and placeholders.

In [None]:
# Defining variables
# NUM_HIDDEN is a number of hidden units in a hidden layer
W = tf.get_variable(name="W", shape=[NUM_HIDDEN, 1], initializer=tf.contrib.layers.xavier_initializer())
b = tf.get_variable(name="b", initializer=tf.constant(np.random.rand(1, 1)))

# Defining placeholders
# Note that each time we are feeding only one bitcoin price data and predicting the next one sample
X = tf.placeholder(tf.float32, shape=[None, 1], name="Input_X")
Y = tf.placeholder(tf.float32, shape=[None, 1], name="Target_Y")

### RNN model

Now since we have defined the variables and placeholders, let's start making the RNN model.