* [Sequence Models](#1)
* [Recurrent Neural Network (RNN)](#2)
* [Implementing Recurrent Neural Network with Keras](#3)
    * [Loading and Preprocessing Data](#31)
    * [Create RNN Model](#32)
    * [Predictions and Visualising RNN Model](#33)
* [Long Short Term Memory (LSTMs)](#4)
* [Implementing Long Short Term Memory with Keras](#99)
    * [Loading and Visualizing Data](#41)
    * [Preprocessing Data](#42)
    * [Create LSTM Model](#43)
    * [Predictions and Visualising LSTM Model](#44)

## Sequence Models
* Sequence models plays an over time. 
* Speech recognition, natural language process (NLP), music generation
* Apples Siri and Google's voice search
* Sentiment classification (duygu sınıflandırma) Mesela "bu ders bu dunyadaki en guzel ders" yada "sacma sapan ders cekmissin hocaaa"  

## Recurrent Neural Network
* RNN’s are able to remember important things about the input they received, which enables them to be very precise in predicting what’s coming next.
* This is the reason why they are the preferred algorithm for sequential data like time series, speech, text, financial data, audio, video, weather and much more because they can form a much deeper understanding of a sequence and its context, compared to other algorithms.
* Not only feeds output but also gives feed backs into itself. Because RNN has internal memory
* temporal loop = zamansal döngü. Kendini besler.
* Belleğe sahipler short term memory bir önceki node da olanları hatırlarlar. Eskiyi hatırlar.
* Mesela geçmişi hatırlamak neden önemli biz yaptıklarımızdan bir şeyler öğreniriz ve yeni öğrenilen şeyleri de eski öğrendiklerimizi üzerine kurarız. RNN'i de aynı mantıkta düşünebilirsiniz. Film örneğinde olduğu gibi.
* Örnek RNN yapılarına bakalım
* One to Many
    * Input bir resim output o resimde yapılan cümle yani "Adam surf yapıyor"
* Many to One
    * Input bir cümle output bir duygu mesela iyimser neşeli gibi.
* Many to Many
    * Mesela google translate kullanarak İngilizceden bir cümleyi Türkçe'ye translate etmek
* RNN short term memory'e sahip ama LSTM long term memory'e de sahip olabiliyor.
* RNN'i ANN yada CNN'den ayıran daha önce de belirttiğimiz gibi *memory*. Mesela "DATAI" diye bir stringimiz var ve biz 4. harfe geldik yani "A" harfine. ANN' e sorduğumuz zaman 4. harfi A olan bir kelimenin 5. harfi ne olabilir diye. ANN bilemez çünkü memory olmadığı için geçmiş harfleri yani "DAT" harflerini bilip "A" ile birleştirip daha sonra 5. harf "I" olabilir diyemez. Ama RNN tam olarak bunu söyleyebilir.
* Exploiding Gradients: Gradient'in çok büyük olması durumu. Gereksiz yere belli weightlere önem kazandırır.
* Vanishing Gradients: Gradient'in çok küçük olması durumu. Yavaş öğrenir.
* Gradient neydi arkadaşlar costa göre weightlerde ki değişim.

## Implementing Recurrent Neural Network with Keras
* [Loading and Preprocessing Data](#31)
* [Create RNN Model](#32)
* [Predictions and Visualising RNN Model](#33)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
import warnings
warnings.filterwarnings('ignore')
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import SimpleRNN
from keras.layers import Dropout
from keras.layers import LSTM
from sklearn.metrics import mean_squared_error

## 1-Read Data and Preprocessing

In [None]:
dataset_train = pd.read_csv('data/Stock_Price_Train.csv')

In [None]:
dataset_train.head()

### NOTE
Bu verisetinde yalnızca Open feature'ı üzerinde çalışalıcaktır. Amaç RNN öğrenmektir.

In [None]:
train = dataset_train.loc[:, ["Open"]].to_numpy()
train

In [None]:
# Feature Scaling
scaler = MinMaxScaler(feature_range = (0, 1)) # min-max normalization
train_scaled = scaler.fit_transform(train)
train_scaled

In [None]:
plt.plot(train_scaled)
plt.show()

### NOTE
Burada 50 sample alacağız(x_train), sonraki indexi mesela 51. yi predict edeceğiz, daha sonra 1 index sonraya shifting edip aynı işlem yapılacaktır. Örneğin 3 sample için:

<img src='img/timestep.png'>

In [None]:
# Creating a data structure with 50 timesteps and 1 output
X_train = []
y_train = []
timesteps = 50
for i in range(timesteps, 1258):
    X_train.append(train_scaled[i-timesteps:i, 0])
    y_train.append(train_scaled[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)

In [None]:
# Reshaping
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_train

In [None]:
y_train

## 2-Create RNN Model and Training Data

In [None]:
# Initialising the RNN
regressor = Sequential()

# Adding the first RNN layer and some Dropout regularisation
regressor.add(SimpleRNN(units = 50, activation='tanh', return_sequences = True, input_shape = (X_train.shape[1], 1)))
regressor.add(Dropout(0.2))

# Adding a second RNN layer and some Dropout regularisation
regressor.add(SimpleRNN(units = 50, activation='tanh', return_sequences = True))
regressor.add(Dropout(0.2))

# Adding a third RNN layer and some Dropout regularisation
regressor.add(SimpleRNN(units = 50, activation='tanh', return_sequences = True))
regressor.add(Dropout(0.2))

# Adding a fourth RNN layer and some Dropout regularisation
regressor.add(SimpleRNN(units = 50))
regressor.add(Dropout(0.2))

# Adding the output layer
regressor.add(Dense(units = 1))

# Compiling the RNN
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fitting the RNN to the Training set
regressor.fit(X_train, y_train, epochs = 100, batch_size = 32)

## 3-Predict (Test) Data and Visualising RNN Model

In [None]:
# Getting the real stock price of 2017
dataset_test = pd.read_csv('data/Stock_Price_Test.csv')
dataset_test.head()

In [None]:
real_stock_price = dataset_test.loc[:, ["Open"]].to_numpy()
real_stock_price

In [None]:
# Getting the predicted stock price of 2017
dataset_total = pd.concat((dataset_train['Open'], dataset_test['Open']), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) - timesteps:].values.reshape(-1,1)
inputs = scaler.transform(inputs)  # min max scaler
inputs

In [None]:
# Test Data
X_test = []
for i in range(timesteps, 70):
    X_test.append(inputs[i-timesteps:i, 0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
predicted_stock_price = regressor.predict(X_test)
predicted_stock_price = scaler.inverse_transform(predicted_stock_price) # scale edilmiş değerleri, gerçek değerlere dönüştürür.

In [None]:
# Visualising the results
plt.plot(real_stock_price, color = 'red', label = 'Real Google Stock Price')
plt.plot(predicted_stock_price, color = 'blue', label = 'Predicted Google Stock Price')
plt.title('Google Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Google Stock Price')
plt.legend()
plt.show()
# epoch = 250 daha güzel sonuç veriyor.

# B-Long Short Term Memory (LSTMs)
* LSTM is variant of RNN.
* LSTM de RNN'den farklı olarak long term memory var. 
* LSTM architecture:
    * x: scaling of information
    * +: Adding information
    * sigmoid layer. Sigmoid memory den bir şeyi hatırlamak için yada unutmak için kullanılır. 1 yada 0'dır.
    * tanh: activation function tanh. Tanh vanishing gradient(yavaş öğrenme - çok küçük gradient) problemini çözer. Çünkü parametreleri update ederken türev alıyorduk. Tanh'ın türevi hemen sıfır'a ulaşmaz.
    * h(t-1): output of LSTM unit
    * c(t-1): memory from previous LSTM unit
    * X(t): input
    * c(t): new updated memory
    * h(t): output
    * From c(t-1) to c(t) is memory pipeline. or only memory.
    * Oklar vektör.
    * h(t-1) ile X(t) birleşmiyor parallel iki yol olarak düşünebilirsiniz.
* <img src="img/lstm.jpg">
* 1) Forget gate: input olarak X(t) ve h(t-1) alır. Gelen bilginin unutulup unutulmayacağına karar verir.
* 2) Input gate: Hangi bilginin memory de depolanıp depolanmayacağına karar verir.
* 3) Output gate: Hangi bilginin output olup olmayacağına karar verir.
* Örneğin: 
    * ... "Boys are watching TV"
    * "On the other hand girls are playing baseball."
    * Forget "boys". new input is "girls" and output is "girls"

<a id="99"></a>
## Implementing Long Short Term Memory with Keras
* [Loading and Visualizing Data](#41)
* [Preprocessing Data](#42)
* [Create LSTM Model](#43)
* [Predictions and Visualising LSTM Model](#44)

## 1-Read Dataset

In [None]:
data = pd.read_csv("data/passengers.csv")
data.head() # İlk indexte 1 ayda international airline'dan 112 kişi geçmiş.

In [None]:
dataset = data.iloc[:, 1].to_numpy()
plt.plot(dataset)
plt.xlabel("time")
plt.ylabel("Number of Passenger")
plt.title("international airline passenger")
plt.show()

## 2-Preprocessing Data
* reshape
* change type
* scaling
* train test split
* Create dataset

In [None]:
dataset = dataset.reshape(-1, 1) # (142,) => ( 142,1)
dataset = dataset.astype("float32")
dataset.shape

In [None]:
# scaling 
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)

In [None]:
train_size = int(len(dataset) * 0.50)
test_size = len(dataset) - train_size
train = dataset[0:train_size,:]
test = dataset[train_size:len(dataset),:]
print("train size: {}, test size: {} ".format(len(train), len(test)))

In [None]:
time_stemp = 10
dataX = []
dataY = []
for i in range(len(train)-time_stemp-1):
    a = train[i:(i+time_stemp), 0]
    dataX.append(a)
    dataY.append(train[i + time_stemp, 0])
trainX = np.array(dataX)
trainY = np.array(dataY)  


In [None]:
dataX = []
dataY = []
for i in range(len(test)-time_stemp-1):
    a = test[i:(i+time_stemp), 0]
    dataX.append(a)
    dataY.append(test[i + time_stemp, 0])
testX = np.array(dataX)
testY = np.array(dataY)  

In [None]:
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

## 3-Create LSTM Model

In [None]:
# model
model = Sequential()
model.add(LSTM(10, input_shape=(1, time_stemp))) # 10 lstm neuron(block)
model.add(Dense(1)) #activation function LSTM'de gerek yok 
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=50, batch_size=1)

## 4-Predictions and Visualising LSTM Model

In [None]:
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
# invert predictions
trainPredict = scaler.inverse_transform(trainPredict) # original value
trainY = scaler.inverse_transform([trainY])
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform([testY])
# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))
print('Test Score: %.2f RMSE' % (testScore))

In [None]:
# shifting train
trainPredictPlot = np.empty_like(dataset)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[time_stemp:len(trainPredict)+time_stemp, :] = trainPredict
# shifting test predictions for plotting
testPredictPlot = np.empty_like(dataset)
testPredictPlot[:, :] = np.nan
testPredictPlot[len(trainPredict)+(time_stemp*2)+1:len(dataset)-1, :] = testPredict
# plot baseline and pr edictions
plt.plot(scaler.inverse_transform(dataset))
plt.plot(trainPredictPlot)
plt.plot(testPredictPlot)
plt.show()