<a href="https://colab.research.google.com/github/sobiahashmi/BIA_codes/blob/main/rnn_taxi_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **AUTHOR NAME: SOBIA ALAMGIR**

## **Dataset used: Newyork Taxi**
## **Model: RNN (Recurrent Neural Network)**

- An **RNN (Recurrent Neural Network)** is a type of neural network designed to process sequential data by remembering past information through loops in its architecture.

  - **Core Idea:** It processes one element at a time while maintaining a hidden state that carries information from previous steps.
  - **Best Use Cases:** Time series, language modeling, speech recognition, and sequential data.
  - **Variants:** Includes LSTMs (Long Short-Term Memory) and GRUs (Gated Recurrent Units) to handle long-term dependencies better.


## Step-01 Load Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

import tensorflow as tf
from tensorflow import keras

import warnings
warnings.filterwarnings("ignore")

## Step-02 Load Dataset

In [None]:
df = pd.read_csv("/content/drive/MyDrive/BIA class/Deep Learning/RNN/ny_taxi_data.csv")
df.head()

Unnamed: 0,id,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,store_and_fwd_flag
0,id2875421,2,14-03-2016 17:24,14-03-2016 17:32,1,-73.982155,40.767937,-73.96463,40.765602,N
1,id2377394,1,12-06-2016 00:43,12-06-2016 00:54,1,-73.980415,40.738564,-73.999481,40.731152,N
2,id3858529,2,19-01-2016 11:35,19-01-2016 12:10,1,-73.979027,40.763939,-74.005333,40.710087,N
3,id3504673,2,06-04-2016 19:32,06-04-2016 19:39,1,-74.01004,40.719971,-74.012268,40.706718,N
4,id2181028,2,26-03-2016 13:30,26-03-2016 13:38,1,-73.973053,40.793209,-73.972923,40.78252,N


## Step-03 Data Preprocessing

In [None]:
df.shape

(16100, 10)

In [None]:
df.columns

Index(['id', 'vendor_id', 'pickup_datetime', 'dropoff_datetime',
       'passenger_count', 'pickup_longitude', 'pickup_latitude',
       'dropoff_longitude', 'dropoff_latitude', 'store_and_fwd_flag'],
      dtype='object')

In [None]:
# Assuming you want to pick the passenger count
data = df["passenger_count"].values
data

array([1, 1, 1, ..., 1, 1, 1])

In [None]:
df["passenger_count"].value_counts()

Unnamed: 0_level_0,count
passenger_count,Unnamed: 1_level_1
1,11371
2,2357
5,866
3,638
6,541
4,327


* **Scaled data helps algorithms process and converge faster.**

In [None]:

scaler = MinMaxScaler() # to fast the conversion
data =  scaler.fit_transform(data.reshape(-1,1))
data

array([[0.],
       [0.],
       [0.],
       ...,
       [0.],
       [0.],
       [0.]])

In [None]:
len(data)

16100

In [None]:
sequence_length = 10
sequences = []
targets = []

for i in range(len(data)- sequence_length): # 16100 - 10 = 16090 times loop will execute
  sequences.append(data[i:i+sequence_length]) # For i = 0,1 & 2 iteration: 0:10, 1:11, 2:12
  targets.append(data[i+sequence_length])   # For i = 0,1 & 2 iteration: 10,11,12

  # print(sequences)
  # print(targets)
  # break

sequences = np.array(sequences)
targets = np.array(targets)

print(len(sequences))
print(len(targets))

16090
16090


In [None]:
sequences.shape

(16090, 10, 1)

In [None]:
targets.shape

(16090, 1)

## Step-04 Train test split

In [None]:
X_train,X_test,y_train,y_test = train_test_split(sequences ,
                                                 targets ,
                                                 test_size = 0.2 ,
                                                 random_state= True)

In [None]:
X_train.shape, X_test.shape

((12872, 10, 1), (3218, 10, 1))

In [None]:
X_train.shape[1]

10

## Step-05 Model Selection

In [None]:
# Initialize the sequential model and name it RNN
model_rnn = keras.Sequential(name = "RNN")

# Add a SimpleRNN layer with 100 units and ReLU activation function
# The input shape is set to match the number of features in X_train(number of time steps)
model_rnn.add(keras.layers.SimpleRNN(100,activation="relu" , input_shape = (X_train.shape[1],1)))

model_rnn.add(keras.layers.Dense(1)) # single value - regression task

model_rnn.compile(optimizer = "adam" , loss = "mean_squared_error" , metrics = ["accuracy"])

In [None]:
# Initialize a sequential model and name it LSTM
model_lstm = keras.Sequential(name = "LSTM")

model_lstm.add(keras.layers.LSTM(100,activation = "relu" ,
                                 input_shape = (X_train.shape[1],1)))
model_lstm.add(keras.layers.Dense(1))

model_lstm.compile(optimizer = "adam",
                   loss = "mean_squared_error", metrics = ["accuracy"])

In [None]:
# Initialize the sequential model and name it GRU (Gated Reurrent Unit)
model_gru = keras.Sequential(name = "GRU")

model_gru.add(keras.layers.GRU(100,activation = "relu",
                               input_shape = (X_train.shape[1],1)))
model_gru.add(keras.layers.Dense(1))

model_gru.compile(optimizer = "adam" ,
                  loss = "mean_squared_error", metrics = ["accuracy"])


## Step-06 Model Training

* **Model fitting using RNN (Recurrent Neural Network)**

In [None]:
model_rnn.fit(X_train,y_train,
              epochs = 100,
              batch_size = 32 , # Number of samples per iteration
              verbose = 1 # No output or progressbar during training
              )

Epoch 1/100
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.7153 - loss: 0.0695
Epoch 2/100
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.7129 - loss: 0.0696
Epoch 3/100
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.7104 - loss: 0.0680
Epoch 4/100
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.7069 - loss: 0.0703
Epoch 5/100
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.7158 - loss: 0.0678
Epoch 6/100
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.7091 - loss: 0.0703
Epoch 7/100
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.7069 - loss: 0.0698
Epoch 8/100
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.7132 - loss: 0.0680
Epoch 9/100
[1m403/403[0m [32

<keras.src.callbacks.history.History at 0x7e9b78c14220>

* **Model fitting using LSTM (Long Short-Term Memory)**

In [None]:
model_lstm.fit(X_train,y_train,
              epochs = 10,
              batch_size = 32 , # Number of samples per iteration
              verbose = 1 # No output or progressbar during training
              )

Epoch 1/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 7ms/step - accuracy: 0.7114 - loss: 0.0710
Epoch 2/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 7ms/step - accuracy: 0.7151 - loss: 0.0690
Epoch 3/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 11ms/step - accuracy: 0.7130 - loss: 0.0676
Epoch 4/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 7ms/step - accuracy: 0.7096 - loss: 0.0689
Epoch 5/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 7ms/step - accuracy: 0.7181 - loss: 0.0688
Epoch 6/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 9ms/step - accuracy: 0.7118 - loss: 0.0681
Epoch 7/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 10ms/step - accuracy: 0.7055 - loss: 0.0689
Epoch 8/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step - accuracy: 0.7070 - loss: 0.0702
Epoch 9/10
[1m403/403[0m [32m━━━━━━

<keras.src.callbacks.history.History at 0x7e9b75993250>

* **Model fitting using GRU (Gated Recurrent Unit)**

In [None]:
model_gru.fit(X_train, y_train,
              epochs = 10,
              batch_size = 32,
              verbose = 1
              )

Epoch 1/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 10ms/step - accuracy: 0.7124 - loss: 0.0702
Epoch 2/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 11ms/step - accuracy: 0.7092 - loss: 0.0716
Epoch 3/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 8ms/step - accuracy: 0.7130 - loss: 0.0676
Epoch 4/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 8ms/step - accuracy: 0.7085 - loss: 0.0706
Epoch 5/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 11ms/step - accuracy: 0.7111 - loss: 0.0703
Epoch 6/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 8ms/step - accuracy: 0.7138 - loss: 0.0666
Epoch 7/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 10ms/step - accuracy: 0.7109 - loss: 0.0707
Epoch 8/10
[1m403/403[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 16ms/step - accuracy: 0.7161 - loss: 0.0688
Epoch 9/10
[1m403/403[0m [32m━━━

<keras.src.callbacks.history.History at 0x7e9b5fde5d20>

## Step-07 Model Evaluation

In [None]:
def evauluate_model (model , X_test , y_test):

  X_test_reshaped = X_test.reshape(X_test.shape[0],X_test.shape[1], 1)
  y_pred = model.predict(X_test_reshaped)

# Inverse Transform the prediction and true values to get then back to the origin values
  y_pred = scaler.inverse_transform(y_pred).flatten()
  y_test = scaler.inverse_transform(y_test.reshape(-1,1)).flatten()

  mse = mean_squared_error(y_test,y_pred)
  return mse

#print(evauluate_model())

In [None]:
mse_rnn = evauluate_model(model_rnn,X_test,y_test)
mse_lstm = evauluate_model(model_lstm,X_test,y_test)
mse_gru = evauluate_model(model_gru,X_test,y_test)

print(f"RNN Mean Squared Error: {mse_rnn}")
print(f"LSTM Mean Squared Error: {mse_lstm}")
print(f"GRU Mean Squared Error: {mse_gru}")

[1m101/101[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m101/101[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
[1m101/101[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
RNN Mean Squared Error: 2.11844393888646
LSTM Mean Squared Error: 1.7871085926177819
GRU Mean Squared Error: 1.7856346453873795
