<a href="https://colab.research.google.com/github/myacarrizosa/Airbnb-Rating-Prediction/blob/main/04_Neural_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Networks

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.regularizers import l2
from tensorflow.keras.layers import Dropout
from tensorflow.keras.callbacks import EarlyStopping

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

import random
random.seed(222)

In [None]:
X_train = pd.read_csv("https://media.githubusercontent.com/media/myacarrizosa/Airbnb-Rating-Prediction/main/data/X_train_r.csv")
X_test = pd.read_csv("https://media.githubusercontent.com/media/myacarrizosa/Airbnb-Rating-Prediction/main/data/X_test_r.csv")
y_train = pd.read_csv("https://media.githubusercontent.com/media/myacarrizosa/Airbnb-Rating-Prediction/main/data/y_train_r.csv")
y_test = pd.read_csv("https://media.githubusercontent.com/media/myacarrizosa/Airbnb-Rating-Prediction/main/data/y_test_r.csv")

In [None]:
X_train = X_train['lem_comments']
X_test = X_test['lem_comments']

y_train = y_train['type']
y_test = y_test['type']

In [None]:
nn_y_train = y_train.values
nn_y_test = y_test.values

In [None]:
num_words = 2000
oov_token = '<UNK>'
pad_type = 'post'
trunc_type = 'post'

In [None]:
tokenizer = Tokenizer(num_words=num_words, oov_token=oov_token)
tokenizer.fit_on_texts(X_train)

In [None]:
train_sequences = tokenizer.texts_to_sequences(X_train)
maxlen = max([len(x) for x in train_sequences])
train_padded = pad_sequences(train_sequences, padding=pad_type, truncating=trunc_type, maxlen=maxlen)

In [None]:
test_sequences = tokenizer.texts_to_sequences(X_test)
test_padded = pad_sequences(test_sequences, padding=pad_type, truncating=trunc_type, maxlen=maxlen)

In [None]:
train_padded.shape, test_padded.shape

((24510, 1004), (8171, 1004))

We ran our text data through the tensorflow keras tokenizer tool to get it processed to be run through the neural network models.

Code adapted from: https://www.kdnuggets.com/2020/03/tensorflow-keras-tokenization-text-data-prep.html

In [None]:
ss = StandardScaler()
X_train_sc = ss.fit_transform(train_padded)
X_test_sc = ss.transform(test_padded)

In [None]:
X_train_sc.shape, X_test_sc.shape

((24510, 1004), (8171, 1004))

We then standardized our data 

### **MODEL 1**

In [None]:
model1 = Sequential()

model1.add(Dense(64, activation = 'relu',
                input_shape = (1004,)))
model1.add(Dense(32, activation = 'relu'))

model1.add(Dense(1, activation = 'sigmoid'))

model1.compile(loss = 'bce', optimizer = 'adam', metrics = ['accuracy'])


res1 = model1.fit(X_train_sc, nn_y_train,
                epochs = 100,
                batch_size = 64,
                validation_data = (X_test_sc, nn_y_test),
                verbose = 1) 

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [None]:
print('Train Accuracy:')
print(res1.history['accuracy'][-1])
print('Test Accuracy:')
print(res1.history['val_accuracy'][-1])

Train Accuracy:
0.8103223443031311
Test Accuracy:
0.5327377319335938


Our first neural net model consisted of three dense layers, ending in a sigmoid activation for binary classification. We fit our model using an epoch size of 100 and a batch size of 64. Our model had a train accuracy of 0.81 and a test accuracy of 0.53, meaning that the model was very overfitted.



# Model 2 
We added more layers to try to improve our model performance 

In [None]:
model2 = Sequential()

model2.add(Dense(64, activation = 'relu',
                input_shape = (1004,)))
model2.add(Dense(64, activation = 'relu'))
model2.add(Dense(64, activation = 'relu'))
model2.add(Dense(32, activation = 'relu'))
model2.add(Dense(32, activation = 'relu'))

model2.add(Dense(1, activation = 'sigmoid'))

model2.compile(loss = 'bce', optimizer = 'adam', metrics = ['accuracy'])


res2 = model2.fit(X_train_sc, nn_y_train,
                epochs = 100,
                batch_size = 64,
                validation_data = (X_test_sc, nn_y_test),
                verbose = 1) 

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [None]:
print('Train Accuracy:')
print(res2.history['accuracy'][-1])
print('Test Accuracy:')
print(res2.history['val_accuracy'][-1])

Train Accuracy:
0.8437780737876892
Test Accuracy:
0.5362868905067444


Our second neural net model consisted of six dense layers, ending in a sigmoid activation for binary classification. We added these extra layers to try to improve the accuracy of the model by making it more complex. We fit our model using an epoch size of 100 and a batch size of 64. Our model had a train accuracy of 0.84 and a test accuracy of 0.54, meaning that the model was very overfitted. 

## Model 3

We then tried regularize the model by adding early stopping, kernal regularizer, and dropout layers

In [None]:
# Build model using early stopping
es = EarlyStopping(patience = 5)

In [None]:
model3 = Sequential()

model3.add(Dense(64, activation = 'relu',
                input_shape = (1004,)))
model3.add(Dense(64, activation = 'relu'))
model3.add(Dense(32, activation = 'relu', kernel_regularizer = l2(0.005)))
model3.add(Dropout(.5))
model3.add(Dense(32, activation = 'relu', kernel_regularizer = l2(0.005)))
model3.add(Dense(64, activation = 'relu'))
model3.add(Dense(32, activation = 'relu'))
model3.add(Dropout(.5))
model3.add(Dense(32, activation = 'relu'))
model3.add(Dense(32, activation = 'relu'))

model3.add(Dense(1, activation = 'sigmoid'))

model3.compile(loss = 'bce', optimizer = 'adam', metrics = ['accuracy'])


res3 = model3.fit(X_train_sc, nn_y_train,
                epochs = 100,
                batch_size = 64,
                validation_data = (X_test_sc, nn_y_test),
                callbacks = [es],
                verbose = 1) 

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100


In [None]:
print('Train Accuracy:')
print(res3.history['accuracy'][-1])
print('Test Accuracy:')
print(res3.history['val_accuracy'][-1])

Train Accuracy:
0.5801305770874023
Test Accuracy:
0.5559906959533691


Code for neural net models adapted from General Assembly Data Science Immersive Lessons 9.02 and 9.04

Our third neural net model consisted of nine dense layers, ending in a sigmoid activation for binary classification. We added these extra layers to try to improve the accuracy of the model by making it more complex. We also tried to address the overfitting of our first models by using regularization techniques such as dropout layers, kernel regularizers, and early stopping.
We fit our model using an epoch size of 100 and a batch size of 64. Our model had a train accuracy of 0.58 and a test accuracy of 0.56. This model did not have as much variance bias as the first two models but is still performing barely above baseline accuracy. Because some of our other models that weren't neural networks were performing substantially better than these models, we decided that neural networks may not be the best fit for our data. 