<a href="https://colab.research.google.com/github/narsym/deep-learning-with-tensorflow-2.0/blob/master/Sentiment_analysis_with_CNNs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Importing libraries

In [0]:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models, preprocessing
import tensorflow_datasets as tfds

Constants:

*   Maximum length of sentences
*   no of words in sentences
*   embeddings output dimensions
*   no of epochs
*   batch_size



In [0]:
max_len = 100
n_words = 10000
dim_embedding = 256
EPOCHS = 20
BATCH_SIZE = 500

Loading IMDB movie review dataset from the kaggle and preprocess it by setting maximum length and padding 

In [0]:
def load_data():
  (X_train, y_train), (X_test, y_test) = datasets.imdb.load_data(num_words = n_words)
  X_train = preprocessing.sequence.pad_sequences(X_train, maxlen = max_len)
  X_test = preprocessing.sequence.pad_sequences(X_test, maxlen = max_len)
  return (X_train, y_train), (X_test, y_test)

Building model using Embedding layer which is autoencoder layer and using 1D layers for the text

In [0]:
def build_model():
  model = models.Sequential()
  model.add(layers.Embedding(n_words,dim_embedding,input_length = max_len))
  model.add(layers.Dropout(0.3))
  model.add(layers.Conv1D(256,3,padding = 'valid',  activation = 'relu'))
  model.add(layers.GlobalMaxPool1D())
  model.add(layers.Dense(128,activation = 'relu'))
  model.add(layers.Dropout(0.5))
  model.add(layers.Dense(1,activation = 'sigmoid'))
  return model

Now using above function to instantiate model

In [5]:
(X_train, y_train), (X_test, y_test) = load_data()
model = build_model()
model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 100, 256)          2560000   
_________________________________________________________________
dropout (Dropout)            (None, 100, 256)          0         
_________________________________________________________________
conv1d (Conv1D)              (None, 98, 256)           196864    
_________________________________________________________________
global_max_pooling1d (Global (None, 256)               0         
_________________________________________________________________
dense (Dense)                (None, 128)               32896     
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_______________________________

Compiling model with adam optimizer, binary_crossentropy loss and accuracy as metrics

In [0]:
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

Training the model

In [7]:
model.fit(X_train, y_train, epochs = EPOCHS, batch_size = BATCH_SIZE, validation_data = (X_test, y_test))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7fc5b54aab70>

Evaluating the model on the test set

In [8]:
score = model.evaluate(X_test, y_test, batch_size = BATCH_SIZE)
print('\nTest score:',score[0])
print('\nTest accuracy:',score[1])


Test score: 0.8320747017860413

Test accuracy: 0.8459600210189819


We got 84% accuracy