<a href="https://colab.research.google.com/github/integral05/Deep-Learning-Practical/blob/main/DL_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### 2. Classification using Deep neural network Binary classification using Deep Neural Networks Example: Classify movie reviews into positive" reviews and "negative" reviews, just based on the text content of the reviews. Use IMDB dataset

# Importing necessary libraries

This cell imports the necessary libraries for this task:

- `numpy` for numerical operations, especially for handling arrays.
- `tensorflow` and `keras` for building and training deep learning models.
- `pad_sequences` for padding the input sequences to a fixed length.

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Dense, Flatten
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.feature_extraction.text import TfidfVectorizer  # For example

# Loading the IMDB dataset

This cell loads the IMDB movie review dataset.

- `np.load` is used to load the data from the 'imdb.npz' file.
- `x_train` and `y_train` contain the training data (movie reviews and their labels).
- `x_test` and `y_test` contain the testing data.

In [5]:
with np.load ('/content/imdb.npz', allow_pickle=True) as f:
  x_train,y_train = f['x_train'],f['y_train']
  x_test,y_test = f['x_test'], f['y_test']


# Finding the maximum index and number of words

This cell calculates the maximum index of words in the dataset and the total number of words.

- `max_index` represents the highest index of any word in the training data.
- `num_words` is the total number of unique words in the vocabulary.

In [6]:
max_index = np.max([np.max(x) for x in x_train])
num_words = max_index + 1

# Padding the sequences

This cell pads the sequences of movie reviews to a fixed length of 256.

- `pad_sequences` is used to ensure all reviews have the same length.
- This is necessary for the deep learning model to process the data efficiently.

In [7]:
x_train = pad_sequences(x_train, maxlen = 256)
x_test = pad_sequences(x_test, maxlen = 256)

# Building and compiling the model

This cell defines, builds, and compiles the deep learning model for classification.

- `Sequential` is used to create a linear stack of layers for the model.
- `Embedding` layer converts words into numerical vectors.
- `Flatten` layer transforms the output of the embedding layer into a single vector.
- `Dense` layer with sigmoid activation is used for binary classification (positive or negative review).
- `compile` configures the model for training, specifying the optimizer, loss function, and metrics.

In [8]:
model = Sequential()
model.add(Embedding(num_words, 32, input_length=256))
model.add(Flatten())
model.add(Dense(1,activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])



# Training the model

This cell trains the model using the training data.

- `fit` function is used to train the model.
- `epochs` specify the number of times the model iterates over the entire training data.
- `batch_size` determines the number of samples processed before updating the model's weights.
- `validation_split` sets aside a portion of the training data for validation during training.

In [9]:
# Train the model
model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

Epoch 1/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 24ms/step - accuracy: 0.6907 - loss: 0.5699 - val_accuracy: 0.8302 - val_loss: 0.3662
Epoch 2/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 26ms/step - accuracy: 0.9299 - loss: 0.1954 - val_accuracy: 0.8464 - val_loss: 0.3402
Epoch 3/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 23ms/step - accuracy: 0.9845 - loss: 0.0730 - val_accuracy: 0.8054 - val_loss: 0.4593
Epoch 4/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 22ms/step - accuracy: 0.9958 - loss: 0.0295 - val_accuracy: 0.7674 - val_loss: 0.6030
Epoch 5/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 24ms/step - accuracy: 0.9990 - loss: 0.0115 - val_accuracy: 0.7536 - val_loss: 0.6936
Epoch 6/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 22ms/step - accuracy: 0.9992 - loss: 0.0079 - val_accuracy: 0.8066 - val_loss: 0.5540
Epoch 7/10
[1m6

<keras.src.callbacks.history.History at 0x7a2fd2b4f450>

# Evaluating the model

This cell evaluates the trained model's performance using the test data.

- `evaluate` function calculates the loss and accuracy of the model on the test data.
- The accuracy score indicates the percentage of correctly classified movie reviews.

In [10]:
accuracy = model.evaluate(x_test, y_test)
print('Accuracy: ', accuracy)

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8980 - loss: 0.3268
Accuracy:  [0.4924541711807251, 0.8556399941444397]


In [12]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.feature_extraction.text import TfidfVectorizer

# Load the IMDB dataset
with np.load('/content/imdb.npz', allow_pickle=True) as f:
    x_train, y_train = f['x_train'], f['y_train']
    x_test, y_test = f['x_test'], f['y_test']

# Convert x_train and x_test to raw text if they are not already
# Assuming x_train and x_test are lists of word indices:

# Apply TfidfVectorizer to the raw text data
vectorizer = TfidfVectorizer(max_features=10000)  # Adjust max_features as needed
# Moved vectorizer definition before usage

#Fit Vectorizer and Transform for train and test data
x_train_text = [' '.join([str(i) for i in seq]) for seq in x_train]
x_test_text = [' '.join([str(i) for i in seq]) for seq in x_test]


x_train = vectorizer.fit_transform(x_train_text).toarray()
x_test = vectorizer.transform(x_test_text).toarray()

# Build the Dense model
model = Sequential()
model.add(Dense(16, input_shape=(x_train.shape[1],), activation="relu"))
model.add(Dense(16, activation="relu"))
model.add(Dense(1, activation="sigmoid"))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

# Evaluate the model
_, accuracy = model.evaluate(x_test, y_test)
print('Accuracy: ', accuracy)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 6ms/step - accuracy: 0.7511 - loss: 0.4971 - val_accuracy: 0.8630 - val_loss: 0.3262
Epoch 2/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 5ms/step - accuracy: 0.9475 - loss: 0.1552 - val_accuracy: 0.8268 - val_loss: 0.4514
Epoch 3/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.9735 - loss: 0.0914 - val_accuracy: 0.8574 - val_loss: 0.4238
Epoch 4/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 6ms/step - accuracy: 0.9848 - loss: 0.0569 - val_accuracy: 0.8320 - val_loss: 0.6067
Epoch 5/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.9936 - loss: 0.0314 - val_accuracy: 0.7856 - val_loss: 0.9477
Epoch 6/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 4ms/step - accuracy: 0.9967 - loss: 0.0176 - val_accuracy: 0.7984 - val_loss: 1.0195
Epoch 7/10
[1m625/625[0m 