A neural network is a computational model designed to recognize patterns. It consists of layers of interconnected nodes (neurons) that process input data and produce output. Key components include:

Input Layer: Receives input data.
Hidden Layers: Perform computations and extract features.
Output Layer: Produces the final output.
Weights: Parameters adjusted during training.
Activation Functions: Introduce non-linearity to model complex patterns.
How Neural Networks are Used in This Program
In this program, an LSTM (Long Short-Term Memory) neural network, a type of recurrent neural network (RNN), is used for binary text classification.

Steps in the Program:
Data Loading and Preprocessing:

Load text data from a CSV file.
Convert text to sequences of integers and pad them to a uniform length.
Data Splitting:

Split the data into training and testing sets.
Building the Neural Network:

Embedding Layer: Converts word indices into dense vectors.
Spatial Dropout Layer: Prevents overfitting by randomly dropping units.
LSTM Layer: Processes sequence data to capture dependencies.
Dense Layer: Produces the final output with a sigmoid activation function for binary classification.
Compiling the Model:

Use binary cross-entropy loss, the Adam optimizer, and accuracy as the metric.
Training the Model:

Train the model on the training data and validate it on the test data.
Evaluating and Plotting:

Plot training and validation accuracy.
Evaluate the model's accuracy and loss on the test set.

In [None]:
# Install TensorFlow if not already installed
!pip install tensorflow

# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, SpatialDropout1D, Dropout
import matplotlib.pyplot as plt

# Magic inline for displaying plots inline
%matplotlib inline

# Load the dataset
file_path = 'cleaned_balanced_dataset_FINAL.csv'
data = pd.read_csv(file_path)

# Display the first few rows of the dataset to understand its structure
display(data.head())

# Ensure all comments are strings
data['comment'] = data['comment'].astype(str)

# Preprocess the text data
max_features = 2000
max_len = 100

tokenizer = Tokenizer(num_words=max_features, oov_token='<OOV>')
tokenizer.fit_on_texts(data['comment'])
sequences = tokenizer.texts_to_sequences(data['comment'])
padded_sequences = pad_sequences(sequences, maxlen=max_len)

# Preparing the labels
labels = data['label'].values

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(padded_sequences, labels, test_size=0.2, random_state=42)

# Building the neural network model
model = Sequential()
model.add(Embedding(input_dim=max_features, output_dim=128))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Training the model
history = model.fit(X_train, y_train, epochs=5, batch_size=64, validation_data=(X_test, y_test), verbose=2)

# Plotting the accuracy graph
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

# Evaluating the model
loss, accuracy = model.evaluate(X_test, y_test, verbose=2)
print(f'Test Loss: {loss}')
print(f'Test Accuracy: {accuracy}')




Unnamed: 0,label,comment
0,1,need
1,0,might well milk last
2,1,ask locktrap
3,1,im glad community doesnt make console player f...
4,0,joke put stitch


Epoch 1/5
