<p style="text-align:center; ">
<img src="https://intech.media/wp-content/uploads/2021/11/pixta_67679187_M.jpg", style='width: 1000px; height: 600px;'>
</p>


This code is a deep learning workflow designed to train a text classification model using TensorFlow. It begins by importing necessary libraries such as NumPy, Pandas, TensorFlow, and others for data processing, model building, and evaluation. The workflow involves loading a dataset, vectorizing the text data using TensorFlow's `TextVectorization` layer, and splitting the data into training, validation, and test sets. A bidirectional LSTM model is built to handle the sequence-based nature of text, with additional layers like Dropout to prevent overfitting. The model is trained using a binary cross-entropy loss function, and various evaluation metrics such as Precision, Recall, and Accuracy are calculated. Finally, the model and its associated vectorizer configuration and weights are saved for future use. This process covers all the essential steps for a typical natural language processing task, from data preparation to model evaluation and saving.

1.Importing necessary libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib

2.Importing TensorFlow for deep learning tasks and checking GPU availability

In [None]:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Num GPUs Available:  0


3.Reading the CSV file containing the training dataset

In [None]:
df = pd.read_csv("/content/train.csv.zip")

FileNotFoundError: [Errno 2] No such file or directory: '/content/train.csv.zip'

4.Extracting features

In [None]:
X = df["comment_text"]
y = df[df.columns[2:]].values

NameError: name 'df' is not defined

5.TextVectorization

In [None]:
from tensorflow.keras.layers import TextVectorization
MAX_FEATURES = 200000
vectorizer = TextVectorization(
    max_tokens = MAX_FEATURES,
    output_sequence_length = 1800,
    output_mode = "int"
)

In [None]:
vectorizer.adapt(X.values)

6.Vectorizing the input text data

In [None]:
vectorized_text = vectorizer(X.values)

7.Creating a TensorFlow dataset from the vectorized text and the labels

In [None]:
dataset = tf.data.Dataset.from_tensor_slices((vectorized_text, y))
dataset = dataset.cache()
dataset = dataset.shuffle(160000)
dataset = dataset.batch(32)
dataset = dataset.prefetch(16)

8.Fetching one batch from the dataset

In [None]:
batch_X, batch_y = dataset.as_numpy_iterator().next()

9.Splitting the dataset into training, validation, and test sets

In [None]:
train = dataset.take(int(len(dataset) * .7))
val = dataset.skip(int(len(dataset) * .7)).take(int(len(dataset) * .2))
test = dataset.skip(int(len(dataset) * .9)).take(int(len(dataset) * .1))

10.Creating a generator for the training data

In [None]:
train_generator = train.as_numpy_iterator()

11.Importing necessary libraries

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout, Bidirectional, Dense, Embedding, TextVectorization

12.creating the model

In [None]:
model = Sequential()
model.add(Embedding(MAX_FEATURES + 1, 128))  # Embedding layer
model.add(Bidirectional(LSTM(64, return_sequences=True)))  # LSTM layer
model.add(Dropout(0.5))  # Dropout layer pour éviter le sur-apprentissage
model.add(Bidirectional(LSTM(32)))
model.add(Dropout(0.5))  # Dropout layer
model.add(Dense(128, activation="relu"))
model.add(Dense(256, activation="relu"))
model.add(Dense(128, activation="relu"))
model.add(Dense(6, activation="sigmoid"))  # Multi-label classification

13.Compiling with optimized loss function

In [None]:
model.compile(loss="BinaryCrossentropy", optimizer="Adam", metrics=["accuracy"])
model.summary()

14.Setting up callbacks: EarlyStopping to stop training early if validation loss stops improving, and ModelCheckpoint to save the best model

In [None]:
early_stopping = EarlyStopping(monitor="val_loss", patience=3, restore_best_weights=True)
model_checkpoint = ModelCheckpoint("best_model.keras", save_best_only=True)

15.Training the model for 7 epochs using the training and validation datasets, with the defined callbacks

In [None]:
history = model.fit(
    train,
    validation_data=val,
    epochs=7,
    callbacks=[early_stopping, model_checkpoint]
)

16.Retrieving the history of the model's performance during training

In [None]:
history.history

In [None]:
import matplotlib.pyplot as plt

17.Plotting the training and validation loss/accuracy over epochs

In [None]:
plt.figure(figsize=(8,5))
pd.DataFrame(history.history).plot()
plt.show()

18.Vectorizing a sample input text and making a prediction with the trained model

In [None]:
input_text = vectorizer("I love you")
batch = test.as_numpy_iterator().next()
res = model.predict(np.expand_dims(input_text,0))

19.evaluation metrics

In [None]:
from tensorflow.keras.metrics import Precision, Recall, CategoricalAccuracy
pre = Precision()
re = Recall()
acc = CategoricalAccuracy()

20.Evaluating the model on the test set

In [None]:
for batch in test.as_numpy_iterator():
    X_true, y_true = batch
    yhat = model.predict(X_true)

    y_true = y_true.flatten()
    yhat = yhat.flatten()

    pre.update_state(y_true, yhat)
    re.update_state(y_true, yhat)
    acc.update_state(y_true, yhat)

21.Printing the evaluation results

In [None]:
print(f'Precision: {pre.result().numpy()}, Recall: {re.result().numpy()}, Accuracy: {acc.result().numpy()}')

In [None]:
import tensorflow as tf
import pickle

# Saving the entire model in Keras format
model.save("full_model.keras")

# Saving the configuration and weights of the TextVectorization layer
vectorizer_config = vectorizer.get_config()
vectorizer_weights = vectorizer.get_weights()

with open("vectorizer_config.pkl", "wb") as f:
    pickle.dump(vectorizer_config, f)

with open("vectorizer_weights.pkl", "wb") as f:
    pickle.dump(vectorizer_weights, f)