# BidirectionalLSTM_EmojiClassifier

## Installing the necessary libraries
Downloading the TensorFlow library, as well as the NumPy library; they will be needed for creating and training the model:

In [1]:
from __future__ import annotations

import numpy as np
import tensorflow as tf

from typing import Any

from tensorflow import keras
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Embedding, LSTM, Dense, Bidirectional, Dropout

2023-12-17 13:20:30.754168: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Input Data
Creating basic training data, which consists of two parameters: two messages, for example, that have `context` (meaning the text of the messages) and an `emoji` indicator, where `0` signifies the absence of an `emoji` and `1` signifies its presence.

In [2]:
data: list = [{"text": "hi", "emoji": 0}, {"text": "lol", "emoji": 1}]

Next, we extract labels and text from our prepared data

In [3]:
texts: list = [item["text"] for item in data]
labels: list = [item["emoji"] for item in data]


We use tokenization to convert text into numbers, more precisely into a sequence of numbers. This is essential for training on textual data

In [4]:
tokenizer: Tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
sequences: Any = tokenizer.texts_to_sequences(texts)
padded_sequences: Any = pad_sequences(sequences, maxlen=10, padding='post', truncating='post')

## Model
We create the model for our neural network, which consists of `4` layers:
- `Embedding Layer`, with parameters:
    - `input_dim` - the number of unique words
    - `output_dim` - the dimensionality of the output
    - `input_length` - the length of input data
- `LSTM Layer`, with parameters:
    - `units` - the number of `LSTM` blocks
- `Dense Layer`, with parameters:
    - `units` - the number of neurons
    - `activation` - `ReLU` activation function
- `Dropout Layer`, with parameters:
    - `rate` - the fraction of input neurons to drop

In [5]:
model = Sequential()
model.add(Embedding(input_dim=len(tokenizer.word_index) + 1, output_dim=16, input_length=10))
model.add(Bidirectional(LSTM(64)))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

We compile the model with the Adam optimizer and binary_crossentropy loss function:

In [6]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

We update the labels and train the model on our data:

In [7]:
labels_np = np.array(labels)

model.fit(padded_sequences, labels_np, epochs=100, batch_size=4)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.src.callbacks.History at 0x7f7d59f2bb20>

# Neural Network Tests
To test the neural network, you can use the following code:

In [8]:
new_texts = ["hi bro", "hello gpt", "my name is gustavo", "bruh lol", "bro lorem impsum lol", "xd"]
new_sequences = tokenizer.texts_to_sequences(new_texts)
new_padded_sequences = pad_sequences(new_sequences, maxlen=10, padding='post', truncating='post')

predictions = model.predict(new_padded_sequences)

for text, pred in zip(new_texts, predictions):
    emoji = 1 if pred > 0.4 else 0
    print(f"Text: {text}, Pred: {pred}, Emoji: {emoji}")

Text: hi bro, Pred: [0.00016371], Emoji: 0
Text: hello gpt, Pred: [0.00051243], Emoji: 0
Text: my name is gustavo, Pred: [0.00051243], Emoji: 0
Text: bruh lol, Pred: [0.99934345], Emoji: 1
Text: bro lorem impsum lol, Pred: [0.99934345], Emoji: 1
Text: xd, Pred: [0.00051243], Emoji: 0
