# **Sarcasm Detection with LSTM Algorithms**
NLP project about detecting sarcasm with Bi-LSTM TensorFlow. This project involved binary classification, where one 1 represents sarcasm and one 0 does not. Furthermore, this model used dropout to prevent overfitting and used binary crossentropy as the loss function, Adam as the optimizer, and accuracy as the metric. This model had a train accuracy of 87% and a validation accuracy of 82%.

## Import Library

In [None]:
import json
import tensorflow as tf
import numpy as np
import urllib
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

## Build and train a classifier for the sarcasm dataset

In [None]:
def solution_model():
    url = 'https://storage.googleapis.com/download.tensorflow.org/data/sarcasm.json'
    urllib.request.urlretrieve(url, 'sarcasm.json')
    vocab_size = 1000
    embedding_dim = 16
    max_length = 120
    trunc_type='post'
    padding_type='post'
    oov_tok = "<OOV>"
    training_size = 20000

    sentences = []
    labels = []
    sentences = []
    labels = []
    # load data from json file
    with open('sarcasm.json', 'r') as f:
        data = json.load(f)

    # prepare data to sentences and labels
    for i in data:
        sentences.append(i['headline'])
        labels.append(i['is_sarcastic'])

    # split data
    train_sentences = sentences[0:training_size]
    train_labels = labels[0:training_size]
    val_sentences = sentences[training_size:]
    val_labels = labels[training_size:]

    # Fit your tokenizer with training data
    tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=vocab_size, oov_token=oov_tok)

    # Generate index to dictionary of vocabularyn
    word_index = tokenizer.fit_on_texts(train_sentences)
    word_index = tokenizer.word_index

    # train data
    sequences = tokenizer.texts_to_sequences(train_sentences)
    padded = tf.keras.preprocessing.sequence.pad_sequences(sequences, maxlen=max_length, padding=padding_type,truncating=trunc_type)

    # validation data
    val_sequences = tokenizer.texts_to_sequences(val_sentences)
    val_padded = tf.keras.preprocessing.sequence.pad_sequences(val_sequences, maxlen=max_length, padding=padding_type,truncating=trunc_type)

    # Convert labels to arrays
    train_labels_final = np.array(train_labels)
    val_labels_final = np.array(val_labels)

    model = tf.keras.Sequential([
        tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
        # tf.keras.layers.GlobalAveragePooling1D(),
        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(24)),
        tf.keras.layers.Dropout(0.7),
        tf.keras.layers.Dense(512, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])

    # Compile the model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

    # Train the model
    model.fit(padded, train_labels_final, validation_data=(val_padded, val_labels_final), epochs=10, verbose=2)
    return model

## Fitting and Save Model Classifier

In [None]:
if __name__ == '__main__':
    model = solution_model()
    model.save("mymodel.h5")

Epoch 1/10
625/625 - 27s - loss: 0.4623 - accuracy: 0.7688 - val_loss: 0.3939 - val_accuracy: 0.8216 - 27s/epoch - 43ms/step
Epoch 2/10
625/625 - 8s - loss: 0.3760 - accuracy: 0.8305 - val_loss: 0.3872 - val_accuracy: 0.8207 - 8s/epoch - 13ms/step
Epoch 3/10
625/625 - 9s - loss: 0.3487 - accuracy: 0.8469 - val_loss: 0.3842 - val_accuracy: 0.8190 - 9s/epoch - 14ms/step
Epoch 4/10
625/625 - 7s - loss: 0.3346 - accuracy: 0.8551 - val_loss: 0.3883 - val_accuracy: 0.8196 - 7s/epoch - 11ms/step
Epoch 5/10
625/625 - 8s - loss: 0.3242 - accuracy: 0.8598 - val_loss: 0.3760 - val_accuracy: 0.8284 - 8s/epoch - 12ms/step
Epoch 6/10
625/625 - 7s - loss: 0.3152 - accuracy: 0.8651 - val_loss: 0.3818 - val_accuracy: 0.8283 - 7s/epoch - 11ms/step
Epoch 7/10
625/625 - 9s - loss: 0.3085 - accuracy: 0.8688 - val_loss: 0.3818 - val_accuracy: 0.8293 - 9s/epoch - 14ms/step
Epoch 8/10
625/625 - 6s - loss: 0.3027 - accuracy: 0.8732 - val_loss: 0.4001 - val_accuracy: 0.8167 - 6s/epoch - 10ms/step
Epoch 9/10
625

  saving_api.save_model(
