# Text Sequence Classifier for Hateful Memes (Training)
This is the notebook for the main classifier. It reads in the data from the hateful images and trains it. Note that the input data are not the images themselves, but text data that has extracted the meaning from the image, specifically the meme text (the text that appear on the meme image, written by the creator of the meme) and the caption text (a description of the meme generated by an image captioning model).

Source: https://developers.google.com/machine-learning/guides/text-classification/

In [1]:
from google.colab import drive
drive.mount('/content/drive')
%cd '/content/drive/MyDrive/deep-learning-project/google_text_classifier'

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/deep-learning-project/google_text_classifier


In [2]:
#!pip install pandas
#!pip install scikit-learn
#!pip install numpy
#!pip install tensorflow

In [3]:
import pandas as pd
import numpy as np
import tensorflow

# Read in training data

In [4]:
train = pd.read_csv('../data/train_captioned.csv')
train.head(2)

Unnamed: 0.1,Unnamed: 0,id,img,label,text,caption
0,0,42953,img/42953.png,0,its their character not their color that matters,a man in a suit and tie with a picture of a man.
1,1,23058,img/23058.png,0,don't be afraid to love again everyone is not ...,a woman and man standing next to each other.


In [5]:
train['context'] = train['text'] + '. ' + train['caption']

In [6]:
train.drop(columns=['Unnamed: 0', 'text', 'caption'], inplace=True)

# Read in validation data

In [7]:
valid = pd.read_csv('../data/dev_unseen_captioned.csv')
#valid = pd.read_csv('../data/dev_seen_captioned.csv')
#valid = pd.read_csv('../data/test_unseen_captioned.csv')
#valid = pd.read_csv('../data/test_seen_captioned.csv')
valid['context'] = valid['text'] + '. ' + valid['caption']
valid.drop(columns=['Unnamed: 0', 'text', 'caption'], inplace=True)

# Create dataset

In [8]:
train_texts = list(train['context'])
train_labels = np.array(train['label'])
test_texts = list(valid['context'])
test_labels = np.array(valid['label'])

data = ((train_texts, train_labels),(test_texts, test_labels))

# Build model

In [22]:
# Arguments

# blocks: int, number of pairs of sepCNN and pooling blocks in the model.
blocks = 3  # 2
# filters: int, output dimension of the layers.
filters = 128  # 64
# kernel_size: int, length of the convolution window.
kernel_size = 3  # 3
# embedding_dim: int, dimension of the embedding vectors.
embedding_dim = 300  # 200
# dropout_rate: float, percentage of input to drop at Dropout layers.
dropout_rate = 0.2
# pool_size: int, factor by which to downscale input at MaxPooling layer.
pool_size = 1
# input_shape: tuple, shape of input to the model.
#input_shape = (1, 1)  # determined in vectorize_data
# num_classes: int, number of output classes.
num_classes = 2
# num_features: int, number of words (embedding input dimension).
#num_features = embedding_dim
# use_pretrained_embedding: bool, true if pre-trained embedding is on.
#use_pretrained_embedding = False
# is_embedding_trainable: bool, true if embedding layer is trainable.
#is_embedding_trainable = False
# embedding_matrix: dict, dictionary with embedding coefficients.
#embedding_matrix = None

# Train Model

In [23]:
from train_model import batch_train_sequence_model

batch_train_sequence_model(
    data=data,
    num_classes=num_classes,
    learning_rate=1e-4,  # 1e-3
    epochs=100,
    batch_size=128,  # 128
    blocks=blocks,
    filters=filters,
    dropout_rate=dropout_rate,
    embedding_dim=embedding_dim,
    kernel_size=kernel_size,
    pool_size=pool_size
)

Epoch 1/100


  verbose=2)  # Logs once per epoch.


67/67 - 5s - loss: 0.6877 - acc: 0.6285 - val_loss: 0.6826 - val_acc: 0.6296 - 5s/epoch - 70ms/step
Epoch 2/100
67/67 - 3s - loss: 0.6781 - acc: 0.6448 - val_loss: 0.6747 - val_acc: 0.6296 - 3s/epoch - 43ms/step
Epoch 3/100
67/67 - 3s - loss: 0.6693 - acc: 0.6448 - val_loss: 0.6676 - val_acc: 0.6296 - 3s/epoch - 43ms/step
Epoch 4/100
67/67 - 3s - loss: 0.6615 - acc: 0.6448 - val_loss: 0.6627 - val_acc: 0.6296 - 3s/epoch - 44ms/step
Epoch 5/100
67/67 - 3s - loss: 0.6563 - acc: 0.6448 - val_loss: 0.6601 - val_acc: 0.6296 - 3s/epoch - 43ms/step
Epoch 6/100
67/67 - 3s - loss: 0.6531 - acc: 0.6448 - val_loss: 0.6593 - val_acc: 0.6296 - 3s/epoch - 43ms/step
Epoch 7/100
67/67 - 3s - loss: 0.6521 - acc: 0.6448 - val_loss: 0.6592 - val_acc: 0.6296 - 3s/epoch - 43ms/step
Epoch 8/100
67/67 - 3s - loss: 0.6515 - acc: 0.6448 - val_loss: 0.6592 - val_acc: 0.6296 - 3s/epoch - 43ms/step
Epoch 9/100
67/67 - 3s - loss: 0.6512 - acc: 0.6448 - val_loss: 0.6593 - val_acc: 0.6296 - 3s/epoch - 43ms/step
Vali

(0.6296296119689941, 0.6592591404914856)