# Introduction

Creating an accurate model for disaster analysis is challenging, it is even harder when it comes to Twitter text. Here you can find a model which is easy to understand and open to improve by changing model structure and hyperparameters. I got inspired from many research papers which are some of listed below. <br />
I implemented the idea "bringing the different task models together for getting better results", and confirmed the power of CNN-BiLSTM pipeline. 

Proposed **BERT-CNN-BiLSTM** learning pipeline, which consists of **three sequential modules**.<br />
BERT produces competitive results, and can be considered as one of the new electricity of natural
language processing tasks such as sentiment analysis, named entity recognition (NER), and topic
modeling. The combination of CNN and BiLSTM models requires a particular design, since each
model has a specific architecture and its own strengths:<br />
• BERT is utilized to transform word tokens from the raw Tweet messages to contextual word
embeddings.<br />
• CNN is known for its ability to extract as many features as possible from the text.<br />
• BiLSTM keeps the chronological order between words in a document, thus it has the ability
to ignore unnecessary words using the delete gate.<br />



**References:**<br />
1) [A Sentiment-Aware Contextual Model for Real-Time Disaster Prediction Using Twitter Data](https://www.mdpi.com/1999-5903/13/7/163/htm) -> The idea comes from and really worth to check on, however, I am not using the same model.<br />
2) [Automatic identification of eyewitness messages on twitter during disasters](https://reader.elsevier.com/reader/sd/pii/S0306457319303590?token=985D740724AEDB812611486EBAD3B68FA4393520D4DCD96FDADE4A642A9805D728945987C1BBBE0FDAA8EC3684E372C7&originRegion=eu-west-1&originCreation=20210920022341)<br />
3) [Convolutional Neural Networks for Sentence Classification](http://arxiv.org/abs/1408.5882)<br />
4) [BERT: Pre-training of Deep Bidirectional Transformers for Language
               Understanding](http://arxiv.org/abs/1810.04805)<br />
5) [– Understanding LSTM –
a tutorial into Long Short-Term Memory
Recurrent Neural Networks](https://arxiv.org/pdf/1909.09586.pdf)<br />

# I recommend to read before starting

___
## Why Development Set is Exluded ?
It is simply because we have pretty hard dataset which is specified by ambiguous keywords to distinguish. The dataset is not big enough for model to understand some differences so we won't divide our little dataset. Our goal is maximizing our submission results. However, using dev set to improve analysis of results is a must almost every time. 

Feel free to check `version 3` for more explanatory notebook. It includes dev set application, error analysis and more surprises.
____
## Dataset & cleaning
Based on this [paper](https://aclanthology.org/2020.pam-1.15.pdf) punctuations are important.  It significantly affects counting in BERT's context extraction phase. Therefore, we will not clean the punctuations which are in the texts. Even though Twitter data is a mess, sometimes these kinds of little tricky changes increase accurracy in remarkable amount.

Finding perfect hyperparameters are an actual issue after preprocessing done properly. We should not do every preprocessing transaction. I did some of them to show how to see in `version 3`, however, generally traditional preprocessing affects texts in a really bad way to be learned by BERT or any contextual structures. We need to check and think how embedding models which we are going to use trained and why we need to clean that any specific property.

The best results is obtained in raw data with contextual models.


In [None]:
import tensorflow as tf

from tensorflow.keras.layers import (Embedding, LSTM, Dense, Bidirectional, 
                                     SpatialDropout1D, Input, Conv1D, Dropout)
from tensorflow.keras import Model
from tensorflow.keras.optimizers import Adam
from transformers import BertTokenizer, TFBertModel

import numpy as np
import os
import pandas as pd
import time

# Preparing environment for training

In [None]:
os.environ["WANDB_API_KEY"] = "0" ## to silence warning

In [None]:
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
except ValueError:
    strategy = tf.distribute.get_strategy() # for CPU and single GPU
    print('Number of replicas:', strategy.num_replicas_in_sync)

In [None]:
# hyperparameters
max_length = 128
batch_size = 32
dev_size = 0.1

In [None]:
# Bert Tokenizer
model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)

In [None]:
# Read the data
train = pd.read_csv('/kaggle/input/nlp-getting-started/train.csv')

In [None]:
def bert_encode(data):
    tokens = tokenizer.batch_encode_plus(data, max_length=max_length, padding='max_length', truncation=True)
    
    return tf.constant(tokens['input_ids'])

# Encoding text data

In [None]:
%%time
train_encoded = bert_encode(train.text)

In [None]:
train_dataset = (
    tf.data.Dataset
    .from_tensor_slices((train_encoded, train.target))
    .shuffle(64)
    .batch(batch_size)
)

# Proposed Model

In [None]:
def bert_tweets_model():

    bert_encoder = TFBertModel.from_pretrained(model_name)
    input_word_ids = Input(shape=(max_length,), dtype=tf.int32, name="input_ids")
    last_hidden_states = bert_encoder(input_word_ids)[0]   
    x = SpatialDropout1D(0.2)(last_hidden_states)
    x = Conv1D(64, 3, activation='relu')(x)
    x = Bidirectional(LSTM(64, dropout=0.2, recurrent_dropout=0.2))(x)
    x = Dense(256, activation='relu')(x)
    x = Dropout(0.4)(x)
    x = Dense(128, activation='relu')(x)
    outputs = Dense(1, activation='sigmoid')(x)
    model = Model(input_word_ids, outputs)
   
    return model

In [None]:
with strategy.scope():
    model = bert_tweets_model()
    adam_optimizer = tf.keras.optimizers.Adam(learning_rate=1e-5)
    model.compile(loss='binary_crossentropy', optimizer=adam_optimizer, metrics=['accuracy'])

    model.summary()

In [None]:
tf.keras.utils.plot_model(model, show_shapes=True)

# Training

In [None]:
%%time
history = model.fit(
    train_dataset,
    batch_size=batch_size,
    epochs=3,
    verbose=1)

# SUBMISSION

In [None]:
test = pd.read_csv('/kaggle/input/nlp-getting-started/test.csv')
test_encoded = bert_encode(test.text)

test_dataset = (
    tf.data.Dataset
    .from_tensor_slices(test_encoded)
    .batch(batch_size)
)

predicted_tweets = model.predict(test_dataset, batch_size=batch_size)
predicted_tweets_binary = tf.cast(tf.round(predicted_tweets), tf.int32).numpy().flatten()

In [None]:
my_submission = pd.DataFrame({'id': test.id, 'target': predicted_tweets_binary})
my_submission.to_csv('./submission.csv', index=False)