<a href="https://colab.research.google.com/github/hecshzye/nlp-disaster-tweet-detection/blob/main/disaster_tweet_detection_nlp.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Disaster Tweets Detection using Natural Language Processing

### The goal is to predict which tweets are about real disasters and which are not using `NLP` and `TensorFlow`

- The dataset used in this model is the `Real-or-Not` from `Kaggle competition`: https://www.kaggle.com/c/nlp-getting-started/data


In [72]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import random
import os

In [2]:
# Importing few functions wriiten for workflow and ease
!wget https://raw.githubusercontent.com/hecshzye/natural_language_processing-cases/main/helper_functions.py

--2022-01-15 01:57:46--  https://raw.githubusercontent.com/hecshzye/natural_language_processing-cases/main/helper_functions.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6442 (6.3K) [text/plain]
Saving to: ‘helper_functions.py’


2022-01-15 01:57:47 (84.9 MB/s) - ‘helper_functions.py’ saved [6442/6442]



In [73]:
from helper_functions import plot_loss_curves, create_confusion_matrix, create_tensorboard_callback, unzip_data, compare_history

## Dataset & EDA

In [4]:
!wget https://github.com/hecshzye/nlp-disaster-tweet-detection/blob/main/nlp_getting_started.zip?raw=true
unzip_data("nlp_getting_started.zip?raw=true")

--2022-01-15 01:57:57--  https://github.com/hecshzye/nlp-disaster-tweet-detection/blob/main/nlp_getting_started.zip?raw=true
Resolving github.com (github.com)... 13.114.40.48
Connecting to github.com (github.com)|13.114.40.48|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/hecshzye/nlp-disaster-tweet-detection/raw/main/nlp_getting_started.zip [following]
--2022-01-15 01:57:58--  https://github.com/hecshzye/nlp-disaster-tweet-detection/raw/main/nlp_getting_started.zip
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/hecshzye/nlp-disaster-tweet-detection/main/nlp_getting_started.zip [following]
--2022-01-15 01:57:58--  https://raw.githubusercontent.com/hecshzye/nlp-disaster-tweet-detection/main/nlp_getting_started.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.110.133, ...
Connecting

In [5]:
# Converting CSV to DataFrame
train_df = pd.read_csv("/content/train.csv")
test_df = pd.read_csv("/content/test.csv")

In [6]:
train_df.head()

Unnamed: 0,id,keyword,location,text,target
0,1,,,Our Deeds are the Reason of this #earthquake M...,1
1,4,,,Forest fire near La Ronge Sask. Canada,1
2,5,,,All residents asked to 'shelter in place' are ...,1
3,6,,,"13,000 people receive #wildfires evacuation or...",1
4,7,,,Just got sent this photo from Ruby #Alaska as ...,1


In [7]:
# Shuffling the dataset
train_df_shuffled = train_df.sample(frac=1, random_state=42)
train_df_shuffled.head()

Unnamed: 0,id,keyword,location,text,target
2644,3796,destruction,,So you have a new weapon that can cause un-ima...,1
2227,3185,deluge,,The f$&amp;@ing things I do for #GISHWHES Just...,0
5448,7769,police,UK,DT @georgegalloway: RT @Galloway4Mayor: ÛÏThe...,1
132,191,aftershock,,Aftershock back to school kick off was great. ...,0
6845,9810,trauma,"Montgomery County, MD",in response to trauma Children of Addicts deve...,0


In [8]:
train_df.target.value_counts()

0    4342
1    3271
Name: target, dtype: int64

**Refrence dictionary** 

Real disaster tweet = `1` (3271)

Not real disaster tweet = `0` (4342)

In [9]:
# Train & test data distribution 
print(f"train - {len(train_df)}")
print(f"test - {len(test_df)}")
print(f"total size - {len(train_df) + len(test_df)}")

train - 7613
test - 3263
total size - 10876


In [10]:
# Data viz 
random_index = random.randint(0, len(train_df)-10)
for row in train_df_shuffled[["text", "target"]][random_index:random_index+10].itertuples():
  _, text, target = row
  print(f"target: {target}", "(real disaster)" if target > 0 else "(not real disaster)")
  print(f"text:\n{text}\n")
  print(f"---\n")

target: 1 (real disaster)
text:
11-Year-Old Boy Charged With Manslaughter of Toddler: Report: An 11-year-old boy has been charged with manslaughter over the fatal sh...

---

target: 0 (not real disaster)
text:
What is the biggest regret you have in hearthstone? http://t.co/vcIrn1Md8v

---

target: 0 (not real disaster)
text:
The possible new jerseys for the Avalanche next year. ???? http://t.co/nruzhR5XQu

---

target: 0 (not real disaster)
text:
'...As of right now I'm reopening the X-Files. That's what they fear the most.' #TheXFiles201Days

---

target: 0 (not real disaster)
text:
Road Hazard @ CASCADE RD SW / CHILDRESS DR SW http://t.co/DilyvRoWyJ

---

target: 1 (real disaster)
text:
Afghan Soldier Kills US General America's Highest-Ranking Fatality Since Vietnam http://t.co/SiHQPlUIDW

---

target: 1 (real disaster)
text:
Sinkhole on west side damaging cars via @WEWS http://t.co/S7grbZNwlr

---

target: 0 (not real disaster)
text:
I just wanna smoke some weed and get some commas

In [11]:
# Splitting 
from sklearn.model_selection import train_test_split
train_sentences, val_sentences, train_labels, val_labels = train_test_split(train_df_shuffled["text"].to_numpy(),
                                                                            train_df_shuffled["target"].to_numpy(),
                                                                            test_size=0.1,
                                                                            random_state=42)
len(train_sentences), len(train_labels), len(val_sentences), len(val_labels)

(6851, 6851, 762, 762)

In [12]:
train_sentences[:5], train_labels[:10]

(array(['@mogacola @zamtriossu i screamed after hitting tweet',
        'Imagine getting flattened by Kurt Zouma',
        '@Gurmeetramrahim #MSGDoing111WelfareWorks Green S welfare force ke appx 65000 members har time disaster victim ki help ke liye tyar hai....',
        "@shakjn @C7 @Magnums im shaking in fear he's gonna hack the planet",
        'Somehow find you and I collide http://t.co/Ee8RpOahPk'],
       dtype=object), array([0, 0, 1, 0, 0, 1, 1, 0, 1, 1]))

In [13]:
# Preprocessing - turning text into vectors
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
text_vectorizer = TextVectorization(max_tokens=None,
                                    standardize="lower_and_strip_punctuation",
                                    split="whitespace",
                                    ngrams=None,
                                    output_mode="int",
                                    output_sequence_length=None)

In [14]:
# Average number of words in a tweet (tokens after vectorization)
round(sum([len(i.split()) for i in train_sentences])/len(train_sentences))

15

In [15]:
# Text Vectorization using custom variables
max_vocab_length = 1000
max_length = 15
text_vectorizer = TextVectorization(max_tokens=max_vocab_length,
                                    output_mode="int",
                                    output_sequence_length=max_length)

In [16]:
# Mapping TextVectorization and text_vectorizer
text_vectorizer.adapt(train_sentences)

# Tokenizing sample sentences
sample_sentence = "Floor is lava at the end of the day"
text_vectorizer([sample_sentence])

<tf.Tensor: shape=(1, 15), dtype=int64, numpy=
array([[  1,   9, 434,  17,   2, 304,   6,   2, 101,   0,   0,   0,   0,
          0,   0]])>

In [17]:
# Vectorization of sentences 
random_sentence = random.choice(train_sentences)
print(f"Original Text:\n{random_sentence}\
         \n\nVectorized version:")
text_vectorizer([random_sentence])

Original Text:
Russia stood down cold war nuke ban or face ocean superiority 
Unconditional surrender next putin
Game set match
Release the hostages         

Vectorized version:


<tf.Tensor: shape=(1, 15), dtype=int64, numpy=
array([[921,   1, 134,   1, 122,   1,   1,  53, 289,   1,   1,   1,   1,
        274,   1]])>

In [18]:
# Checking unique tokens in the vocabulary
words_in_vocab = text_vectorizer.get_vocabulary()
print(f"Number of words in the vocab: {len(words_in_vocab)}")
top_5_words = words_in_vocab[:5]
print(f"Top 5 common words: {top_5_words}")
bottom_5_words = words_in_vocab[-5:]
print(f"Bottom 5 least common words: {bottom_5_words}")

Number of words in the vocab: 1000
Top 5 common words: ['', '[UNK]', 'the', 'a', 'in']
Bottom 5 least common words: ['reported', 'r', 'pray', 'playlist', 'patience']


In [19]:
# Embedding and Embedding layer
from tensorflow.keras import layers
tf.random.set_seed(42)
embedding = layers.Embedding(input_dim=max_vocab_length,
                             output_dim=128,
                             embeddings_initializer="uniform",
                             input_length=max_length,
                             name="embedding_1")
embedding

<keras.layers.embeddings.Embedding at 0x7fa71acce7d0>

In [20]:
# Testing out another sample layer
random_sentence = random.choice(train_sentences)
print(f"Original Text:\n{random_sentence}\
        \n\nEmbedded version:")
sample_embed = embedding(text_vectorizer([random_sentence]))
sample_embed

Original Text:
What a wonderful day!        

Embedded version:


<tf.Tensor: shape=(1, 15, 128), dtype=float32, numpy=
array([[[-0.01456394,  0.02664156, -0.0070488 , ...,  0.00958163,
         -0.01225308,  0.04130488],
        [-0.04284013, -0.01489798, -0.0159496 , ..., -0.01166106,
          0.03061062,  0.01972148],
        [ 0.03977952, -0.03782602, -0.03646283, ...,  0.00236253,
          0.03332629,  0.02803668],
        ...,
        [ 0.01645621, -0.00589932, -0.01471175, ..., -0.02511839,
          0.00912381, -0.00024097],
        [ 0.01645621, -0.00589932, -0.01471175, ..., -0.02511839,
          0.00912381, -0.00024097],
        [ 0.01645621, -0.00589932, -0.01471175, ..., -0.02511839,
          0.00912381, -0.00024097]]], dtype=float32)>

In [21]:
sample_embed[0][0]

<tf.Tensor: shape=(128,), dtype=float32, numpy=
array([-0.01456394,  0.02664156, -0.0070488 , -0.01578101,  0.01857844,
       -0.03789372, -0.02296357,  0.04445826,  0.00534431, -0.04086939,
       -0.0001178 ,  0.0325787 ,  0.01795044, -0.00840222,  0.02777113,
        0.00267535,  0.00249401, -0.02981182,  0.00419725, -0.01612579,
       -0.04731787, -0.00343758, -0.01875019,  0.0197308 ,  0.02892964,
        0.02087852,  0.0311374 , -0.02340465, -0.02849378,  0.0206467 ,
        0.02238199,  0.03973413,  0.02957373, -0.02066603,  0.00601099,
        0.01802809,  0.03249015, -0.03012767,  0.00128162,  0.03171993,
        0.02708571,  0.0370943 ,  0.02782694,  0.01861149, -0.00639851,
       -0.04907566, -0.01623199, -0.03895655,  0.0074303 ,  0.02819103,
        0.03139831,  0.02758573, -0.0115862 , -0.03698015, -0.0286014 ,
       -0.01950072,  0.03606139, -0.00383686, -0.00495111,  0.02097936,
        0.04864902, -0.00507084, -0.04891219,  0.03100859, -0.04584966,
       -0.046618

# Modelling 

In [22]:
# Baseline model
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB

model_1 = Pipeline([
                    ("tfidf", TfidfVectorizer()),
                    ("clf", MultinomialNB())
])

model_1.fit(train_sentences, train_labels)

Pipeline(steps=[('tfidf', TfidfVectorizer()), ('clf', MultinomialNB())])

In [23]:
model_1_score = model_1.score(val_sentences, val_labels)
print(f"model_1 baseline: {model_1_score*100:.2f}%")

model_1 baseline: 79.27%


In [24]:
# Baseline model prediction
model_1_preds = model_1.predict(val_sentences)
model_1_preds[:28]

array([1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0,
       0, 0, 1, 1, 0, 0])

# Evaluation

In [74]:
# Function for evaluation
from sklearn.metrics import precision_recall_fscore_support, accuracy_score
def evaluate_results(y_true, y_pred):
  model_accuracy = accuracy_score(y_true, y_pred) * 100
  mdoel_precision, model_recall, model_f1, _ = precision_recall_fscore_support(y_true, y_pred, average="weighted")
  model_results = {"accuracy": model_accuracy,
                   "precision": mdoel_precision,
                   "recall": model_recall,
                   "f1": model_f1}
  return model_results 

In [26]:
model_1_results = evaluate_results(y_true=val_labels,
                                   y_pred=model_1_preds)
model_1_results

{'accuracy': 79.26509186351706,
 'f1': 0.7862189758049549,
 'precision': 0.8111390004213173,
 'recall': 0.7926509186351706}

In [27]:
from helper_functions import create_tensorboard_callback
SAVE_DIR = "model_logs"

# Dense model_2

In [28]:
# model_2 using keras API
from tensorflow.keras import layers
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = embedding(x)
x = layers.GlobalAveragePooling1D()(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model_2 = tf.keras.Model(inputs, outputs, name="model_2_dense")

model_2.compile(loss="binary_crossentropy",
              optimizer=tf.keras.optimizers.Adam(),
              metrics=["accuracy"])

model_2.summary()

Model: "model_2_dense"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding_1 (Embedding)     (None, 15, 128)           128000    
                                                                 
 global_average_pooling1d (G  (None, 128)              0         
 lobalAveragePooling1D)                                          
                                                                 
 dense (Dense)               (None, 1)                 129       
                                                                 
Total params: 128,129
Trainable params: 128,129
Non-t

In [29]:
# Fit
model_2_history = model_2.fit(train_sentences,
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels),
                              callbacks=[create_tensorboard_callback(dir_name=SAVE_DIR,
                                                                     experiment_name="dense_model_2")])

Saved Tensorboard logs to: model_logs/dense_model_2/20220115-015928
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [30]:
# Evaluation
model_2.evaluate(val_sentences, val_labels)



[0.48855453729629517, 0.7795275449752808]

In [31]:
embedding.weights

[<tf.Variable 'embedding_1/embeddings:0' shape=(1000, 128) dtype=float32, numpy=
 array([[-0.004251  ,  0.01844549, -0.03852235, ..., -0.0496745 ,
         -0.01485125,  0.02297583],
        [ 0.03137534, -0.03485814, -0.04432392, ..., -0.00107152,
          0.02473094,  0.03598075],
        [-0.00510067,  0.06533641, -0.01093523, ..., -0.08215702,
         -0.0573307 ,  0.02339376],
        ...,
        [ 0.00829318,  0.03502899, -0.03210786, ...,  0.00386814,
         -0.014278  , -0.03917582],
        [-0.0561762 ,  0.04923385, -0.02204463, ..., -0.0809767 ,
         -0.06787818,  0.09590853],
        [-0.03262722, -0.0288001 , -0.00593466, ...,  0.00154698,
          0.03966668,  0.0328024 ]], dtype=float32)>]

In [32]:
embed_weights = model_2.get_layer("embedding_1").get_weights()[0]
print(embed_weights.shape)

(1000, 128)


In [33]:
# Tensorboard logs 
!tensorboard dev upload --logdir ./model_logs \
  --name "Dense model_2 text data" \
  --description "rough dense model_2 with embedded layer" \
  --one_shot 

2022-01-15 02:00:10.457958: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

***** TensorBoard Uploader *****

This will upload your TensorBoard logs to https://tensorboard.dev/ from
the following directory:

./model_logs

This TensorBoard will be visible to everyone. Do not upload sensitive
data.

Your use of this service is subject to Google's Terms of Service
<https://policies.google.com/terms> and Privacy Policy
<https://policies.google.com/privacy>, and TensorBoard.dev's Terms of Service
<https://tensorboard.dev/policy/terms/>.

This notice will not be shown again while you are logged into the uploader.
To log out, run `tensorboard dev auth revoke`.

Continue? (yes/NO) yes

Please visit this URL to authorize this application: https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=373649185512-8v619h5kft38l4456nm2dj4ubeqsrvh6.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%

View loss and accuracy on tensorboard
link = https://tensorboard.dev/experiment/wYXytJRjSqeZK56sjkH7rQ/

# Predictions

In [36]:
model_2_pred_probs = model_2.predict(val_sentences)
model_2_pred_probs[:20]

array([[0.53058326],
       [0.68900245],
       [0.9704894 ],
       [0.10615796],
       [0.15930057],
       [0.9235889 ],
       [0.85915077],
       [0.82929075],
       [0.7156279 ],
       [0.16583654],
       [0.32949227],
       [0.5653246 ],
       [0.04989049],
       [0.24511224],
       [0.0299069 ],
       [0.14587364],
       [0.03504819],
       [0.2658403 ],
       [0.22818181],
       [0.3052133 ]], dtype=float32)

In [37]:
# Converting prediction probabilities into 1-D float tensors
model_2_preds = tf.squeeze(tf.round(model_2_pred_probs))
model_2_preds[:20]

<tf.Tensor: shape=(20,), dtype=float32, numpy=
array([1., 1., 1., 0., 0., 1., 1., 1., 1., 0., 0., 1., 0., 0., 0., 0., 0.,
       0., 0., 0.], dtype=float32)>

In [38]:
# Metrics
model_2_results = evaluate_results(y_true=val_labels,
                                   y_pred=model_2_preds)
model_2_results

{'accuracy': 77.95275590551181,
 'f1': 0.7764267379950773,
 'precision': 0.7841069305867823,
 'recall': 0.7795275590551181}

In [39]:
# model_1 & model_2 comparision
np.array(list(model_2_results.values())) > np.array(list(model_1_results.values()))

array([False, False, False, False])

In [42]:
# Function for comparision
def compare_results(model_1_results, new_model_results):
  for key, value in model_1_results.items():
    print(f"model_1 {key}: {value:.2f}, New {key}: {new_model_results[key]:.2f}, Difference: {new_model_results[key]-value:.2f}")
compare_results(model_1_results=model_1_results,
                               new_model_results=model_2_results)    

model_1 accuracy: 79.27, New accuracy: 77.95, Difference: -1.31
model_1 precision: 0.81, New precision: 0.78, Difference: -0.03
model_1 recall: 0.79, New recall: 0.78, Difference: -0.01
model_1 f1: 0.79, New f1: 0.78, Difference: -0.01


In [43]:
# Visualizing the embeddings learned
words_in_vocab = text_vectorizer.get_vocabulary()
len(words_in_vocab), words_in_vocab[:10]

(1000, ['', '[UNK]', 'the', 'a', 'in', 'to', 'of', 'and', 'i', 'is'])

In [44]:
model_2.summary()

Model: "model_2_dense"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding_1 (Embedding)     (None, 15, 128)           128000    
                                                                 
 global_average_pooling1d (G  (None, 128)              0         
 lobalAveragePooling1D)                                          
                                                                 
 dense (Dense)               (None, 1)                 129       
                                                                 
Total params: 128,129
Trainable params: 128,129
Non-t

In [45]:
embed_weights = model_2.get_layer("embedding_1").get_weights()[0]
print(embed_weights.shape)

(1000, 128)


In [49]:
import io
out_v = io.open("embedding_vectors.tsv", "w", encoding="utf-8")
out_m = io.open("embedding_metadata.tsv", "w", encoding="utf-8")

for num, word in enumerate(words_in_vocab):
  if num == 0:
    continue
  vec = embed_weights[num]
  out_m.write(word + "\n")
  out_v.write("\t".join([str(x) for x in vec]) + "\n")
out_v.close()
out_m.close()

try:
  from google.colab import files
except ImportError:
  pass
else:
  files.download("embedding_vectors.tsv")
  files.download("embedding_metadata.tsv")  

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# model_3 with RNN, LSTM layer

In [51]:
from tensorflow.keras import layers
tf.random.set_seed(42)
model_3_embedding = layers.Embedding(input_dim=max_vocab_length,
                                     output_dim=128,
                                     embeddings_initializer="uniform",
                                     input_length=max_length,
                                     name="embedding_3")

# LSTM modelling
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = model_3_embedding(x)
print(x.shape)
x = layers.LSTM(64)(x)
print(x.shape)
outputs = layers.Dense(1, activation="sigmoid")(x)
model_3 = tf.keras.Model(inputs, outputs, name="model_3_LSTM")

(None, 15, 128)
(None, 64)


In [52]:
model_3.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])
model_3.summary()

Model: "model_3_LSTM"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding_3 (Embedding)     (None, 15, 128)           128000    
                                                                 
 lstm (LSTM)                 (None, 64)                49408     
                                                                 
 dense_1 (Dense)             (None, 1)                 65        
                                                                 
Total params: 177,473
Trainable params: 177,473
Non-trainable params: 0
________________________________________________

In [53]:
model_3_history = model_3.fit(train_sentences,
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels),
                              callbacks=[create_tensorboard_callback(SAVE_DIR,
                                                                     "LSTM")])

Saved Tensorboard logs to: model_logs/LSTM/20220115-033400
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [54]:
# Predictions
model_3_pred_probs = model_3.predict(val_sentences)
model_3_pred_probs.shape, model_3_pred_probs[:10]

((762, 1), array([[0.56842875],
        [0.64211476],
        [0.9662366 ],
        [0.08715507],
        [0.09231639],
        [0.9865646 ],
        [0.860858  ],
        [0.8853502 ],
        [0.75101924],
        [0.10359108]], dtype=float32))

In [62]:
model_3_preds = tf.squeeze(tf.round(model_3_pred_probs))
model_3_preds[:20]

<tf.Tensor: shape=(20,), dtype=float32, numpy=
array([1., 1., 1., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0.], dtype=float32)>

In [66]:
# Evaluation
model_3_results = evaluate_results(y_true=val_labels,
                                   y_pred=model_3_preds)
model_3_results

{'accuracy': 77.03412073490814,
 'f1': 0.7680604812955161,
 'precision': 0.7723473976820748,
 'recall': 0.7703412073490814}

In [67]:
# Comparision
compare_results(model_2_results, model_3_results)

model_1 accuracy: 77.95, New accuracy: 77.03, Difference: -0.92
model_1 precision: 0.78, New precision: 0.77, Difference: -0.01
model_1 recall: 0.78, New recall: 0.77, Difference: -0.01
model_1 f1: 0.78, New f1: 0.77, Difference: -0.01


In [80]:
# Modelling with GRU (model_4)
from tensorflow.keras import layers
tf.random.set_seed(42)
model_4_embedding = layers.Embedding(input_dim=max_vocab_length,
                                     output_dim=128,
                                     embeddings_initializer="uniform",
                                     input_length=max_length,
                                     name="embedding_4")
# GRU cell 
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = model_4_embedding(x)
x = layers.GRU(64)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model_4 = tf.keras.Model(inputs, outputs, name="model_4_GRU")

model_4.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

model_4_history = model_4.fit(train_sentences,
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels),
                              callbacks=[create_tensorboard_callback(SAVE_DIR, "GRU")])

Saved Tensorboard logs to: model_logs/GRU/20220115-041057
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [81]:
model_4.summary()

Model: "model_4_GRU"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_10 (InputLayer)       [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding_4 (Embedding)     (None, 15, 128)           128000    
                                                                 
 gru_7 (GRU)                 (None, 64)                37248     
                                                                 
 dense_9 (Dense)             (None, 1)                 65        
                                                                 
Total params: 165,313
Trainable params: 165,313
Non-trainable params: 0
_________________________________________________

In [82]:
# Predictions
model_4_pred_probs = model_4.predict(val_sentences)
model_4_pred_probs.shape, model_4_pred_probs[:20]

((762, 1), array([[0.6687808 ],
        [0.6664946 ],
        [0.9592022 ],
        [0.08834901],
        [0.06398237],
        [0.9852369 ],
        [0.8270847 ],
        [0.94431037],
        [0.71905947],
        [0.10090721],
        [0.13116473],
        [0.39779377],
        [0.1189782 ],
        [0.245549  ],
        [0.02221528],
        [0.17180341],
        [0.01423022],
        [0.10421562],
        [0.14262035],
        [0.5230711 ]], dtype=float32))

In [83]:
model_4_preds = tf.squeeze(tf.round(model_4_pred_probs))
model_4_preds[:20]

<tf.Tensor: shape=(20,), dtype=float32, numpy=
array([1., 1., 1., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 1.], dtype=float32)>

In [84]:
# Evaluation
model_4_results = evaluate_results(y_true=val_labels,
                                   y_pred=model_3_preds)
model_4_results

{'accuracy': 77.03412073490814,
 'f1': 0.7680604812955161,
 'precision': 0.7723473976820748,
 'recall': 0.7703412073490814}

In [85]:
# Comparing the evalution results
compare_results(model_1_results, model_4_results)

model_1 accuracy: 79.27, New accuracy: 77.03, Difference: -2.23
model_1 precision: 0.81, New precision: 0.77, Difference: -0.04
model_1 recall: 0.79, New recall: 0.77, Difference: -0.02
model_1 f1: 0.79, New f1: 0.77, Difference: -0.02


In [87]:
 # Bidirectional model_5
 from tensorflow.keras import layers 
 tf.random.set_seed(42)
 model_5_embedding = layers.Embedding(input_dim=max_vocab_length,
                                      output_dim=128,
                                      embeddings_initializer="uniform",
                                      input_length=max_length,
                                      name="embedding_5")
 
 # Bidirectional RNN
 inputs = layers.Input(shape=(1,), dtype="string")
 x = text_vectorizer(inputs)
 x = model_5_embedding(x)
 x = layers.Bidirectional(layers.LSTM(64))(x)
 outputs = layers.Dense(1, activation="sigmoid")(x)
 model_5 = tf.keras.Model(inputs, outputs, name="model_5_Bidirectional")

 model_5.compile(loss="binary_crossentropy",
                 optimizer=tf.keras.optimizers.Adam(),
                 metrics=["accuracy"])
 model_5_history = model_5.fit(train_sentences,
                               train_labels,
                               epochs=5,
                               validation_data=(val_sentences, val_labels),
                               callbacks=[create_tensorboard_callback(SAVE_DIR, "bidirectional_RNN")])

Saved Tensorboard logs to: model_logs/bidirectional_RNN/20220115-044408
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [88]:
model_5.summary()

Model: "model_5_Bidirectional"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_12 (InputLayer)       [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding_5 (Embedding)     (None, 15, 128)           128000    
                                                                 
 bidirectional_1 (Bidirectio  (None, 128)              98816     
 nal)                                                            
                                                                 
 dense_11 (Dense)            (None, 1)                 129       
                                                                 
Total params: 226,945
Trainable params: 226,9

In [89]:
# Predictions 
model_5_pred_probs = model_5.predict(val_sentences)
model_5_pred_probs[:10]

array([[0.83845985],
       [0.6855751 ],
       [0.9851843 ],
       [0.09927738],
       [0.13823923],
       [0.98963165],
       [0.74951273],
       [0.9685226 ],
       [0.86154425],
       [0.1335555 ]], dtype=float32)

In [91]:
model_5_preds = tf.squeeze(tf.round(model_5_pred_probs))
model_5_preds[:10]

<tf.Tensor: shape=(10,), dtype=float32, numpy=array([1., 1., 1., 0., 0., 1., 1., 1., 1., 0.], dtype=float32)>

In [92]:
# Evaluation 
model_5_results = evaluate_results(val_labels, model_5_preds)
model_5_results

{'accuracy': 76.24671916010499,
 'f1': 0.7605097571811492,
 'precision': 0.7635307406353953,
 'recall': 0.7624671916010499}

In [93]:
# Comparing the evaluations
compare_results(model_1_results, model_5_results)

model_1 accuracy: 79.27, New accuracy: 76.25, Difference: -3.02
model_1 precision: 0.81, New precision: 0.76, Difference: -0.05
model_1 recall: 0.79, New recall: 0.76, Difference: -0.03
model_1 f1: 0.79, New f1: 0.76, Difference: -0.03


In [96]:
# Conv1D CNN model_6
from tensorflow.keras import layers
tf.random.set_seed(42)
model_6_embedding = layers.Embedding(input_dim=max_vocab_length,
                                     output_dim=128,
                                     embeddings_initializer="uniform",
                                     input_length=max_length,
                                     name="embedding_6")

from tensorflow.keras import layers
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = model_6_embedding(x)
x = layers.Conv1D(filters=32, kernel_size=5, activation="relu")(x)
x = layers.GlobalMaxPool1D()(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model_6 = tf.keras.Model(inputs, outputs, name="model_6_Conv1D")

model_6.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

model_6_history = model_6.fit(train_sentences,
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels),
                              callbacks=[create_tensorboard_callback(SAVE_DIR,
                                                                     "Conv1D")])

Saved Tensorboard logs to: model_logs/Conv1D/20220115-051807
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [97]:
model_6.summary()

Model: "model_6_Conv1D"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_15 (InputLayer)       [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding_6 (Embedding)     (None, 15, 128)           128000    
                                                                 
 conv1d (Conv1D)             (None, 11, 32)            20512     
                                                                 
 global_max_pooling1d (Globa  (None, 32)               0         
 lMaxPooling1D)                                                  
                                                                 
 dense_12 (Dense)            (None, 1)              

In [98]:
# Predictions
model_6_pred_probs = model_6.predict(val_sentences)
model_6_pred_probs[:20]

array([[0.8475301 ],
       [0.71685964],
       [0.91082156],
       [0.03484702],
       [0.27681047],
       [0.99820596],
       [0.7032288 ],
       [0.9130862 ],
       [0.44537032],
       [0.08698517],
       [0.098658  ],
       [0.514401  ],
       [0.03171504],
       [0.13957179],
       [0.00731665],
       [0.0436677 ],
       [0.03011793],
       [0.23537952],
       [0.19284055],
       [0.29781306]], dtype=float32)

In [99]:
model_6_preds = tf.squeeze(tf.round(model_6_pred_probs))
model_6_preds[:20]

<tf.Tensor: shape=(20,), dtype=float32, numpy=
array([1., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
       0., 0., 0.], dtype=float32)>

In [100]:
# Evaluation
model_6_results = evaluate_results(y_true=val_labels,
                                   y_pred=model_6_preds)
model_6_results

{'accuracy': 76.9028871391076,
 'f1': 0.7654561636171319,
 'precision': 0.7739700256882782,
 'recall': 0.7690288713910761}

In [101]:
# Comparing the evaluation
compare_results(model_1_results, model_6_results)

model_1 accuracy: 79.27, New accuracy: 76.90, Difference: -2.36
model_1 precision: 0.81, New precision: 0.77, Difference: -0.04
model_1 recall: 0.79, New recall: 0.77, Difference: -0.02
model_1 f1: 0.79, New f1: 0.77, Difference: -0.02


In [103]:
# Modelling using Pretrained Embeddings (model_7)
import tensorflow_hub as hub
sentence_encoder_layer = hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder/4",
                                        input_shape=[],
                                        dtype=tf.string,
                                        trainable=False,
                                        name="USE")

model_7 = tf.keras.Sequential([
    sentence_encoder_layer,
    layers.Dense(64, activation="relu"),
    layers.Dense(1, activation="sigmoid")
], name="model_7_USE")

model_7.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

model_7_history = model_7.fit(train_sentences,
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels),
                              callbacks=[create_tensorboard_callback(SAVE_DIR,
                                                                     "tf_hub_sentence_encoder")])

Saved Tensorboard logs to: model_logs/tf_hub_sentence_encoder/20220115-053850
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [104]:
model_7.summary()

Model: "model_7_USE"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 USE (KerasLayer)            (None, 512)               256797824 
                                                                 
 dense_13 (Dense)            (None, 64)                32832     
                                                                 
 dense_14 (Dense)            (None, 1)                 65        
                                                                 
Total params: 256,830,721
Trainable params: 32,897
Non-trainable params: 256,797,824
_________________________________________________________________


In [105]:
# Predictions
model_7_pred_probs = model_7.predict(val_sentences)
model_7_pred_probs[:20]

array([[0.14443198],
       [0.7271504 ],
       [0.98566544],
       [0.19740924],
       [0.7341702 ],
       [0.68596613],
       [0.98088884],
       [0.97411025],
       [0.91573226],
       [0.08070084],
       [0.58887357],
       [0.40971822],
       [0.15428743],
       [0.5110037 ],
       [0.18880552],
       [0.02612236],
       [0.38688582],
       [0.56881505],
       [0.33638483],
       [0.27960563]], dtype=float32)

In [106]:
model_7_preds = tf.squeeze(tf.round(model_7_pred_probs))
model_7_preds[:20]

<tf.Tensor: shape=(20,), dtype=float32, numpy=
array([0., 1., 1., 0., 1., 1., 1., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0.,
       1., 0., 0.], dtype=float32)>

In [107]:
# Evaluation
model_7_results = evaluate_results(val_labels, model_7_preds)
model_7_results

{'accuracy': 81.23359580052494,
 'f1': 0.810686575717776,
 'precision': 0.8148798668657973,
 'recall': 0.8123359580052494}

In [108]:
# Comparing the evaluation
compare_results(model_1_results, model_7_results)

model_1 accuracy: 79.27, New accuracy: 81.23, Difference: 1.97
model_1 precision: 0.81, New precision: 0.81, Difference: 0.00
model_1 recall: 0.79, New recall: 0.81, Difference: 0.02
model_1 f1: 0.79, New f1: 0.81, Difference: 0.02


# Comparing model performances 

In [109]:
combined_model_results = pd.DataFrame({"model_1": model_1_results,
                                       "imple_dense": model_2_results,
                                       "lstm": model_3_results,
                                       "gru": model_4_results,
                                       "bidirectional": model_5_results,
                                       "conv1d": model_6_results,
                                       "tf_hub_sentence_encoder": model_7_results})
combined_model_results = combined_model_results.transpose()
combined_model_results

Unnamed: 0,accuracy,precision,recall,f1
model_1,79.265092,0.811139,0.792651,0.786219
imple_dense,77.952756,0.784107,0.779528,0.776427
lstm,77.034121,0.772347,0.770341,0.76806
gru,77.034121,0.772347,0.770341,0.76806
bidirectional,76.246719,0.763531,0.762467,0.76051
conv1d,76.902887,0.77397,0.769029,0.765456
tf_hub_sentence_encoder,81.233596,0.81488,0.812336,0.810687


In [117]:
combined_model_results["accuracy"] = combined_model_results["accuracy"]/100

In [118]:
# TensorBoard logs
!tensorboard dev upload --logdir ./model_logs \
  --name "NLP Disaster Tweet Detection models" \
  --description "All NLP modelling experiments" \
  --one_shot

2022-01-15 06:18:07.090899: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/VPqBU2fmQXqAMTmbTwJX4g/

[1m[2022-01-15T06:18:07][0m Started scanning logdir.
[1m[2022-01-15T06:18:17][0m Total uploaded: 198 scalars, 0 tensors, 7 binary objects (2.1 MB)
[1m[2022-01-15T06:18:17][0m Done scanning logdir.


Done. View your TensorBoard at https://tensorboard.dev/experiment/VPqBU2fmQXqAMTmbTwJX4g/


# TensorBoard Log - https://tensorboard.dev/experiment/VPqBU2fmQXqAMTmbTwJX4g/

In [119]:
# Ensemble model using combined_model_results
model_1_pred_probs = np.max(model_1.predict_proba(val_sentences), axis=1)
combined_pred_probs = model_1_pred_probs + tf.squeeze(model_3_pred_probs, axis=1) + tf.squeeze(model_7_pred_probs)
combined_preds = tf.round(combined_pred_probs/3)
combined_preds[:20]

<tf.Tensor: shape=(20,), dtype=float32, numpy=
array([0., 1., 1., 0., 0., 1., 1., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0.,
       1., 0., 0.], dtype=float32)>

In [120]:
# Evaluation of the ensemble model
ensemble_results = evaluate_results(val_labels, combined_preds)
ensemble_results

{'accuracy': 79.13385826771653,
 'f1': 0.7910936404114438,
 'precision': 0.7910709758531173,
 'recall': 0.7913385826771654}

In [121]:
combined_model_results.loc["ensemble_results"] = ensemble_results

In [122]:
combined_model_results.loc["ensemble_results"]["accuracy"] = combined_model_results.loc["ensemble_results"]["accuracy"]/100

In [123]:
combined_model_results

Unnamed: 0,accuracy,precision,recall,f1
model_1,7.9e-05,0.811139,0.792651,0.786219
imple_dense,7.8e-05,0.784107,0.779528,0.776427
lstm,7.7e-05,0.772347,0.770341,0.76806
gru,7.7e-05,0.772347,0.770341,0.76806
bidirectional,7.6e-05,0.763531,0.762467,0.76051
conv1d,7.7e-05,0.77397,0.769029,0.765456
tf_hub_sentence_encoder,8.1e-05,0.81488,0.812336,0.810687
ensemble_results,0.791339,0.791071,0.791339,0.791094


In [124]:
# Saving the model (model_7 performed the best)
model_7.save("model_7.h5")

In [125]:
# Loading
loaded_model_7 = tf.keras.models.load_model("model_7.h5",
                                            custom_objects={"KerasLayer": hub.KerasLayer})

In [126]:
# Evaluating the loaded model
loaded_model_7.evaluate(val_sentences, val_labels)



[0.43088313937187195, 0.8123359680175781]

In [127]:
model_7.save("model_7_SavedModel_format")



INFO:tensorflow:Assets written to: model_7_SavedModel_format/assets


INFO:tensorflow:Assets written to: model_7_SavedModel_format/assets


In [129]:
loaded_model_7_SavedModel = tf.keras.models.load_model("model_7_SavedModel_format")

In [130]:
loaded_model_7_SavedModel.evaluate(val_sentences, val_labels)



[0.43088313937187195, 0.8123359680175781]

## The model_7 has 81.23% accuracy

In [131]:
# Finding the wrong predictions
val_df = pd.DataFrame({"text": val_sentences,
                       "target": val_labels,
                       "pred": model_7_preds,
                       "pred_prob": tf.squeeze(model_7_pred_probs)})
val_df.head()

Unnamed: 0,text,target,pred,pred_prob
0,DFR EP016 Monthly Meltdown - On Dnbheaven 2015...,0,0.0,0.144432
1,FedEx no longer to transport bioterror germs i...,0,1.0,0.72715
2,Gunmen kill four in El Salvador bus attack: Su...,1,1.0,0.985665
3,@camilacabello97 Internally and externally scr...,1,0.0,0.197409
4,Radiation emergency #preparedness starts with ...,1,1.0,0.73417


In [132]:
# To find the wrong predictions
most_wrong = val_df[val_df["target"] != val_df["pred"]].sort_values("pred_prob", ascending=False)
most_wrong[:20]

Unnamed: 0,text,target,pred,pred_prob
31,? High Skies - Burning Buildings ? http://t.co...,0,1.0,0.910481
759,FedEx will no longer transport bioterror patho...,0,1.0,0.864676
209,Ashes 2015: AustraliaÛªs collapse at Trent Br...,0,1.0,0.837961
393,@SonofLiberty357 all illuminated by the bright...,0,1.0,0.836361
628,@noah_anyname That's where the concentration c...,0,1.0,0.835225
49,@madonnamking RSPCA site multiple 7 story high...,0,1.0,0.834875
109,[55436] 1950 LIONEL TRAINS SMOKE LOCOMOTIVES W...,0,1.0,0.80089
251,@AshGhebranious civil rights continued in the ...,0,1.0,0.782611
698,åÈMGN-AFRICAå¨ pin:263789F4 åÈ Correction: Ten...,0,1.0,0.782433
144,The Sound of Arson,0,1.0,0.771343


In [133]:
# False Positives function
for row in most_wrong[:20].itertuples():
  _, text, target, pred, prob = row
  print(f"Target: {target}, Pred: {int(pred)}, Prob: {prob}")
  print(f"Text:\n{text}\n")
  print(f"----\n")

Target: 0, Pred: 1, Prob: 0.9104808568954468
Text:
? High Skies - Burning Buildings ? http://t.co/uVq41i3Kx2 #nowplaying

----

Target: 0, Pred: 1, Prob: 0.8646756410598755
Text:
FedEx will no longer transport bioterror pathogens in wake of anthrax lab mishaps http://t.co/lHpgxc4b8J

----

Target: 0, Pred: 1, Prob: 0.837960958480835
Text:
Ashes 2015: AustraliaÛªs collapse at Trent Bridge among worst in history: England bundled out Australia for 60 ... http://t.co/t5TrhjUAU0

----

Target: 0, Pred: 1, Prob: 0.8363614082336426
Text:
@SonofLiberty357 all illuminated by the brightly burning buildings all around the town!

----

Target: 0, Pred: 1, Prob: 0.8352251052856445
Text:
@noah_anyname That's where the concentration camps and mass murder come in. 
 
EVERY. FUCKING. TIME.

----

Target: 0, Pred: 1, Prob: 0.8348746299743652
Text:
@madonnamking RSPCA site multiple 7 story high rise buildings next to low density character residential in an area that floods

----

Target: 0, Pred: 1, Pro

In [134]:
 # False negatives function
 for row in most_wrong[-20:].itertuples():
   _, text, target, pred, prob = row
   print(f"Target: {target}, Pred: {int(pred)}, Prob: {prob}")
   print(f"Text:\n{text}\n")
   print("----\n")

Target: 1, Pred: 0, Prob: 0.08768835663795471
Text:
Petition | Heartless owner that whipped horse until it collapsed is told he can KEEP his animal! Act Now! http://t.co/87eFCBIczM

----

Target: 1, Pred: 0, Prob: 0.08493828773498535
Text:
@reriellechan HE WAS THE LICH KING'S FIRST CASUALTY BLOCK ME BACK I HATE YOU! http://t.co/0Gidg9U45J

----

Target: 1, Pred: 0, Prob: 0.08411180973052979
Text:
If I fall is men GOD @Praiz8 is d bomb well av always known dat since 2008 bigger u I pray sir

----

Target: 1, Pred: 0, Prob: 0.08252623677253723
Text:
Perspectives on the Grateful Dead: Critical Writings (Contributions to the Study http://t.co/fmu0fnuMxf http://t.co/AgGRyhVXKr

----

Target: 1, Pred: 0, Prob: 0.07662928104400635
Text:
@DavidVonderhaar At least you were sincere ??

----

Target: 1, Pred: 0, Prob: 0.07445251941680908
Text:
I Will Survive by Gloria Gaynor (with Oktaviana Devi) ÛÓ https://t.co/HUkJZ1wT36

----

Target: 1, Pred: 0, Prob: 0.0735078752040863
Text:
New post from @

In [135]:
# Predicting on the Test Data
test_sentences = test_df["text"].to_list()
test_samples = random.sample(test_sentences, 20)
for test_sample in test_samples:
  pred_prob = tf.squeeze(model_7.predict([test_sample]))
  pred = tf.round(pred_prob)
  print(f"Pred: {int(pred)}, Prob: {pred_prob}")
  print(f"Text:\n{test_sample}\n")
  print("----\n")

Pred: 1, Prob: 0.6143367886543274
Text:
The Murderous Story Of AmericaÛªs First Hijacking http://t.co/QAOqtptgwH

----

Pred: 0, Prob: 0.11304044723510742
Text:
@GraysonDolan I'll fall and drown so I think I'll pass

----

Pred: 0, Prob: 0.21802997589111328
Text:
Trump &amp; Bill Clinton collide in best conspiracy story ever http://t.co/ABkhBhNLOz via @motherjones TRUMP DEMOCRATIC PLANT?  lmao #lastword

----

Pred: 0, Prob: 0.11775004863739014
Text:
@CurfewBeagle @beaglefreedom Pretty Curfew!!!??

----

Pred: 0, Prob: 0.16136398911476135
Text:
WHELEN MODEL 295SS-100 SIREN AMPLIFIER POLICE EMERGENCY VEHICLE - Full read by eBay http://t.co/Q3yYQi4A27 http://t.co/whEreofYAx

----

Pred: 0, Prob: 0.07658854126930237
Text:
Politicians are using false allegations to attack #PlannedParenthood &amp; harm women. We aren't fooled we #StandwithPP http://t.co/JhseGQLbYq

----

Pred: 0, Prob: 0.17691081762313843
Text:
End the Innovation Catch-22: Reduce the Attack Surface http://t.co/Gj4SSEhk1D #

In [136]:
# Predicting random tweets
random_tweet_1 = "Abundance isn’t created, it is always present. Break limitations for receptivity."

In [138]:
def predict_tweets(model, sentence):
  pred_prob = model.predict([sentence])
  pred_label = tf.squeeze(tf.round(pred_prob)).numpy()
  print(f"Pred: {pred_label}", "(real disaster)" if pred_label > 0 else "(not real disaster)", f"Prob: {pred_prob[0][0]}")
  print(f"Text:\n{sentence}")

In [139]:
predict_tweets(model=model_7,
               sentence=random_tweet_1)

Pred: 0.0 (not real disaster) Prob: 0.08701261878013611
Text:
Abundance isn’t created, it is always present. Break limitations for receptivity.


In [140]:
# Link - https://twitter.com/naval/status/1478322009654259713
navals_tweet_2 = "The enduring legacy of a college degree is that recurring nightmare about missing all of your classes."

In [141]:
predict_tweets(model=model_7,
               sentence=navals_tweet_2)

Pred: 0.0 (not real disaster) Prob: 0.11755836009979248
Text:
The enduring legacy of a college degree is that recurring nightmare about missing all of your classes.


In [142]:
# Link - https://twitter.com/jeogaste/status/1480550129501446146
random_tweet_3 = "The 6.6 magnitude earthquake that occurred 3 days ago in China bent the high-speed train tracks. This dextral faulting, was also seen in the 7.4 magnitude earthquake that occurred in Turkey in 1999. look at the enormous difference between the two earthquakes"


In [144]:
predict_tweets(model=model_7,
               sentence=random_tweet_3)

Pred: 1.0 (real disaster) Prob: 0.9738025665283203
Text:
The 6.6 magnitude earthquake that occurred 3 days ago in China bent the high-speed train tracks. This dextral faulting, was also seen in the 7.4 magnitude earthquake that occurred in Turkey in 1999. look at the enormous difference between the two earthquakes


In [145]:
# Link - https://twitter.com/washingtonpost/status/1482012689752862725
random_tweet_4 = "Doctors call out Spotify over Joe Rogan spreading “false and societally harmful” covid-19 claims"

In [146]:
predict_tweets(model=model_7,
               sentence=random_tweet_4)

Pred: 0.0 (not real disaster) Prob: 0.06785577535629272
Text:
Doctors call out Spotify over Joe Rogan spreading “false and societally harmful” covid-19 claims
