### On the 11th of July, 2020, I started an NLP competition on Kaggle. This is part of my topdown approach to learning Natural Language Processing

**In this competition, one’s is challenged to build a machine learning model that predicts which Tweets are about real disasters and which one’s aren’t**

*Often, I collect the datasets & store in my local machine, look for a notebook in the Kaggle notebook section, and use it as a guide. I read each cell and research on why the author used the codes in each cell.*

Link to the Kaggle Notebook: https://www.kaggle.com/gunesevitan/nlp-with-disaster-tweets-eda-cleaning-and-bert

**Getting Started**

*import modules*

In [None]:
import gc # provides an interface to the optional garbage collector.
# The process by which Python periodically frees and reclaims blocks 
#of memory that no longer are in use is called Garbage Collection
import re # This module provides regular expression matching operations
import string # This module contains a number of functions to process standard Python strings
import operator
from collections import defaultdict

import numpy as np
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

import matplotlib.pyplot as plt
import seaborn as sns

import tokenization
from wordcloud import STOPWORDS

from sklearn.model_selection import StratifiedKFold, StratifiedShuffleSplit
from sklearn.metrics import precision_score, recall_score, f1_score

import tensorflow as tf
import tensorflow_hub as hub
from tensorflow import keras
from tensorflow.keras.optimizers import SGD, Adam
from tensorflow.keras.layers import Dense, Input, Dropout, GlobalAveragePooling1D
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, Callback

SEED = 1337

In [2]:
!pip install tokenization

Collecting tokenization
  Downloading tokenization-1.0.7-py3-none-any.whl (10 kB)
Collecting regex
  Downloading regex-2020.6.8.tar.gz (690 kB)
[K     |████████████████████████████████| 690 kB 348 kB/s eta 0:00:01
[?25hBuilding wheels for collected packages: regex
  Building wheel for regex (setup.py) ... [?25ldone
[?25h  Created wheel for regex: filename=regex-2020.6.8-cp37-cp37m-macosx_10_9_x86_64.whl size=273542 sha256=0fbe0031d1deb3d186a581b6f3964cc8441b7df2bd752d3186a8f2fad18c25c9
  Stored in directory: /Users/peterokwukogu/Library/Caches/pip/wheels/46/f1/0b/a372e98f7103934a3573301c71b475143baf8ba6f6dffc876c
Successfully built regex
Installing collected packages: regex, tokenization
Successfully installed regex-2020.6.8 tokenization-1.0.7


In [4]:
!pip install wordcloud

Collecting wordcloud
  Downloading wordcloud-1.7.0-cp37-cp37m-macosx_10_6_x86_64.whl (160 kB)
[K     |████████████████████████████████| 160 kB 128 kB/s eta 0:00:01
Installing collected packages: wordcloud
Successfully installed wordcloud-1.7.0


In [6]:
!pip install tensorflow

Collecting tensorflow
  Downloading tensorflow-2.2.0-cp37-cp37m-macosx_10_11_x86_64.whl (175.3 MB)
[K     |████████████████████████████████| 175.3 MB 18 kB/s  eta 0:00:01   |▏                               | 706 kB 294 kB/s eta 0:09:54     |███████████████████████████     | 147.4 MB 502 kB/s eta 0:00:56     |████████████████████████████    | 153.4 MB 438 kB/s eta 0:00:50     |█████████████████████████████▍  | 160.8 MB 762 kB/s eta 0:00:20     |███████████████████████████████▋| 173.1 MB 1.2 MB/s eta 0:00:02
Collecting h5py<2.11.0,>=2.10.0
  Downloading h5py-2.10.0-cp37-cp37m-macosx_10_6_intel.whl (3.0 MB)
[K     |████████████████████████████████| 3.0 MB 469 kB/s eta 0:00:01
[?25hCollecting astunparse==1.6.3
  Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting keras-preprocessing>=1.1.0
  Downloading Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
[K     |████████████████████████████████| 42 kB 1.4 MB/s eta 0:00:01
[?25hCollecting google-pasta>=0.1.8
  Downl