# Introduction - Embeddings from Language Models (ELMo)
***
ELMo is a SOTA NLP framework for several NLP tasks.

ELMo vectors are computed on top of a two-layer bidirectional language model (biLM). This biLM model has two layers stacked together, where each layer has 2 passes (a forward and a backward pass).

The steps are as follows:
* use a character-level CNN to represent words of a text string into raw word vectors
* these raw word vectors act as inputs tot he first layer of the biLM
* the forward pass contains information about a word and the context before that word
* the backward pass contains information about the word and the context after that word
* the information from both the forward and backward passes are used to form the intermediate word vectors
* these intermediate word vectors are fed into the next layer of the biLM
* the final (ELMo) representation is the weighted sum of the raw word vectors and the 2 intermediate word vectors

Because the biLM is computed from characters, it can differentiate between words like **beauty** and **beautiful**, while realizing that they are related at some level.

ELMo is different from the traditional word embeddings (word2vec and glove) in that the ELMo vector assigned to a token is a function of the sentence, and therefore the context containing the word. Because of this, ELMo can differentiate between the meanings in the word **read** in the two following sentences:
1. I **read** the book yesterday
2. Can you **read** the letter now?

The **read** in the first sentence is in the past tense, whereas in the second sentence it is in the present tense. Traditional word embeddings would come up with the same vector for the word **read** for both of the above sentences, whereas they should be similar, but slightly different (containing temporal information). 

# Problem Statement
***
Perform sentiment analysis on a series of tweets from customers of various tech firms that manufacture and sell hardware. We want to identify if the tweets have negative sentiment towards companies, or positive sentiment (binary classification). 

In [1]:
import pandas as pd
import numpy as np
import spacy
import re
import pickle

In [2]:
train = pd.read_csv("data/tech_company_sentiment_tweet_data/train.csv")
test = pd.read_csv("data/tech_company_sentiment_tweet_data/test.csv")

print(train.shape, test.shape)

(7920, 3) (1953, 2)


In [3]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7920 entries, 0 to 7919
Data columns (total 3 columns):
id       7920 non-null int64
label    7920 non-null int64
tweet    7920 non-null object
dtypes: int64(2), object(1)
memory usage: 185.7+ KB


Let's check to see the class distribution of positive and negative tweets in the training set (1 is negative and 0 is non-negative):

In [4]:
train.label.value_counts(normalize=True)

0    0.744192
1    0.255808
Name: label, dtype: float64

In [5]:
train[train.label==0].head()

Unnamed: 0,id,label,tweet
0,1,0,#fingerprint #Pregnancy Test https://goo.gl/h1...
1,2,0,Finally a transparant silicon case ^^ Thanks t...
2,3,0,We love this! Would you go? #talk #makememorie...
3,4,0,I'm wired I know I'm George I was made that wa...
6,7,0,Happy for us .. #instapic #instadaily #us #son...


# Text Cleaning and Preprocessing
***

In [6]:
import string
punctuation_pattern = r"[" + string.punctuation + "]"
nlp = spacy.load("en", disable=['parser','ner'])

def clean_tweet( tweet ):
    # remove URL links
    tweet = re.sub(r"http\S+", '', tweet)
    # remove punctuation
    tweet = re.sub(punctuation_pattern, '', tweet)
    # convert to lower case
    tweet = tweet.lower()
    # remove numbers
    tweet = re.sub(r"[0-9]", ' ', tweet)
    # remove whitespaces
    tweet = " ".join(tweet.split())
    # use spacy to lemmatize
    root_words = [token.lemma_ for token in nlp(tweet)]
    tweet = " ".join(root_words)
    return tweet

In [7]:
%time train["clean_tweet"] = train["tweet"].apply(clean_tweet)
%time test["clean_tweet"] = test["tweet"].apply(clean_tweet)

CPU times: user 1min 28s, sys: 2min 9s, total: 3min 38s
Wall time: 28.6 s
CPU times: user 23 s, sys: 33.1 s, total: 56.1 s
Wall time: 7.29 s


Let's look at how the cleaned tweets appear compared to the pre-cleaned tweets:

In [8]:
train.head()

Unnamed: 0,id,label,tweet,clean_tweet
0,1,0,#fingerprint #Pregnancy Test https://goo.gl/h1...,fingerprint pregnancy test android app beautif...
1,2,0,Finally a transparant silicon case ^^ Thanks t...,finally a transparant silicon case thank to -P...
2,3,0,We love this! Would you go? #talk #makememorie...,-PRON- love this would -PRON- go talk makememo...
3,4,0,I'm wired I know I'm George I was made that wa...,-PRON- be wired i know -PRON- be george i be m...
4,5,1,What amazing service! Apple won't even talk to...,what amazing service apple will not even talk ...


# Download ELMo Pre-Trained Vectors from TensorFlow Hub
***
TensorFlow Hub is a library that enables transfer learning by hosting machine learning models for different tasks.

In [9]:
import tensorflow_hub as hub
import tensorflow as tf

Now load the module:

In [10]:
%time elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True)

W0822 09:45:42.967687 140392815863616 deprecation.py:323] From /home/joseph/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py:3632: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.


Here is a quick example to show you how to get the learned vectors for a small example sentence:

In [11]:
example_sentence = ["Roasted ants are a popular snack in Columbia"]
embeddings = elmo(example_sentence, signature="default", as_dict=True)["elmo"]
print(embeddings.shape)
print(embeddings)

(1, 8, 1024)
Tensor("module_apply_default/aggregation/mul_3:0", shape=(1, 8, 1024), dtype=float32)


* The first dimension of this output represents the number of training samples (1 here). 
* The second dimension represents the maximum length of the longest string in the input list of strings. Because we only have 1 string in the input list, the second dimension size is equal to the length of our example sentence (8). 
* The third dimension is equal to the length of the output ELMo vector (every one is of size 1024)

So we need to determine what the ELMo vector for the cleaned tweets is in both the train and test set. In order to get a vector representation of an entire tweet, we will take the component-wise average of the ELMo vectors of the constituent terms or tokens of the tweet.

In [12]:
def elmo_vectors( tweets ):
    embeddings = elmo(tweets.tolist(), signature="default", as_dict=True)["elmo"]
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        sess.run(tf.tables_initializer())
        # average along axis 1
        return sess.run(tf.reduce_mean(embeddings,1))

Create lists of batches of points so you don't run out of memory when processing the ELMo vectors

In [13]:
batch_size = 100
list_train = [train[i:i+batch_size] for i in range(0,train.shape[0], batch_size)]
list_test = [test[i:i+batch_size] for i in range(0,test.shape[0], batch_size)]

In [14]:
elmo_train = [elmo_vectors(x["clean_tweet"]) for x in list_train]
elmo_test = [elmo_vectors(x["clean_tweet"]) for x in list_test]

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node module_apply_default_1/bilm/CNN_2/Conv2D_6 (defined at /home/joseph/miniconda3/lib/python3.6/site-packages/tensorflow_hub/native_module.py:561) ]]
	 [[node Mean (defined at <ipython-input-12-f9746ba7e73b>:7) ]]

Caused by op 'module_apply_default_1/bilm/CNN_2/Conv2D_6', defined at:
  File "/home/joseph/miniconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/joseph/miniconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 505, in start
    self.io_loop.start()
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 148, in start
    self.asyncio_loop.run_forever()
  File "/home/joseph/miniconda3/lib/python3.6/asyncio/base_events.py", line 421, in run_forever
    self._run_once()
  File "/home/joseph/miniconda3/lib/python3.6/asyncio/base_events.py", line 1426, in _run_once
    handle._run()
  File "/home/joseph/miniconda3/lib/python3.6/asyncio/events.py", line 127, in _run
    self._callback(*self._args)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tornado/ioloop.py", line 690, in <lambda>
    lambda f: self._run_callback(functools.partial(callback, future))
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tornado/gen.py", line 787, in inner
    self.run()
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tornado/gen.py", line 748, in run
    yielded = self.gen.send(value)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 378, in dispatch_queue
    yield self.process_one()
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tornado/gen.py", line 225, in wrapper
    runner = Runner(result, future, yielded)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tornado/gen.py", line 714, in __init__
    self.run()
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tornado/gen.py", line 748, in run
    yielded = self.gen.send(value)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 365, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 272, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 542, in execute_request
    user_expressions, allow_stdin,
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 294, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2855, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2881, in _run_cell
    return runner(coro)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
    coro.send(None)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3058, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3249, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-14-814de3b051e2>", line 1, in <module>
    elmo_train = [elmo_vectors(x["clean_tweet"]) for x in list_train]
  File "<ipython-input-14-814de3b051e2>", line 1, in <listcomp>
    elmo_train = [elmo_vectors(x["clean_tweet"]) for x in list_train]
  File "<ipython-input-12-f9746ba7e73b>", line 2, in elmo_vectors
    embeddings = elmo(tweets.tolist(), signature="default", as_dict=True)["elmo"]
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tensorflow_hub/module.py", line 255, in __call__
    name=name)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tensorflow_hub/native_module.py", line 561, in create_apply_graph
    import_scope=relative_scope_name)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1435, in import_meta_graph
    meta_graph_or_file, clear_devices, import_scope, **kwargs)[0]
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1457, in _import_meta_graph_with_return_elements
    **kwargs))
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
    return_elements=return_elements)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
    _ProcessNewOps(graph)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 235, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3433, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3433, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3325, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/home/joseph/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node module_apply_default_1/bilm/CNN_2/Conv2D_6 (defined at /home/joseph/miniconda3/lib/python3.6/site-packages/tensorflow_hub/native_module.py:561) ]]
	 [[node Mean (defined at <ipython-input-12-f9746ba7e73b>:7) ]]


In [None]:
elmo_train_new = np.concatenate(elmo_train, axis=0)
elmo_test_new = np.concatenate(elmo_test, axis=0)

Now we can save these arrays to pickle files:

In [None]:
pickle_out = open("elmo_train.pickle","wb")
pickle.dump(elmo_train_new, pickoue_out)
pickle_out.close()

pickle_out = open("elmo_test.pickle","wb")
pickle.dump(elmo_test_new, pickoue_out)
pickle_out.close()

To load these files back, we use:

In [None]:
pickle_in = open("elmo_train.pickle", "rb")
elmo_train_new = pickle.load(pickle_in)

pickle_in = open("elmo_test.pickle", "rb")
elmo_test_new = pickle.load(pickle_in)

# Building a Model
***
Now we can build an NLP model with ELMo. We are going to build a classification model. First, we need to split our training set into a train and validation set so we can evaluate with respect to hyperparameters as well. 

In [None]:
from sklearn.model_selection import train_test_split

x_train, x_valid, y_train, y_valid = train_test_split(elmo_train_new, y_train, random_state=2019, test_size=0.2)

Now we can instantiate a simple logistic regression model using the ELMo vectors as features.

In [None]:
from sklearn.linear_model import LogisticRegression


lreg = LogisticRegression()
lreg.fit(x_train, y_train)

In [None]:
validation_predictions = lreg.predict(x_valid)

# Evaluating the Model
***
We evaluate the model using f1 score. 

In [None]:
from sklearn.metrics import f1_score