## ELMO

**ELMO** is a word representation method that is used to represent words as vectors. Word vectors are learned in **ELMO** as a function of the internal states of a deep bidirectional language model, which is pre-trained on a large text corpus. 

**ELMO** is described in the following paper
https://arxiv.org/abs/1802.05365

In the sheet below we will use **ELMO** word vectors for sentiment classification of a tweet.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import nltk

%matplotlib inline

### Load and preprocess tweets

In [2]:
from nltk.corpus import twitter_samples 
nltk.download('twitter_samples')
nltk.download('stopwords')

[nltk_data] Downloading package twitter_samples to
[nltk_data]     /Users/jamieott/nltk_data...
[nltk_data]   Package twitter_samples is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/jamieott/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [3]:
all_positive_tweets = twitter_samples.strings('positive_tweets.json')
all_negative_tweets = twitter_samples.strings('negative_tweets.json')

In [4]:
#Train test split
test_pos = all_positive_tweets[4000:]
train_pos = all_positive_tweets[:4000]
test_neg = all_negative_tweets[4000:]
train_neg = all_negative_tweets[:4000]

train_x = np.array(train_pos + train_neg)
test_x = np.array(test_pos + test_neg)

In [5]:
# combine positive and negative labels
train_y = np.append(np.ones((len(train_pos), 1)), np.zeros((len(train_neg), 1)), axis=0)
test_y = np.append(np.ones((len(test_pos), 1)), np.zeros((len(test_neg), 1)), axis=0)

In [6]:
print("train_y.shape = " + str(train_y.shape))
print("test_y.shape = " + str(test_y.shape))

train_y.shape = (8000, 1)
test_y.shape = (2000, 1)


In [7]:
print(train_pos[0])

#FollowFriday @France_Inte @PKuchly57 @Milipol_Paris for being top engaged members in my community this week :)


In [8]:
## Process tweet
import string
import re
from nltk.tokenize import TweetTokenizer
from nltk.corpus import stopwords, twitter_samples 

tweet_tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True, reduce_len=True)
stopwords_english = stopwords.words('english')
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

def process_tweet(tweet):
    # remove stock market tickers like $GE
    tweet = re.sub(r'\$\w*', '', tweet)
    # remove old style retweet text "RT"
    tweet = re.sub(r'^RT[\s]+', '', tweet)
    # remove hyperlinks
    tweet = re.sub(r'https?:\/\/.*[\r\n]*', '', tweet)
    # remove hashtags
    # only removing the hash # sign from the word
    tweet = re.sub(r'#', '', tweet)
    # tokenize tweets
    tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True, reduce_len=True)
    tweet_tokens = tokenizer.tokenize(tweet)
    tweets_clean = []
    for word in tweet_tokens:
        if (word not in stopwords_english and # remove stopwords
            word not in string.punctuation): # remove punctuation
            stem_word = stemmer.stem(word) # stemming word
            tweets_clean.append(stem_word)
    return ' '.join(tweets_clean)



In [9]:
process_tweet(train_pos[0])

'followfriday top engag member commun week :)'

### Load ELMO

In [10]:
#Load ELMO model from tensor flow hub
import tensorflow_hub as hub
import tensorflow as tf
elmo = hub.load("https://tfhub.dev/google/elmo/3")

In the example below **ELMO** gives different word embeddings for the word fall depending upon context. In one context word fall is used as a weather in second context word fall is used as a verb (to fall). We can see cosine similarity of the word fall as weather is higher with word winter then with verb fall.

In [11]:
embeddings = elmo.signatures["default"](tf.constant([
                "I love cool crisp fall weather",
                "Dont fall on your way to the gym",
                "winter",
                "slip"
                ])
                )["elmo"]

In [12]:
embeddings_arr = embeddings.numpy()

In [13]:
embeddings_arr.shape

(4, 8, 1024)

In [14]:
fall1 = embeddings_arr[0][4]
fall2 = embeddings_arr[1][2]
winter = embeddings_arr[2][0]
slip = embeddings_arr[3][0]

In [15]:
##Now lets see consine similarity between these embeddings
def cosine_similarity(A, B):
    dot = np.dot(A,B)
    norma = np.linalg.norm(A)
    normb = np.linalg.norm(B)
    cos = dot/(norma*normb)
    return cos

print(f"Cosine similarity between fall (weather) and fall (verb) {cosine_similarity(fall1, fall2) :.3}")
print(f"Cosine similarity between fall (weather) and winter {cosine_similarity(fall1, winter) :.3}")
print(f"Cosine similarity between fall (verb) and slip {cosine_similarity(fall2, slip) :.3}")

Cosine similarity between fall (weather) and fall (verb) 0.156
Cosine similarity between fall (weather) and winter 0.459
Cosine similarity between fall (verb) and slip 0.187


Lets prepare ELMO word vectors for logistic regression

In [16]:
def create_embeddings(tweets):
    embeddings = elmo.signatures["default"](tf.convert_to_tensor(tweets))["elmo"]
    return tf.reduce_mean(embeddings,1)

In [17]:
list_train_x = [train_x[i:i+100] for i in range(0,train_x.shape[0],100)]
list_test_x = [test_x[i:i+100] for i in range(0,test_x.shape[0],100)]

In [18]:
elmo_train_x = [create_embeddings(x) for x in list_train_x]
elmo_test_x  = [create_embeddings(x) for x in list_test_x]

In [19]:
elmo_train_x = np.concatenate(elmo_train_x, axis = 0)
elmo_test_x = np.concatenate(elmo_test_x, axis = 0)

Now lets apply Logistic regression to fit the model

In [20]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(elmo_train_x)
elmo_train_std_x = sc.transform(elmo_train_x)
elmo_test_std_x  = sc.transform(elmo_test_x)

In [21]:
from sklearn.linear_model import LogisticRegression

lr = LogisticRegression(max_iter=10000)
lr.fit(elmo_train_std_x, train_y.reshape(train_y.shape[0]))

LogisticRegression(max_iter=10000)

In [22]:
print(f"Score of train set {lr.score(elmo_train_std_x, train_y):.3}")
print(f"Score in test set {lr.score(elmo_test_std_x, test_y):.3}")

Score of train set 1.0
Score in test set 0.951


There is some over fitting on the train set...but in general this model performs very well almost 95% R2 for sentiment classification