In traditional neural language models, each token in the first input layer (in this case The cat is happy) is converted into a fixed-length word embedding before being passed into the recurrent unit. This is done either by initializing a word embedding matrix of size (Vocabulary size) x (Word embedding dimension), or by using a pretrained embedding such as GLoVe for each token<p>
    
In Elmo we first convert each token to an appropriate representation using character embeddings. This character embedding representation is then run through a convolutional layer using some number of filters, followed by a max-pool layer. Finally this representation is passed through a 2-layer highway network before being provided as the input to the LSTM layer.

In [1]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

data=pd.read_csv(r"../input/bbc-text.csv")
df2 = data.copy()

### Removing Stopwords

In [2]:
import nltk
from nltk.corpus import stopwords
stop = stopwords.words('english')

In [3]:
df2['text'] = df2['text'].apply(lambda x: " ".join(x for x in x.split() if x not in stop))
df2.head()

Unnamed: 0,category,text
0,tech,tv future hands viewers home theatre systems p...
1,business,worldcom boss left books alone former worldcom...
2,sport,tigers wary farrell gamble leicester say rushe...
3,sport,yeading face newcastle fa cup premiership side...
4,entertainment,ocean twelve raids box office ocean twelve cri...


### Removing Infrequent words

In [4]:
freq = pd.Series(' '.join(df2['text']).split()).value_counts()[-10:]
df2['text'] = df2['text'].apply(lambda x: " ".join(x for x in x.split() if x not in freq))

### Step 2.3 Import the Libraries

In [5]:
import pandas as pd
import numpy as np
import spacy
from tqdm import tqdm
import re
import time
import pickle

In [6]:
import tensorflow_hub as hub
import tensorflow as tf

embed = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True)

### Step 2.4 Convert Sentence to Elmo Vectors

In [7]:
import tensorflow as tf
import tensorflow_hub as hub
import pandas as pd
from sklearn import preprocessing
import keras
import numpy as np


y = list(df2['category'])
x = list(df2['text'])

le = preprocessing.LabelEncoder()
le.fit(y)

def encode(le, labels):
    enc = le.transform(labels)
    return keras.utils.to_categorical(enc)

def decode(le, one_hot):
    dec = np.argmax(one_hot, axis=1)
    return le.inverse_transform(dec)


x_enc = x
y_enc = encode(le, y)

Using TensorFlow backend.


### Step 2.5 Divide dataset to test and train dataset

In [8]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(np.asarray(x_enc), np.asarray(y_enc), test_size=0.2, random_state=42)

Elmo uses bi-directional LSTM model to create word representations.<br>
Rather than a dictionary of words and their corresponding vectors, ELMo analyses words within the context that they are used<br>
ELMo is character based, therefore tokenizing words should not have any impact on performance.

#### Example to understand

In [9]:
x = ["Roasted ants are a popular snack in Columbia"]

# Extract ELMo features 
embeddings = embed(x, signature="default", as_dict=True)["default"]
embeddings.shape

init_op = tf.initialize_all_variables()

#run the graph
with tf.Session() as sess:
    sess.run(init_op) #execute init_op
    new = sess.run(embeddings)
    print(sess.run(embeddings))
    print(new.shape)

[[-0.08238704  0.07898229  0.08993168 ... -0.22516626  0.05554263
   0.296727  ]]
(1, 1024)


The output is a 2 dimensional tensor of shape (1, 1024)
* The first dimension of this tensor represents the number of training samples. This is 1 in our case
* The seond dimension is equal to the length of the ELMo vector

Hence, every word in the input sentence has an ELMo vector of size 1024

### Step 2.5 Train Keras neural model with ELMO Embeddings

In [10]:
from keras.layers import Input, Lambda, Dense
from keras.models import Model
import keras.backend as K

In [11]:
def ELMoEmbedding(x):
    return embed(tf.squeeze(tf.cast(x, tf.string)), signature="default", as_dict=True)["default"]

**Elmo model** : This modules supports inputs both in the form of raw text strings or tokenized text strings<p>
    
**Default signature** : The module takes untokenized sentences as input. The input tensor is a string tensor with shape [batch_size]. The module tokenizes each string by splitting on spaces.

**Token Signature** : With the tokens signature, the module takes tokenized sentences as input. The input tensor is a string tensor with shape [batch_size, max_length] and an int32 tensor with shape [batch_size] corresponding to the sentence length. The length input is necessary to exclude padding in the case of sentences with varying length.

**tf.squeeze** : Given a tensor input, this operation returns a tensor of the same type with all dimensions of size 1 removed. If you don't want to remove all size 1 dimensions, you can remove specific size 1 dimensions by specifying axis<p>
**tf.cast** : The operation casts x (in case of Tensor) or x.values (in case of SparseTensor) to dtype (tf.string) here<p>
**signature** : A string with the signature name to apply. If none, the default signature is used.<br>
**as_dict** : If a signature has multiple inputs, they must be passed as a dict, with the keys defined by the signature. Likewise, if a      signature has multiple outputs, these can be retrieved as a dict by passing as_dict=True<p>
**default(output)**: a fixed mean-pooling of all contextualized word representations with shape [batch_size, 1024].

In [12]:
input_text = Input(shape=(1,), dtype=tf.string)
embedding = Lambda(ELMoEmbedding, output_shape=(1024, ))(input_text)
dense = Dense(256, activation='relu')(embedding)
pred = Dense(5, activation='softmax')(dense)

model = Model(inputs=[input_text], outputs=pred)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [13]:
embedding.shape

TensorShape([Dimension(None), Dimension(1024)])

**Input Layer**: It is used to instantiate a Keras tensor<br>
shape: A shape tuple (integers), not including the batch size. For instance, shape=(1,) indicates that the expected input will be batches of 1-d vectors<br>
dtype: The data type expected by the input, as a string <p>
**Lambda Layer** : Keras employs a naming scheme to define anonymous/custom layers. Lambda layers in Keras help you to implement layers or functionality that is not prebuilt and which do not require trainable weights.

In [14]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 1)                 0         
_________________________________________________________________
lambda_1 (Lambda)            (None, 1024)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 256)               262400    
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 1285      
Total params: 263,685
Trainable params: 263,685
Non-trainable params: 0
_________________________________________________________________


In [15]:
from keras.utils.vis_utils import plot_model
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)

In [16]:
with tf.Session() as session:
    K.set_session(session)
    session.run(tf.global_variables_initializer())
    session.run(tf.tables_initializer())
    history = model.fit(x_train, y_train, epochs=1, batch_size=16)
    model.save_weights('./elmo-model.h5')

with tf.Session() as session:
    K.set_session(session)
    session.run(tf.global_variables_initializer())
    session.run(tf.tables_initializer())
    model.load_weights('./elmo-model.h5')
    predicts = model.predict(x_test, batch_size=16)

y_test = decode(le, y_test)
y_preds = decode(le, predicts)

Epoch 1/1


**tf.tables_initializer()** : Returns an Op that initializes all tables of the default graph.<br>
**tf.global_variables_initializers()** : Returns an Op that initializes global variables.

# 4. Results

In [17]:
from sklearn import metrics
print(metrics.confusion_matrix(y_test, y_preds))
print(metrics.classification_report(y_test, y_preds))

from sklearn.metrics import accuracy_score
print("Accuracy of ELMO is:",accuracy_score(y_test,y_preds))

[[97  0  4  0  0]
 [ 1 77  1  0  2]
 [ 3  0 80  0  0]
 [ 0  0  1 97  0]
 [ 5  3  0  0 74]]
               precision    recall  f1-score   support

     business       0.92      0.96      0.94       101
entertainment       0.96      0.95      0.96        81
     politics       0.93      0.96      0.95        83
        sport       1.00      0.99      0.99        98
         tech       0.97      0.90      0.94        82

     accuracy                           0.96       445
    macro avg       0.96      0.95      0.95       445
 weighted avg       0.96      0.96      0.96       445

Accuracy of ELMO is: 0.9550561797752809
