# ELMo embeddings example

## **Install Tensorflow and TF Hub**



In [2]:
import os
os.environ['PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION'] = 'python'

In [1]:
!pip install -U tensorflow
!pip install -U tensorflow-hub
!pip install -U numpy==1.18.5



In [2]:
!nvidia-smi

Tue May 17 19:16:51 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P8    25W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [3]:
import tensorflow as tf
print(tf.__version__)

ModuleNotFoundError: No module named 'tensorflow'

### **Tensorflow Hub: A MarketPlace for pretrained models**

TensorFlow Hub is a library for the publication, discovery, and consumption of reusable parts of machine learning models. A module is a self-contained piece of a TensorFlow graph, along with its weights and assets, that can be reused across different tasks in a process known as transfer learning.

### **ELMo - Deep contextualized word representations**

<img src="http://jalammar.github.io/images/elmo-embedding.png">

***Inputs***

The module defines two signatures: default, and tokens.

* **default**: the module takes untokenized sentences as input. The input tensor is a string tensor with shape [batch_size]. The module tokenizes each string by splitting on spaces.

* **tokens**: the module takes tokenized sentences as input. The input tensor is a string tensor with shape [batch_size, max_length] and an int32 tensor with shape [batch_size] corresponding to the sentence length. The length input is necessary to exclude padding in the case of sentences with varying length.

***Outputs***

The output dictionary contains:

* **word_emb**: the character-based word representations with shape [batch_size, max_length, 512].
* **lstm_outputs1**: the first LSTM hidden state with shape [batch_size, max_length, 1024].
* **lstm_outputs2**: the second LSTM hidden state with shape [batch_size, max_length, 1024].
* **elmo**: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024].
* **default**: a fixed mean-pooling of all contextualized word representations with shape [batch_size, 1024].


We need to build a Keras layer as a wrapper of Tensorflow Hub ELMo module.

See more in https://tfhub.dev/google/elmo/3


In [5]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
import tensorflow_hub as hub

elmo = hub.Module("https://tfhub.dev/google/elmo/3", trainable=True)

embeddings = elmo(
    ["the cat is on the mat", "dogs are in the fog"], signature="default",
    as_dict=True)["elmo"]

print(embeddings)

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Tensor("module_1_apply_default/aggregation/mul_3:0", shape=(2, 6, 1024), dtype=float32)


In [8]:
from tensorflow.keras.layers import Layer
import tensorflow.keras.backend as K

class ELMo(Layer):
    def __init__(self, elmo_representation='elmo', trainable=True, **kwargs):
        self.module_output = elmo_representation
        self.trainable = trainable

        self.elmo = None
        super(ELMo, self).__init__(**kwargs)

    def build(self, input_shape):
        # SetUp tensorflow Hub module
        self.elmo = hub.Module('https://tfhub.dev/google/elmo/3',
                               trainable=self.trainable, name="{}_module".format(self.name))
        
        # Assign module's trainable weights to model
        if self.trainable:
          self._trainable_weights.extend(
              tf.trainable_variables(scope="^{}_module/.*".format(self.name))
          )
        
        super(ELMo, self).build(input_shape)

    def call(self, x, mask=None):
        result = self.elmo(K.squeeze(K.cast(x, tf.string), axis=1),
                           as_dict=True,
                           signature='default',
                           )[self.module_output]
        return result

    def compute_mask(self, inputs, mask=None):
        return None

    def compute_output_shape(self, input_shape):
        return input_shape[0], input_shape[0], 1024

In [9]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Dropout, Bidirectional, LSTM
from tensorflow.keras.layers import Dense, TimeDistributed, concatenate

LSTM_SIZE = 300
DENSE = 1000

# Word ids of the sample
inputs =  Input(shape=(None, ), dtype='int32')
# And the strig as it is...
elmo_inputs = Input(shape=(1,), dtype='string')

# Define Embedding Layer (Actually these are random weights... You can use W2V!)
embeddings = Embedding(20000, 50)(inputs)

# ELMo embeddings as weighted sum across layers --> ['elmo'] module output
elmo_embeddings = ELMo()(elmo_inputs)

# Concat Word2Vec + ELMo embeddings in the last dimension (horizontally)
concatenated_embeddings = concatenate([embeddings, elmo_embeddings], axis=-1)

# Apply Dropout
drop_emb = Dropout(0.33)(concatenated_embeddings)

# Define an RNN (Biderectional) with LSTM cells
bilstm = Bidirectional(LSTM(units=LSTM_SIZE, return_sequences=False, recurrent_dropout = 0.33))(drop_emb)

# Apply Dropout to the bilstm representation
drop_encodings = Dropout(0.33)(bilstm)

# Pass through a Dense Layer
hidden = Dense(units=DENSE, activation="relu")(drop_encodings)

# Apply Dropout to the output of the Dense Layer
drop_out = Dropout(0.33)(hidden)

# Last pass through a Dense Layer with softmax activation to produce a probability distribution
out = Dense(units=20, activation="softmax")(drop_out)

# Wrap model --> Remember Functional API
model = Model(inputs=[inputs, elmo_inputs], outputs=out)

# Print topology
model.summary(110)

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Model: "model_1"
______________________________________________________________________________________________________________
Layer (type)                        Output Shape            Param #      Connected to                         
input_3 (InputLayer)                [(None, None)]          0                                                 
______________________________________________________________________________________________________________
input_4 (InputLayer)                [(None, 1)]             0                                                 
______________________________________________________________________________________________________________
embedding_1 (Embedding)             (None, None, 50)        1000000      input_3[0][0]                        
______________________________________________________________________________________________________________
el_mo_1 (ELMo)                      (None, None, 1024)      4            input_4[0][0]         