# Setup

In [1]:
%cd ..

/workspaces/rasa_moodbot


# Version Check

In [2]:
import transformers
transformers.__version__

'4.9.2'

# Sequence Classification

## Pipeline Approach 

In [3]:
from transformers import pipeline

nlp = pipeline("sentiment-analysis")

2021-08-29 11:11:54.623644: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-08-29 11:11:54.623707: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-08-29 11:11:57.532858: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-08-29 11:11:57.532905: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2021-08-29 11:11:57.532934: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (09d706799cbe): /proc/driver/nvidia/version does not exist
2021-08-29 11:11:57.533166: I tensorflow/core/platform/cpu_featu

In [4]:
nlp(" I hate you")

[{'label': 'NEGATIVE', 'score': 0.9991129040718079}]

In [5]:
nlp("I love you")

[{'label': 'POSITIVE', 'score': 0.9998656511306763}]

## Model Approach

In [6]:
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
import tensorflow as tf

In [7]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")

Some layers from the model checkpoint at bert-base-cased-finetuned-mrpc were not used when initializing TFBertForSequenceClassification: ['dropout_183']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at bert-base-cased-finetuned-mrpc.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


In [8]:
classes = ["not paraphrase", "is paraphrase"]

In [9]:
sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"

In [10]:
paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, return_tensors="tf")
not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, return_tensors="tf")

In [11]:
paraphrase_classification_logits = model(paraphrase)[0]
not_paraphrase_classification_logits = model(not_paraphrase)[0]

In [12]:
paraphrase_results = tf.nn.softmax(paraphrase_classification_logits, axis=1).numpy()[0]
not_paraphrase_results = tf.nn.softmax(not_paraphrase_classification_logits, axis=1).numpy()[0]

In [13]:
print("Should be paraphrase")
for i in range(len(classes)):
    print(f"{classes[i]}: {round(paraphrase_results[i] * 100)}%")

Should be paraphrase
not paraphrase: 10.0%
is paraphrase: 90.0%


In [14]:
print("\nShould not be paraphrase")
for i in range(len(classes)):
    print(f"{classes[i]}: {round(not_paraphrase_results[i] * 100)}%")


Should not be paraphrase
not paraphrase: 94.0%
is paraphrase: 6.0%


# Masked Language Modelling

## Pipeline Approach

In [15]:
from transformers import pipeline

nlp = pipeline("fill-mask")

All model checkpoint layers were used when initializing TFRobertaForMaskedLM.

All the layers of TFRobertaForMaskedLM were initialized from the model checkpoint at distilroberta-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForMaskedLM for predictions without further training.


In [16]:
sentence = f"HuggingFace is creating a {nlp.tokenizer.mask_token} that the community uses to solve NLP tasks."

In [17]:
results = nlp(sentence)

In [18]:
for r in results:
    print(r)

{'sequence': 'HuggingFace is creating a tool that the community uses to solve NLP tasks.', 'score': 0.1792726218700409, 'token': 3944, 'token_str': ' tool'}
{'sequence': 'HuggingFace is creating a framework that the community uses to solve NLP tasks.', 'score': 0.11349277198314667, 'token': 7208, 'token_str': ' framework'}
{'sequence': 'HuggingFace is creating a library that the community uses to solve NLP tasks.', 'score': 0.05243482440710068, 'token': 5560, 'token_str': ' library'}
{'sequence': 'HuggingFace is creating a database that the community uses to solve NLP tasks.', 'score': 0.03493505343794823, 'token': 8503, 'token_str': ' database'}
{'sequence': 'HuggingFace is creating a prototype that the community uses to solve NLP tasks.', 'score': 0.028602296486496925, 'token': 17715, 'token_str': ' prototype'}


## Model Approach

In [19]:
from transformers import TFAutoModelWithLMHead, AutoTokenizer
import tensorflow as tf

In [20]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
model = TFAutoModelWithLMHead.from_pretrained("distilbert-base-cased")

Some layers from the model checkpoint at distilbert-base-cased were not used when initializing TFDistilBertForMaskedLM: ['activation_13']
- This IS expected if you are initializing TFDistilBertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFDistilBertForMaskedLM were initialized from the model checkpoint at distilbert-base-cased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForMaskedLM for predictions without further training.


In [21]:
sequence = f"Distilled models are smaller than the models they mimic. Using them instead of the large versions would help {tokenizer.mask_token} our carbon footprint."
sequence

'Distilled models are smaller than the models they mimic. Using them instead of the large versions would help [MASK] our carbon footprint.'

In [22]:
input = tokenizer.encode(sequence, return_tensors="tf")
input

<tf.Tensor: shape=(1, 30), dtype=int32, numpy=
array([[  101, 12120,  2050,  8683,  1181,  3584,  1132,  2964,  1190,
         1103,  3584,  1152, 27180,   119,  7993,  1172,  1939,  1104,
         1103,  1415,  3827,  1156,  1494,   103,  1412,  6302,  2555,
        10988,   119,   102]], dtype=int32)>

In [23]:
mask_token_index = tf.where(input == tokenizer.mask_token_id)[0, 1]
mask_token_index

<tf.Tensor: shape=(), dtype=int64, numpy=23>

In [24]:
token_logits = model(input)[0]
mask_token_logits = token_logits[0, mask_token_index, :]

In [25]:
top_5_tokens = tf.math.top_k(mask_token_logits, 5).indices.numpy()

In [26]:
for token in top_5_tokens:
    print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))

Distilled models are smaller than the models they mimic. Using them instead of the large versions would help reduce our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help increase our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help decrease our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help offset our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help improve our carbon footprint.


# Using Community Models

In [27]:
from transformers import AutoTokenizer, TFAutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("StevenLimcorn/MelayuBERT")

model = TFAutoModelForMaskedLM.from_pretrained("StevenLimcorn/MelayuBERT")

All model checkpoint layers were used when initializing TFBertForMaskedLM.

All the layers of TFBertForMaskedLM were initialized from the model checkpoint at StevenLimcorn/MelayuBERT.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForMaskedLM for predictions without further training.


In [28]:
sequence = f"Saya {tokenizer.mask_token} makan nasi hari ini"
sequence

'Saya [MASK] makan nasi hari ini'

In [29]:
input = tokenizer.encode(sequence, return_tensors="tf")
mask_token_index = tf.where(input == tokenizer.mask_token_id)[0, 1]
token_logits = model(input)[0]
mask_token_logits = token_logits[0, mask_token_index, :]
top_5_tokens = tf.math.top_k(mask_token_logits, 5).indices.numpy()

In [30]:
for token in top_5_tokens:
    print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))

Saya nak makan nasi hari ini
Saya suka makan nasi hari ini
Saya dah makan nasi hari ini
Saya makan makan nasi hari ini
Saya ialah makan nasi hari ini


# Adapting to Rasa

## Using [StevenLimcorn/MelayuBERT](https://huggingface.co/StevenLimcorn/MelayuBERT)

In [31]:
from transformers import BertTokenizer, TFBertForMaskedLM
import tensorflow as tf

In [32]:
tokenizer = BertTokenizer.from_pretrained('.cache')
model = TFBertForMaskedLM.from_pretrained('.cache', output_hidden_states=True)

All model checkpoint layers were used when initializing TFBertForMaskedLM.

All the layers of TFBertForMaskedLM were initialized from the model checkpoint at .cache.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForMaskedLM for predictions without further training.


In [33]:
utterance = "Apa khabar?"

In [34]:
inputs = tokenizer(utterance, return_tensors="tf")
inputs["labels"] = tokenizer(utterance, return_tensors="tf")["input_ids"]

In [35]:
outputs = model(inputs)
loss = outputs.loss
logits = outputs.logits
hidden_states = outputs.hidden_states

In [36]:
hidden_states[-1]

<tf.Tensor: shape=(1, 5, 768), dtype=float32, numpy=
array([[[-0.17985347, -0.16066436, -0.3857671 , ...,  0.166049  ,
         -0.20223033,  0.02533146],
        [ 0.2641846 , -0.13460945, -0.26393276, ...,  0.30219942,
         -0.30747098, -0.16703594],
        [ 0.28111064, -0.22694078, -0.46562824, ...,  0.02080232,
         -0.58649707,  0.05440234],
        [ 0.37077692, -0.27485836, -0.38428468, ...,  0.21863174,
         -0.42085147, -0.48858008],
        [ 0.02771205, -0.03792508, -0.4409372 , ...,  0.2997001 ,
         -0.37650868, -0.07049367]]], dtype=float32)>

## Using [rasa/LaBSE](https://huggingface.co/rasa/LaBSE)

In [37]:
from transformers import AutoTokenizer, TFBertModel

In [38]:
tokenizer = AutoTokenizer.from_pretrained("rasa/LaBSE")
model = TFBertModel.from_pretrained("rasa/LaBSE")

Downloading:   0%|          | 0.00/277 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/654 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/5.22M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.88G [00:00<?, ?B/s]

2021-08-29 11:15:17.606723: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 1539542016 exceeds 10% of free system memory.
2021-08-29 11:15:18.532293: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 1539542016 exceeds 10% of free system memory.
2021-08-29 11:15:18.751976: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 1539542016 exceeds 10% of free system memory.
2021-08-29 11:15:21.194061: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 1539542016 exceeds 10% of free system memory.
2021-08-29 11:15:21.639484: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 1539542016 exceeds 10% of free system memory.
All model checkpoint layers were used when initializing TFBertModel.

All the layers of TFBertModel were initialized from the model checkpoint at rasa/LaBSE.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions with

In [39]:
inputs = tokenizer(utterance, return_tensors="tf")
inputs["labels"] = tokenizer(utterance, return_tensors="tf")["input_ids"]

outputs = model(inputs)
hidden_states = outputs.hidden_states

In [40]:
outputs.last_hidden_state

<tf.Tensor: shape=(1, 5, 768), dtype=float32, numpy=
array([[[-0.58700824,  1.3538016 , -1.037914  , ..., -0.28710842,
          0.08892204, -0.2947217 ],
        [-0.16785997,  1.1743879 , -0.3859771 , ...,  0.16525707,
         -0.6053635 , -0.31148687],
        [-0.20207278,  1.044165  , -0.2412882 , ..., -0.01710178,
         -0.769098  , -0.3958366 ],
        [-0.23526837,  1.002651  , -0.52520394, ...,  0.05787983,
         -0.5744119 , -0.31641147],
        [-0.58700866,  1.3538011 , -1.0379144 , ..., -0.28710806,
          0.08892214, -0.2947222 ]]], dtype=float32)>