# Using USR to score dialogues

http://shikib.com/usr

Downloaded the models locally for speed but also because the config.json for the "uk" and "ctx" models lack the model_type parameter. They also lack the finetuning_task setting although this is not needed for doing predictions. Add the following config attributes to both the "uk" and "ctx" model config.json files:

```
  "model_type": "roberta",
  "finetuning_task": "qqp",
```

Checking out the training and development data in the folder "both", I suspect it is better to downcase all text and use "_eos	_go" to separate the context from the target sentence and to end the target sentence with "_eos"

```
0	1	2	_nofact _eos	_go i like hockey and soccer . what teams do you support ? _eos	0
0	1	2	amazon ceo jeff bezos built a clock into a mountain that should run for 10,000 years . _eos	_go i think he is a great ceo , he is a great ceo and a genius _eos	0
```

There are three models given, which all have two classes as output:

* MLM metric that estimates the likelihood of a response (fine tuned on Topical-Chat or PersonChat), likelihood is used as a measure for the "Understandability" and the "Naturalness" of a response: The input sequence to MLM is a concatenation of a dialog context, c, and a response, r. One word at a time, each word in r is masked and its log likelihood is computed. Model name = roberta_ft, test data is given in undr/test.lm":

```
yeah i would feel bad too . bad sportsmanship . is n't it odd that pro bowlers used to make more money than pro football players in the 1960 's ! _eos _go yeah , i guess football has changed so much . i wonder if bowling is more popular in the 60 's than football . _eos
yeah i would feel bad too . bad sportsmanship . is n't it odd that pro bowlers used to make more money than pro football players in the 1960 's ! _eos _go yeah i guess so . do you like fantasy ? _eos

```

* Dialog retrieval (DR) for evaluating generative models is an intuitive choice, especially for metrics like "Maintains Context", "Interesting" and "Uses Knowledge". The model fine-tuned for likelihood, is further fine-tuned for the retrieval task.  The model is trained given a context x, a response r, and a binary label y indicating whether r is the true response or randomly sampled. The context x may consist of the dialog history and the fact, denoted c, or just the fact, denoted f .
* Model name "uk": "Uses Knowledge", trained from the data the *fct* folder:

```
0	1	2	in september of 2010 , the united nations appointed official ambassador to extraterrestrials in they case they would ever make contact with earth _eos	_go i 'm not sure . i wonder if the un has an ambassador to aliens . _eos	0
0	1	2	in september of 2010 , the united nations appointed official ambassador to extraterrestrials in they case they would ever make contact with earth _eos	_go i think it 's because of the atmosphere , it 's not all that old . _eos	0
0	1	2	in september of 2010 , the united nations appointed official ambassador to extraterrestrials in they case they would ever make contact with earth _eos	_go i do n't know , but maybe it 's due to the fact that we have no longer have an ambassador to extraterrestrials _eos	0
0	1	2	in september of 2010 , the united nations appointed official ambassador to extraterrestrials in they case they would ever make contact with earth _eos	_go i think that was for sure , we should grow , i wonder what planet they are able to make the earth , they will have an alien ambassador to extraterrestrials _eos	0
0	1	2	in september of 2010 , the united nations appointed official ambassador to extraterrestrials in they case they would ever make contact with earth _eos	_go wow . the un appointed an official ambassador to aliens ! maybe we can ask them for help if we run out of helium on earth . _eos	0
0	1	2	_nofact _eos	_go i do n't really know much about sports . i do like to watch the olympics and i have been swimming in the summer olympics . _eos	0
0	1	2	_nofact _eos	_go i have never swam competitively , but i did n't have it . i do like it though . _eos	0
0	1	2	_nofact _eos	_go i do not but i am more into swimming myself . i do n't like sports , but i do know there are some really boring swimming competitions _eos	0
0	1	2	_nofact _eos	_go yes . i think that 's why i live in the usa . i 've seen some swimming around the world where i live . _eos	0
0	1	2	_nofact _eos	_go i like hockey and soccer . what teams do you support ? _eos	0
```
* Mode name "ctx": "The context x is the dialog history, trained from *both*, "_nofact":

```
0	1	2	thanks , grandpa ! i bet grandpa wishes he had cashed them in before he cashed out . _eos _nofact _eos	_go i bet he was a great player . nice chat _eos	0
0	1	2	thanks , grandpa ! i bet grandpa wishes he had cashed them in before he cashed out . _eos _nofact _eos	_go i wonder if he was a fan of his music ? _eos	0
0	1	2	thanks , grandpa ! i bet grandpa wishes he had cashed them in before he cashed out . _eos _nofact _eos	_go i 'm sure he was , it was great chatting with you ! _eos	0
0	1	2	thanks , grandpa ! i bet grandpa wishes he had cashed them in before he cashed out . _eos _nofact _eos	_go he was an all star and did the money . i think it 's interesting . i wonder how many albums he had . _eos	0
0	1	2	thanks , grandpa ! i bet grandpa wishes he had cashed them in before he cashed out . _eos _nofact _eos	_go maybe it's time to collect some baseball cards now , so you can cash out when you 're older ! _eos	0
```

In [22]:
from transformers import pipeline

In [23]:
usr_rft_classifier_qqp = pipeline("text-classification", model='/Users/piek/Desktop/t-MA-Combots-2021/code/usr/examples/roberta_ft')
usr_uk_classifier_qqp = pipeline("text-classification", model='/Users/piek/Desktop/t-MA-Combots-2021/code/usr/examples/uk')
usr_ctx_classifier_qqp = pipeline("text-classification", model='/Users/piek/Desktop/t-MA-Combots-2021/code/usr/examples/ctx')

Some weights of the model checkpoint at /Users/piek/Desktop/t-MA-Combots-2021/code/usr/examples/roberta_ft were not used when initializing RobertaForSequenceClassification: ['lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.layer_norm.weight', 'roberta.pooler.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at /Users/piek/Desktop/t-MA

In [18]:
sequence = "hi there how are you doing this evening ?\nhi , sitting here with my three dogs watching the olympics !\nnice i do not want to go back to work i am a waitress\ni love being in a polyamorous open relationship !\niol well i wish i was brave enough to do that\nmy father was a salesman , helps my dog walking business now\nthat is nice i've a motorbike don't know what car to get for winter\nvery very cool . sounds fun\nyes i had them put red with blue stripes to be shinny for when racing\nso is my dog , wow so cool\nso what do you do in your spare time ?\nlead singer for a band , music teacher\nwow nice are you really good ?\nmillions of plays on soundcloud\nreally would you share or are you shy\ni know what you mean spend most nights cuddling my dog and star watching\n"
#sequence = "b'A woman looks at a duck as she walks behind it.' b'A woman is going on a walk with her dog.'"
#sequence = '<s>I am a chef in a restaurant</s><s>What dishes do you cook?</s>'
#sequence = 'I am a chef in a restaurant. What dishes do you cook?'
sequence = "yeah i would feel bad too . bad sportsmanship . is n't it odd that pro bowlers used to make more money than pro football players in the 1960 's ! _eos people fantasy draft the national spelling bee _eos	_go yeah , i guess football has changed so much . i wonder if bowling is more popular in the 60 's than football . _eos"
sequence = "amazon ceo jeff bezos built a clock into a mountain that should run for 10,000 years . _eos	_go i think he is a great ceo , he is a great ceo and a genius _eos"

In [59]:
print('Fine-tuned',usr_rft_classifier_qqp(sequence, return_all_scores=True))
print('Use knowledge', usr_uk_classifier_qqp(sequence, return_all_scores=True))
print('Coherence', usr_ctx_classifier_qqp(sequence, return_all_scores=True))

Fine-tuned [[{'label': 'LABEL_0', 'score': 0.49284106492996216}, {'label': 'LABEL_1', 'score': 0.5071589350700378}]]
Use knowledge [[{'label': 'LABEL_0', 'score': 0.0045419964008033276}, {'label': 'LABEL_1', 'score': 0.9954580068588257}]]
Coherence [[{'label': 'LABEL_0', 'score': 0.012619041837751865}, {'label': 'LABEL_1', 'score': 0.9873809814453125}]]


In [57]:
sequence = "amazon ceo jeff bezos built a clock into a mountain that should run for 10,000 years . _eos	_go I think he is a great CEO , he is a great CEO and a genius _eos"
print('Fine-tuned',usr_rft_classifier_qqp(sequence, return_all_scores=True))
print('Use knowledge', usr_uk_classifier_qqp(sequence, return_all_scores=True))
print('Coherence', usr_ctx_classifier_qqp(sequence, return_all_scores=True))

Fine-tuned [[{'label': 'LABEL_0', 'score': 0.492765337228775}, {'label': 'LABEL_1', 'score': 0.5072346329689026}]]
Use knowledge [[{'label': 'LABEL_0', 'score': 0.06817938387393951}, {'label': 'LABEL_1', 'score': 0.9318206310272217}]]
Coherence [[{'label': 'LABEL_0', 'score': 0.010731802321970463}, {'label': 'LABEL_1', 'score': 0.9892682433128357}]]


In [5]:
import requests
import sys
import os
import emissor as em
from emissor.persistence import ScenarioStorage
from emissor.representation.annotation import AnnotationType, Token, NER
from emissor.representation.container import Index
from emissor.representation.scenario import Modality, ImageSignal, TextSignal, Mention, Annotation, Scenario


src_path = os.path.abspath(os.path.join('..'))
if src_path not in sys.path:
    sys.path.append(src_path)

#### The next utils are needed for the interaction and creating triples and capsules
import chatbots.util.driver_util as d_util
import chatbots.util.text_util as t_util

scenario_path = "/Users/piek/PycharmProjects/cltl-chatbots/data"
### The name of your scenario
scenario_id = "2021-11-30-09:08:06"

### Create the scenario folder, the json files and a scenarioStorage and scenario in memory
scenarioStorage = ScenarioStorage(scenario_path)
scenario_ctrl = scenarioStorage.load_scenario(scenario_id)
signals = scenario_ctrl.get_signals(Modality.TEXT)

In [39]:
leolani = []
dialoggpt = []

for index, signal in enumerate(signals):
    if index % 2== 0:
        utterance = ''.join(signal.seq)
        leolani.append(utterance)
print(len(leolani))
for index, signal in enumerate(signals):
    if not (index % 2== 0):
        utterance = ''.join(signal.seq)
        dialoggpt.append(utterance)
print(len(dialoggpt))

rft0 = []
rft1 = []
uk0 = []
uk1 = []
ctx0 = []
ctx1 = []
for q,r in (zip(leolani, dialoggpt)):
    sequence = q.lower() + ' _eos _go ' + r.lower() + ' _eos'
    results = usr_rft_classifier_qqp(sequence, return_all_scores=True)
    for result in results[0]:
        if result['label']=='LABEL_0':
            rft0.append(result['score'])
        else:
            rft1.append(result['score'])
            
    results = usr_uk_classifier_qqp(sequence, return_all_scores=True)
    for result in results[0]:
        if result['label']=='LABEL_0':
            uk0.append(result['score'])
        else:
            uk1.append(result['score'])

    results = usr_ctx_classifier_qqp(sequence, return_all_scores=True)
    for result in results[0]:
        if result['label']=='LABEL_0':
            ctx0.append(result['score'])
        else:
            ctx1.append(result['score'])


31
31


In [45]:
rft0_score = sum(rft0)/len(rft0)
rft1_score = sum(rft1)/len(rft1)
print('Roberta_fine_tuned:', 'LABEL_0', rft0_score, 'LABEL_1', rft1_score)

uk0_score = sum(uk0)/len(uk0)
uk1_score = sum(uk1)/len(uk1)
print('Use Knowledge:', 'LABEL_0', uk0_score, 'LABEL_1', uk1_score)

ctx0_score = sum(ctx0)/len(ctx0)
ctx1_score = sum(ctx1)/len(ctx1)
print('Maintains context:', 'LABEL_0', ctx0_score, 'LABEL_1', ctx1_score)

Roberta_fine_tuned: LABEL_0 0.4673770070075989 LABEL_1 0.5326229814560183
Use Knowledge: LABEL_0 0.21310751541187206 LABEL_1 0.7868924833113148
Context coherence: LABEL_0 0.37423608579763 LABEL_1 0.6257639068268961


## End of notebook