# Abordagem 5

Usando a abordagem 5 para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística "Vocabulary" com o teste MFT.

As etapas desta abordagem são:

1. Quebrar as instâncias em sentenças
2. Rankear as palavras de cada sentença
3. Filtrar as sentenças pelo tamanho (maior ou igual a 5 palavras)
4. Filtrar sentenças com palavras relevantes (verbos ou adjetivos)
5. Filtrar sentenças com alta confiança na predição das palavras relevantes da etapa anterior
6. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
import sys
sys.path.append('../')

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
from datasets import load_dataset

pd.set_option('display.max_colwidth', None)

dataset = load_dataset("rotten_tomatoes")
dataset.set_format("pandas")
df = dataset["test"].shuffle(seed=42)[:100]
df

Unnamed: 0,text,label
0,"unpretentious , charming , quirky , original",1
1,"a film really has to be exceptional to justify a three hour running time , and this isn't .",0
2,working from a surprisingly sensitive script co-written by gianni romoli . . . ozpetek avoids most of the pitfalls you'd expect in such a potentially sudsy set-up .,1
3,"it may not be particularly innovative , but the film's crisp , unaffected style and air of gentle longing make it unexpectedly rewarding .",1
4,"such a premise is ripe for all manner of lunacy , but kaufman and gondry rarely seem sure of where it should go .",0
...,...,...
95,"ice age is the first computer-generated feature cartoon to feel like other movies , and that makes for some glacial pacing early on .",0
96,there's no denying that burns is a filmmaker with a bright future ahead of him .,1
97,it collapses when mr . taylor tries to shift the tone to a thriller's rush .,0
98,"there's a great deal of corny dialogue and preposterous moments . and yet , it still works .",1


In [3]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
rotten_tomatoes_models = {
    'bert': 'textattack/bert-base-uncased-rotten-tomatoes', 
    'albert': 'textattack/albert-base-v2-rotten-tomatoes', 
    'distilbert': 'textattack/distilbert-base-uncased-rotten-tomatoes', 
    'roberta': 'textattack/roberta-base-rotten-tomatoes', 
    'xlnet': 'textattack/xlnet-base-cased-rotten-tomatoes', 
    
}

In [4]:
m1 = load_model(rotten_tomatoes_models['albert'])
m2 = load_model(rotten_tomatoes_models['distilbert'])
m3 = load_model(rotten_tomatoes_models['roberta'])
m4 = load_model(rotten_tomatoes_models['xlnet'])

# Models to be used as oracle
models = [m1, m2, m3, m4]
# Target model
model = load_model(rotten_tomatoes_models['bert'])

Loading model textattack/albert-base-v2-rotten-tomatoes...
Loading model textattack/distilbert-base-uncased-rotten-tomatoes...
Loading model textattack/roberta-base-rotten-tomatoes...


Some weights of the model checkpoint at textattack/roberta-base-rotten-tomatoes were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading model textattack/xlnet-base-cased-rotten-tomatoes...
Loading model textattack/bert-base-uncased-rotten-tomatoes...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [5]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorApp5

### Número inicial de instâncias: 5

In [6]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [7]:
tg = PosNegTemplateGeneratorApp5(model, models)
templates = tg.generate_templates(instances, n_masks=2, ranked_words_count=4, min_classification_score=0.8)

Converting texts to sentences...
:: 6 sentences were generated.
Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Word ranking done.
Filtering instances by contaning a minimmum of words: 5...
:: 6 sentences remaining.
Filtering instances by relevant words...
:: 3 sentences remaining.
Filtering instances by relevant words classification score greater than 0.8
:: 2 sentences remaining.
Predicting inputs...
:: Sentence predictions done.


#### Tempo de execução para 5 instâncias: 1m 15.8s
filipe: 41.6s

In [8]:
tg.to_dataframe()

Unnamed: 0,label,original_text,masked_text,template_text
0,0,the rules of attraction gets us too drunk on the party favors to sober us up with the transparent attempts at moralizing .,the rules of attraction {mask} us too {mask} on the party favors to sober us up with the transparent attempts at moralizing .,the rules of attraction {pos_verb} us too {neg_verb} on the party favors to sober us up with the transparent attempts at moralizing .
1,0,"the high-concept scenario soon proves preposterous , the acting is robotically italicized , and truth-in-advertising hounds take note : there's very little hustling on view .","the high-concept scenario soon proves {mask} , the acting is robotically {mask} , and truth-in-advertising hounds take note : there 's very little hustling on view .","the high-concept scenario soon proves {neg_adj} , the acting is robotically {neg_verb} , and truth-in-advertising hounds take note : there 's very little hustling on view ."


In [9]:
tg.lexicons

{'pos_verb': ['gets'],
 'neg_verb': ['italicized', 'drunk'],
 'pos_adj': [],
 'neg_adj': ['preposterous']}

### Número inicial de instâncias: 100

In [10]:
# Using all 100 instances
instances = [x for x in df['text'].values]

In [11]:
%%time
# 1m 9.1s
tg = PosNegTemplateGeneratorApp5(model, models)
templates = tg.generate_templates(instances, n_masks=2, ranked_words_count=4, min_classification_score=0.8)

Converting texts to sentences...
:: 138 sentences were generated.
Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Word ranking done.
Filtering instances by contaning a minimmum of words: 5...
:: 125 sentences remaining.
Filtering instances by relevant words...
:: 48 sentences remaining.
Filtering instances by relevant words classification score greater than 0.8
:: 25 sentences remaining.
Predicting inputs...
:: Sentence predictions done.
CPU times: user 9min 32s, sys: 185 ms, total: 9min 32s
Wall time: 57.4 s


#### Tempo de execução para 100 instâncias: 1m 5.6s

In [12]:
tg.to_dataframe()

Unnamed: 0,label,original_text,masked_text,template_text
0,1,"unpretentious , charming , quirky , original","{mask} , charming , quirky , {mask}","{neg_adj} , charming , quirky , {neg_adj}"
1,0,"we started to wonder if some unpaid intern had just typed 'chris rock , ' 'anthony hopkins' and 'terrorists' into some univac-like script machine .","we {mask} to {mask} if some unpaid intern had just typed 'chris rock , ' 'anthony hopkins ' and 'terrorists ' into some univac-like script machine .","we {neg_verb} to {neg_verb} if some unpaid intern had just typed 'chris rock , ' 'anthony hopkins ' and 'terrorists ' into some univac-like script machine ."
2,0,"what ensues are much blood-splattering , mass drug-induced bowel evacuations , and none-too-funny commentary on the cultural distinctions between americans and brits .","what ensues {mask} much blood-splattering , mass drug-induced bowel evacuations , and {mask} commentary on the cultural distinctions between americans and brits .","what ensues {neg_verb} much blood-splattering , mass drug-induced bowel evacuations , and {neg_adj} commentary on the cultural distinctions between americans and brits ."
3,1,the stunt work is top-notch ; the dialogue and drama often food-spittingly funny .,the stunt work is {mask} ; the dialogue and drama often food-spittingly {mask} .,the stunt work is {pos_adj} ; the dialogue and drama often food-spittingly {pos_adj} .
4,1,an original and highly cerebral examination of the psychopathic mind,an {mask} and highly cerebral examination of the {mask} mind,an {neg_adj} and highly cerebral examination of the {neg_adj} mind
5,0,"death to smoochy tells a moldy-oldie , not-nearly -as-nasty -as-it- thinks-it-is joke .","death to {mask} tells a moldy-oldie , not-nearly {mask} -as-it- thinks-it-is joke .","death to {neg_verb} tells a moldy-oldie , not-nearly {neg_adj} -as-it- thinks-it-is joke ."
6,0,"a rip-off twice removed , modeled after [seagal's] earlier copycat under siege , sometimes referred to as die hard on a boat .","a rip-off twice {mask} , {mask} after [ seagal 's ] earlier copycat under siege , sometimes referred to as die hard on a boat .","a rip-off twice {neg_verb} , {neg_verb} after [ seagal 's ] earlier copycat under siege , sometimes referred to as die hard on a boat ."
7,1,"what might have been readily dismissed as the tiresome rant of an aging filmmaker still thumbing his nose at convention takes a surprising , subtle turn at the midway point .","what might have been readily {mask} as the {mask} rant of an aging filmmaker still thumbing his nose at convention takes a surprising , subtle turn at the midway point .","what might have been readily {neg_verb} as the {neg_adj} rant of an aging filmmaker still thumbing his nose at convention takes a surprising , subtle turn at the midway point ."
8,0,too bad none of it is funny .,too {mask} none of it is {mask} .,too {neg_adj} none of it is {pos_adj} .
9,0,"the dialogue is cumbersome , the simpering soundtrack and editing more so .","the dialogue is cumbersome , the {mask} soundtrack and {mask} more so .","the dialogue is cumbersome , the {neg_verb} soundtrack and {pos_verb} more so ."


In [13]:
print(tg.lexicons)

{'pos_verb': ['combines', 'gets', 'editing', 'talking', 'engrossing', 'is'], 'neg_verb': ['started', 'dismissed', 'drunk', 'modeled', 'italicized', 'undermines', 'removed', 'was', 'are', 'wonder', 'simpering', 'smoochy', 'can', 'hoping'], 'pos_adj': ['powerful', 'phenomenal', 'funny', 'imitative', 'successful', 'top-notch', 'wonderful', 'timeless'], 'neg_adj': ['psychopathic', 'sappy', 'none-too-funny', 'bad', 'manipulative', 'short', 'faux-urban', 'credulous', 'mumbo', 'previous', 'original', '-as-nasty', 'preposterous', 'hotter-two-years-ago', 'tiresome', 'same', 'unpretentious']}


# Usando os templates gerados pelo TemplateGenerator no CheckList

In [14]:
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [15]:
lexicons = tg.lexicons
templates = tg.template_texts
masked = tg.masked_texts
labels = [sent.prediction.label for sent in tg.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [16]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary"))

In [17]:
suite.run(model.predict, overwrite=True)
suite.save('./suites/posneg-approach5.suite')

Running Test: MFT with vocabullary - template1
Predicting 17 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 14 examples
Running Test: MFT with vocabullary - template3
Predicting 238 examples
Running Test: MFT with vocabullary - template4
Predicting 8 examples
Running Test: MFT with vocabullary - template5
Predicting 17 examples
Running Test: MFT with vocabullary - template6
Predicting 238 examples
Running Test: MFT with vocabullary - template7
Predicting 14 examples
Running Test: MFT with vocabullary - template8
Predicting 238 examples
Running Test: MFT with vocabullary - template9
Predicting 136 examples
Running Test: MFT with vocabullary - template10
Predicting 84 examples
Running Test: MFT with vocabullary - template11
Predicting 8 examples
Running Test: MFT with vocabullary - template12
Predicting 48 examples
Running Test: MFT with vocabullary - template13
Predicting 6 examples
Running Test: MFT with vocabullary - template14
Predicting 136 examples
Running Test: MFT with vocabullary - template15
Predicting 84 examples
Running Test

# Carregando suite de teste

In [18]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-approach5.suite')

# suite.visual_summary_table()

In [19]:
passed = 0
failed = 0
for test_name in suite.tests:
    table = suite.visual_summary_by_test(test_name)
    
    failed += table.stats['nfailed']    
    passed += table.stats['npassed']
    assert table.stats['nfailed'] + table.stats['npassed'] == len(table.filtered_testcases)

print(f"{failed = } ({(failed/(passed+failed))*100:.2f}%)")
print(f"{passed = } ({(passed/(passed+failed))*100:.2f}%)")
print(f"total = {passed+failed}")
print("templates:", len(suite.tests))

failed = 372 (16.47%)
passed = 1887 (83.53%)
total = 2259
templates: 25


In [20]:
table = suite.visual_summary_by_test('Test: MFT with vocabullary - template3')

for item in table.candidate_testcases:
    print(item['examples'][0]['new']['text'])

one of the smarter mesmerizing the horror genre missing produced in recent memory , even if it 's far tamer than advertised .
one of the smarter becoming the horror genre overacted produced in recent memory , even if it 's far tamer than advertised .
one of the smarter becoming the horror genre bother produced in recent memory , even if it 's far tamer than advertised .
one of the smarter becoming the horror genre missing produced in recent memory , even if it 's far tamer than advertised .
one of the smarter is the horror genre overacted produced in recent memory , even if it 's far tamer than advertised .
one of the smarter is the horror genre bother produced in recent memory , even if it 's far tamer than advertised .
one of the smarter is the horror genre forced produced in recent memory , even if it 's far tamer than advertised .
one of the smarter is the horror genre missing produced in recent memory , even if it 's far tamer than advertised .
one of the smarter provides the horr

In [20]:
passed = 0
failed = 0
for i in range(len(templates)):
    table = suite.visual_summary_by_test(f'Test: MFT with vocabullary - template{i+1}')
    failed = failed + len(table.candidate_testcases)    
    passed = passed + len(table.filtered_testcases)

print(f"{failed=}", f"{passed=}", f"{passed+failed=}", sep="\n")

failed=639
passed=4110
passed+failed=4749


: 