# Abordagem 4

Usando a abordagem 4 para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística *Vocabullary* com o teste **MFT**.

As etapas desta abordagem são:

1. Classificar as instancias usando o *Oráculo*
2. Filtrar instâncias classificadas de forma unânime
3. Quebrar a instância em sentenças
4. Classificar as sentenças usando o *Oráculo*
5. Filtrar as sentenças classificadas de forma unânime
6. Filtrar as sentenças com alta confiança nas predições
7. Rankear as palavras de cada sentença
8. Filtrar sentenças com palavras relevantes (verbos ou adjetivos) bem rankeadas
9. Filtrar sentenças com alta confiança na predição das palavras relevantes 
10. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
import sys
sys.path.append('../')

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
from datasets import load_dataset

pd.set_option('display.max_colwidth', None)

dataset = load_dataset("rotten_tomatoes")
dataset.set_format("pandas")
df = dataset["test"].shuffle(seed=42)[:100]
df

Unnamed: 0,text,label
0,"unpretentious , charming , quirky , original",1
1,"a film really has to be exceptional to justify a three hour running time , and this isn't .",0
2,working from a surprisingly sensitive script co-written by gianni romoli . . . ozpetek avoids most of the pitfalls you'd expect in such a potentially sudsy set-up .,1
3,"it may not be particularly innovative , but the film's crisp , unaffected style and air of gentle longing make it unexpectedly rewarding .",1
4,"such a premise is ripe for all manner of lunacy , but kaufman and gondry rarely seem sure of where it should go .",0
...,...,...
95,"ice age is the first computer-generated feature cartoon to feel like other movies , and that makes for some glacial pacing early on .",0
96,there's no denying that burns is a filmmaker with a bright future ahead of him .,1
97,it collapses when mr . taylor tries to shift the tone to a thriller's rush .,0
98,"there's a great deal of corny dialogue and preposterous moments . and yet , it still works .",1


In [3]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
rotten_tomatoes_models = {
    'bert': 'textattack/bert-base-uncased-rotten-tomatoes', 
    'albert': 'textattack/albert-base-v2-rotten-tomatoes', 
    'distilbert': 'textattack/distilbert-base-uncased-rotten-tomatoes', 
    'roberta': 'textattack/roberta-base-rotten-tomatoes', 
    'xlnet': 'textattack/xlnet-base-cased-rotten-tomatoes', 
}

In [4]:
m1 = load_model(rotten_tomatoes_models['albert'])
m2 = load_model(rotten_tomatoes_models['distilbert'])
m3 = load_model(rotten_tomatoes_models['roberta'])
m4 = load_model(rotten_tomatoes_models['xlnet'])

# Models to be used as oracle
models = [m1, m2, m3, m4]
# Target model
model = load_model(rotten_tomatoes_models['bert'])

Loading model textattack/albert-base-v2-rotten-tomatoes...
Loading model textattack/distilbert-base-uncased-rotten-tomatoes...
Loading model textattack/roberta-base-rotten-tomatoes...


Some weights of the model checkpoint at textattack/roberta-base-rotten-tomatoes were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading model textattack/xlnet-base-cased-rotten-tomatoes...
Loading model textattack/bert-base-uncased-rotten-tomatoes...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [5]:
from template_generator_static_lex.tasks.sentiment_analisys import PosNegTemplateGeneratorApp4

tg = PosNegTemplateGeneratorApp4(model, models)

### Número inicial de instâncias: 5

In [6]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [7]:
templates = tg.generate_templates(instances, ranked_words_count=4, min_classification_score=0.8)

Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Instance predictions done.
Filtering instances classified unanimously...
:: 5 instances remaining.
Converting texts to sentences...
:: 5 sentences were generated.
Predicting inputs...
:: Sentence predictions done.
Filtering instances classified unanimously...
:: 5 sentences remaining.
Filtering instances by classification score greater than 0.8
:: 5 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
Filtering instances by relevant words...
:: 3 sentences remaining.
Filtering instances by relevant words classification score greater than 0.8
:: 2 sentences remaining.


#### Tempo de execução para 5 instâncias: 1m 23.9s
filipe: 43.6s

In [8]:
tg.to_dataframe()


Unnamed: 0,label,original_text,masked_text,template_text
0,0,the rules of attraction gets us too drunk on the party favors to sober us up with the transparent attempts at moralizing .,the rules of attraction {mask} us too {mask} on the party favors to sober us up with the transparent attempts at moralizing .,the rules of attraction {pos_verb} us too {neg_adj} on the party favors to sober us up with the transparent attempts at moralizing .
1,0,"the high-concept scenario soon proves preposterous , the acting is robotically italicized , and truth-in-advertising hounds take note : there's very little hustling on view .","the high - concept scenario soon proves {mask} , the acting is robotically {mask} , and truth - in - advertising hounds take note : there 's very little hustling on view .","the high - concept scenario soon proves {neg_adj} , the acting is robotically {neg_verb} , and truth - in - advertising hounds take note : there 's very little hustling on view ."


In [9]:
tg.lexicons

{'pos_verb': ['carries',
  'loved',
  'evoked',
  'gets',
  'shares',
  'celebrating',
  'enjoy'],
 'neg_verb': ['squanders',
  'removed',
  'withered',
  'undermines',
  'dismissed',
  'simpering',
  'avoids',
  'italicized'],
 'pos_adj': ['powerful',
  'timeless',
  'notch',
  'worth',
  'brilliant',
  'engrossing',
  'terrific',
  'funny',
  'successful',
  'passionate',
  'breathtaking',
  'tender',
  'phenomenal',
  'wonderful'],
 'neg_adj': ['undeterminable',
  'tiresome',
  'manipulative',
  'unpretentious',
  'sensitive',
  'off',
  'preposterous',
  'dumb',
  'stupid',
  'psychopathic',
  'drunk',
  'nasty',
  'cumbersome',
  'bad']}

### Número inicial de instâncias: 100

In [10]:
# Using all 100 instances
instances = [x for x in df['text'].values]

In [11]:
%%time
# 1m 35.7s
tg = PosNegTemplateGeneratorApp4(model, models)
templates = tg.generate_templates(instances, ranked_words_count=4, min_classification_score=0.8)

Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Instance predictions done.
Filtering instances classified unanimously...
:: 74 instances remaining.
Converting texts to sentences...
:: 76 sentences were generated.
Predicting inputs...
:: Sentence predictions done.
Filtering instances classified unanimously...
:: 76 sentences remaining.
Filtering instances by classification score greater than 0.8
:: 70 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
Filtering instances by relevant words...
:: 32 sentences remaining.
Filtering instances by relevant words classification score greater than 0.8
:: 19 sentences remaining.
CPU times: user 11min 29s, sys: 908 ms, total: 11min 30s
Wall time: 1min 10s


#### Tempo de execução para 100 instâncias: 1m 10.4s
filipe: 1m 10.4s

In [12]:
tg.to_dataframe()

Unnamed: 0,label,original_text,masked_text,template_text
0,1,the stunt work is top-notch ; the dialogue and drama often food-spittingly funny .,the stunt work is top - {mask} ; the dialogue and drama often food - spittingly {mask} .,the stunt work is top - {pos_adj} ; the dialogue and drama often food - spittingly {pos_adj} .
1,1,an original and highly cerebral examination of the psychopathic mind,an {mask} and highly cerebral examination of the {mask} mind,an {neg_adj} and highly cerebral examination of the {neg_adj} mind
2,0,"a rip-off twice removed , modeled after [seagal's] earlier copycat under siege , sometimes referred to as die hard on a boat .","a rip - off twice {mask} , {mask} after [ seagal 's ] earlier copycat under siege , sometimes referred to as die hard on a boat .","a rip - off twice {neg_verb} , {neg_verb} after [ seagal 's ] earlier copycat under siege , sometimes referred to as die hard on a boat ."
3,0,"the dialogue is cumbersome , the simpering soundtrack and editing more so .","the dialogue is {mask} , the {mask} soundtrack and editing more so .","the dialogue is {neg_adj} , the {neg_verb} soundtrack and editing more so ."
4,1,"an engrossing story that combines psychological drama , sociological reflection , and high-octane thriller .","an {mask} story that {mask} psychological drama , sociological reflection , and high - octane thriller .","an {pos_adj} story that {pos_verb} psychological drama , sociological reflection , and high - octane thriller ."
5,1,"in imax in short , it's just as wonderful on the big screen .","in imax in {mask} , it 's just as {mask} on the big screen .","in imax in {neg_adj} , it 's just as {pos_adj} on the big screen ."
6,0,the rules of attraction gets us too drunk on the party favors to sober us up with the transparent attempts at moralizing .,the rules of attraction {mask} us too {mask} on the party favors to sober us up with the transparent attempts at moralizing .,the rules of attraction {pos_verb} us too {neg_adj} on the party favors to sober us up with the transparent attempts at moralizing .
7,1,manages to accomplish what few sequels can -- it equals the original and in some ways even betters it .,{mask} to accomplish what few sequels can -- it equals the {mask} and in some ways even betters it .,{pos_verb} to accomplish what few sequels can -- it equals the {neg_adj} and in some ways even betters it .
8,0,""" one look at a girl in tight pants and big tits and you turn stupid ? "" um . . isn't that the basis for the entire plot ?",""" one look at a girl in tight pants and big tits and you turn {mask} ? "" um . . is n't that the basis for the {mask} plot ?",""" one look at a girl in tight pants and big tits and you turn {neg_adj} ? "" um . . is n't that the basis for the {neg_adj} plot ?"
9,0,charly comes off as emotionally manipulative and sadly imitative of innumerable past love story derisions .,charly comes off as emotionally {mask} and sadly {mask} of innumerable past love story derisions .,charly comes off as emotionally {neg_adj} and sadly {pos_adj} of innumerable past love story derisions .


In [13]:
tg.lexicons

{'pos_verb': ['carries',
  'loved',
  'evoked',
  'gets',
  'manages',
  'shares',
  'combines',
  'celebrating',
  'enjoy'],
 'neg_verb': ['squanders',
  'sum',
  'removed',
  'withered',
  'modeled',
  'spare',
  'undermines',
  'dismissed',
  'simpering',
  'avoids',
  'italicized'],
 'pos_adj': ['powerful',
  'timeless',
  'notch',
  'worth',
  'unconditional',
  'brilliant',
  'engrossing',
  'terrific',
  'funny',
  'imitative',
  'successful',
  'passionate',
  'breathtaking',
  'tender',
  'phenomenal',
  'wonderful'],
 'neg_adj': ['sensitive',
  'preposterous',
  'stupid',
  'entire',
  'bad',
  'manipulative',
  'off',
  'cumbersome',
  'dumb',
  'undeterminable',
  'tiresome',
  'drunk',
  'nasty',
  'short',
  'original',
  'wannabe',
  'psychopathic',
  'previous',
  'unpretentious']}

# Usando os templates gerados pelo TemplateGenerator no CheckList

In [14]:
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [15]:
lexicons = tg.lexicons
templates = tg.template_texts
masked = tg.masked_texts
labels = [sent.prediction.label for sent in tg.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [16]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary"))

In [17]:
suite.run(model.predict, overwrite=True)
suite.save('./suites/posneg-approach4.suite')

Running Test: MFT with vocabullary - template1
Predicting 16 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 19 examples
Running Test: MFT with vocabullary - template3
Predicting 11 examples
Running Test: MFT with vocabullary - template4
Predicting 209 examples
Running Test: MFT with vocabullary - template5
Predicting 144 examples
Running Test: MFT with vocabullary - template6
Predicting 304 examples
Running Test: MFT with vocabullary - template7
Predicting 171 examples
Running Test: MFT with vocabullary - template8
Predicting 171 examples
Running Test: MFT with vocabullary - template9
Predicting 19 examples
Running Test: MFT with vocabullary - template10
Predicting 304 examples
Running Test: MFT with vocabullary - template11
Predicting 304 examples
Running Test: MFT with vocabullary - template12
Predicting 209 examples
Running Test: MFT with vocabullary - template13
Predicting 209 examples
Running Test: MFT with vocabullary - template14
Predicting 304 examples
Running Test: MFT with vocabullary - template15
Predicting 176 examples
Run

# Carregando suite de teste

In [18]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-approach4.suite')

# suite.visual_summary_table()

In [19]:
passed = 0
failed = 0
for test_name in suite.tests:
    table = suite.visual_summary_by_test(test_name)
    
    failed += table.stats['nfailed']    
    passed += table.stats['npassed']
    assert table.stats['nfailed'] + table.stats['npassed'] == len(table.filtered_testcases)

print(f"{failed = } ({(failed/(passed+failed))*100:.2f}%)")
print(f"{passed = } ({(passed/(passed+failed))*100:.2f}%)")
print(f"total = {passed+failed}")
print("templates:", len(suite.tests))

failed = 468 (14.44%)
passed = 2774 (85.56%)
total = 3242
templates: 19


In [20]:
table = suite.visual_summary_by_test('Test: MFT with vocabullary - template8')

failed = table.candidate_testcases
tests = table.filtered_testcases

for item in tests:
    # if not item in failed:
    print(item['examples'][0])

{'new': {'text': 'carries to accomplish what few sequels can -- it equals the sensitive and in some ways even betters it .', 'pred': '1', 'conf': 0.9975841, 'tokens': [['carries', 'to', 'accomplish', 'what', 'few', 'sequels', 'can', '--', 'it', 'equals', 'the', 'sensitive', 'and', 'in', 'some', 'ways', 'even', 'betters', 'it', '.']]}, 'old': None, 'label': 1, 'succeed': 1}
{'new': {'text': 'loved to accomplish what few sequels can -- it equals the sensitive and in some ways even betters it .', 'pred': '1', 'conf': 0.9982237, 'tokens': [['loved', 'to', 'accomplish', 'what', 'few', 'sequels', 'can', '--', 'it', 'equals', 'the', 'sensitive', 'and', 'in', 'some', 'ways', 'even', 'betters', 'it', '.']]}, 'old': None, 'label': 1, 'succeed': 1}
{'new': {'text': 'evoked to accomplish what few sequels can -- it equals the sensitive and in some ways even betters it .', 'pred': '1', 'conf': 0.99690837, 'tokens': [['evoked', 'to', 'accomplish', 'what', 'few', 'sequels', 'can', '--', 'it', 'equals'