# Abordagem 4

Usando a abordagem 4 para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística *Vocabullary* com o teste **MFT**.

As etapas desta abordagem são:

1. Classificar as instancias usando o *Oráculo*
2. Filtrar instâncias classificadas de forma unânime
3. Quebrar a instância em sentenças
4. Classificar as sentenças usando o *Oráculo*
5. Filtrar as sentenças classificadas de forma unânime
6. Filtrar as sentenças com alta confiança nas predições
7. Rankear as palavras de cada sentença
8. Filtrar sentenças com palavras relevantes (verbos ou adjetivos) bem rankeadas
9. Filtrar sentenças com alta confiança na predição das palavras relevantes 
10. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
import sys
sys.path.append('../')

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', None)
from datasets import load_dataset

dataset = load_dataset("amazon_polarity")
dataset.set_format("pandas")
df = dataset["test"].shuffle(seed=42)[:100]
df = df.rename(columns={"content": "text"}).drop(columns=["title"])
df

Unnamed: 0,label,text
0,1,"The product works fine. I ordered the more exprensive one after I read reviews from others on Amazon. My husband likes the presser. It does a good job pressing his pants. However, it was damaged in the box when we received it. We decided it was too much trouble to send it back. The box was torn and the presser had a chuck knocked out of it."
1,0,"This book is so useless that I feel compelled to write a review to warn others to stay away from this book. A good tutorial should inspire the user on what he/she can do with the product. This book leads you to believe that without talent, the only thing you can do with Illustrator is to draw circles and squares. The book is a disservice to both the reader and to Adobe Illustrator."
2,0,The authors attempt an ambitious goal of covering many SOA topics - but their resulting text come across as scattered - vague - and lacking a coherent and practical application.Thomas Erl's books are much better written - and have a coherent approch to buliding a solid body of knowledge.For a manager / salesperson wanting a broad overview of SOA - they might be better served by reading Service Oriented Architecture For DummiesService Oriented Architecture For Dummies (For Dummies (Computer/Tech))
3,0,I ordered this product and did recieve then a couple months later it broke. Now Ive done everything I was told to do by by shipping back for a replacement and nothing. They wont return Emails i havent received the replacement part.
4,0,"I hated this movie. It was so silly. The girl made the cult look more stupid than they already were. Come on? She was from the future??? I can't stop laughing. Maybe, I missed something. I don't think I did. When it first started, I said to myself: What am I watching this for? I thought it was stupid, stupid and then more stupid. I kept watching, trying to make sense of it, but to no avail. I didn't want to waste my $1.00 rental fee."
...,...,...
95,1,"What a gloriously funny book! Even the recipies were funny, and well, how funny did you think a recipie could be?! I ""discovered"" this book en route to Jamaica back in May--the stranger next to me read it all the way there. Well, the cover just grabbed me and I HAD to have it. It was a quick, light read that had a very wise and uplifting last chapter. Oh, and for those who are clueless like me in the beginning, this is not a fiction novel, but a wacky manual about life, love and other good stuff that we should all follow to the hilt!"
96,0,"If you want Harman Kardon receivers it's ok. Even most of the DVD's. I own a 22 and a 31 and I also got this one which is really annoying.Issues:- it does not save caption settings- it does not save video settings; even after I set it up to be 16:9 1080i default it always reverted to 720p.- after a period of time the DVD unit itself refused to read discsI returned to HK, got a replacement and I'm testing it to see if there are any improvements, but... I think this is unacceptable for HK. After all I did not buy an 80$ Sony, and if I bought HK I bought it for the name which supposedley means quality."
97,0,"Same problems as everybody else. 14 months after purchase it ate the card. Tried 2 different cards, no dice for either. From love to hate. Dang. Also Canon's support website/acknowledgement of this problem is non-existent. It was hard enough to navigate their site, but it's impossible to find anything relevant."
98,0,I can be tough on safety glasses so it may be no fault of the mfg but IMO the lenses scuffed and scratched rather quickly.


In [3]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
rotten_tomatoes_models = {
    'bert': 'pig4431/amazonPolarity_BERT_5E', 
    'distilbert': 'pig4431/amazonPolarity_DistilBERT_5E', 
    'roberta': 'pig4431/amazonPolarity_roBERTa_5E', 
    'albert': 'pig4431/amazonPolarity_ALBERT_5E',
    'xlnet': 'pig4431/amazonPolarity_XLNET_5E', 
}

In [4]:
m1 = load_model(rotten_tomatoes_models['albert'])
m2 = load_model(rotten_tomatoes_models['distilbert'])
m3 = load_model(rotten_tomatoes_models['roberta'])
m4 = load_model(rotten_tomatoes_models['xlnet'])

# Models to be used as oracle
models = [m1, m2, m3, m4]
# Target model
model = load_model(rotten_tomatoes_models['bert'])

Loading model pig4431/amazonPolarity_ALBERT_5E...
Loading model pig4431/amazonPolarity_DistilBERT_5E...
Loading model pig4431/amazonPolarity_roBERTa_5E...
Loading model pig4431/amazonPolarity_XLNET_5E...
Loading model pig4431/amazonPolarity_BERT_5E...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [5]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorApp4

tg = PosNegTemplateGeneratorApp4(model, models)

### Número inicial de instâncias: 5

In [6]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [7]:
templates = tg.generate_templates(instances, ranked_words_count=4, min_classification_score=0.8)

Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Instance predictions done.
Filtering instances classified unanimously...
:: 5 instances remaining.
Converting texts to sentences...
:: 22 sentences were generated.
Predicting inputs...
:: Sentence predictions done.
Filtering instances classified unanimously...
:: 22 sentences remaining.
Filtering instances by classification score greater than 0.8
:: 22 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
Filtering instances by relevant words...
:: 8 sentences remaining.
Filtering instances by relevant words classification score greater than 0.8
:: 3 sentences remaining.


#### Tempo de execução para 5 instâncias: 1m 23.9s
filipe: 43.6s

In [8]:
tg.to_dataframe()


Unnamed: 0,label,original_text,masked_text,template_text
0,0,"The opening is so small, that anything larger or longer than a cheerio is impossible to pick up.","The opening {mask} so small , that anything larger or {mask} than a cheerio is impossible to pick up .","The opening {neg_verb} so small , that anything larger or {neg_adj} than a cheerio is impossible to pick up ."
1,0,It got boring and monotonous quick.,It {mask} boring and {mask} quick .,It {neg_verb} boring and {neg_adj} quick .
2,1,Huggies customer service is amazing as well!,Huggies customer service {mask} {mask} as well !,Huggies customer service {neg_verb} {pos_adj} as well !


In [9]:
tg.lexicons

{'pos_verb': [],
 'neg_verb': ['got', 'is'],
 'pos_adj': ['amazing'],
 'neg_adj': ['monotonous', 'longer']}

### Número inicial de instâncias: 100

In [10]:
# Using all 100 instances
instances = [x for x in df['text'].values]

In [11]:
%%time
# 1m 7.1s
tg = PosNegTemplateGeneratorApp4(model, models)
templates = tg.generate_templates(instances, ranked_words_count=4, min_classification_score=0.8)

Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Instance predictions done.
Filtering instances classified unanimously...
:: 84 instances remaining.
Converting texts to sentences...
:: 399 sentences were generated.
Predicting inputs...
:: Sentence predictions done.
Filtering instances classified unanimously...
:: 297 sentences remaining.
Filtering instances by classification score greater than 0.8
:: 297 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
Filtering instances by relevant words...
:: 118 sentences remaining.
Filtering instances by relevant words classification score greater than 0.8
:: 71 sentences remaining.
CPU times: user 39min 10s, sys: 1.09 s, total: 39min 11s
Wall time: 4min 15s


#### Tempo de execução para 100 instâncias: 1m 10.4s
filipe: 1m 10.4s

In [12]:
df_templates = tg.to_dataframe()
df_templates.insert(0, "template_index", df_templates.index.map(lambda x: int(x)+1))
df_templates

Unnamed: 0,template_index,label,original_text,masked_text,template_text
0,1,0,This book is so useless that I feel compelled to write a review to warn others to stay away from this book.,This book is so {mask} that I feel {mask} to write a review to warn others to stay away from this book .,This book is so {neg_adj} that I feel {neg_verb} to write a review to warn others to stay away from this book .
1,2,0,The girl made the cult look more stupid than they already were.,The girl made the cult {mask} more {mask} than they already were .,The girl made the cult {neg_verb} more {neg_adj} than they already were .
2,3,1,I can't stop laughing.,I ca n't {mask} {mask} .,I ca n't {neg_verb} {pos_verb} .
3,4,0,I don't think I did.,I {mask} n't {mask} I did .,I {neg_verb} n't {neg_verb} I did .
4,5,1,"Nirvana got mega-popular after ""Smells like teen spirit"" was debuted on MTV, and remained popular up until he killed himself.","Nirvana got mega-popular after `` Smells like teen spirit '' was debuted on MTV , and {mask} popular up until he {mask} himself .","Nirvana got mega-popular after `` Smells like teen spirit '' was debuted on MTV , and {neg_verb} popular up until he {neg_verb} himself ."
...,...,...,...,...,...
66,67,1,Another great french film.,Another {mask} {mask} film .,Another {pos_adj} {neg_adj} film .
67,68,0,"I ordered a paperback copy of this book which was supposed to be ""like new"".",I ordered a paperback copy of this book which was {mask} to {mask} `` like new '' .,I ordered a paperback copy of this book which was {neg_verb} to {neg_verb} `` like new '' .
68,69,0,The book I received was a hardcover copy not to mention that the book was pulling away from its binding.,The book I received {mask} a hardcover copy not to {mask} that the book was pulling away from its binding .,The book I received {neg_verb} a hardcover copy not to {neg_verb} that the book was pulling away from its binding .
69,70,0,"It was hard enough to navigate their site, but it's impossible to find anything relevant.","It was hard enough to navigate their site , but it 's {mask} to find anything {mask} .","It was hard enough to navigate their site , but it 's {neg_adj} to find anything {neg_adj} ."


In [13]:
df_templates.to_csv("generated_templates/generated_templates_approach4.csv", index=False)


In [16]:
tg.lexicons

{'pos_verb': ['beginning',
  'have',
  'repaired',
  'evolving',
  'finding',
  'Love',
  'understand',
  'hearing',
  'mesmerizing',
  'reborn',
  'love',
  'found',
  'provides',
  'must',
  "'ll",
  'laughing',
  'entertaining.The',
  'start',
  'Arrived'],
 'neg_verb': ['makes',
  'do',
  'works',
  'stop',
  'cost',
  'fall',
  'leaked',
  'based',
  'supposed',
  'compelled',
  'got',
  'hurt',
  'shorted',
  'would',
  'look',
  'dissappointed',
  'experenced',
  'seemed',
  'like',
  'looking',
  'prompted',
  'saw',
  'remained',
  'looked',
  'did',
  'make',
  'had',
  'suggest',
  'think',
  'ripped',
  'being',
  'waste',
  'were',
  'does',
  'Do',
  'mention',
  'are',
  'fix',
  'is',
  'surprised',
  'was',
  'killed',
  'be',
  'use'],
 'pos_adj': ['funniest',
  'good',
  'outstanding',
  'favorite',
  'absolute',
  'due',
  'ultimate',
  'rich',
  'handy',
  'rushed',
  'amazing',
  'awesome',
  'more',
  'nice',
  'great',
  'worth',
  'solid',
  'expensive',
  'fun

# Usando os templates gerados pelo TemplateGenerator no CheckList

In [17]:
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [18]:
lexicons = tg.lexicons
templates = tg.template_texts
masked = tg.masked_texts
labels = [sent.prediction.label for sent in tg.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [19]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary"))

In [20]:
suite.run(model.predict, overwrite=True)
suite.save('./suites/posneg-approach4.suite')

Running Test: MFT with vocabullary - template1
Predicting 1231 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 1231 examples
Running Test: MFT with vocabullary - template3
Predicting 836 examples
Running Test: MFT with vocabullary - template4
Predicting 44 examples
Running Test: MFT with vocabullary - template5
Predicting 44 examples
Running Test: MFT with vocabullary - template6
Predicting 836 examples
Running Test: MFT with vocabullary - template7
Predicting 532 examples
Running Test: MFT with vocabullary - template8
Predicting 1231 examples
Running Test: MFT with vocabullary - template9
Predicting 1231 examples
Running Test: MFT with vocabullary - template10
Predicting 28 examples
Running Test: MFT with vocabullary - template11
Predicting 836 examples
Running Test: MFT with vocabullary - template12
Predicting 836 examples
Running Test: MFT with vocabullary - template13
Predicting 836 examples
Running Test: MFT with vocabullary - template14
Predicting 836 examples
Running Test: MFT with vocabullary - template15
Predicting 836 examples


# Carregando suite de teste

In [21]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-approach4.suite')

# suite.visual_summary_table()

In [22]:
passed = 0
failed = 0
for test_name in suite.tests:
    table = suite.visual_summary_by_test(test_name)
    
    failed += table.stats['nfailed']    
    passed += table.stats['npassed']
    assert table.stats['nfailed'] + table.stats['npassed'] == len(table.filtered_testcases)

print(f"{failed = } ({(failed/(passed+failed))*100:.2f}%)")
print(f"{passed = } ({(passed/(passed+failed))*100:.2f}%)")
print(f"total = {passed+failed}")
print("templates:", len(suite.tests))

failed = 9622 (20.69%)
passed = 36877 (79.31%)
total = 46499
templates: 71


In [23]:
table = suite.visual_summary_by_test('Test: MFT with vocabullary - template1')

for item in table.candidate_testcases:
    print(item['examples'][0]['new']['text'])

This book is so setting that I feel makes to write a review to warn others to stay away from this book .
This book is so setting that I feel do to write a review to warn others to stay away from this book .
This book is so setting that I feel works to write a review to warn others to stay away from this book .
This book is so setting that I feel stop to write a review to warn others to stay away from this book .
This book is so setting that I feel fall to write a review to warn others to stay away from this book .
This book is so setting that I feel based to write a review to warn others to stay away from this book .
This book is so setting that I feel compelled to write a review to warn others to stay away from this book .
This book is so setting that I feel got to write a review to warn others to stay away from this book .
This book is so setting that I feel would to write a review to warn others to stay away from this book .
This book is so setting that I feel experenced to write a 