# Abordagem 1

Usando a abordagem 1 para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística *Vocabullary* com o teste **MFT**.

As etapas desta abordagem são:

1. Rankear as palavras das instâncias completas
2. Quebrar as instâncias em sentenças
3. Filtrar as sentenças que contêm ao menos uma das palavras mais bem rankeadas na etapa anterior
4. Filtrar as sentenças com palavras relevantes (adjetivos ou verbos)
5. Classificar as sentenças usando o *Oráculo*
6. Filtrar as sentenças classificadas de forma unânime
7. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
%load_ext autoreload
%autoreload 2
import sys
sys.path.append('../')

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', None)
from datasets import load_dataset

dataset = load_dataset("amazon_polarity")
dataset.set_format("pandas")
df = dataset["test"].shuffle(seed=42)[:100]
df = df.rename(columns={"content": "text"}).drop(columns=["title"])
df

Unnamed: 0,label,text
0,1,"The product works fine. I ordered the more exprensive one after I read reviews from others on Amazon. My husband likes the presser. It does a good job pressing his pants. However, it was damaged in the box when we received it. We decided it was too much trouble to send it back. The box was torn and the presser had a chuck knocked out of it."
1,0,"This book is so useless that I feel compelled to write a review to warn others to stay away from this book. A good tutorial should inspire the user on what he/she can do with the product. This book leads you to believe that without talent, the only thing you can do with Illustrator is to draw circles and squares. The book is a disservice to both the reader and to Adobe Illustrator."
2,0,The authors attempt an ambitious goal of covering many SOA topics - but their resulting text come across as scattered - vague - and lacking a coherent and practical application.Thomas Erl's books are much better written - and have a coherent approch to buliding a solid body of knowledge.For a manager / salesperson wanting a broad overview of SOA - they might be better served by reading Service Oriented Architecture For DummiesService Oriented Architecture For Dummies (For Dummies (Computer/Tech))
3,0,I ordered this product and did recieve then a couple months later it broke. Now Ive done everything I was told to do by by shipping back for a replacement and nothing. They wont return Emails i havent received the replacement part.
4,0,"I hated this movie. It was so silly. The girl made the cult look more stupid than they already were. Come on? She was from the future??? I can't stop laughing. Maybe, I missed something. I don't think I did. When it first started, I said to myself: What am I watching this for? I thought it was stupid, stupid and then more stupid. I kept watching, trying to make sense of it, but to no avail. I didn't want to waste my $1.00 rental fee."
...,...,...
95,1,"What a gloriously funny book! Even the recipies were funny, and well, how funny did you think a recipie could be?! I ""discovered"" this book en route to Jamaica back in May--the stranger next to me read it all the way there. Well, the cover just grabbed me and I HAD to have it. It was a quick, light read that had a very wise and uplifting last chapter. Oh, and for those who are clueless like me in the beginning, this is not a fiction novel, but a wacky manual about life, love and other good stuff that we should all follow to the hilt!"
96,0,"If you want Harman Kardon receivers it's ok. Even most of the DVD's. I own a 22 and a 31 and I also got this one which is really annoying.Issues:- it does not save caption settings- it does not save video settings; even after I set it up to be 16:9 1080i default it always reverted to 720p.- after a period of time the DVD unit itself refused to read discsI returned to HK, got a replacement and I'm testing it to see if there are any improvements, but... I think this is unacceptable for HK. After all I did not buy an 80$ Sony, and if I bought HK I bought it for the name which supposedley means quality."
97,0,"Same problems as everybody else. 14 months after purchase it ate the card. Tried 2 different cards, no dice for either. From love to hate. Dang. Also Canon's support website/acknowledgement of this problem is non-existent. It was hard enough to navigate their site, but it's impossible to find anything relevant."
98,0,I can be tough on safety glasses so it may be no fault of the mfg but IMO the lenses scuffed and scratched rather quickly.


In [3]:
df["label"].value_counts()

label
0    51
1    49
Name: count, dtype: int64

In [4]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
rotten_tomatoes_models = {
    'bert': 'pig4431/amazonPolarity_BERT_5E', 
    'distilbert': 'pig4431/amazonPolarity_DistilBERT_5E', 
    'roberta': 'pig4431/amazonPolarity_roBERTa_5E', 
    'albert': 'pig4431/amazonPolarity_ALBERT_5E',
    'xlnet': 'pig4431/amazonPolarity_XLNET_5E', 
}

In [5]:
m1 = load_model(rotten_tomatoes_models['albert'])
m2 = load_model(rotten_tomatoes_models['distilbert'])
m3 = load_model(rotten_tomatoes_models['roberta'])
m4 = load_model(rotten_tomatoes_models['xlnet'])

# Models to be used as oracle
models = [m1, m2, m3, m4]
# Target model
model = load_model(rotten_tomatoes_models['bert'])

Loading model pig4431/amazonPolarity_ALBERT_5E...


Downloading (…)lve/main/config.json:   0%|          | 0.00/889 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/46.8M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/548 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.27M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/286 [00:00<?, ?B/s]

Loading model pig4431/amazonPolarity_DistilBERT_5E...


Downloading (…)lve/main/config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/382 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Loading model pig4431/amazonPolarity_roBERTa_5E...


Downloading (…)lve/main/config.json:   0%|          | 0.00/735 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/427 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

Loading model pig4431/amazonPolarity_XLNET_5E...


Downloading (…)lve/main/config.json:   0%|          | 0.00/979 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/469M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/581 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.41M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/346 [00:00<?, ?B/s]

Loading model pig4431/amazonPolarity_BERT_5E...


Downloading (…)lve/main/config.json:   0%|          | 0.00/725 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/433M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/347 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/669k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [6]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorApp1

tg = PosNegTemplateGeneratorApp1(model, models)

### Número inicial de instâncias: 5

In [7]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]
instances

["It's been years since I had a hand-vac, so I spent a lot of time researching the ones available. I decided on this one after reading the reviews, it appeared to be the best candidate for what I needed. Unfortunately, it hasn't been worth the purchase AT ALL! The opening is so small, that anything larger or longer than a cheerio is impossible to pick up. The design of the internal components make it even harder to just keep the debris inside the canister! To top it all off, the device only stays charged for seven minutes according to the manual -- I haven't been able to get it to function well after just a few minutes. There's so many negatives, instead of listing them all -- take my word that this product is a total waste.I've decided to just buy another stand vacuum to keep in place of where we were storing this hand-vac & use the stand vacuum's hose. This hand-vac is a definite inconvenience instead of connivence.",
 "Although the concept of this book is a great idea I just couldn'

In [8]:
templates = tg.generate_templates(instances)

Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


Converting texts to sentences...
:: 22 sentences were generated.
Filtering instances by contaning ranked words...
:: 12 sentences remaining.
Filtering instances by relevant words...
:: 1 sentences remaining.
Predicting inputs...
:: Sentence predictions done.


#### Tempo de execução para 5 instâncias: 6m 8.6s
filipe: 2m 20.5s

In [9]:
tg.to_dataframe()

Unnamed: 0,label,original_text,masked_text,template_text
0,0,It got boring and monotonous quick.,It {mask} boring and {mask} quick .,It {neg_verb} boring and {neg_adj} quick .


In [10]:
tg.lexicons

{'pos_verb': [], 'neg_verb': ['got'], 'pos_adj': [], 'neg_adj': ['monotonous']}

### Número inicial de instâncias: 100

In [11]:
# Using all 100 instances
instances = [x for x in df['text'].values]

In [12]:
%%time
# 12m 31.5s
tg = PosNegTemplateGeneratorApp1(model, models)
templates = tg.generate_templates(instances)

Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


Converting texts to sentences...
:: 467 sentences were generated.
Filtering instances by contaning ranked words...
:: 221 sentences remaining.
Filtering instances by relevant words...
:: 23 sentences remaining.
Predicting inputs...
:: Sentence predictions done.
CPU times: user 2h 5min 9s, sys: 2.28 s, total: 2h 5min 11s
Wall time: 12min 31s


#### Tempo de execução para 100 instâncias: 1m 0.7s
filipe: 1m 0.7s

In [13]:
tg.to_dataframe()

Unnamed: 0,label,original_text,masked_text,template_text
0,1,The product works fine.,The product {mask} {mask} .,The product {neg_verb} {pos_adj} .
1,0,I didn't want to waste my $1.00 rental fee.,I did n't want to {mask} my $ 1.00 {mask} fee .,I did n't want to {neg_verb} my $ 1.00 {pos_adj} fee .
2,1,"Nirvana got mega-popular after ""Smells like teen spirit"" was debuted on MTV, and remained popular up until he killed himself.","Nirvana {mask} {mask} after `` Smells like teen spirit '' was debuted on MTV , and remained popular up until he killed himself .","Nirvana {neg_verb} {pos_adj} after `` Smells like teen spirit '' was debuted on MTV , and remained popular up until he killed himself ."
3,0,cant return it cant fix it.,{mask} return it cant {mask} it .,{neg_adj} return it cant {neg_verb} it .
4,1,Buy it and eat the bill.,{mask} it and {mask} the bill .,{neg_verb} it and {neg_verb} the bill .
5,1,"However, it still provides a good general survey of ethnic earrings from Africa, Asia and the Americas.","However , it still {mask} a {mask} general survey of ethnic earrings from Africa , Asia and the Americas .","However , it still {pos_verb} a {pos_adj} general survey of ethnic earrings from Africa , Asia and the Americas ."
6,1,This looks like a fun game with a really reasonable price.,This {mask} like a {mask} game with a really reasonable price .,This {neg_verb} like a {pos_adj} game with a really reasonable price .
7,0,"We called and called and Toys R us said tha we could not return it, due to fact that we didn't have a reciept.","We called and called and Toys R us {mask} tha we could not return it , {mask} to fact that we did n't have a reciept .","We called and called and Toys R us {neg_verb} tha we could not return it , {pos_adj} to fact that we did n't have a reciept ."
8,0,"If you actually want to buy something at Amazon, do not attempt to buy from Vertex.","If you actually {mask} to buy something at Amazon , do not attempt to {mask} from Vertex .","If you actually {neg_verb} to buy something at Amazon , do not attempt to {neg_verb} from Vertex ."
9,0,The main focus of the book seemed to be how inadequate the Snubby is for self-defense and I like Snubby revolvers.,The main focus of the book {mask} to be how {mask} the Snubby is for self-defense and I like Snubby revolvers .,The main focus of the book {neg_verb} to be how {neg_adj} the Snubby is for self-defense and I like Snubby revolvers .


In [14]:
tg.lexicons

{'pos_verb': ['provides', 'found'],
 'neg_verb': ['Buy',
  'eat',
  'think',
  'got',
  'making',
  'looks',
  'said',
  'listed',
  'was',
  'waste',
  'probaly',
  "'s",
  'cost',
  'want',
  'works',
  'buy',
  'is',
  'hurt',
  'spend',
  'seemed',
  'would',
  'fix'],
 'pos_adj': ['mega-popular',
  'good',
  'new',
  'fine',
  'great',
  'due',
  'rental',
  'fun'],
 'neg_adj': ['unacceptable',
  'horrible',
  'hard',
  'old',
  'little',
  'cant',
  'inadequate',
  'monotonous',
  'impossible']}

# Usando os templates gerados pelo TemplateGenerator no CheckList

In [15]:
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [16]:
lexicons = tg.lexicons
templates = tg.template_texts
masked = tg.masked_texts
labels = [sent.prediction.label for sent in tg.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [17]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary"))

In [18]:
suite.run(model.predict, overwrite=True)
suite.save('./suites/posneg-approach1.suite')

Running Test: MFT with vocabullary - template1
Predicting 176 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 176 examples
Running Test: MFT with vocabullary - template3
Predicting 176 examples
Running Test: MFT with vocabullary - template4
Predicting 198 examples
Running Test: MFT with vocabullary - template5
Predicting 22 examples
Running Test: MFT with vocabullary - template6
Predicting 16 examples
Running Test: MFT with vocabullary - template7
Predicting 176 examples
Running Test: MFT with vocabullary - template8
Predicting 176 examples
Running Test: MFT with vocabullary - template9
Predicting 22 examples
Running Test: MFT with vocabullary - template10
Predicting 198 examples
Running Test: MFT with vocabullary - template11
Predicting 198 examples
Running Test: MFT with vocabullary - template12
Predicting 198 examples
Running Test: MFT with vocabullary - template13
Predicting 198 examples
Running Test: MFT with vocabullary - template14
Predicting 22 examples
Running Test: MFT with vocabullary - template15
Predicting 198 examples
Runn

# Carregando suite de teste

In [1]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-approach1.suite')

# suite.visual_summary_table()

In [2]:
passed = 0
failed = 0
for test_name in suite.tests:
    table = suite.visual_summary_by_test(test_name)
    
    failed += table.stats['nfailed']    
    passed += table.stats['npassed']
    assert table.stats['nfailed'] + table.stats['npassed'] == len(table.filtered_testcases)

print(f"{failed = } ({(failed/(passed+failed))*100:.2f}%)")
print(f"{passed = } ({(passed/(passed+failed))*100:.2f}%)")
print(f"total = {passed+failed}")
print("templates:", len(suite.tests))



failed = 549 (16.41%)
passed = 2797 (83.59%)
total = 3346
templates: 23


In [3]:
table = suite.visual_summary_by_test('Test: MFT with vocabullary - template1')

failed = table.candidate_testcases
tests = table.filtered_testcases

for item in tests:
    if not item in failed:
        print(item['examples'][0]['new']['text'])

The product Buy mega-popular .
The product Buy good .
The product Buy new .
The product Buy fine .
The product Buy great .
The product Buy rental .
The product Buy fun .
The product eat good .
The product eat new .
The product eat fine .
The product eat great .
The product eat rental .
The product eat fun .
The product think good .
The product think fine .
The product think great .
The product got mega-popular .
The product got good .
The product got new .
The product got fine .
The product got great .
The product got due .
The product got rental .
The product got fun .
The product making mega-popular .
The product making good .
The product making new .
The product making fine .
The product making great .
The product making fun .
The product looks good .
The product looks new .
The product looks fine .
The product looks great .
The product looks due .
The product looks fun .
The product said good .
The product said great .
The product listed great .
The product was good .
The product w