# Abordagem 2

Usando a abordagem 2 para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística *Vocabullary* com o teste **MFT**.

As etapas desta abordagem são:

1. Rankear as palavras das instâncias completas
2. Quebrar as instâncias em sentenças
3. Filtrar as sentenças que contêm ao menos uma das palavras mais bem rankeadas na etapa anterior
4. Rankear as palavras de cada sentença
5. Filtrar as sentenças com palavras relevantes (adjetivos ou verbos)
6. Classificar as sentenças usando o *Oráculo*
7. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
import sys
sys.path.append('../')

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', None)
from datasets import load_dataset

dataset = load_dataset("amazon_polarity")
dataset.set_format("pandas")
df = dataset["test"].shuffle(seed=42)[:100]
df = df.rename(columns={"content": "text"}).drop(columns=["title"])
df

Unnamed: 0,label,text
0,1,"The product works fine. I ordered the more exprensive one after I read reviews from others on Amazon. My husband likes the presser. It does a good job pressing his pants. However, it was damaged in the box when we received it. We decided it was too much trouble to send it back. The box was torn and the presser had a chuck knocked out of it."
1,0,"This book is so useless that I feel compelled to write a review to warn others to stay away from this book. A good tutorial should inspire the user on what he/she can do with the product. This book leads you to believe that without talent, the only thing you can do with Illustrator is to draw circles and squares. The book is a disservice to both the reader and to Adobe Illustrator."
2,0,The authors attempt an ambitious goal of covering many SOA topics - but their resulting text come across as scattered - vague - and lacking a coherent and practical application.Thomas Erl's books are much better written - and have a coherent approch to buliding a solid body of knowledge.For a manager / salesperson wanting a broad overview of SOA - they might be better served by reading Service Oriented Architecture For DummiesService Oriented Architecture For Dummies (For Dummies (Computer/Tech))
3,0,I ordered this product and did recieve then a couple months later it broke. Now Ive done everything I was told to do by by shipping back for a replacement and nothing. They wont return Emails i havent received the replacement part.
4,0,"I hated this movie. It was so silly. The girl made the cult look more stupid than they already were. Come on? She was from the future??? I can't stop laughing. Maybe, I missed something. I don't think I did. When it first started, I said to myself: What am I watching this for? I thought it was stupid, stupid and then more stupid. I kept watching, trying to make sense of it, but to no avail. I didn't want to waste my $1.00 rental fee."
...,...,...
95,1,"What a gloriously funny book! Even the recipies were funny, and well, how funny did you think a recipie could be?! I ""discovered"" this book en route to Jamaica back in May--the stranger next to me read it all the way there. Well, the cover just grabbed me and I HAD to have it. It was a quick, light read that had a very wise and uplifting last chapter. Oh, and for those who are clueless like me in the beginning, this is not a fiction novel, but a wacky manual about life, love and other good stuff that we should all follow to the hilt!"
96,0,"If you want Harman Kardon receivers it's ok. Even most of the DVD's. I own a 22 and a 31 and I also got this one which is really annoying.Issues:- it does not save caption settings- it does not save video settings; even after I set it up to be 16:9 1080i default it always reverted to 720p.- after a period of time the DVD unit itself refused to read discsI returned to HK, got a replacement and I'm testing it to see if there are any improvements, but... I think this is unacceptable for HK. After all I did not buy an 80$ Sony, and if I bought HK I bought it for the name which supposedley means quality."
97,0,"Same problems as everybody else. 14 months after purchase it ate the card. Tried 2 different cards, no dice for either. From love to hate. Dang. Also Canon's support website/acknowledgement of this problem is non-existent. It was hard enough to navigate their site, but it's impossible to find anything relevant."
98,0,I can be tough on safety glasses so it may be no fault of the mfg but IMO the lenses scuffed and scratched rather quickly.


In [3]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
rotten_tomatoes_models = {
    'bert': 'pig4431/amazonPolarity_BERT_5E', 
    'distilbert': 'pig4431/amazonPolarity_DistilBERT_5E', 
    'roberta': 'pig4431/amazonPolarity_roBERTa_5E', 
    'albert': 'pig4431/amazonPolarity_ALBERT_5E',
    'xlnet': 'pig4431/amazonPolarity_XLNET_5E', 
}


In [4]:
m1 = load_model(rotten_tomatoes_models['albert'])
m2 = load_model(rotten_tomatoes_models['distilbert'])
m3 = load_model(rotten_tomatoes_models['roberta'])
m4 = load_model(rotten_tomatoes_models['xlnet'])

# Models to be used as oracle
models = [m1, m2, m3, m4]
# Target model
model = load_model(rotten_tomatoes_models['bert'])

Loading model pig4431/amazonPolarity_ALBERT_5E...
Loading model pig4431/amazonPolarity_DistilBERT_5E...
Loading model pig4431/amazonPolarity_roBERTa_5E...
Loading model pig4431/amazonPolarity_XLNET_5E...
Loading model pig4431/amazonPolarity_BERT_5E...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [5]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorApp2

tg = PosNegTemplateGeneratorApp2(model, models)

### Número inicial de instâncias: 5

In [6]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [7]:
templates = tg.generate_templates(instances)

Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


Converting texts to sentences...
:: 22 sentences were generated.
Filtering instances by contaning ranked words...
:: 12 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
Filtering instances by relevant words...
:: 1 sentences remaining.
Predicting inputs...
:: Sentence predictions done.


#### Tempo de execução para 5 instâncias: 6m 19.8s
filipe: 2m 39.8


In [8]:
tg.to_dataframe()

Unnamed: 0,label,original_text,masked_text,template_text
0,0,It got boring and monotonous quick.,It {mask} boring and {mask} quick .,It {neg_verb} boring and {neg_adj} quick .


In [9]:
tg.lexicons

{'pos_verb': [], 'neg_verb': ['got'], 'pos_adj': [], 'neg_adj': ['monotonous']}

### Número inicial de instâncias: 100

In [10]:
# Using all 100 instances
instances = [x for x in df['text'].values]

In [11]:
%%time
# 14m 52.9s
templates = tg.generate_templates(instances)

Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


Converting texts to sentences...
:: 467 sentences were generated.
Filtering instances by contaning ranked words...
:: 221 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
Filtering instances by relevant words...
:: 32 sentences remaining.
Predicting inputs...
:: Sentence predictions done.
CPU times: user 2h 28min 43s, sys: 2.51 s, total: 2h 28min 45s
Wall time: 14min 52s


#### Tempo de execução para 100 instâncias: 1m 10.9s
1m 10.9s

In [12]:
tg.to_dataframe()

Unnamed: 0,label,original_text,masked_text,template_text
0,1,The product works fine.,The product {mask} {mask} .,The product {neg_verb} {pos_adj} .
1,0,This book is so useless that I feel compelled to write a review to warn others to stay away from this book.,This book is so {mask} that I feel {mask} to write a review to warn others to stay away from this book .,This book is so {neg_adj} that I feel {neg_verb} to write a review to warn others to stay away from this book .
2,1,"(at which time there was a spike in popularity, yes, but they were popular before).also some of kurt's lyrics are touching if you don't go in with such a bias.Of course I agree with you that Soundgarden sounds nothing like Nirvana, and I generally like soundgarden's music better.I also agree that Kickstand isn't a very strong song on the album, but I love Spoonman.","( at which time there was a spike in popularity , yes , but they were popular before ) {mask} some of kurt 's lyrics are {mask} if you do n't go in with such a bias.Of course I agree with you that Soundgarden sounds nothing like Nirvana , and I generally like soundgarden 's music better.I also agree that Kickstand is n't a very strong song on the album , but I love Spoonman .","( at which time there was a spike in popularity , yes , but they were popular before ) {neg_verb} some of kurt 's lyrics are {pos_verb} if you do n't go in with such a bias.Of course I agree with you that Soundgarden sounds nothing like Nirvana , and I generally like soundgarden 's music better.I also agree that Kickstand is n't a very strong song on the album , but I love Spoonman ."
3,1,I wish I had captured the outstanding three recorded tapes onto my PC then.,I wish I had captured the {mask} three {mask} tapes onto my PC then .,I wish I had captured the {pos_adj} three {pos_verb} tapes onto my PC then .
4,1,I liked the book because it is from a true story and the story is fantastic.,I {mask} the book because it is from a {mask} story and the story is fantastic .,I {pos_verb} the book because it is from a {pos_adj} story and the story is fantastic .
5,1,Also makes a good gift choice for the history buff on your shopping list.,Also {mask} a {mask} gift choice for the history buff on your shopping list .,Also {neg_verb} a {pos_adj} gift choice for the history buff on your shopping list .
6,0,A 6 pack would have been OK for the price but almost $50.00 for one Garnier is ridiculous.,A 6 pack {mask} have been OK for the price but almost $ 50.00 for one Garnier is {mask} .,A 6 pack {neg_verb} have been OK for the price but almost $ 50.00 for one Garnier is {neg_adj} .
7,1,"I think it would have been helpful to show how some of the earrings are actually worn, either on a mannequin head or a real person.","I think it {mask} {mask} been helpful to show how some of the earrings are actually worn , either on a mannequin head or a real person .","I think it {neg_verb} {pos_verb} been helpful to show how some of the earrings are actually worn , either on a mannequin head or a real person ."
8,0,"Problem is, on my iMac running 10.6.8, the game does not start!","Problem {mask} , on my iMac running 10.6.8 , the game does not {mask} !","Problem {neg_verb} , on my iMac running 10.6.8 , the game does not {pos_verb} !"
9,0,"My son was very dissapointed, because that toy was at the top of his Christmas List.","My son was very {mask} , because that toy {mask} at the top of his Christmas List .","My son was very {neg_adj} , because that toy {neg_verb} at the top of his Christmas List ."


In [13]:
tg.lexicons

{'pos_verb': ['found',
  'beginning',
  'evolving',
  'start',
  'liked',
  'recorded',
  'demonstrated',
  'have',
  'Love',
  'entertaining.The',
  'came',
  'touching'],
 'neg_verb': ['works',
  'are',
  '.also',
  'makes',
  'seemed',
  'crossing',
  'make',
  'shorted',
  'was',
  'cost',
  'got',
  'would',
  'think',
  'screws',
  'compelled',
  'isnt',
  'looked',
  'is'],
 'pos_adj': ['awesome',
  'new',
  'true',
  'aware',
  'outstanding',
  'good',
  'easy',
  'great',
  'fine'],
 'neg_adj': ['dissapointed',
  'least',
  'messy',
  'relevant',
  'little',
  'ridiculous',
  'impossible',
  'inadequate',
  'useless',
  'monotonous']}

# Usando os templates gerados pelo TemplateGenerator no CheckList

In [14]:
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [15]:
lexicons = tg.lexicons
templates = tg.template_texts
masked = tg.masked_texts
labels = [sent.prediction.label for sent in tg.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [16]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary"))

In [17]:
suite.run(model.predict, overwrite=True)
suite.save('./suites/posneg-approach2.suite')

Running Test: MFT with vocabullary - template1
Predicting 162 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 180 examples
Running Test: MFT with vocabullary - template3
Predicting 216 examples
Running Test: MFT with vocabullary - template4
Predicting 108 examples
Running Test: MFT with vocabullary - template5
Predicting 108 examples
Running Test: MFT with vocabullary - template6
Predicting 162 examples
Running Test: MFT with vocabullary - template7
Predicting 180 examples
Running Test: MFT with vocabullary - template8
Predicting 216 examples
Running Test: MFT with vocabullary - template9
Predicting 216 examples
Running Test: MFT with vocabullary - template10
Predicting 180 examples
Running Test: MFT with vocabullary - template11
Predicting 162 examples
Running Test: MFT with vocabullary - template12
Predicting 18 examples
Running Test: MFT with vocabullary - template13
Predicting 180 examples
Running Test: MFT with vocabullary - template14
Predicting 12 examples
Running Test: MFT with vocabullary - template15
Predicting 180 examples
Ru

# Carregando suite de teste

In [18]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-approach2.suite')

# suite.visual_summary_table()

In [19]:
passed = 0
failed = 0
for test_name in suite.tests:
    table = suite.visual_summary_by_test(test_name)
    
    failed += table.stats['nfailed']    
    passed += table.stats['npassed']
    assert table.stats['nfailed'] + table.stats['npassed'] == len(table.filtered_testcases)

print(f"{failed = } ({(failed/(passed+failed))*100:.2f}%)")
print(f"{passed = } ({(passed/(passed+failed))*100:.2f}%)")
print(f"total = {passed+failed}")
print("templates:", len(suite.tests))



failed = 711 (15.53%)
passed = 3868 (84.47%)
total = 4579
templates: 32


In [21]:
table = suite.visual_summary_by_test('Test: MFT with vocabullary - template1')

for item in table.candidate_testcases:
    print(item['examples'][0]['new']['text'])

shot like a postcard and balancing with all the boozy self-indulgence that brings out the holocaust in otherwise talented actors .
shot like a postcard and balancing with all the boozy self-indulgence that brings out the tougher in otherwise talented actors .
shot like a postcard and heard with all the boozy self-indulgence that brings out the holocaust in otherwise talented actors .
shot like a postcard and heard with all the boozy self-indulgence that brings out the tougher in otherwise talented actors .
