<a href="https://colab.research.google.com/github/kolayn808/OpenAttack/blob/main/Customized_Victim_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install OpenAttack

Collecting OpenAttack
  Downloading OpenAttack-2.1.1-py3-none-any.whl (145 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m145.4/145.4 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
Collecting datasets (from OpenAttack)
  Downloading datasets-2.14.5-py3-none-any.whl (519 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
Collecting transformers>=4.0.0 (from OpenAttack)
  Downloading transformers-4.34.0-py3-none-any.whl (7.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m80.2 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.16.4 (from transformers>=4.0.0->OpenAttack)
  Downloading huggingface_hub-0.18.0-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m28.2 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.15,>=0.14 (from transformers>=4.0.0->OpenAttack)
  Downloading to

In [None]:
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

In [None]:
import OpenAttack as oa
import numpy as np
import datasets
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer


# configure access interface of the customized victim model by extending OpenAttack.Classifier.
class MyClassifier(oa.Classifier):
    def __init__(self):
        # nltk.sentiment.vader.SentimentIntensityAnalyzer is a traditional sentiment classification model.
        nltk.download('vader_lexicon')
        self.model = SentimentIntensityAnalyzer()

    def get_pred(self, input_):
        return self.get_prob(input_).argmax(axis=1)

    # access to the classification probability scores with respect input sentences
    def get_prob(self, input_):
        ret = []
        for sent in input_:
            # SentimentIntensityAnalyzer calculates scores of “neg” and “pos” for each instance
            res = self.model.polarity_scores(sent)

            # we use 𝑠𝑜𝑐𝑟𝑒_𝑝𝑜𝑠 / (𝑠𝑐𝑜𝑟𝑒_𝑛𝑒𝑔 + 𝑠𝑐𝑜𝑟𝑒_𝑝𝑜𝑠) to represent the probability of positive sentiment
            # Adding 10^−6 is a trick to avoid dividing by zero.
            prob = (res["pos"] + 1e-6) / (res["neg"] + res["pos"] + 2e-6)

            ret.append(np.array([1 - prob, prob]))

        # The get_prob method finally returns a np.ndarray of shape (len(input_), 2). See Classifier for detail.
        return np.array(ret)

def dataset_mapping(x):
    return {
        "x": x["sentence"],
        "y": 1 if x["label"] > 0.5 else 0,
    }

# load some examples of SST-2 for evaluation
dataset = datasets.load_dataset("sst", split="train[:20]").map(function=dataset_mapping)
# choose the costomized classifier as the victim model
victim = MyClassifier()
# choose PWWS as the attacker and initialize it with default parameters
attacker = oa.attackers.PWWSAttacker()
# prepare for attacking
attack_eval = oa.AttackEval(attacker, victim)
# launch attacks and print attack results
attack_eval.eval(dataset, visualize=True)

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
Downloading TProcess.NLTKWordNet: 100%|██████████| 10.8M/10.8M [00:01<00:00, 5.95MB/s]
Downloading TProcess.NLTKSentTokenizer: 100%|██████████| 162k/162k [00:00<00:00, 407kB/s] 
Downloading TProcess.NLTKPerceptronPosTagger: 100%|██████████| 2.53M/2.53M [00:01<00:00, 1.80MB/s]


[32mLabel: 1 (100.00%) --> 0 (100.00%)[0m          |                                   
                                            |                                   
The Rock is destined to be the 21st Century |                                   
the rock is destined to be the 21st century |                                   
                                            |                                   
' s new `` Conan '' and that he ' s going   |                                   
' s new `` conan '' and that he ' s going   | Running Time:            0.036093 
                                            | Query Exceeded:          no       
to make a splash even [1;31mgreater[0m than Arnold   | Victim Model Queries:    224      
to make a splash even [1;32mbully  [0m than arnold   | Succeed:                 yes      
                                            |                                   
Schwarzenegger , Jean - Claud Van Damme or  |                                 

{'Total Attacked Instances': 20,
 'Successful Instances': 17,
 'Attack Success Rate': 0.85,
 'Avg. Running Time': 0.013456237316131592,
 'Total Query Exceeded': 0.0,
 'Avg. Victim Model Queries': 120.9}