# fiddler-modelauditor Quick Start Guide

#### Note: This notebook is available in the examples directory.

## Step 0: Imports
Let us import requisite modules and packages.

In [1]:
import spacy
import numpy as np

from modelauditor.perturbations import PerturbText
from modelauditor.evaluation.evaluate import ModelTest, TestSuite
from modelauditor.evaluation.expected_behavior import InvariantScore, InvariantPrediction

In [2]:
TEST_DATASET = [
    "please call michael",
    "please call michael bolton",
    "how's the weather in Austin",
    "Alexa, set timer for 5 minutes",
]

## Step 1: Set-up Perturber
- We need to provide an NER pipeline to parse the dataset that would be perturbed. In this example we use the Roberta transformer based NER pipeline

In [3]:
ner_pipeline = spacy.load("en_core_web_trf").pipe

We'll now instantiate an PerturbText object and generate 5 perturbations for each sample.

In [4]:
perturber = PerturbText(
    TEST_DATASET,
    ner_pipeline=ner_pipeline,
    batch_size=8,
    perturbations_per_sample=5,
)

2022-12-07 13:18:07,053 — modelauditor.perturbations.text — INFO — Parsing the dataset to extract entities


1it [00:00, 16.92it/s]


## Step 2: Set-up a Model Test

To set-up a model-test we need two things
1. Perturbed data
2. Expected Behavior that shoud be tested

We'll use the perturber object from the previous cell to perturb names.

In [5]:
perturbed_dataset = perturber.perturb_names()
perturbed_dataset.data

2022-12-07 13:18:07,123 — modelauditor.perturbations.text — INFO — Perturbing names of the dataset.
2022-12-07 13:18:07,125 — modelauditor.perturbations.text — INFO — Perturbed names in 3 out of 4 sentences in the dataset


[['please call michael',
  'please call Paul',
  'please call Jared',
  'please call Anthony',
  'please call Edward',
  'please call Caleb'],
 ['please call michael bolton',
  'please call Michael Lopez',
  'please call Christopher Hughes',
  'please call Matthew Foster',
  'please call David Wood',
  'please call James Wood'],
 ['Alexa, set timer for 5 minutes',
  'Allison, set timer for 5 minutes',
  'Evelyn, set timer for 5 minutes',
  'Heather, set timer for 5 minutes',
  'Melanie, set timer for 5 minutes',
  'Mia, set timer for 5 minutes']]

Let us now set-up a model-test that would evaluate for invariant behavior

In [6]:
invariant_behavior = InvariantScore(rel_tol=0.05)
model_eval = ModelTest(
    perturbed_dataset=perturbed_dataset,
    expected_behavior=invariant_behavior,
)

Let us create a simple model that returns all ones.

In [7]:
def simple_model(input_text):
    out_len = 2
    inp_len = len(input_text)
    return np.ones((inp_len, out_len)) 

Let us now run the test evaluation

In [8]:
test_details = model_eval.evaluate(
    model_predict=simple_model
)

2022-12-07 13:18:07,155 — modelauditor.evaluation.evaluate — INFO — Sanity check for perturbed inputs and model prediction method - passed
2022-12-07 13:18:07,157 — modelauditor.evaluation.evaluate — INFO — Started model evaluation with perturbation type Names
2022-12-07 13:18:07,158 — modelauditor.evaluation.evaluate — INFO — Robust Accuracy: 100.0
2022-12-07 13:18:07,159 — modelauditor.evaluation.evaluate — INFO — Completed model evaluation with perturbation type Names


In [9]:
print(test_details)

Description: Expected behavior: Model scores are invariant to input perturbations within a relative tolerance of 5.0 %
Perturbation Type: Names
Robust accuracy: 1.0
Total perturbations: 15
Perturbed Input: please call michael, Original Input: please call michael, Output: [1. 1.], Result: 1, test_metric: 0.0, 
Perturbed Input: please call Paul, Original Input: please call michael, Output: [1. 1.], Result: 1, test_metric: 0.0, 
Perturbed Input: please call Jared, Original Input: please call michael, Output: [1. 1.], Result: 1, test_metric: 0.0, 
Perturbed Input: please call Anthony, Original Input: please call michael, Output: [1. 1.], Result: 1, test_metric: 0.0, 
Perturbed Input: please call Edward, Original Input: please call michael, Output: [1. 1.], Result: 1, test_metric: 0.0, 
Perturbed Input: please call Caleb, Original Input: please call michael, Output: [1. 1.], Result: 1, test_metric: 0.0, 
Perturbed Input: please call michael bolton, Original Input: please call michael bolton

## Step 3: Set-up a test-suite and generate report

In [10]:
test_suite = TestSuite(
    model_predict=simple_model,
    description='Test-suite for dummy model with perturbed names and locations'
)

In [11]:
perturbed_names = perturber.perturb_names()
test_1 = ModelTest(
    perturbed_dataset=perturbed_names,
    expected_behavior=invariant_behavior,
)
test_suite.add(test_1)

2022-12-07 13:18:07,187 — modelauditor.perturbations.text — INFO — Perturbing names of the dataset.
2022-12-07 13:18:07,189 — modelauditor.perturbations.text — INFO — Perturbed names in 3 out of 4 sentences in the dataset
2022-12-07 13:18:07,190 — modelauditor.evaluation.evaluate — INFO — Sanity check for perturbed inputs and model prediction method - passed


In [12]:
perturbed_location = perturber.perturb_location()
test_2 = ModelTest(
    perturbed_dataset=perturbed_location,
    expected_behavior=invariant_behavior,
)
test_suite.add(test_2)

2022-12-07 13:18:07,197 — modelauditor.perturbations.text — INFO — Perturbing locations of the dataset
2022-12-07 13:18:07,199 — modelauditor.perturbations.text — INFO — Perturbed locations in 1 out of 4 sentences in the dataset
2022-12-07 13:18:07,200 — modelauditor.evaluation.evaluate — INFO — Sanity check for perturbed inputs and model prediction method - passed


In [13]:
suite_summary = test_suite.evaluate()

2022-12-07 13:18:07,207 — modelauditor.evaluation.evaluate — INFO — Evaluating test suite with 2 tests.
2022-12-07 13:18:07,208 — modelauditor.evaluation.evaluate — INFO — Sanity check for perturbed inputs and model prediction method - passed
2022-12-07 13:18:07,209 — modelauditor.evaluation.evaluate — INFO — Started model evaluation with perturbation type Names
2022-12-07 13:18:07,210 — modelauditor.evaluation.evaluate — INFO — Robust Accuracy: 100.0
2022-12-07 13:18:07,211 — modelauditor.evaluation.evaluate — INFO — Completed model evaluation with perturbation type Names
2022-12-07 13:18:07,211 — modelauditor.evaluation.evaluate — INFO — Sanity check for perturbed inputs and model prediction method - passed
2022-12-07 13:18:07,212 — modelauditor.evaluation.evaluate — INFO — Started model evaluation with perturbation type Locations
2022-12-07 13:18:07,213 — modelauditor.evaluation.evaluate — INFO — Robust Accuracy: 100.0
2022-12-07 13:18:07,214 — modelauditor.evaluation.evaluate — INF

### Generate HTML report

In [14]:
test_suite.generate_html_report(
    suite_summary=suite_summary,
    model_name='simple model',
    save_dir='/tmp/simple_model_robustness/'
)

2022-12-07 13:18:07,248 — modelauditor.evaluation.evaluate — INFO — Report generated at: /private/tmp/simple_model_robustness/robustness_report_simple model.html
