In [1]:
import numpy as np
import checklist
from checklist.test_suite import TestSuite
import logging
logging.basicConfig(level=logging.ERROR)

In [2]:
from transformers import pipeline

In [4]:
model = pipeline("question-answering", model="./model-qa-trained/", device=0)

In [5]:
suite_path = '../../../checklist/release_data/squad/squad_suite.pkl'
suite = TestSuite.from_file(suite_path)

In [6]:
def predconfs(context_question_pairs):
    preds = []
    confs = []
    for c, q in context_question_pairs:
        try:
            p = model(question=q, context=c, truncation=True, )
        except:
            print('Failed', q)
            preds.append(' ')
            confs.append(1)
        preds.append(p['answer'])
        confs.append(p['score'])
    return preds, np.array(confs)

In [7]:
suite.run(predconfs, overwrite=True, n=100)   # for quicker testing 

Running A is COMP than B. Who is more / less COMP?
Predicting 200 examples
Running Intensifiers (very, super, extremely) and reducers (somewhat, kinda, etc)?
Predicting 1200 examples
Running size, shape, age, color
Predicting 400 examples
Running Profession vs nationality
Predicting 1000 examples
Running Animal vs Vehicle
Predicting 400 examples
Running Animal vs Vehicle v2
Predicting 400 examples
Running Synonyms
Predicting 400 examples
Running A is COMP than B. Who is antonym(COMP)? B
Predicting 400 examples
Running A is more X than B. Who is more antonym(X)? B. Who is less X? B. Who is more X? A. Who is less antonym(X)? A.
Predicting 1600 examples
Running Question typo
Predicting 200 examples
Running Question contractions
Predicting 201 examples
Running Add random sentence to context
Predicting 300 examples
Running Change name everywhere
Predicting 1100 examples
Running Change location everywhere
Predicting 1100 examples
Running There was a change in profession
Predicting 200 exampl

In [8]:
def format_squad_with_context(x, pred, conf, label=None, *args, **kwargs):
    c, q = x
    ret = 'C: %s\nQ: %s\n' % (c, q)
    if label is not None:
        ret += 'A: %s\n' % label
    ret += 'P: %s\n' % pred
    return ret

In [9]:
suite.summary(format_example_fn=format_squad_with_context)

Vocabulary

A is COMP than B. Who is more / less COMP?
Test cases:      494
Test cases run:  100
Fails (rate):    99 (99.0%)

Example fails:
C: Taylor is greater than Alexis.
Q: Who is less great?
A: Alexis
P: Taylor


----
C: Kimberly is taller than Steven.
Q: Who is less tall?
A: Steven
P: Kimberly


----
C: Amber is cleaner than Abigail.
Q: Who is less clean?
A: Abigail
P: Amber


----


Intensifiers (very, super, extremely) and reducers (somewhat, kinda, etc)?
Test cases:      497
Test cases run:  100
Fails (rate):    100 (100.0%)

Example fails:
C: Dylan is slightly particular about the project. Patrick is particular about the project.
Q: Who is least particular about the project?
A: Dylan
P: Patrick

C: Patrick is super particular about the project. Dylan is particular about the project.
Q: Who is most particular about the project?
A: Patrick
P: Dylan

C: Patrick is particular about the project. Dylan is slightly particular about the project.
Q: Who is most particular about the p

In [10]:
suite.visual_summary_table()

Please wait as we prepare the table data...


SuiteSummarizer(stats={'npassed': 0, 'nfailed': 0, 'nfiltered': 0}, test_infos=[{'name': 'A is COMP than B. Wh…

In [11]:
test = suite.tests['Question typo']
test.run(predconfs, overwrite=True)

Predicting 1000 examples


In [12]:
test.summary()

Test cases:      500
Fails (rate):    125 (25.0%)

Example fails:
Daily Mail newspaper ('In regard to companies, the Court of Justice held in R (Daily Mail and General Trust plc) v HM Treasury that member states could restrict a company moving its seat of business, without infringing TFEU article 49. This meant the Daily Mail newspaper\'s parent company could not evade tax by shifting its residence to the Netherlands without first settling its tax bills in the UK. The UK did not need to justify its action, as rules on company seats were not yet harmonised. By contrast, in Centros Ltd v Erhversus-og Selkabssyrelsen the Court of Justice found that a UK limited company operating in Denmark could not be required to comply with Denmark\'s minimum share capital rules. UK law only required £1 of capital to start a company, while Denmark\'s legislature took the view companies should only be started up if they had 200,000 Danish krone (around €27,000) to protect creditors if the company failed 