# PyPremise Example:  Misclassifications

PyPremise enables easy identification of patterns or explanations for where a machine learning classifier performs well and where it fails. To use it, we need a list of texts along with their corresponding labels.

In this example, we analyze the performance of a Visual Question Answering (VQA) model. The model is given an image and a question and tries to answer the question based on the image. We simulate a few questions where the model succeeded and others where it failed. The goal is to identify patterns that help explain when the model tends to make mistakes. 

We have a small dataset of questions about images (e.g., from a VQA task).We assume that questions in `question_group_1` were classified **correctly**, and questions in `question_group_0` were **misclassified**.

In [2]:
questions_group_0 = ["How many ducks are there",
                 "How many roosters are in the puddle",
                 "How many ducks do you see",
                 "How many ducks are there",
                 "How many chickens are crossing the road",
                 "What are the ducks eating",
                 "How many ducks does one need",
                 "How many ducks and chickens are there",
                 "Are there any ducks",
                 "Where is the rooster looking at",
                 "How many chickens are there",
                 "How many roosters can you see"
                ]

questions_group_1 = ["When was the photo taken",
                 "Are there many ducks playing",
                 "When was the photograph taken",
                 "When was this photo taken",
                 "When did the photographer take this photograph",
                 "Do you see any ducks",
                 "Can you see a rooster in the picture",
                 "Can you see ducks in the photograph",
                 "When do you think was the photo taken",
                 "When was the photo with the ducks taken",
                 "When was the photograph taken",
                 "When was the photograph taken where one can see the rooster"
                ]


We first tokenize the texts into lists of words. This gives us the following:

In [3]:
# simple whitespace tokenizer
def tokenizer(texts):
    return [text.split() for text in texts]

tok_group_0_questions = tokenizer(questions_group_0)
tok_group_1_questions = tokenizer(questions_group_1)
print(tok_group_0_questions)

[['How', 'many', 'ducks', 'are', 'there'], ['How', 'many', 'roosters', 'are', 'in', 'the', 'puddle'], ['How', 'many', 'ducks', 'do', 'you', 'see'], ['How', 'many', 'ducks', 'are', 'there'], ['How', 'many', 'chickens', 'are', 'crossing', 'the', 'road'], ['What', 'are', 'the', 'ducks', 'eating'], ['How', 'many', 'ducks', 'does', 'one', 'need'], ['How', 'many', 'ducks', 'and', 'chickens', 'are', 'there'], ['Are', 'there', 'any', 'ducks'], ['Where', 'is', 'the', 'rooster', 'looking', 'at'], ['How', 'many', 'chickens', 'are', 'there'], ['How', 'many', 'roosters', 'can', 'you', 'see']]


We then use the function `from_token_lists` to convert these into `PremiseInstance` objects, which PyPremise can work with:

In [4]:
from pypremise.data_loaders import from_token_lists

premise_instances, voc_token_to_index, voc_index_to_token = from_token_lists(
    token_lists_group_0=tok_group_0_questions,
    token_lists_group_1=tok_group_1_questions
)


This function takes two separate lists of tokenized texts: one for the misclassified examples (label 0) and one for the correctly classified ones (label 1). It merges them internally and creates the necessary data structures for PyPremise.

We can now run PyPremise on the instances to find patterns that differentiate between successful and unsuccessful classifications:

In [5]:
from pypremise import Premise

premise = Premise(voc_index_to_token=voc_index_to_token)
patterns = premise.find_patterns(premise_instances)

for p in patterns:
    print(p)


(How) and (many) towards group 0 (Instances: 9 in group 0, 0 in group 1)
(was) and (taken) and (When) towards group 1 (Instances: 0 in group 0, 7 in group 1)


Alternatively, you can use `from_token_lists_and_labels` if your data is not grouped, but instead already annotated with success/failure labels.

For example:

In [6]:
question_lists = ["When was the photograph taken",
              "Where is the rooster looking at",
              "Are there many ducks playing",
              "When was the photo taken",
              "When was this photo taken",
              "How many ducks are there",
              "How many chickens are there",
              "When was the photograph taken where one can see the rooster",
              "Can you see ducks in the photograph",
              "When was the photograph taken",
              "What are the ducks eating",
              "How many ducks and chickens are there",
              "Are there any ducks",
              "How many ducks does one need",
              "How many chickens are crossing the road",
              "Do you see any ducks",
              "How many ducks do you see",
              "When did the photographer take this photograph",
              "How many ducks are there",
              "When do you think was the photo taken",
              "Can you see a rooster in the picture",
              "When was the photo with the ducks taken",
              "How many roosters are in the puddle",
              "How many roosters can you see"]

labels = [1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0] # 0 = failure, 1 = success

token_lists =  tokenizer(question_lists)

We then create the instances using the function as follows:

In [7]:
from pypremise.data_loaders import from_token_lists_and_labels

instances_alt, voc_token_to_index_alt, voc_index_to_token_alt = from_token_lists_and_labels(token_lists, labels)