In [1]:
# set up the notebook

%load_ext autoreload
%autoreload 2

import sys
import os
import logging
logging.basicConfig(level=logging.ERROR)

# Preparation

## NLP Model 

Load a hate speech classification model via HuggingFace pipeline. Please read through [this tutorial](https://huggingface.co/docs/transformers/pipeline_tutorial) to learn more about the pipeline -- you only need to read until "Audio Pipeline". In a nutshell, you will:

> pick an appropriate model, load it with the corresponding AutoModelFor and AutoTokenizer class.


**Please do so by changing the next code block.**

### Tips on picking tasks and models:

1. [This tutorial](https://github.com/huggingface/notebooks/blob/main/transformers_doc/en/quicktour.ipynb) has a more comprehensive introduction on HuggingFace.
2. If you are unsure about the tasks / datasets, [HuggingFace models](https://huggingface.co/models) have useful tags on the left that can be used as filters.
3. Please pick a task that has a supported pipeline structure. [This page has all the pipelines and examples](https://huggingface.co/docs/transformers/main_classes/pipelines#natural-language-processing).
4. Please pick models that are:
    - Well-documented (e.g., describes how it's finetuned, notes its perfromances, has usage instructions, has an inference API). These models are usually less buggy. For example, [Distilbert-SST-2](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) is quite well-documented.
    - Small if you do not have GPU (smaller models are usually less accurate but will give you predictions more quickly.) You can check the `pytorch_model.bin` size (e.g., [here](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english/tree/main))


In [None]:
from transformers import pipeline
from transformers import AutoTokenizer, AutoModel

# select a model through Transformers Pipeline.
task_name = hate speech classification
model_name = "MODEL_NAME"
# If you have GPU, you can change this to 0 or other devices
device=-1

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
pipe = pipeline(
    task=task_name, 
    model=model_name, 
    device=device)

## Load Dataset

Find a dataset to use as your base dataset for further data slicing and perturbations, via [HuggingFace Datasets](https://huggingface.co/docs/datasets/index). See [the tutorial here](https://huggingface.co/docs/datasets/tutorial).

In [None]:
from datasets import load_dataset

dataset_name = "DATASET_NAME"
split = "dev"
dataset = load_dataset(dataset_name, split=split)
dataset[0]

# Write Tests

Assume you are an AI engineer building this model Your task is to come up with multiple tests to assess the model, and figure out when it works well and when it (potentially) contains bugs. You should proceed with the mindset that these tests you create to evaluate this model will also be used to test future models before they go into production!

Your completed test suite will go through peer reviews, and your fellow students will rate the result of each test as to its severity, from 'not a bug' to 'very severe bug'. As mentioned in README, more severe bugs will give you higher grade :)

From this point, you will:
1. Learn to use CheckList. You should check out [its readme and tutorials here](https://github.com/marcotcr/checklist).
2. Write tests for your selected model. Try to write:
    - At least 10 tests
    - Write tests that can expose more severe bugs

In [2]:
import checklist
from checklist.test_suite import TestSuite
# create a test suite
suite = TestSuite()

In [7]:
## Use the following blocks -- and add more -- to create your tests after reading the checklist tutotrials. 

# Summarize and Save the Result

If you are missing some tests, please go back, re-run them and add them. Once you are done, call the command below, to start rating your tests:


In [None]:
# see the results
suite.visual_summary_table()

In [None]:
# save the result
suite.save("./A1-Suite.pkl")

In [None]:
# make sure the results correctly saved!
import checklist
from checklist.test_suite import TestSuite
# create a test suite
suite2 = TestSuite.from_file("./A1-Suite.pkl")
suite2.visual_summary_table()