# Kili Tutorial: How to leverage Counterfactually augmented data to have a more robust model

This recipe is inspired by the paper *Learning the Difference that Makes a Difference with Counterfactually-Augmented Data*, that you can find here on [arXiv](https://arxiv.org/abs/1909.12434)

In this study, the authors point out the difficulty for Machine Learning models to generalize the classification rules learned, because their decision rules, described as 'spurious patterns', often miss the key elements that affects most the class of a text. They thus decided to delete what can be considered as a confusion factor, by changing the label of an asset at the same time as changing the minimum amount of words so those **key-words** would be much easier for the model to spot.

We'll see in this tutorial :
1. How to create a project in Kili, both for [IMDB](##Data-Augmentation-on-IMDB-dataset) and [SNLI](##Data-Augmentation-on-SNLI-dataset) datasets, to reproduce such a data-augmentation task, in order to improve our model, and decrease its variance when used in production with unseen data.
2. We'll also try to [reproduce the results of the paper](##Reproducing-the-results), using similar models, to show how such a technique can be of key interest while working on a text-classification task.
We'll use the data of the study, both IMDB and Stanford NLI, publicly available [here](https://github.com/acmi-lab/counterfactually-augmented-data).

Additionally, for an overview of Kili, visit the [website](https://kili-technology.com), you can also check out the Kili [documentation](https://cloud.kili-technology.com/docs), or some other recipes.


![data augmentation](https://raw.githubusercontent.com/acmi-lab/counterfactually-augmented-data/master/data_collection_pipeline.png)

In [7]:
# Authentication
import os

# !pip install kili # uncomment if you don't have kili installed already
from kili.client import Kili

api_endpoint = os.getenv('KILI_API_ENDPOINT') 
# If you use Kili SaaS, use the url 'https://cloud.kili-technology.com/api/label/v2/graphql'

kili = Kili(api_endpoint=api_endpoint)
user_id = kili.auth.user_id

## Data Augmentation on IMDB dataset

The data consists in reviews of films, that are classified as positives or negatives. State-of-the-art models performance is often measured against this dataset, making it a reference. 

This is how our task would look like on Kili, into 2 different projects for each task, from Positive to Negative or Negative to Positive.

### Creating the projects

In [32]:
taskname = "NEW_REVIEW"
project_imdb_negative_to_positive = {
'title': 'Counterfactual data-augmentation - Negative to Positive',
'description': 'IMDB Sentiment Analysis',
'instructions': 'https://docs.google.com/document/d/1zhNaQrncBKc3aPKcnNa_mNpXlria28Ij7bfgUvJbyfw/edit?usp=sharing',
'input_type': 'TEXT',
'json_interface':{
    "filetype": "TEXT",
    "jobs": {
        taskname : {
            "mlTask": "TRANSCRIPTION",
            "content": {
                "input": None
            },
            "required": 1,
            "isChild": False,
            "instruction": "Write here the new review modified to be POSITIVE. Please refer to the instructions above before starting"
        }
    }
}
}
project_imdb_positive_to_negative = {
'title': 'Counterfactual data-augmentation - Positive to Negative',
'description': 'IMDB Sentiment Analysis',
'instructions': 'https://docs.google.com/document/d/1zhNaQrncBKc3aPKcnNa_mNpXlria28Ij7bfgUvJbyfw/edit?usp=sharing',
'input_type': 'TEXT',
'json_interface':{
    "jobs": {
        taskname : {
            "mlTask": "TRANSCRIPTION",
            "content": {
                "input": None
            },
            "required": 1,
            "isChild": False,
            "instruction": "Write here the new review modified to be NEGATIVE. Please refer to the instructions above before starting"
        }
    }
}
}

In [33]:
for project_imdb in [project_imdb_positive_to_negative,project_imdb_negative_to_positive] :
    project_imdb['id'] = kili.create_project(title=project_imdb['title'],
                                                   instructions=project_imdb['instructions'],
                                                   description=project_imdb['description'],
                                                   input_type=project_imdb['input_type'],
                                                   json_interface=project_imdb['json_interface'])['id']

We'll just create some useful functions for an improved readability :

In [28]:
def create_assets(dataframe, intro, objective, instructions, truth_label, target_label) :
    return((intro + dataframe[truth_label] + objective + dataframe[target_label] + instructions + dataframe['Text']).tolist())

def create_json_responses(taskname,df,field="Text") :
    return( [{taskname: { "text": df[field].iloc[k] }
          } for k in range(df.shape[0]) ])

### Importing the data into Kili

In [34]:
import pandas as pd
datasets = ['dev','train','test']

for dataset in datasets :
    url = f'https://raw.githubusercontent.com/acmi-lab/counterfactually-augmented-data/master/sentiment/combined/paired/{dataset}_paired.tsv'
    df = pd.read_csv(url, error_bad_lines=False, sep='\t')
    df = df[df.index%2 == 0] # keep only the original reviews as assets
    
    for review_type,project_imdb in zip(['Positive','Negative'],[project_imdb_positive_to_negative,project_imdb_negative_to_positive]) :
        dataframe = df[df['Sentiment']==review_type]
        reviews_to_import = dataframe['Text'].tolist()
        external_id_array = ('IMDB ' + review_type +' review ' + dataset + dataframe['batch_id'].astype('str')).tolist()
    
        kili.append_many_to_dataset(
            project_id=project_imdb['id'],
            content_array=reviews_to_import,
            external_id_array=external_id_array)

### Importing the labels into Kili 
We will fill-in with the results of the study, as if they were predictions. In a real annotation project, we could fill in with the sentences as well so the labeler just has to write the changes. 

In [37]:
model_name = 'results-arxiv:1909.12434'

for dataset in datasets :
    url = f'https://raw.githubusercontent.com/acmi-lab/counterfactually-augmented-data/master/sentiment/combined/paired/{dataset}_paired.tsv'
    df = pd.read_csv(url, error_bad_lines=False, sep='\t')
    df = df[df.index%2 == 1] # keep only the modified reviews as predictions
    
    for review_type,project_imdb in zip(['Positive','Negative'],[project_imdb_positive_to_negative,project_imdb_negative_to_positive]) :
        dataframe = df[df['Sentiment']!=review_type]

        external_id_array = ('IMDB ' + review_type +' review ' + dataset + dataframe['batch_id'].astype('str')).tolist()
        json_response_array = create_json_responses(taskname,dataframe)
    
        kili.create_predictions(project_id=project_imdb['id'],
            external_id_array=external_id_array,
            model_name_array=[model_name]*len(external_id_array),
            json_response_array=json_response_array)

This is how our interface looks in the end, allowing to quickly perform the task at hand

![IMDB](./img/imdb_review.png)

## Data Augmentation on SNLI dataset

The data consists in a 3-class dataset, where, provided with two phrases, a premise and an hypothesis, the machine-learning task is to find the correct relation between those two sentences, that can be either entailment, contradiction or neutral.

Here is an example of a premise, and three sentences that could be the hypothesis for the three categories :
![examples](https://licor.me/post/img/robust-nlu/SNLI_annotation.png)

This is how our task would look like on Kili, this time keeping it as a single project. To do so, we strongly remind the instructions at each labeler.

### Creating the project

In [None]:
taskname = "SENTENCE_MODIFIED"
project_snli={
'title': 'Counterfactual data-augmentation NLI',
'description': 'Stanford Natural language Inference',
'instructions': '',
'input_type': 'TEXT',
'json_interface':{
    "jobs": {
        taskname: {
            "mlTask": "TRANSCRIPTION",
            "content": {
                "input": None
            },
            "required": 1,
            "isChild": False,
            "instruction": "Write here the modified sentence. Please refer to the instructions above before starting"
        }
    }
}
}

In [None]:
project_snli['id'] = kili.create_project(title=project_snli['title'],
                                                     instructions=project_snli['instructions'],
                                                     description=project_snli['description'],
                                                     input_type=project_snli['input_type'],
                                                     json_interface=project_snli['json_interface'])['id']
print(f'Created project {project_snli["id"]}')

Again, we'll factorize our code a little, to merge datasets and differentiate properly all the cases of sentences : 

In [None]:
def merge_datasets(dataset, sentence_modified) :
    url_original = f'https://raw.githubusercontent.com/acmi-lab/counterfactually-augmented-data/master/NLI/original/{dataset}.tsv'
    url_revised = f'https://raw.githubusercontent.com/acmi-lab/counterfactually-augmented-data/master/NLI/revised_{sentence_modified}/{dataset}.tsv'
    df_original = pd.read_csv(url_original, error_bad_lines=False, sep='\t')
    df_original = df_original[df_original.duplicated(keep='first')== False]
    df_original['id'] = df_original.index.astype(str)
    
    df_revised = pd.read_csv(url_revised, error_bad_lines=False, sep='\t')
    axis_merge = 'sentence2' if sentence_modified=='premise' else 'sentence1'
    # keep only one label per set of sentences
    df_revised = df_revised[df_revised[[axis_merge,'gold_label']].duplicated(keep='first')== False]

    df_merged = df_original.merge(df_revised, how='inner', left_on=axis_merge, right_on=axis_merge)
    
    if sentence_modified ==  'premise' :
        df_merged['Text'] = df_merged['sentence1_x'] + '\nSENTENCE 2 :\n' + df_merged['sentence2']
        instructions = " relation, by making a small number of changes in the FIRST SENTENCE\
        such that the document remains coherent and the new label accurately describes the revised passage :\n\n\n\
        SENTENCE 1 :\n"
    else : 
        df_merged['Text'] = df_merged['sentence1'] + '\nSENTENCE 2 :\n' + df_merged['sentence2_x']
        instructions = " relation, by making a small number of changes in the SECOND SENTENCE\
        such that the document remains coherent and the new label accurately describes the revised passage :\n\n\n\
        SENTENCE 1 : \n"
    return(df_merged, instructions)

def create_external_ids(dataset,dataframe, sentence_modified):
    return(('NLI ' + dataset + ' ' + dataframe['gold_label_x'] + ' to ' + dataframe['gold_label_y'] + ' ' + sentence_modified + ' modified ' + dataframe['id']).tolist())


### Importing the data into Kili
We'll add before each set of sentences a small precision of the task for the labeler :

In [None]:
datasets = ['dev','train','test']
sentences_modified = ['premise', 'hypothesis']
intro = "Those two sentences' relation is classified as "
objective = " to convert to a "

for dataset in datasets :
    for sentence_modified in sentences_modified :
        df,instructions = merge_datasets(dataset, sentence_modified)

        sentences_to_import = create_assets(df, intro, objective, instructions, 'gold_label_x', 'gold_label_y')
        external_id_array = create_external_ids(dataset, df, sentence_modified)
    
        kili.append_many_to_dataset(project_id=project_snli['id'],
            content_array=sentences_to_import,
            external_id_array=external_id_array)

### Importing the labels into Kili 
We will fill-in with the results of the study, as if they were predictions.

In [None]:
model_name = 'results-arxiv:1909.12434'

for dataset in datasets :
    for sentence_modified in sentences_modified :
        axis_changed = 'sentence1_y' if sentence_modified=='premise' else 'sentence2_y'
        df,instructions = merge_datasets(dataset, sentence_modified)

        external_id_array = create_external_ids(dataset, df, sentence_modified)
        json_response_array = create_json_responses(taskname,df,axis_changed) 
    
        kili.create_predictions(project_id=project_snli['id'],
            external_id_array=external_id_array,
            model_name_array=[model_name]*len(external_id_array),
            json_response_array=json_response_array)

![NLI](./img/snli_ex1.png)
![NLI](./img/snli_ex2.png)

## Conclusion
In this tutorial, we learned how Kili can be a great help in your data augmentation task, as it allows to set a simple and easy to use interface, with proper instructions for your task.

For the study, the quality of the labeling was a key feature in this complicated task, what Kili allows very simply. To monitor the quality of the results, we could set-up a consensus on a part or all of the annotations, or even keep a part of the dataset as ground truth to measure the performance of every labeler.

For an overview of Kili, visit [kili-technology.com](https://kili-technology.com). You can also check out [Kili documentation](https://cloud.kili-technology.com/docs).