# Phase 2: Generation of Difficult Cases

The goal of this phase is to generate difficult instances for the task of sentiment analysis. The requirements are slightly different for both task types (classification versus sequence labeling), pick the task that you build your baseline model for in phase 1.

You should in both situations participate in assignment 3. In other words, you will either do assignment 1 and 3 or assignment 2 and 3.


#### How to Generate the Samples
There are three main methods to generate the samples:
* You can use the Checklist paper code: https://github.com/marcotcr/checklist
* You can write code yourself to generate the samples. You can make use of any method you prefer, including a POS-tagger, word embeddings and contextualized embeddings
* You can generate samples manually

For each of these strategies you should think of a variety of types of difficult cases (so that not the whole set contains of the same types of samples), like the categories in Table 1 in "the Checklist paper".

Note that you have to shortly present your approach in week14 (before the project proposal, you will get 2 minutes for phase 2 and 5 for the project proposal)

#### For Inspiration:
* [Beyond Accuracy: Behavioral Testing of NLP Models with CheckList](https://www.aclweb.org/anthology/2020.acl-main.442.pdf)
* [Towards Linguistically Generalizable NLP Systems: A Workshop and Shared Task](https://www.aclweb.org/anthology/W17-5401.pdf)
* [Breaking NLP: Using Morphosyntax, Semantics, Pragmatics and World
Knowledge to Fool Sentiment Analysis Systems](https://www.aclweb.org/anthology/W17-5405.pdf)


## 1. Classification

The formal requirements are:

* 100-1000 utterances should be handed in on **LearnIt before 30-03 11:59AM**
* Must be in the same format as the training data : one (json) dict per line, and per instance needs at least: "reviewText", "sentiment", and "category" key.
* The "category" key indicates which type of alternation/difficulty you included.
* The gold labels must be correct!

Assuming you write a function that generates examples, writing the final file can be done like:

In [None]:
import json

def swap(sentiment):
    if sentiment == 'positive':
        return 'negative'
    elif sentiment == 'negative':
        return 'positive'

def dataGenerator(inputSents):
    outputSents = []
    for instance in inputSents:
        if 'great' in instance[0]:
            outputSents.append({'reviewText': instance[0].replace('great', 'not great'), 'sentiment': swap(instance[1]), 'category': 'negation'})
    return outputSents

inputSents = [['this is a great album', 'positive']]

outFile = open('group13.json', 'w')
for instance in dataGenerator(inputSents):
    # goldLabel is a string, either 'positive' or 'negative', text contains the review, and category 
    # indicates the type of alternation you did.
    outFile.write(json.dumps(instance) + '\n')
outFile.close()

You should check whether your final file is in the correct format with the following code:

In [None]:
import json
inputPath = 'group13.json'

for lineIdx, line in enumerate(open(inputPath)):
    try:
        data = json.loads(line)
    except ValueError as e:
        print('error, instance ' + str(lineIdx+1) + ' is not in valid json format')
        continue
    if 'reviewText' not in data:
        print("error, instance " + str(lineIdx+1) + ' does not contain key "reviewText"')
        continue
    if 'sentiment' not in data:
        print("error, instance " + str(lineIdx+1) + ' does not contain key "sentiment"')
        continue
    if data['sentiment'] not in ['positive', 'negative']:
        print("error, instance " + str(lineIdx+1) + ': sentiment is not positive/negative')
        continue
        
if lineIdx+1 < 100:
    print('Too little instances(' + str(lineIdx) + '), please generate more')
if lineIdx+1 > 1000:
    print('Too many instances(' + str(lineIdx) + '), please generate more')

## 3. Prediction
06-04 11:59AM is the deadline for handing in the predictions of the baseline on the difficult cases of all the groups. The datafile will be made available as soon as possible after your hand-ins (we aim for 02-04), and all you have to do is re-run your baseline from phase 1. Note that some of the meta-information might not be available, so if your baseline relies on those you have to either retrain without these features, or predict without these features.

The codalab link will appear here, and will be posted on slack when available