# Project 5: Goals and Deliverables

The goals of this assignment are:
* To work with the object oriented version of our corpus code.
* To modify a web app that we can use to analyze text data.
* To finetune a transformer model and write a model card for it.


# Set Up



1. Copy your `spacy_on_corpus.py` from project 4b.
2. Copy the anvil callable functions from project 4b into the file `server.py`.
3. Run % `pip install -r requirements.txt`.

## Make Sure We Can Work With .py Files We Are Editing

Run the code cell below.

In [2]:
# Automatically reload your external source code
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Get a Corpus

In the code cell below, build a corpus using `creator.jsonl`.

In [None]:
from spacy_on_corpus import corpus
my_corpus=corpus.build_corpus('creator.jsonl')

# Finetune a Transformer Model


We can currently get document-level sentiment but quite often a movie review is nuanced: some sentences say good things about some aspects of the movie, while others say bad things about other aspects. In this project, we will finetune a transformer model on the sentiment of sentences from movie reviews, and then add to our webapp the ability to see sentence-level sentiment for a document.

## Get Some Labeled Data

We are going to use the (SST)[https://huggingface.co/datasets/sst] dataset. Note the datasheet!

First, we download the dataset.

In [1]:
# datasets is a huggingface python package that makes it easy to download huggingface datasets
from datasets import load_dataset

# download the sst dataset
raw_sst = load_dataset("glue", "sst2")

# make it smaller for testing; once everything is working, train on all the data by commenting this line out and rerunning the notebook
raw_sst = raw_sst.filter(lambda e, i: i<300, with_indices=True)

  from .autonotebook import tqdm as notebook_tqdm
Filter: 100%|██████████| 67349/67349 [00:00<00:00, 285185.17 examples/s]
Filter: 100%|██████████| 872/872 [00:00<00:00, 169090.76 examples/s]
Filter: 100%|██████████| 1821/1821 [00:00<00:00, 221797.76 examples/s]


Then, we look at the dataset.

**ALWAYS LOOK AT YOUR DATA**

In [2]:
# look at the sst dataset
raw_sst
# look at the sst training data
raw_sst['train']
# look at the sst training data sentences. Note each data point is a pre-tokenized sentence.
raw_sst['train']['sentence']
# look at the sst training data labels.
raw_sst['train']['label']

[0,
 0,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 1,
 1,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 0,
 1,
 0,
 1,
 0,
 0,
 1,
 1,
 0,
 1,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 0,
 0,
 1,
 1,
 1,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 1,
 0,
 1,
 1,
 0,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 1,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 0,
 1,
 1,
 0,
 0,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 1,
 0,
 1,
 1,
 0,
 0,
 0,


## Tokenize the Data

We will use the small `distilbert` model for this project. So we want to use its tokenizer.

In [3]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
#distilbert-basae-uncased for easy extension

Next we need to tokenize our data using that tokenizer.

In [4]:
def transformer_tokenize(example):
    """Tokenizes the input data using the designated transformer tokenizer.

    :param example: a text
    :type example: str
    """
    return tokenizer(example['sentence'], padding="max_length", truncation=True)

# this tokenizes the train, validation and test sets
tokenized_sst = raw_sst.map(transformer_tokenize, batched=True)

Map: 100%|██████████| 300/300 [00:00<00:00, 4270.71 examples/s]
Map: 100%|██████████| 300/300 [00:00<00:00, 6567.83 examples/s]
Map: 100%|██████████| 300/300 [00:00<00:00, 5831.44 examples/s]


## Get a Transformer Model

We will use the distilbert model for efficiency.

In [None]:
from transformers import AutoModelForSequenceClassification

num_labels = 2
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-cased", num_labels=num_labels)

## Instantiate and Run a Trainer

Huggingface gives us a nice clean way to train: the `Trainer`. Each trainer has training arguments - where you can set hyperparameters. We will make a default set of training arguments.

In [6]:
from transformers import TrainingArguments

training_args = TrainingArguments(output_dir="sst_model", evaluation_strategy="epoch")

We will add an accuracy metric from the `evaluate` package so we can see accuracy while training.

In [7]:
# this huggingf
import evaluate
import numpy as np

# Setup evaluation 
accuracy_metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    """Compute the metric!"""
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return accuracy_metric.compute(predictions=predictions, references=labels)

Then we will make a trainer using the model, the training arguments, our train and our dev data.

In [8]:
from transformers import Trainer

trainer = Trainer(model=model, args=training_args, train_dataset=tokenized_sst["train"], eval_dataset=tokenized_sst["validation"], compute_metrics=compute_metrics)

And finally, we train.

**This step takes a long time.** If you want to speed it up, you will need a GPU! But codespaces don't currently have GPU options. So:

1. Download this notebook.
2. Open [https://colab.research.google.com](https://colab.research.google.com).
3. Upload the notebook.
4. Add a cell at the top of the notebook and in it type:
```
!pip install datasets
!pip install transformers[torch]
!pip install evaluate
```
4. In the Runtime menu, choose `GPU`.
5. Run the notebook there to train a model.
6. Download the trained model.
   * in a code cell, type `!tar -czf model.tgz sst-model`
   * download model.tgz
7. Upload the trained model here in the codespace.
   * upload model.tgz
   * in the terminal, type `!tar -xzf model.tgz`

After this course, you can always still use codespaces, and I recommend it because of the tight integration with Github (so your code is saved!). You can *also* always use Colab. If you use Colab, your notebooks will be backed up in your Google Drive, but *no other files that you generated in colab are saved*.

In [9]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.648087,0.66
2,No log,0.736938,0.673333
3,No log,0.744241,0.736667


TrainOutput(global_step=114, training_loss=0.37935069569370206, metrics={'train_runtime': 1182.8896, 'train_samples_per_second': 0.761, 'train_steps_per_second': 0.096, 'total_flos': 119220658790400.0, 'train_loss': 0.37935069569370206, 'epoch': 3.0})

And we save the trained model.

In [None]:
trainer.save_model("sst-model")

# Evaluate

Now we should evaluate the model.

First we load the model.

In [None]:
from transformers import pipeline

sentence_sentiment = pipeline("text-classification", model="sst-model", tokenizer=tokenizer)

Then we try the model on a couple of sample sentences to sanity check.

In [None]:
print(sentence_sentiment("This movie was awful!"))

print(sentence_sentiment("This movie was great!"))

[{'label': 'LABEL_0', 'score': 0.9938607215881348}]
[{'label': 'LABEL_1', 'score': 0.9945043325424194}]


Now we want to run the model on each of our test data points.

Notice that the model outputs 'LABEL_0' or 'LABEL_1' while the data has labels 0 and 1. So *you* should define a function to map the labels.

In [None]:
def get_label(output):
    """Gets a numeric label for the output from the classifier.
        Sample output from classifier:  [{'label': 'LABEL_0', 'score': 0.61}] corresponds to label 0
        Sample output from classifier: [{'label': 'LABEL_1', 'score': 0.72}] corresponds to label 1  

        :param output: the output from the classifier
        :type output: list[dict]
        :returns: a label
        :rtype: int
    """
    ### other option
    #if output['label] =='LABEL_0':
    #   return 0
    #else:
    #   return 1
    true_labels={'LABEL_0': 0, 'LABEL_1':1}
    return true_labels[output[0]['label']]
    

Now run the classifier on each dev data point.

Each element in `raw_sst['validation']` is a dictionary with keys `idx`, `sentence` and `label`. For each dev data point, you should make a new dictionary with keys `idx`, `sentence`, `label` and `pred` (for the output from the classifier). Add this dictionary to the list of results.

In [None]:
results = []
# your work here!
for datum in raw_sst['validation']:
    output=sentence_sentiment(datum['sentence'])
    results.append({'idx':datum['idx'], 'sentence': datum['sentence'], 'label':datum['label'], 'pred': get_label(output)})

In [None]:
results

[{'idx': 0,
  'sentence': "it 's a charming and often affecting journey . ",
  'label': 1,
  'pred': 1},
 {'idx': 1,
  'sentence': 'unflinchingly bleak and desperate ',
  'label': 0,
  'pred': 0},
 {'idx': 2,
  'sentence': 'allows us to hope that nolan is poised to embark a major career as a commercial yet inventive filmmaker . ',
  'label': 1,
  'pred': 1},
 {'idx': 3,
  'sentence': "the acting , costumes , music , cinematography and sound are all astounding given the production 's austere locales . ",
  'label': 1,
  'pred': 1},
 {'idx': 4,
  'sentence': "it 's slow -- very , very slow . ",
  'label': 0,
  'pred': 0},
 {'idx': 5,
  'sentence': 'although laced with humor and a few fanciful touches , the film is a refreshingly serious look at young women . ',
  'label': 1,
  'pred': 1},
 {'idx': 6, 'sentence': 'a sometimes tedious film . ', 'label': 0, 'pred': 0},
 {'idx': 7,
  'sentence': "or doing last year 's taxes with your ex-wife . ",
  'label': 0,
  'pred': 0},
 {'idx': 8,
  'se

Now we have:

* `gold` labels, and
* model predictions

for each of the dev data points.

We will calculate two metrics:

1. accuracy
2. confusion matrix

## Accuracy

In the code cell below, implement the `accuracy` function. The accuracy of a classifier is the number of correctly labeled data points divided by the total number of data points.

In [None]:
def accuracy(results):
    """ Returns the accuracy of a list of classifier results

    :param results: a list of dictionaries. Each dictionary contains, at minimum, the keys 'label' and 'pred'
    :type results: list[dict]
    :returns: accuracy
    :rtype: float
    """
    accurate_data=0
    for datum in results:
        if datum['label']== datum['pred']:
            accurate_data+=1
    accuracy=accurate_data/len(results)  
    return accuracy

Now print the accuracy of the finetuned model.

In [None]:
accuracy(results)

0.73

## Confusion Matrix

In the code cell below, implement the `confusion_matrix` function. A confusion matrix for a classifier is like a spreadsheet or table that has all the labels along the rows and columns. Each cell contains the number of data points where the gold label corresponded to that row label, and the predicted label to that column label.

For example, for labels `TRUE` and `FALSE`, here is a possible confusion matrix:

| | TRUE | FALSE |
| --- | ---- | ----- |
| TRUE | 5 | 2   |
| FALSE | 1 | 4   | 

This says that there were 7 total data points with gold label `TRUE`, of which 5 had predicted label `TRUE`. There were 5 total data points with gold label `FALSE`, of which 4 had predicted label `FALSE`. This is a pretty good classifier!

Your confusion matrix will be a dictionary of dictionaries. Here's the above confusion matrix as a dictionary of dictionaries:
```
cf = {'TRUE': {'TRUE': 5, 'FALSE': 2}, 'FALSE': {'TRUE': 1, 'FALSE': 4}}
```

In [None]:
def confusion_matrix(results):
    """ Returns the confusion matrix for a list of classifier results

    :param results: a list of dictionaries. Each dictionary contains, at minimum, the keys 'label' and 'pred'
    :type results: list[dict]
    :returns: confusion matrix
    :rtype: dict
    """
    confusion={}
    for datum in results:
        if datum['label'] not in confusion:
            confusion[datum['label']]={}
        if datum['pred'] not in confusion[datum['label']]:
            confusion[datum['label']][datum['pred']]=1
        else:
            confusion[datum['label']][datum['pred']]+=1
    return confusion


Now print the confusion matrix for the finetuned model.

In [None]:
print(confusion_matrix(results))

{1: {1: 44, 0: 8}, 0: {0: 29, 1: 19}}


# Add Your Finetuned Model to Your Webapp



1. Create a class attribute in the `corpus` class for loading your model.


2. Create an instance method, `get_sentence_level_sentiment`, in the `corpus` class. This method should return a list of pairs `(sentence, label)` where `label` 
 is the sentiment label for the sentence. Test it in the code cell below.


In [None]:
sentence_sentiment("Hello my name is Evelyn, I am having a great day!")

[{'label': 'LABEL_1', 'score': 0.9952759742736816}]

In [None]:
from spacy_on_corpus import corpus
my_corpus.get_sentence_level_sentiment('1')

[("It's a shame that the weak writing undermines The Creator so much, as there are some intriguing concepts that could have been compelling if executed better.",
  'NEGATIVE'),
 ("For the most part, it's a mishmash of other movies with not much to say on its own.",
  'NEGATIVE')]

3. Create an instance method, `render_document_sentiments`, in the `corpus` class. This method should return a markdown table for the document containing the sentences and their corresponding sentiment labels. At the bottom, it should have an extra row where the "sentence" is the string "document" and the label is the document-level sentiment (*not* from your finetuned model; from project 4b). Test it in the code cell below.


In [None]:
my_corpus.render_document_sentiments('1')

"Document Sentiments\n| Sentence | Sentiment |\n| -------- | ---------- |\n| It's a shame that the weak writing undermines The Creator so much, as there are some intriguing concepts that could have been compelling if executed better. | NEGATIVE |\n| For the most part, it's a mishmash of other movies with not much to say on its own. | NEGATIVE |\n| Document | [{'label': 'NEGATIVE', 'score': 0.9997203946113586}] |\n"

Document Sentiments
| Sentence | Sentiment|
| -------- | ---------- |
| It's a shame that the weak writing undermines The Creator so much, as there are some intriguing concepts that could have been compelling if executed better. | NEGATIVE |
| For the most part, it's a mishmash of other movies with not much to say on its own. | NEGATIVE |


4. In `server.py`, add an anvil callable function, `get_doc_sentiment_markdown`, that calls `render_document_sentiments`. Test it in the code cell below.


In [None]:
my_corpus.render_document_sentiments('2')

"Document Sentiments\n| Sentence | Sentiment |\n| -------- | ---------- |\n| Although 'New Asia' is America's enemy, we are encouraged to transfer our sympathies in that direction. | POSITIVE |\n| Yet the abiding vision of Asian life is a mass of touristic clichés seen through western eyes. | NEGATIVE |\n| Document | [{'label': 'NEGATIVE', 'score': 0.9748802781105042}] |\n"

Document Sentiments
| Sentence | Sentiment |
| -------- | ---------- |
| Although 'New Asia' is America's enemy, we are encouraged to transfer our sympathies in that direction. | POSITIVE |
| Yet the abiding vision of Asian life is a mass of touristic clichés seen through western eyes. | NEGATIVE |\n| Document | [{'label': 'NEGATIVE', 'score': 0.9748802781105042}] |


In [5]:
import server
server.get_doc_sentiment_markdown('2')

"Document Sentiments\n| Sentence | Sentiment |\n| -------- | ---------- |\n| Although 'New Asia' is America's enemy, we are encouraged to transfer our sympathies in that direction. | POSITIVE |\n| Yet the abiding vision of Asian life is a mass of touristic clichés seen through western eyes. | NEGATIVE |\n| Document | [{'label': 'NEGATIVE', 'score': 0.9748802781105042}] |\n"

| Sentence | Sentiment |
 -------- | ---------- |
| Although 'New Asia' is America's enemy, we are encouraged to transfer our sympathies in that direction. | POSITIVE |
| Yet the abiding vision of Asian life is a mass of touristic clichés seen through western eyes. | NEGATIVE |
| Document | [{'label': 'NEGATIVE', 'score': 0.9748802781105042}] |


# Model Card

***ALWAYS DOCUMENT YOUR MODEL***

Complete the model card reading in Perusall.

Then, complete the model card in `model_card.md` for your finetuned model.

# Resources

* https://livingdatalab.com/posts/2023-04-23-fine-tuning-a-sentiment-analysis-model-with-huggingface.html
* https://huggingface.co/docs/transformers/v4.15.0/model_sharing
* https://huggingface.co/docs/datasets/process