# NLP with Transformers

In the implementation part of assignment, we will evaluate Transformer models on text classification, e.g. sentiment analysis of IMDB reviews. **A large number of bonus points** can be obtained by further evaluating Transformer models on: 


*   Text classification: sentiment analysis of Rotten Tomatoes reviews.
*   Token classification: named entity recognition of CoNLL text.
*   Zero-shot classification: text classification into arbitrary categories with textual names.

## Google Colab

While the code in this notebook can be run locally on a powerful machine, it is highly recommended that the notebook is run on the GPU infrastructure availabe for free through [Google's Colab](https://colab.research.google.com/). To load the notebook in Colab:


1.   Point your browser to https://colab.research.google.com/
2.   If a pop-up window opens, click on the Upload button and select this notebook file `code/skeleton/Transformers Rule.ipynb`) from the homework folder.
3.   Alternatively, in the notebook window menu clik File -> Open notebook and load the same notebook file.

## 🤗 HuggingFace Transformers

To complete that tasks in this assignment, we will use lightweight (DistilBERT) pretrained Transformer models and the corresponding high-level API from the open source repository of the [HuggingFace](https://huggingface.co/docs/transformers/index) platform. If you are interested in learning how to use the API and the datasets, it is recommended that you go through the relevant sections in the HuggingFace [course on Transformers](https://huggingface.co/course/chapter1/1), which are linked from each portion in the assignment below.

While going through the HuggingFace course is highly recommended, it is not strictly necessary: this assignment can be completed by writing only Python code, without knowledge of the Transformers API. For this, you may be able to also reuse code that you have written in previous assignments.

You do not need a HuggingFace account to work on this assignment.


  

## Name: Naimisha Churi

# <font color="blue"> Submission Instructions</font>

## Google Colab (recommended):

1. Click the File -> Save in the menu at the top of the Jupyter Notebook.
2. Please make sure to have entered your name above.
3. Select Edit -> Clear all outputs. This will clear all the outputs from all cells (but will keep the content of all cells). 
4. Select Runtime -> Run all. This will run all the cells in order, and will take less than half hour for the sentiment analysis part.
5. Once you've rerun everything, select File -> Download -> Download .ipynb and download the notebook file showing the code and the output of all cells, and save it in the subfolder `code/complete/`.
6. Also save a PDF version of the notebook by selecting File -> Print, select Print as PDF, and Save it as `code/complete/TransformersRule.pdf`
7. Look at the PDF file and make sure all your solutions and outputs are there, displayed correctly.
7. Submit **both** your PDF and the notebook .ipynb file on Canvas. Make sure the PDF and notebook show the outputs of the training and evaluation procedures. Also upload any extra datasets that you used for bonus points by placing them in the `data/` folder (alternatively the notebook can show the web addresses from where the datasets are uploaded in Colab in your code).
8. Verify your Canvas submission contains the correct files by downloading them after posting them on Canvas.


## Local computer (only for powerful machines):

1. Click the Save button at the top of the Jupyter Notebook.
2. Please make sure to have entered your name above.
3. Select Cell -> All Output -> Clear. This will clear all the outputs from all cells (but will keep the content of ll cells). 
4. Select Cell -> Run All. This will run all the cells in order, and will take several minutes.
5. Once you've rerun everything, select File -> Download as -> PDF via LaTeX and download a PDF version *TransformersRule.pdf* showing the code and the output of all cells, and save it in the same folder that contains the notebook file *TransformersRule.ipynb*.
6. Look at the PDF file and make sure all your solutions are there, displayed correctly.
7. Submit **both** your PDF and notebook on Canvas. Make sure the PDF and notebook show the outputs of the training and evaluation procedures. Also upload any extra datasets that you used for bonus points by placing them in the `data/` folder.
8. Verify your Canvas submission contains the correct files by downloading them after posting them on Canvas.

# Theory

## Named Entity Recognition

Provide BIO-style and chunk-level annotation of the named entities (Person, Place, Organization, Date, or Product) in the following sentences:


1.   The third mate was Flask, a native of Tisbury, in Martha’s Vineyard.
2.   Its official Nintendo announced today that they Will release the Nintendo
3DS in north America march 27.
3.   Jessica Reif, a media analyst at Merrill Lynch & Co., said, "If they can
get up and running with exclusive programming within six months, it
doesn't set the venture back that far."

Also tag the examples above using the HuggingFace named entity recognition
tagger (starter code provided in the implementation section on NER) and compare against your manual annotation:


1.   Show your manually annotated named entities (labeled chunks) and the predicted named entities (labeled chuncks). Do the predicted entities match your annotations?
2.   Compute chunk-level Precision, REcall, and F1 measure, as described in homework 7, assuming your annotations are correct (the ground truth).






YOUR SOLUTION HERE <br>
1)
1. O O O O B-PER O O O B-LOC O B-PER B-LOC
2. O O B-ORG O B-DAT O O O O O B-PRD I-PRD O B-LOC I-LOC B-DAT I-DAT
3. B-PER I-PER O O O O B-ORG I-ORG I-ORG I-ORG O O O O O O O O O O O O O O O O O O O O O.

2)
<table>
    <tbody>
        <tr><td>Performance Measures</td><td>LOC</td><td>ORG</td><td>PROD</td><td>PER</td><td>DATE</td></tr>
        <tr><td>Precission</td><td>75</td><td>100</td><td>0</td><td>100</td><td>0</td></tr>
        <tr><td>Recall</td><td>75</td><td>100</td><td>0</td><td>75</td><td>0</td></tr>
<tr><td>F1</td><td>75</td><td>100</td><td>0</td><td>85.71</td><td>0</td></tr>
    </tbody>
</table>


## Language Models and Perplexity

Consider a bigram language model LM according to which p(Time) = 0.03, p(flies | time) = 0.01, p(like | flies) = 0.04 p(an | like) = 0.05 p(arrow | an) = 0.1.


1.   What is the *probability* of the sentence "Time flies like an arrow" according to this LM?
2.   What is the *perplexity* that this LM obtains when evaluated on the sentence "Time flies like an arrow"?




YOUR SOLUTION HERE: <br>
1. p(Time) x p(flies | time) x p(like | flies) x p(an | like) x p(arrow | an) = 0.03 x 0.01 x 0.04 x 0.05 x 0.1 = 6 x 10^-8
2. perplexity = 27.82

# Implementation using HuggingFace Transformers

In [1]:
# Install HuggingFace Transformers API modules.
!pip install transformers[sentencepiece]

# Install HuggingFace Datasets API modules.
!pip install datasets



The most basic object in the Transformers library is the pipeline() function. It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer. More details are here:  
https://huggingface.co/course/chapter1/3?fw=pt

In [2]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")

# Process just one document. Note that, although the document string is split on multiple lines, this is still just one string object.
output = classifier("The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents. "
                    "We live on a placid island of ignorance in the midst of black seas of infinity, and it was not meant that we should voyage far. "
                    "The sciences, each straining in its own direction, have hitherto harmed us little; but some day the piecing together of "
                    "dissociated knowledge will open up such terrifying vistas of reality, and of our frightful position therein, that we shall "
                    "either go mad from the revelation or flee from the light into the peace and safety of a new dark age.")
print(output)

# Process a batch of documents.
output = classifier(["I've been waiting for a HuggingFace course my whole life.", 
                     "The spirit is willing, but the flesh is weak.",
                     "It might and does take until the last scene to do so, but every plot thread is sewn-up, and there isn't a clue doled out along the way that doesn't make sense or fit into the final puzzle. Its as close to perfect as a screenplay can get.",
                     "The cast is great, the plot is quite ingenious and the runtime is nothing too overbearing.",
                     "This film was filmed in France.",
                     "This film was filmed in Afghanistan."])
print(output)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'NEGATIVE', 'score': 0.6746031641960144}]
[{'label': 'POSITIVE', 'score': 0.9598048329353333}, {'label': 'NEGATIVE', 'score': 0.983283519744873}, {'label': 'NEGATIVE', 'score': 0.9997618794441223}, {'label': 'POSITIVE', 'score': 0.9980607628822327}, {'label': 'POSITIVE', 'score': 0.9652075171470642}, {'label': 'NEGATIVE', 'score': 0.8350331783294678}]


# Sentiment Analysis of IMDB Reviews

In this part of the assignment, we evaluate the performance of DistilBERT (the default Transformer) on the development portion of the IMDB reviews dataset, and compare it against the performance of Logistic Regression and RNNs from previous assignments.

## Working with Datasets in Colab and HuggingFace

First, we need to load the IMDB Reviews dataset. HuggingFace already provide a large number of datasets in their [Hub](https://huggingface.co/datasets). Below we will be loading the IMDB dataset from an external source (http addresses on the course web page). More details on loading local and remote datasets are provided here:
https://huggingface.co/course/chapter5/2?fw=pt

In [3]:
from datasets import load_dataset

# Load the IMDB dataset. Only the development portion will be used later.
url_train = "https://webpages.charlotte.edu/rbunescu/courses/itcs4111/hw09/imdb/train.txt"
url_test = "https://webpages.charlotte.edu/rbunescu/courses/itcs4111/hw09/imdb/test.txt"
url_dev = "https://webpages.charlotte.edu/rbunescu/courses/itcs4111/hw09/imdb/dev.txt"

data_files = {"train": url_train, "test": url_test, "dev": url_dev}
dataset = load_dataset('text', data_files = data_files)
dataset

Using custom data configuration default-8bb77e59dbdf30f7
Reusing dataset text (/root/.cache/huggingface/datasets/text/default-8bb77e59dbdf30f7/0.0.0/4b86d314f7236db91f0a0f5cda32d4375445e64c5eda2692655dd99c2dac68e8)


  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 1500
    })
    test: Dataset({
        features: ['text'],
        num_rows: 1500
    })
    dev: Dataset({
        features: ['text'],
        num_rows: 1500
    })
})

The `dataset` object is a `DatasetDict` dictionary mapping to the train, test, and dev objects of type `Dataset`. Since these were created from text files, each line in the file leads to a string example in the `Dataset` object. There are `num_rows` examples in total that can be retrieved through iterators or the usual indexing mechanism, as shown below. More powerfull mechanisms for working with datasets are shown here: https://huggingface.co/course/chapter5/3?fw=pt

In [4]:
dataset.shape

{'dev': (1500, 1), 'test': (1500, 1), 'train': (1500, 1)}

In [4]:
# Print an example from the training dataset. Note how it is represented as a dictionary, mapping one feature named 'text' to its string value.
print(dataset['train'][5])

# To get just the example string itself, the way it was stored in the file.
print(dataset['train'][5]['text'])

# Since we evaluate only on the development data, delete the train and test portions. This will reduce the memory footprint in Colab.
del dataset['train']
del dataset['test']
dataset

{'text': 'pos Grey Gardens is shocking, amusing, sad and mesmerizing. I watched in amazement as Ediths Jr. and Sr. bickered and performed while reminiscing of their past. Their existence in a dilapidated mansion, (which they had not left for more than fifteen years) is both a comedy and a tragedy. This is a film you will not soon forget.'}
pos Grey Gardens is shocking, amusing, sad and mesmerizing. I watched in amazement as Ediths Jr. and Sr. bickered and performed while reminiscing of their past. Their existence in a dilapidated mansion, (which they had not left for more than fifteen years) is both a comedy and a tragedy. This is a film you will not soon forget.


DatasetDict({
    dev: Dataset({
        features: ['text'],
        num_rows: 1500
    })
})

### Process the development `Dataset`

Write a function `read_examples(ds)` that takes a `Dataset` object as input and returns a tuple containg the list of `labels` and the corresponding list of `reviews`. A label should have value 1 if the review was labeled as positive and 0 otherwise.

In [5]:
def read_examples(ds, flag):
  labels = []
  reviews = []

  # YOUR CODE HERE
  if flag:
    for d in ds:
      label = d['text'][:3]
      if label == 'pos':
        labels.append(1)
      elif label == 'neg':
        labels.append(0)
      reviews.append(d['text'][4:])
  else:
    for d in ds:
      label = int(d['text'][0])
      labels.append(label)
      reviews.append(d['text'][3:])


  return reviews, labels

reviews, labels = read_examples(dataset['dev'], 1)

# Since the top half of the reviews in the file are labeled as 'pos' and the bottom half are labeled as 'neg', the lines below should display
# [1, 1, 1, 1, 1]
# [0, 0, 0, 0, 0]
print(labels[:5])
print(labels[-5:])

[1, 1, 1, 1, 1]
[0, 0, 0, 0, 0]


In [61]:
dataset['dev'][10]

{'text': 'pos "Head" is a film that has held up well since its original release date in 1968. The movie is a complete contradiction of the Monkees image. It presents the Monkees in a way their fans never perceived them; men with real thoughts. Totally controlled by their producers, the Monkees were given the opportunity to tell their side of the story. The film pokes fun at their image, the entertainment industry, and corporate America. The soundtrack contains some of their best music. It\'s a movie well worth seeing over and over again.'}

## Sentiment Analysis using DistilBERT Transformer

In this section, we apply the default Transformer model for sentiment classification.

Simply calling the `classifier` on the list of `reviews` will not work because some reviews are longer than the maximum length of 512 that can be accomodated by DistilBERT. Even if the reviews were to be truncated, Colab may run out of memory, as the Transformer classifier tries to process all examples in parallel.

In [62]:
# Attempt to classify all reviews, store labels in 'predictions'.
predictions = classifier(reviews)

RuntimeError: ignored

### Delving deeper into the NLP pipeline

While HuggingFace may provide API for truncating and batching examples, below we show how to work with the [NLP pipeline](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/full_nlp_pipeline.svg), which will give us more control over the inputs and outputs. More details are provided in the [Behind the pipeline](https://huggingface.co/course/chapter2/2?fw=pt) course section.




### Subsample examples for quick evaluations

Classifying the entire dataset of 1,500 examples can be time consuming. For debugging purposes, it is useful to evaluate on a subset of examples. Write a function `sample_dataset(reviews, labels, k)` that extracts a subset of `sreviews` and their associated `slabels` containing just the top k followed by the bottom k examples in the dataset that is provided as input, for a total of 2k examples. We do this so that we have an equal number of positive (top k) and negative (bottom k) examples.

In [6]:
def sample_dataset(reviews, labels, k):
  sreviews, slabels = [], []
  
  # m = int(k/2)
  m = -1 * k
  # YOUR CODE HERE
  sreviews = reviews[:k]
  slabels = labels[:k]
  sreviews.extend(reviews[m:])
  slabels.extend(labels[m:])

  return sreviews, slabels

### Pipeline: From Tokenizer to Model to Post Processing

This is the main code, where first the reviews are Tokenized, then classified with the Model, followed by a slight Post Processing of the predicted labels. For more on the pipeline, see the [Putting it all together](https://huggingface.co/course/chapter2/6?fw=pt) section in the course and the previous sections in the Using Transformers chapter.

If we ran the sentiment analysis model on all 1,500 reviews directly, Colab would run out of memory (try it). Therefore, the code below runs the model on batches of 10 reviews at a time and accumulates the predicted labels in the `predictions` list.

In [9]:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
def model():
  # A machine learning model, like DistilBERT, can be trained using many hyper-parameters configurations,
  # such as learning rate, number of training epochs, preprocessing of its data, finetuning, etc. A checkpoint 
  # corresponds to a particular training setting. All checkpoints are listed and explained on the Hub page.
  checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"

  # Extract the Tokenizer that was used for the training data. It is important that the same tokenizer is used on the test data.
  # https://huggingface.co/course/chapter2/4?fw=pt
  tokenizer = AutoTokenizer.from_pretrained(checkpoint)

  # Extract the sequence classification model associated with the checkpoint.
  # https://huggingface.co/course/chapter2/3?fw=pt
  model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

  # Print the numerical IDs for the two labels, to verify 0 means negative, 1 means positive.
  print(model.config.id2label)

  # By default run evaluations on all examples.
  sreviews, slabels = reviews, labels

  # When set to True, experiments are run only on 2 * 50 examples.
  # Change to False for running on all 1,500 examples for the final submission.
  debug = True
  if debug:
    sreviews, slabels = sample_dataset(reviews, labels, 50)

  # Position for the current batch.
  position = 0
  # Accumulate batch predicted labels here.
  predictions = []

  while position < len(sreviews):
    # Evaluate on a batch of 10 reviews at a time.
    batch = sreviews[position: position + 10]
    
    # Will pad the sequences up to the max length in the dataset.
    # Will also truncate the sequences that are longer than the model max length (512 for BERT or DistilBERT).
    tokens = tokenizer(batch, padding = "longest", truncation = True, return_tensors = "pt")
    
    output = model(**tokens)
    predictions += list(map(lambda logit: int(logit[0] < logit[1]), output.logits))

    position += 10
    print('Processed', position, 'reviews.')

  return slabels, predictions, sreviews

### Evaluate the DistilBERT Model Accuracy on the IMDB Development Dataset

Write a function `compute_accuracy(labels, predictions)` that calculates the accuracy of the model `predicted` labels with respect to the ground truth `labels`.

In [8]:
def compute_accuracy(labels, predictions):
  accuracy = len([n for n in range(0,len(labels)) if (labels[n]==predictions[n])])/len(labels)
  #0 # YOUR CODE HERE. CAN YOU DO IT IN ONE LINE?

  return accuracy

slabels, predictions = model()
print('Accuracy = ', compute_accuracy(slabels, predictions))

{0: 'NEGATIVE', 1: 'POSITIVE'}
Processed 10 reviews.
Processed 20 reviews.
Processed 30 reviews.
Processed 40 reviews.
Processed 50 reviews.
Processed 60 reviews.
Processed 70 reviews.
Processed 80 reviews.
Processed 90 reviews.
Processed 100 reviews.
Processed 110 reviews.
Processed 120 reviews.
Processed 130 reviews.
Processed 140 reviews.
Processed 150 reviews.
Processed 160 reviews.
Processed 170 reviews.
Processed 180 reviews.
Processed 190 reviews.
Processed 200 reviews.
Processed 210 reviews.
Processed 220 reviews.
Processed 230 reviews.
Processed 240 reviews.
Processed 250 reviews.
Processed 260 reviews.
Processed 270 reviews.
Processed 280 reviews.
Processed 290 reviews.
Processed 300 reviews.
Processed 310 reviews.
Processed 320 reviews.
Processed 330 reviews.
Processed 340 reviews.
Processed 350 reviews.
Processed 360 reviews.
Processed 370 reviews.
Processed 380 reviews.
Processed 390 reviews.
Processed 400 reviews.
Processed 410 reviews.
Processed 420 reviews.
Processed 43

In [11]:
slabels, predictions, sreviews = model()

{0: 'NEGATIVE', 1: 'POSITIVE'}
Processed 10 reviews.
Processed 20 reviews.
Processed 30 reviews.
Processed 40 reviews.
Processed 50 reviews.
Processed 60 reviews.
Processed 70 reviews.
Processed 80 reviews.
Processed 90 reviews.
Processed 100 reviews.


In [13]:
#error analysis
def misclassified(slabels,predictions,sreviews):
  print('Missclassified reviews: ')
  for i in range(len(slabels)):
    if slabels[i] != predictions[i]:
      print("Reviews: "+str(sreviews[i])+" Predicted lable: "+str(predictions[i])+" Actual Lable: "+str(slabels[i]))


misclassified(slabels,predictions,sreviews)

Missclassified reviews: 
Reviews: So Dark The Night poses a tough challenge: It's very hard to write about it in any detail without ruining it for those who haven't yet seen it. Since it remains quite obscure, that includes just about everybody. The movie will strike those familiar with its director Joseph H. Lewis' better known titles in the noir cycle  Gun Crazy, The Big Combo, even My Name Is Julia Ross, which in its brevity it resembles  as an odd choice.<br /><br />For starters, the bucolic French countryside serves as its setting. Steven Geray, a middle-aged detective with the Surété in Paris, sets out for a vacation in the village of Ste. Margot (or maybe Margaux). Quite unexpectedly, he finds himself falling in love with the inkeepers' daughter (Micheline Cheirel), even though she's betrothed to a rough-hewn local farmer. But the siren song of life in Paris is hard to resist, so she agrees to marry him, despite the disparity in their ages, which inevitably becomes the talk of

## Analysis of Results

1.   Compare the performance of DistilBERT in IMDB with Logistic Regression and RNN performance from previous assignments.

<table>
    <tbody>
        <tr><td>Model</td><td>Accuracy</td></tr>
        <tr><td>DistilBERT</td><td>0.89</td></tr>
        <tr><td>Logistic Regression</td><td>0.84</td></tr>
        <tr><td>RNN</td><td>0.86</td></tr>
    </tbody>
</table>

2.   [5111] Error analysis:
  
  * Look at a sample of reviews that were misclassified by DistilBERT, try to determine if there is a common type of errors that it makes. <br>
a. The misclassified reviews contain a tone of sarcasm
b. They contain words that are usually used a positive/negative purpose and are used for the opposite.
  * [**Bonus**] Elaborate on how the model performance could be improved. [**Bigger Bonus**] Write code that leads to improved performance.
  * Look at a sample of reviews that were misclassified by Logistic Regression but corectly classified by DistilBERT, elaborate on why you think DistilBERT was able to do better.



---

## Bonus: Sentiment Analysis of Rotten Tomatoes Reviews

Evaluate the model on the development portion of the Rotten Tomatoes dataset. Do a similar analysis as done for the IMDB dataset.

In [12]:
# Only the development portion will be used.
url_train = "https://webpages.charlotte.edu/rbunescu/courses/itcs4111/hw09/rt/train.txt"
url_test = "https://webpages.charlotte.edu/rbunescu/courses/itcs4111/hw09/rt/test-blind.txt"
url_dev = "https://webpages.charlotte.edu/rbunescu/courses/itcs4111/hw09/rt/dev.txt"

# YOUR CODE HERE
data_files = {"train": url_train, "test": url_test, "dev": url_dev}
dataset = load_dataset('text', data_files = data_files, encoding = 'cp1252')

del dataset['train']
del dataset['test']
#dataset

reviews, labels = read_examples(dataset['dev'],0)

Using custom data configuration default-e4392d669705ee0b
Reusing dataset text (/root/.cache/huggingface/datasets/text/default-e4392d669705ee0b/0.0.0/4b86d314f7236db91f0a0f5cda32d4375445e64c5eda2692655dd99c2dac68e8)


  0%|          | 0/3 [00:00<?, ?it/s]

In [13]:
slabels, predictions = model()
print('Accuracy = ', compute_accuracy(slabels, predictions))

{0: 'NEGATIVE', 1: 'POSITIVE'}
Processed 10 reviews.
Processed 20 reviews.
Processed 30 reviews.
Processed 40 reviews.
Processed 50 reviews.
Processed 60 reviews.
Processed 70 reviews.
Processed 80 reviews.
Processed 90 reviews.
Processed 100 reviews.
Accuracy =  0.86




---

# Named Entity Recognition



Run the HuggingFace NER Transformer on the examples from the Theory section. Below is sample code adapted from the HuggingFace course section on [Transformers what they can do?](https://huggingface.co/course/chapter1/3?fw=pt).

In [14]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities = True)
ner("UNC Charlotte is a public research university in North Carolina.")

# YOUR CODE HERE

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)


Downloading:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.24G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

  f'`grouped_entities` is deprecated and will be removed in version v5.0.0, defaulted to `aggregation_strategy="{aggregation_strategy}"` instead.'


[{'end': 13,
  'entity_group': 'ORG',
  'score': 0.9821411,
  'start': 0,
  'word': 'UNC Charlotte'},
 {'end': 63,
  'entity_group': 'LOC',
  'score': 0.99887604,
  'start': 49,
  'word': 'North Carolina'}]

In [16]:
ner('The third mate was Flask, a native of Tisbury, in Martha’s Vineyard')

[{'end': 24,
  'entity_group': 'PER',
  'score': 0.99570006,
  'start': 19,
  'word': 'Flask'},
 {'end': 45,
  'entity_group': 'LOC',
  'score': 0.89198345,
  'start': 38,
  'word': 'Tisbury'},
 {'end': 67,
  'entity_group': 'LOC',
  'score': 0.97118497,
  'start': 50,
  'word': 'Martha ’ s Vineyard'}]

In [17]:
ner('Its official Nintendo announced today that they Will release the Nintendo 3DS in north America march 27.')

[{'end': 21,
  'entity_group': 'ORG',
  'score': 0.9988274,
  'start': 13,
  'word': 'Nintendo'},
 {'end': 77,
  'entity_group': 'MISC',
  'score': 0.9949834,
  'start': 65,
  'word': 'Nintendo 3DS'},
 {'end': 94,
  'entity_group': 'LOC',
  'score': 0.99817383,
  'start': 87,
  'word': 'America'}]

In [15]:
ner('Jessica Reif, a media analyst at Merrill Lynch & Co., said, "If they can get up and running with exclusive programming within six months, it doesnt set the venture back that far."')

[{'end': 12,
  'entity_group': 'PER',
  'score': 0.9985828,
  'start': 0,
  'word': 'Jessica Reif'},
 {'end': 52,
  'entity_group': 'ORG',
  'score': 0.99609566,
  'start': 33,
  'word': 'Merrill Lynch & Co.'}]

## Bonus: NER on CoNLL Text



In this bonus portion, evaluate the performance of the default Transformer model on the development portion of the CoNLL named entity dataset, and compare it against the performance of CRFs from a previous assignment.

To complete this exercise, you may consider reusing code from the CRF assignment and/or NER code from the [Token Classification](https://huggingface.co/course/chapter7/2?fw=pt) section of the HuggingFace course. Similar to the sentiment analysis portion, you may need to run the NER Transformer on small batches of sentences at a time so that Colab does not run out of memory.

In [None]:
# Only the development portion will be used.
url_train = "https://webpages.charlotte.edu/rbunescu/courses/itcs4111/hw09/ner/eng.train"
url_test = "https://webpages.charlotte.edu/rbunescu/courses/itcs4111/hw09/ner/eng.testb.blind"
url_dev = "https://webpages.charlotte.edu/rbunescu/courses/itcs4111/hw09/ner/eng.testa"

# YOUR CODE HERE





# Bonus: Zero-shot Classification

Zero-shot classification refers to using a model to classify text into labels for which the model was not explicitly trained. The example below, adapted from the HuggingFace course, shows how to use the default Transformer classifier to compute probabilities for new textual labels.

In [None]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a computer science course about Natural Language Processing and Machine Learning.",
    candidate_labels=["education", "politics", "business"],
)

Create or use an existing corpus that has at least 2 textual labels and 200 examples and evaluate the accuracy of the default Transformer model on text classification in the zero-shot setting. If in assignment 5 you created a text classification corpus, you can evaluate the Transformer model on that corpus.

In [None]:
# YOUR CODE HERE





# Bonus: Anything extra goes here

In [None]:
# YOUR CODE HERE



