# Context
Entity Extraction is a historical problem in NLP, and still one of the most useful one.
It consists in extracting key entities out of a piece of text, where an entity is a grounded concept of interest.

We are going to work with one called `WNUT 17` which contains _rare_ entities, extracted from user-generated content (comments, twitter). All the text we will be working with is in English.

This is a challenging dataset, because of the rarity of the entities and the noisineses of the text. To get an idea of the state-of-the-art you can look at current results [here](https://paperswithcode.com/sota/named-entity-recognition-on-wnut-2017).

Here we are going to try to solve this nail with a big ðŸ”¨: a **large language model**

# Data
We will use the `datasets` library from HuggingFace for this

In [None]:
from datasets import load_dataset

In [None]:
dataset = load_dataset('wnut_17')
train = dataset['train']

A few helper functions:

In [None]:
# map indexes to names of types of entities
ner_names = list(range(15))
ner_names[0] = 'Outside'
ner_names[1] = 'B-corporation'
ner_names[2] = 'I-corporation'
ner_names[3] = 'B-creative-work'
ner_names[4] = 'I-creative-work'
ner_names[5] = 'B-group'
ner_names[6] = 'I-group'
ner_names[7] = 'B-location'
ner_names[8] = 'I-location'
ner_names[9] = 'B-person'
ner_names[10] = 'I-person'
ner_names[11] = 'B-location'
ner_names[12] = 'B-product'
ner_names[13] = 'B-location'
ner_names[14] = 'I-product'

class Entity:
    """
        a Entity consists of two things: its type and the string describing it
        taken from https://github.com/cohere-ai/cohere-python/blob/main/cohere/extract.py
    """
    def __init__(self, type: str, value: str) -> None:
        self.type = type
        self.value = value

    def toDict(self) -> dict:
        return {"type": self.type, "value": self.value}

    def __str__(self) -> str:
        return f"[{self.type}] {self.value}"

    def __repr__(self) -> str:
        return str(self.toDict())

    def __eq__(self, other) -> bool:
        return self.type == other.type and self.value == other.value

    
def is_beginning(nt_idx):
    return ner_names[nt_idx][0] == 'B'

def _get_entities(d):
    """
        returns entities as the concatenation of the contiguous strings
    """
    cur_tag, cur_txt = None, None
    for idx,nt in enumerate(d['ner_tags']):
        if nt == 0: 
            continue
        if is_beginning(nt):
            if cur_txt is not None:
                yield ' '.join(cur_txt), ner_names[cur_tag]
            cur_txt = [ d['tokens'][idx ] ] # start new Named Entity
            cur_tag = nt
        else: # is Inside of the same tag
            cur_txt.append( d['tokens'][idx])
    # maybe there is a started NE
    if cur_txt is not None:
        yield ' '.join(cur_txt), ner_names[cur_tag]
        
def entities(d):
    """
        returns list of Entity classes
    """
    result = [] 
    for ents in _get_entities(d):
        result.append(Entity(ents[1][2:], ents[0]))
    return result


def to_text(datapoint):
    return ' '.join(datapoint['tokens'])

    

A few examples

In [None]:
for idx in [2,31, 100, 42, 2021]:
    print('TEXT:', to_text(train[idx]))
    print('ENTITIES: ')
    for e in list(entities(train[idx])):
        print('\t',e)

# Using Large Language Models (LLMs)

Sign-up to cohere.com using this [link](https://dashboard.cohere.ai/welcome/register?utm_source=cohere-owned&utm_medium=event&utm_campaign=matthias-course).
Retrieve your API key and copy this to the variable below:

In [None]:
import cohere
from cohere import CohereError
apikey=""
co = cohere.Client(apikey)


In [None]:
prompt = "In a hole in the ground there lived a hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell"
co.generate(prompt)

## In-context learning

Few-shot in-context learning consists in providing some examples, structured in a certain way ("prompt engineering") and leverage the fact that LLM are good in pattern-matching. Two examples below

In [None]:
prompt = """The capital of Germany is Berlin.
The capital of France is Paris.
The capital of Albania is"""

In [None]:
co.generate(prompt)

In [None]:
prompt = """text: I loved that movie.
sentiment: positive
text: It was a disgusting food.
sentiment: negative
text: I felt welcomed as if I was a king.
sentiment: positive
text: The service was truly exceptionally good.
sentiment:"""

In [None]:
co.generate(prompt)

# Evaluation

For evaluation, using F1 measure you will need to compute the entities which are in common between the gold label and the proposed ones.
As many of the posts do not have any entity we will be focusing on **recall** as evaluation

In [None]:
def evaluate_one(gold, proposed):
    """ returns (number of common hits, length of retrieved entities, length of gold entities)
    params:
        - gold: set of gold entities
        - proposed: set of retrieved entities
    """
    common = len(gold.intersection(proposed))
    return common, len(proposed), len(gold)

def precision(common, prop, gold):
    prec = 1
    if prop != 0: prec   = common/prop
    return prec

def recall(common, prop, gold):
    recall = 1
    if gold != 0: recall = common/gold 
    return recall

def f1(common, prop, gold):
    prec = precision(common, prop, gold)
    rec  = recall(common, prop, gold)
    f1 = 0
    if prec+rec>0:
        f1 = 2*prec*rec / (prec+rec)
    return f1

# The Exercise
You will need to create a prompt (play around with some alternatives), and obtain a result for each text in the test set. The final average of recalls should be > 0.6

In [None]:
# TODO: prompt generation

In [None]:
import numpy as np
test = dataset['test']
recalls = []

for idx,datapoint in enumerate(test):
    thisprompt = ... #  generate an appropiate prompt 
    thisprompt += ... # concante something out of datapoint
 
    try:
        c = co.generate(thisprompt)
        answer = c.generations[0].text
    except CohereError as e:
        continue
    
    proposed = .... # extract the entities from this text

    gold = set([e.value.lower() for e in entities(t)])
    r_score = recall(*evaluate_one(gold,proposed))
    recalls.append(r_score)

print np.mean(recalls)

# (Bonus) 
Now, let's observe how the model behave given different configurations. In particular : we can ask the following questions : How does the perfomance of the model depend on the number of training examples used for prompting ? On the training samples ? On the prompting methods ? <br>

### Exp1 : Varying the train data size

In [None]:
import seaborn as sns
import pandas as pd

In [None]:
def run_with_n_samples(train, test, n_sampling):
    all_recalls = []
    n_train_samples = []
    for n in range(1, n_sampling+1):
        recalls = [] # recalls for exp with n train examples
        
        train_subset = ... # subset of training data of size n
        prompt = ... # Prompt with n train examples
        ... # use what was done before to obtain the results using the current prompt for n train samples
        mean_recall = ... 
        
        recalls.append(mean_recall)
        n_train_samples.append(n)
    
    return n_train_samples, results

In [None]:
n_sampling = ...
n_train_samples, exp1_recalls = run_with_n_samples(train, test, n_sampling)
df = pd.DataFrame(data={"n_train_samples": n_train_samples, "recall": recalls})
sns.lineplot(data=df, x="n_train_samples", y="recall")

### Exp2 : Varying the training samples

In [None]:
def run_with_varying_samples(train, test, n_sampling):
    recalls = []
    for n in range(n_sampling):
        train_sample = ... # random training sample of size n
        ... # use what was done before to obtain the results using the train subset
        mean_recall = ... 
        
        recalls.append(mean_recall)
        
    return recalls


In [None]:
n_sampling = ...
exp2_recalls = run_with_varying_samples(train, test, n_sampling)
print(np.mean(exp2_recalls), np.std(exp2_recalls))

### Exp3: Try different prompting ways and compare the results