# Real time Name entity recognition on the IPU

Integration of the Graphcore Intelligence Processing Unit (IPU) and the Hugging Face transformer library means that it only takes a few lines of code to perform complex tasks which require deep learning.

In this notebook we perform **name entity extraction (NER)**  also known as token classification: we use natural language processing models to classify the words inside the prompt. 


The ease of use of the `pipeline` interface lets us quickly experiment with the pre-trained models and identify which one will work best.
This simple interface means that it is extremely easy to access the fast inference performance of the IPU on your application.

<img src="images/token_classification.png" alt="Widget inference on a token classification task" style="width:500px;">

While this notebook is focused on using the model (inference), our [token_classification](token_classification.ipynb) notebook will show you how to fine tune a model for a specific task using the [`datasets`](https://huggingface.co/docs/datasets/index) package.

In order to run this demo you will need to have a Poplar SDK environment enabled with the PopTorch installed
(see the [Getting Started](https://docs.graphcore.ai/en/latest/getting-started.html) guide for your IPU system),  and Optimum Graphcore.

First of all, let's make sure your environment has the latest version of [🤗 Optimum Graphcore](https://github.com/huggingface/optimum-graphcore) available.

In [None]:
# %pip install "optimum-graphcore>=0.4, <0.5"
# %pip install emoji==0.6.0 gradio

In [None]:
%load_ext autoreload
%autoreload 2

The value for cache directories can be configured through environment variables or directly in the notebook:

In [None]:
import os
executable_cache_dir = os.getenv("POPLAR_EXECUTABLE_CACHE_DIR", "./exe_cache/")

## NER with transformers pipelines on the IPU

Lets load our model config for the IPU and get started with using pipelines to run NER on the IPU:

In [None]:
from optimum.graphcore import pipelines
inference_config = dict(layers_per_ipu=[40], ipus_per_replica=1, enable_half_partials=True,
                        executable_cache_dir=executable_cache_dir)

The simplest way to use a model on the IPU is to use the `pipeline` function. It provides a set of models which have been validated to work on a given task. To get started choose the task and call the `pipeline` (To do: explain what the pipeline function does) function:

This loads up the most basic "ner" model, which in this case the default is `BERT...` , learn more spesific details about the pipeline here: https://huggingface.co/docs/transformers/v4.25.1/en/main_classes/pipelines#transformers.TokenClassificationPipeline

In [None]:
ner_pipeline = pipelines.pipeline("ner", 
                                  ipu_config=inference_config, 
                                  padding='max_length', 
                                  max_length=256)

We can now create a prompt which we can use with our model pipeline:

In [None]:
prompt = """The simplest way to use a model on the IPU is to use the `pipeline` function.
It provides a set of models which have been validated to work on a given task. To get
started choose the task and call the `pipeline` function"""
out = ner_pipeline(prompt)
out

Lets give our pipeline some examples to do NER on:

In [None]:
examples = [
    "My name is Wolfgang and I live in Berlin, I work for Graphcore and I really like HuggingFace",
    "I'm from France and I live in the UK, John is happy there.",
    "Dans Budapest, la est une grande piscine ou les gens visite. ",
    "The hospital was full of patients with many different diseases, many had covid-19 , flu and colds."
]

We can use our model pipeline to do NER on our examples, for instance lets look at our model outputs for our first prompt!


In [None]:
output_ner = ner_pipeline(examples[0])
output_ner

This output is messy and doesn't allow us to really quickly understand our models outputs. 
We're lucky though because we can use the gradio app to build a fast and simple app to quickly view the models outpts!

Lets demo this:

In [None]:
import gradio as gr

def app_for_pipeline(pipeline, examples=None, description="", label_description=""):
    return gr.Interface(
        fn=lambda x: dict(text=x, entities=pipeline(x)),
        inputs = [
            gr.Textbox(
                label="Initial text",
                lines=3,
                value=prompt,
            ),
        ],
        outputs=gr.HighlightedText(
            label=label_description,
            combine_adjacent=True,
            postprocess=True,
            value=dict(text=prompt, entities=out)
        ),
        examples=examples,
        description=description,
    )

Now we can see how our app allows us to quickly view, test and evaluate our model and examples! 

In [None]:
app_for_pipeline(ner_pipeline, examples=[examples]).launch()

Thus we have seen how fast and easy it is to run NER on the IPU and to build an app to make it look pretty.

### Multilingual model

We can even quickly load and run a model which is able to do the same task but for different languages. Lets put that to action:

In [None]:
model = "Davlan/bert-base-multilingual-cased-ner-hrl"
ner_pipeline_multilingual = pipelines.pipeline(
    "ner", model=model, ipu_config=inference_config,
    padding='max_length', max_length=256
)
out = ner_pipeline_3(prompt)

We can port this model to our gradio app as such:

In [None]:
app_for_pipeline(ner_pipeline_multilingual, examples=[examples]).launch()

We can also use our model to identify and locate food in text...

### Food information extraction

The advantage of using pipelines on the IPU is that we can quickly load different models for different tasks! For instance lets load up this checkpoint which identifies food in text:

In [None]:
model = "chambliss/distilbert-for-food-extraction"
ner_pipeline_3 = pipelines.pipeline(
    "ner", model=model, ipu_config=inference_config,
    padding='max_length', max_length=256
)
out = ner_pipeline_3(prompt)

Now we can use our new model to ID prompts related to food in some new text examples:

In [None]:
app_for_pipeline(
    ner_pipeline_3, 
    examples=["I went to the restaurant last night, the food was excellent. I hate roast carrots with a side of chips, what a meal!"] + examples,
    description="Try prompting me with some food-related text!").launch()


### Biomedical Model
A task which may be more applicable and usable in industry could be identifying key words in medical data. Hypotehtically, if you had to queary a large database of medcial data to learn about a spesific disease this may take alot of time which could be simplified by using a NER model to help us pick out and highlight very spesific information

Hence lets load up this biological medical model:

In [None]:
model = "alvaroalon2/biobert_diseases_ner"
ner_pipeline_4 = pipelines.pipeline(
    "ner", model=model, ipu_config=inference_config,
    padding='max_length', max_length=256
)
out = ner_pipeline_4(prompt)

Now lets see how this looks in gradio!

In [None]:
app_from_pipeline(
    ner_pipeline_4, examples=["I'm ill, I've got a cold or the flu. It's annoying I have shivers and a fever!"] + examples,
    description="Try prompting me with some food-related text!"
).launch()

This notebook shows us how how fast, easy and interactive the IPU can be. 
We have been able to quickly swap out models for different purposes super fast! Also, we have been able to build an app that we can use as an interactive interface between us and the IPU to visualise our results quickly.

########################################## Alex's old code ##################################################

In [None]:
model = "vblagoje/bert-english-uncased-finetuned-pos"

ner_pipeline_2 = pipelines.pipeline(
    "ner", model=model, ipu_config=inference_config,
    padding='max_length', max_length=256
)
out = ner_pipeline_2(prompt)

In [None]:
model = "vblagoje/bert-english-uncased-finetuned-pos"
ner_pipeline_2 = pipelines.pipeline(
    "ner", model=model, ipu_config=inference_config,
    padding='max_length', max_length=256
)
out = ner_pipeline_2(prompt)

In [None]:
app_for_pipeline(ner_pipeline_2, examples=examples).launch()

In [None]:
model = "chambliss/distilbert-for-food-extraction"
ner_pipeline_3 = pipelines.pipeline(
    "ner", model=model, ipu_config=inference_config,
    padding='max_length', max_length=256
)
out = ner_pipeline_3(prompt)

In [None]:
app_for_pipeline(
    ner_pipeline_3, 
    examples=["I went to the restaurant last night, the food was excellent. I hate roast carrots with a side of chips, what a meal!"] + examples,
    description="Try prompting me with some food-related text!").launch()


In [None]:
model = "alvaroalon2/biobert_diseases_ner"
ner_pipeline_4 = pipelines.pipeline(
    "ner", model=model, ipu_config=inference_config,
    padding='max_length', max_length=256
)
out = ner_pipeline_4(prompt)

In [None]:
from_pipeline2(
    ner_pipeline_4, examples=["I'm ill, I've got a cold or the flu. It's annoying I have shivers and a fever!"] + examples,
    description="Try prompting me with some food-related text!"
).launch()

In [None]:
def from_pipeline(pipeline, examples=None, description="", label_description=""):
    return gr.Interface(
        fn=lambda x: dict(text=x, entities=pipeline(x)),
        inputs = [
            gr.Textbox(
                label="Initial text",
                lines=3,
                value=prompt,
            ),
        ],
        outputs=gr.HighlightedText(
            label=label_description,
            combine_adjacent=True,
            postprocess=True,
            value=dict(text=prompt, entities=out)
        ),
        examples=examples,
        description=description,
    )

def from_pipeline2(pipeline, examples=[], description="", label_description=""):
    demo = gr.Blocks(   
        # examples=examples,
        description=description,
    )
    with demo:
        inputs = gr.Textbox(
            label="Initial text",
            lines=3,
            value=prompt,
        )
        outputs=gr.HighlightedText(
            label=label_description,
            combine_adjacent=True,
            postprocess=True,
            value=dict(text=prompt, entities=out)
        )
        examples_block = gr.Examples(examples=examples, inputs=inputs, outputs=outputs)
        inputs.change(
            fn=lambda x: dict(text=x, entities=pipeline(x)),
            inputs=inputs, outputs=outputs, postprocess=True
        )
    return demo


In [None]:
demo = from_pipeline2(
    ner_pipeline_4, 
    examples=[
        "I'm ill, I've got a cold or an influenza. It's annoying I have shivers and a fever!"        
    ] + examples,
    description="Try prompting it with something related to diseases",
    label_description="Diseases",
)
demo.launch()

In [None]:
demo.launch()