# **Building a Named Entity Recognition app**

We are building an app that will take text and label a person's name, institution, or places, etc., using a BERT-based named entity recognition model.

- BERT is a machine learning model for natural language processing (NLP).
- It extracts or identifies specific entities such as names, places, companies, institutions, etc.
- We are using the **bert-base-NER** model, a 108M-parameter fine-tuned BERT model for Named Entity Recognition (NER).

**bert-base-NER** recognizes 4 types of entities:

- LOC (Location)
- ORG (Organization)
- PER (Person)
- MISC (Miscellaneous)

In [None]:
# the code to run the model locally using API endpoint
from transformers import pipeline

get_completion = pipeline("ner", model="dslim/bert-base-NER")

def ner(input):
    output = get_completion(input)
    return {"text": input, "entities": output}

In [None]:
from google.colab import userdata
hf_api_key = userdata.get('HF_API_KEY')
import gradio as gr

# Helper function
import requests, json

# Summarization endpoint
def get_completion(inputs, parameters=None,ENDPOINT_URL=userdata.get('HF_API_SUMMARY_BASE')):
    headers = {
      "Authorization": f"Bearer {hf_api_key}",
      "Content-Type": "application/json"
    }
    data = { "inputs": inputs }
    if parameters is not None:
        data.update({"parameters": parameters})
    response = requests.request("POST",
                                ENDPOINT_URL, headers=headers,
                                data=json.dumps(data)
                               )
    return json.loads(response.content.decode("utf-8"))


In [None]:
API_URL = userdata.get('HF_API_NER_BASE') #NER endpoint
text = "My name is Andrew and I live in California"
get_completion(text, parameters=None, ENDPOINT_URL=API_URL)

[{'entity_group': 'PER',
  'score': 0.9872257113456726,
  'word': 'Pol',
  'start': 11,
  'end': 14},
 {'entity_group': 'PER',
  'score': 0.4419715702533722,
  'word': '##i',
  'start': 14,
  'end': 15},
 {'entity_group': 'ORG',
  'score': 0.9195873737335205,
  'word': 'HuggingFace',
  'start': 28,
  'end': 39}]

If we run the get_completion function, we will get a dictionary of entities as a result.

But it does not look pretty for the user.

Therefore, we use Gradio to make this more user-friendly output.

In [None]:
def ner(input):
    output = get_completion(input, parameters=None, ENDPOINT_URL=API_URL)
    return {"text": input, "entities": output}


# create a user interface with Gradio
demo = gr.Interface(fn=ner,
                    inputs=[gr.Textbox(label="Text to find entities", lines=3)],
                    outputs=[gr.HighlightedText(label="Result")],
                    title="Names Entity Recognition App",
                    description="Find entities using the `dslim/bert-base-NER` model under the hood!",
                    allow_flagging="never",
                    #Here we introduce a new tag, examples, easy to use examples for your application
                    examples=["My name is Andrew and I live in California", "My name is Poli and work at HuggingFace"])
demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://48d2eb8b27b80e8b2d.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [None]:
# create a function to merge tokens
"""This function will merge tokens (broken words) and present into a complete word
    e.g. from 'Pol' + 'i', the result of merge tokens will be 'Poli' as a name"""


def merge_tokens(tokens):
    merged_tokens = []
    for token in tokens:
        if merged_tokens and merged_tokens[-1]['entity_group'].endswith(token['entity_group'][2:]):
            # If current token continues the entity of the last one, merge them
            last_token = merged_tokens[-1]
            last_token['word'] += token['word'].replace('##', '')
            last_token['end'] = token['end']
            last_token['score'] = (last_token['score'] + token['score']) / 2
        else:
            # Otherwise, add the token to the list
            merged_tokens.append(token)
    return merged_tokens


# call the merged_tokens inside the 'ner' function for our app
def ner(input):
    output = get_completion(input, parameters=None, ENDPOINT_URL=API_URL)
    merged_tokens = merge_tokens(output)
    return {"text": input, "entities": merged_tokens}

demo = gr.Interface(fn=ner,
                    inputs=[gr.Textbox(label="Text to find entities", lines=2)],
                    outputs=[gr.HighlightedText(label="Result")],
                    title="Names Entity Recognition App",
                    description="Find entities using the `dslim/bert-base-NER` model under the hood!",
                    allow_flagging="never",
                    examples=["My name is Andrew, I'm building DeeplearningAI and I live in California", "My name is Poli, I live in Vienna and work at HuggingFace"])

demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://34041899ab52f2a51d.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [None]:
# clean up and close all the ports
gr.close_all()

**Note**:
- How to get Inference API huggingface, see [LINK](https://blog.futuresmart.ai/mastering-hugging-face-inference-api-integrating-nlp-models-for-real-time-predictions)