### Model description
#### bert-base-NER
<br>bert-base-NER is a fine-tuned BERT model that is ready to use for Named Entity Recognition and achieves state-of-the-art performance for the NER task. It has been trained to recognize four types of entities: location (LOC), organizations (ORG), person (PER) and Miscellaneous (MISC).
#### xlm-roberta-large-ner-hrl
<br>xlm-roberta-large-ner-hrl is a Named Entity Recognition model for 10 high resourced languages (Arabic, German, English, Spanish, French, Italian, Latvian, Dutch, Portuguese and Chinese) based on a fine-tuned XLM-RoBERTa large model.
### Labels
<br>This model was fine-tuned on English version of the standard CoNLL-2003 Named Entity Recognition dataset.
<br>The training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes:
<br>Abbreviation	Description
<br>O	Outside of a named entity
<br>B-MIS	Beginning of a miscellaneous entity right after another miscellaneous entity
<br>I-MIS	Miscellaneous entity
<br>B-PER	Beginning of a person’s name right after another person’s name
<br>I-PER	Person’s name
<br>B-ORG	Beginning of an organization right after another organization
<br>I-ORG	organization
<br>B-LOC	Beginning of a location right after another location
<br>I-LOC	Location

### Step - 1 Load Pretrained Model

In [1]:
model = "roberta" #"roberta" or "bert-base"

In [2]:
#imports
import pandas
import pprint
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

  LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0'


In [3]:
def get_pipeline(model):
    model_dict = {
                    "roberta":"Davlan/xlm-roberta-large-ner-hrl",
                    "bert-base":"dslim/bert-base-NER"
                 }
    #Tokens & Model initialisation
    tokenizer = AutoTokenizer.from_pretrained(model_dict[model])
    model = AutoModelForTokenClassification.from_pretrained(model_dict[model])

    #Pipeline for inference
    nlp = pipeline("ner", model=model, tokenizer=tokenizer)
    return nlp

### Loading Dataset

In [4]:
data = pandas.read_csv('./demo.csv')['sentences']

### Inference

In [5]:
nlp = get_pipeline(model)
output = {}
for sent in data:
    output[sent] = nlp(sent)

### Output

In [6]:
import pprint
pprint.pprint(output)

{'A large grey cat was asleep on a rocking chair.': [],
 'I did not ask the American Medical Association their opinion of this arrangement.': [{'end': 26,
                                                                                        'entity': 'B-ORG',
                                                                                        'index': 6,
                                                                                        'score': 0.99999094,
                                                                                        'start': 17,
                                                                                        'word': '▁American'},
                                                                                       {'end': 34,
                                                                                        'entity': 'I-ORG',
                                                                                        'index': 7,
            

Refenence
<br> https://huggingface.co/dslim/bert-base-NER?text=My+name+is+Wolfgang+and+I+live+in+Berlin
<br> https://huggingface.co/Davlan/xlm-roberta-large-ner-hrl?text=%D8%A5%D8%B3%D9%85%D9%8A+%D8%B3%D8%A7%D9%85%D9%8A+%D9%88%D8%A3%D8%B3%D9%83%D9%86+%D9%81%D9%8A+%D8%A7%D9%84%D9%82%D8%AF%D8%B3+%D9%81%D9%8A+%D9%81%D9%84%D8%B3%D8%B7%D9%8A%D9%86.