# Flair Basics

In [None]:
from flair.data import Sentence

We can now define a sentence like : 

In [None]:
sentence = Sentence('The grass is greeen')

In [None]:
print(sentence)

Sentence: "The grass is greeen"   [− Tokens: 4]


We can access the tokens via their token id or with their index

In [None]:
print(sentence.get_token(4))
print(sentence[3])

Token: 4 greeen
Token: 4 greeen


We can also iterate over all tokens in a sentence

In [None]:
for token in sentence:
    print(token)

Token: 1 The
Token: 2 grass
Token: 3 is
Token: 4 greeen


# Tokenization

There is simple tokenizer included in the library using the lightweight segtok library to tokenize your text for such a sentence defenition. In the sentence constructor `use_tokenize` to tokenize the input string before instantiating the Sentence object

In [None]:
sentence = Sentence('The grass is green.', use_tokenizer=True)
print(sentence)

Sentence: "The grass is green ."   [− Tokens: 5]


# Tags on tokens

A token as fields for linguistic annotation:

- Lemma
- part-of-speech tag
- named entity tag

We can add a tag by specifying the tag type and the tag value.

We will be adding an NER tag of type 'color' to the word 'green'. This means that we've tagged this word as an entity of type color.

In [None]:
sentence[3].add_tag('ner','color')

In [None]:
print(sentence.to_tagged_string())

The grass is green <color> .


Each tag has associated score too

In [None]:
from flair.data import Label
tag = sentence[3].get_tag('ner')
print(f'{sentence[3]} is tagged as {tag.value} with confidence score {tag.score}')

Token: 4 green is tagged as color with confidence score 1.0


The manually added color tag has a score of 1.0 . A tag predicted by a sequence labeler will have a score value that indicates the classifier confidence.

A sentence can have one or multiple labels that can for example be used in classification task. For instance, the example below shows how we add the label 'sports' to a sentence. 

In [None]:
sentence.add_label()

In [None]:
sentence = Sentence('France is the current world cup winner.')
sentence.add_label('topic','sports')
sentence.add_label('language', 'English')
print(sentence)

Sentence: "France is the current world cup winner ."   [− Tokens: 8  − Sentence-Labels: {'topic': [sports (1.0)], 'language': [English (1.0)]}]


In [None]:
for label in sentence.labels:
    print(label)

sports (1.0)
English (1.0)


# Tagging Text

Using pre-trained sequence tagging models

Flair has numerous pre-trained models. For the named entity recognition (NER) task there is a model that was trained on the English CoNLL-03 task and can recognize 4 different entity types. Import it using:


In [None]:
from flair.models import SequenceTagger
tagger = SequenceTagger.load('ner')

2021-03-13 11:01:14,590 --------------------------------------------------------------------------------
2021-03-13 11:01:14,591 The model key 'ner' now maps to 'https://huggingface.co/flair/ner-english' on the HuggingFace ModelHub
2021-03-13 11:01:14,591  - The most current version of the model is automatically downloaded from there.
2021-03-13 11:01:14,592  - (you can alternatively manually download the original model at https://nlp.informatik.hu-berlin.de/resources/models/ner/en-ner-conll03-v0.4.pt)
2021-03-13 11:01:14,592 --------------------------------------------------------------------------------


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=432176557.0, style=ProgressStyle(descri…


2021-03-13 11:01:23,188 loading file /Users/puneet/.flair/models/ner-english/4f4cdab26f24cb98b732b389e6cebc646c36f54cfd6e0b7d3b90b25656e4262f.8baa8ae8795f4df80b28e7f7b61d788ecbb057d1dc85aacb316f1bd02837a4a4


In [None]:
tagger.predict(sentence)

In [None]:
print(sentence.to_tagged_string())

France <S-LOC> is the current world cup winner .


In [None]:
sentence = Sentence('Tim Cook went to New York City .')

tagger.predict(sentence)

print(sentence.to_tagged_string())

Tim <B-PER> Cook <E-PER> went to New <B-LOC> York <I-LOC> City <E-LOC> .


In [None]:
for entity in sentence.get_spans('ner'):
    print(entity)

Span [1,2]: "Tim Cook"   [− Labels: PER (0.9999)]
Span [5,6,7]: "New York City"   [− Labels: LOC (0.9851)]


Each span has a text, tag value, its position in the sentence and "score" that indicates how confident the tagger is that prediction is correct. You can also get additional information, such as as the position offsets of each entity in the sentence by calling

In [None]:
print(sentence.to_dict(tag_type='ner'))

{'text': 'Tim Cook went to New York City .', 'labels': [], 'entities': [{'text': 'Tim Cook', 'start_pos': 0, 'end_pos': 8, 'labels': [PER (0.9999)]}, {'text': 'New York City', 'start_pos': 17, 'end_pos': 30, 'labels': [LOC (0.9851)]}]}
