# Tutorial 2: Tagging your Text
## Tagging with Pre-Trained Sequence Tagging Models
Using a pre-trained model for named entity recognition (NER), trained over the English CoNLL-03 task , and can recognize 4 different entity types. 


In [1]:
from flair.models import SequenceTagger
from flair.data import Sentence 
from flair.models import MultiTagger
from flair.models import SequenceTagger
from flair.tokenization import SegtokSentenceSplitter

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [2]:
tagger: SequenceTagger = SequenceTagger.load("ner")

2021-08-01 21:40:46,724 loading file /home/statisticallyfit/.flair/models/en-ner-conll03-v0.4.pt


In [3]:
tagger

SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings('/home/aakbik/.flair/embeddings/glove.gensim')
    (list_embedding_1): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
        (decoder): Linear(in_features=2048, out_features=300, bias=True)
      )
    )
    (list_embedding_2): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
        (decoder): Linear(in_features=2048, out_features=300, bias=True)
      )
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=4196, out_features=4196, bias=True)
  (rnn): LSTM(4196, 256, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=512, out_features=20, bias=True)
)

Using the `predict()` method of the tagger on a sentence will add predicted tags to the tokens in the sentence. 

Using a sentence with two named entities: 

In [4]:

sentence = Sentence("George Washington went to Washington.")

# Predict NER tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence.to_tagged_string())

George <B-PER> Washington <E-PER> went to Washington <S-LOC> .


## Getting Annotated Spans
Many sequence labeling methods annotate spans that consist of multiple words like "George Washington" in the example sentence. 

To directly get such spans in a tagged sentence, do: 

In [5]:
for entity in sentence.get_spans("ner"):
    print(entity)

Span [1,2]: "George Washington"   [− Labels: PER (0.9968)]
Span [5]: "Washington"   [− Labels: LOC (0.9994)]


**Another (longer) Example for Getting Annotated Spans: **

In [6]:
folkAirSentence = Sentence("When she doesn't answer, I fugre there's no more point in conversation. I steer her toward the kitchens. We'll have to pass by guards; there's no other way out. She has pasted on a horrible rictus of a smile, but at least she has enough self-possession for that. More worrying is the way she can't stop staring at things. As we walkt toward the guards, the intensity of her gaze is impossible to disguise. I improvise, trying to sound as though I am reciting a memorized message, without inflection in the words. 'Prince Cardan says we are to attend him.' One of the guards turns to the other. 'Balekin won't like that.' I try not to react, but it's hard. I just stand there and wait. If they lunge at us, I am going to have to kill them. 'Very well,' the first guard says. 'Go. But inform Cardan that his brother demands he brings both of you back this time.' ")

folkAirSentence

Sentence: "When she does n't answer , I fugre there 's no more point in conversation . I steer her toward the kitchens . We 'll have to pass by guards ; there 's no other way out . She has pasted on a horrible rictus of a smile , but at least she has enough self-possession for that . More worrying is the way she ca n't stop staring at things . As we walkt toward the guards , the intensity of her gaze is impossible to disguise . I improvise , trying to sound as though I am reciting a memorized message , without inflection in the words . ' Prince Cardan says we are to attend him . ' One of the guards turns to the other . ' Balekin wo n't like that . ' I try not to react , but it 's hard . I just stand there and wait . If they lunge at us , I am going to have to kill them . ' Very well ,' the first guard says . ' Go . But inform Cardan that his brother demands he brings both of you back this time . '"   [− Tokens: 200]

In [7]:
tagger.predict(folkAirSentence)
print(folkAirSentence.to_tagged_string())

When she does n't answer , I fugre there 's no more point in conversation . I steer her toward the kitchens . We 'll have to pass by guards ; there 's no other way out . She has pasted on a horrible rictus of a smile , but at least she has enough self-possession for that . More worrying is the way she ca n't stop staring at things . As we walkt toward the guards , the intensity of her gaze is impossible to disguise . I improvise , trying to sound as though I am reciting a memorized message , without inflection in the words . ' Prince Cardan <S-PER> says we are to attend him . ' One of the guards turns to the other . ' Balekin <S-PER> wo n't like that . ' I try not to react , but it 's hard . I just stand there and wait . If they lunge at us , I am going to have to kill them . ' Very well ,' the first guard says . ' Go . But inform Cardan <S-PER> that his brother demands he brings both of you back this time . '


In [8]:
# Getting the annotated spans: 
for entity in folkAirSentence.get_spans("ner"):
    print(entity)

Span [113]: "Cardan"   [− Labels: PER (0.9991)]
Span [132]: "Balekin"   [− Labels: PER (0.9969)]
Span [186]: "Cardan"   [− Labels: PER (0.9679)]


In [9]:
print(folkAirSentence.to_dict(tag_type = "ner"))

{'text': "When she doesn't answer, I fugre there's no more point in conversation. I steer her toward the kitchens. We'll have to pass by guards; there's no other way out. She has pasted on a horrible rictus of a smile, but at least she has enough self-possession for that. More worrying is the way she can't stop staring at things. As we walkt toward the guards, the intensity of her gaze is impossible to disguise. I improvise, trying to sound as though I am reciting a memorized message, without inflection in the words. 'Prince Cardan says we are to attend him.' One of the guards turns to the other. 'Balekin won't like that.' I try not to react, but it's hard. I just stand there and wait. If they lunge at us, I am going to have to kill them. 'Very well,' the first guard says. 'Go. But inform Cardan that his brother demands he brings both of you back this time.'", 'labels': [], 'entities': [{'text': 'Cardan', 'start_pos': 521, 'end_pos': 527, 'labels': [PER (0.9991)]}, {'text': 'Balekin', 

In [10]:
mistbornSentence = Sentence("In Hemalurgy, the type of metal used in a spike is important, as is the positioning of that spike on the body. For instance, steel spikes take physical Allomantic powers—the ability to burn pewter, tin, steel, or iron—and bestow them upon the person receiving the spike. Which of these four is granted, however, depends on where the spike is placed. Spikes made from other metals steal Feruchemical abilities. For example, all of the original Inquisitors were given a pewter spike, which—after first being pounded through the body of a Feruchemist—gave the Inquisitor the ability to store up healing power. (Though they couldn't do so as quickly as a real Feruchemist, as per the law of Hemalurgic decay.) This, obviously, is where the Inquisitors got their infamous ability to recover from wounds quickly, and was also why they needed to rest so much.")

tagger.predict(mistbornSentence)

print(mistbornSentence.to_tagged_string())

In Hemalurgy <S-LOC> , the type of metal used in a spike is important , as is the positioning of that spike on the body . For instance , steel spikes take physical Allomantic <S-MISC> powers — the ability to burn pewter , tin , steel , or iron — and bestow them upon the person receiving the spike . Which of these four is granted , however , depends on where the spike is placed . Spikes made from other metals steal Feruchemical <S-MISC> abilities . For example , all of the original Inquisitors were given a pewter spike , which — after first being pounded through the body of a Feruchemist <S-MISC> — gave the Inquisitor <S-ORG> the ability to store up healing power . ( Though they could n't do so as quickly as a real Feruchemist <S-MISC> , as per the law of Hemalurgic <S-MISC> decay . ) This , obviously , is where the Inquisitors got their infamous ability to recover from wounds quickly , and was also why they needed to rest so much .


In [11]:
# Not all correct - Hemalurgy is not a location, it is a type of skill. 
for entity in mistbornSentence.get_spans("ner"):
    print(entity)

Span [2]: "Hemalurgy"   [− Labels: LOC (0.8381)]
Span [33]: "Allomantic"   [− Labels: MISC (0.8186)]
Span [82]: "Feruchemical"   [− Labels: MISC (0.9868)]
Span [110]: "Feruchemist"   [− Labels: MISC (0.7525)]
Span [114]: "Inquisitor"   [− Labels: ORG (0.4601)]
Span [135]: "Feruchemist"   [− Labels: MISC (0.9532)]
Span [142]: "Hemalurgic"   [− Labels: MISC (0.9086)]


# Multi-Tagging
Sometimes you want to predict several types of annotation at once, like NER and POS tags. You can use a `MultiTagger` object: 

In [12]:
#from flair.models import MultiTagger

# load tagger for POS and NER
tagger = MultiTagger.load(['pos', 'ner'])
tagger

2021-08-01 21:41:12,024 loading file /home/statisticallyfit/.flair/models/en-pos-ontonotes-v0.5.pt


2021-08-01 21:41:15,589 loading file /home/statisticallyfit/.flair/models/en-ner-conll03-v0.4.pt


<flair.models.sequence_tagger_model.MultiTagger at 0x7f81548b8fd0>

In [13]:
# Predict sentence with both models
tagger.predict(mistbornSentence)
print(mistbornSentence)

Sentence: "In Hemalurgy , the type of metal used in a spike is important , as is the positioning of that spike on the body . For instance , steel spikes take physical Allomantic powers — the ability to burn pewter , tin , steel , or iron — and bestow them upon the person receiving the spike . Which of these four is granted , however , depends on where the spike is placed . Spikes made from other metals steal Feruchemical abilities . For example , all of the original Inquisitors were given a pewter spike , which — after first being pounded through the body of a Feruchemist — gave the Inquisitor the ability to store up healing power . ( Though they could n't do so as quickly as a real Feruchemist , as per the law of Hemalurgic decay . ) This , obviously , is where the Inquisitors got their infamous ability to recover from wounds quickly , and was also why they needed to rest so much ."   [− Tokens: 174  − Token-Labels: "In <IN> Hemalurgy <S-LOC/NNP> , <,> the <DT> type <NN> of <IN> metal

## [List of Pre-Trained Sequence Tagger Models](https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_2_TAGGING.md#list-of-pre-trained-sequence-tagger-models)
Can choose which pre-trained model you load by passing appropriate string to the `load()` method of the `SequenceTagger` class. 

**Example: Chunking**

[Meaning of the chunk tags](https://huggingface.co/flair/chunk-english-fast?text=The+happy+man+has+been+eating+at+the+diner)

In [14]:
chunkTagger: SequenceTagger = SequenceTagger.load("chunk-fast")
chunkTagger

2021-08-01 21:41:44,020 loading file /home/statisticallyfit/.flair/models/en-chunk-conll2000-fast-v0.4.pt


SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.25, inplace=False)
        (encoder): Embedding(275, 100)
        (rnn): LSTM(100, 1024)
        (decoder): Linear(in_features=1024, out_features=275, bias=True)
      )
    )
    (list_embedding_1): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.25, inplace=False)
        (encoder): Embedding(275, 100)
        (rnn): LSTM(100, 1024)
        (decoder): Linear(in_features=1024, out_features=275, bias=True)
      )
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=2048, out_features=2048, bias=True)
  (rnn): LSTM(2048, 256, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=512, out_features=45, bias=True)
)

In [15]:
chunkTagger.predict(folkAirSentence)
print(folkAirSentence)



Sentence: "When she does n't answer , I fugre there 's no more point in conversation . I steer her toward the kitchens . We 'll have to pass by guards ; there 's no other way out . She has pasted on a horrible rictus of a smile , but at least she has enough self-possession for that . More worrying is the way she ca n't stop staring at things . As we walkt toward the guards , the intensity of her gaze is impossible to disguise . I improvise , trying to sound as though I am reciting a memorized message , without inflection in the words . ' Prince Cardan says we are to attend him . ' One of the guards turns to the other . ' Balekin wo n't like that . ' I try not to react , but it 's hard . I just stand there and wait . If they lunge at us , I am going to have to kill them . ' Very well ,' the first guard says . ' Go . But inform Cardan that his brother demands he brings both of you back this time . '"   [− Tokens: 200  − Token-Labels: "When <S-ADVP> she <S-NP> does <B-VP> n't <I-VP> answe

## Experimental: Semantic Frame Detection

For English, Flair provides a pre-trained model that detects semantic frames in text using Propbank 3.0 frames. 

Provides a word sense disambiguation for frame evoking words. 

### Example 1: George and Hat

In [16]:
# Load model
semanticFrameTagger = SequenceTagger.load('frame')

2021-08-01 21:41:54,480 loading file /home/statisticallyfit/.flair/models/en-frame-ontonotes-v0.4.pt


In [17]:
semanticFrameTagger

SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): BytePairEmbeddings(model=bpe-en-100000-50)
    (list_embedding_1): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
        (decoder): Linear(in_features=2048, out_features=300, bias=True)
      )
    )
    (list_embedding_2): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
        (decoder): Linear(in_features=2048, out_features=300, bias=True)
      )
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=4196, out_features=4196, bias=True)
  (rnn): LSTM(4196, 256, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=512, out_features=5196, bias=True)
)

In [18]:
# Make English sentence
s1 = Sentence("George returned to Berlin to return his hat.")
s2 = Sentence("He had a look at different hats.")


# Predict NER tags  
semanticFrameTagger.predict(s1)
semanticFrameTagger.predict(s2)

# Print sentence with predicted tags
print(s1.to_tagged_string())
print(s2.to_tagged_string())

George returned <return.01> to Berlin to return <return.02> his hat .
He had <have.03> a look <look.01> at different hats .


### Example 2: Drive

In [19]:
# Load the model
semanticFrameTagger = SequenceTagger.load("frame")
semanticFrameTagger

2021-08-01 21:42:00,148 loading file /home/statisticallyfit/.flair/models/en-frame-ontonotes-v0.4.pt


SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): BytePairEmbeddings(model=bpe-en-100000-50)
    (list_embedding_1): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
        (decoder): Linear(in_features=2048, out_features=300, bias=True)
      )
    )
    (list_embedding_2): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
        (decoder): Linear(in_features=2048, out_features=300, bias=True)
      )
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=4196, out_features=4196, bias=True)
  (rnn): LSTM(4196, 256, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=512, out_features=5196, bias=True)
)

In [20]:
# Make English sentence
#kiwiSentence = Sentence("The girl sliced open the furry brown kiwi to reveal a juicy green interior, while the kiwi sang merrily on the branch in the tropical forest where kiwi hung from branches.")


# checkSentence = Sentence("Sally left the porcelain on the table before she left to visit her grandmother.")

#sentence = Sentence("While Marie drove the cart, she admired the driving ambition of the steady, plodding horses pulling her through the country as the pull of ocean waves tugged her eyes to the store.")

sentence = Sentence("Marie drove the cart through the countryside, admiring the drive of the plodding horses pulling it. She drove a nail through the tarp to protect the lumber from rain. The farmer drove away gophers from his crop. The mayor drove people into poverty with the new tax rules. The storms and tides drove the boats toward shore. ")

#checkSentence = Sentence("The girl checked her arrow before letting it fly, and after watching it whip smoothly into the target, she checked her watch for the time of day, and then remembering an urgent appointment, she hurriedly checked to make sure her equipment was packed away before cashing a check in the bank and leaving to see her friend. ")

# Predict NER tags for semantic frames
semanticFrameTagger.predict(sentence)

# Print sentence with predicted tags
print(sentence.to_tagged_string())

Marie drove <drive.01> the cart through the countryside , admiring <admire.01> the drive <drive.01> of the plodding horses pulling <pull.01> it . She drove <drive.01> a nail through the tarp to protect <protect.01> the lumber from rain . The farmer drove <drive.01> away gophers from his crop . The mayor drove <drive.02> people into poverty with the new tax rules <rule.01> . The storms and tides drove <drive.01> the boats toward shore .


### Example 3: Firing

In [21]:
sentence = Sentence("The general fired four gunshot rounds, while the second general fired the lieutenants.Curiosity sparked my imagination. The flame sparked the bonfire that ravaged the forest.")

semanticFrameTagger.predict(sentence)

print(sentence.to_tagged_string())

The general fired <fire.01> four gunshot rounds , while the second general fired <fire.01> the lieutenants.Curiosity sparked <spark.01> my imagination <imagine.01> . The flame sparked <spark.01> the bonfire that ravaged <destroy.01> the forest .


### Example 4: Absorb
sentence = Sentence("The villagers were absorbed in their own affairs so did not notice how the fortifications were absorbing the floodwaters.")

semanticFrameTagger.predict(sentence)

print(sentence.to_tagged_string())

## Tagging a List of Sentences
Often, you want to tag an entire text corpus. Then you need to split the corpus into sentences and pass a list of `Sentence` objects to the `.predict()` method. 

For instance, can use the sentence splitter of segtok: 

In [22]:
#from flair.models import SequenceTagger
#from flair.tokenization import SegtokSentenceSplitter

# Example 5: Text ("Fell") with many sentences

In [23]:
text: str = "The rock fell through the air. The responsibility fell on his shoulders to protect the herd from the thunderstorm. Multiple animals fell into order to evade lightning strikes."

# initialize sentence splitter
splitter = SegtokSentenceSplitter()

# Use splitter to split text into multiple (list) of sentences
sentences = splitter.split(text)

# Predict tags for sentences
tagger = SequenceTagger.load("frame")
tagger.predict(sentences)

# Iterate through sentences and print predicted labels
for sentence in sentences: 
    print(sentence.to_tagged_string())

# TODO HELP: why doesn't this Frame model differentiate the different senses of the word "fell"?
# 1) "fell" as in object falling through the air
# 2) "fell" as in an intangible weight being laid on someone.
# 3) "fell" as in organize themselves in line

2021-08-01 21:42:11,569 loading file /home/statisticallyfit/.flair/models/en-frame-ontonotes-v0.4.pt


The rock fell <fall.01> through the air .
The responsibility fell <fall.01> on his shoulders to protect <protect.01> the herd from the thunderstorm .
Multiple animals fell <fall.01> into order to evade <deter.01> lightning strikes <strike.01> .


## Tagging with Pre-Trained Text Classification Models
Using pre-trained mdoel for detecting positive or negative comments. This model is trained over a mix of product and movie review datasets and can recognize positive and negative sentiment in english text. 

In [24]:
from flair.models import TextClassifier

# load tagger
classifier = TextClassifier.load("sentiment")
classifier

2021-08-01 21:42:19,183 loading file /home/statisticallyfit/.flair/models/sentiment-en-mix-distillbert.pt


TextClassifier(
  (document_embeddings): TransformerDocumentEmbeddings(
    (model): DistilBertModel(
      (embeddings): Embeddings(
        (word_embeddings): Embedding(30522, 768, padding_idx=0)
        (position_embeddings): Embedding(512, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (transformer): Transformer(
        (layer): ModuleList(
          (0): TransformerBlock(
            (attention): MultiHeadSelfAttention(
              (dropout): Dropout(p=0.1, inplace=False)
              (q_lin): Linear(in_features=768, out_features=768, bias=True)
              (k_lin): Linear(in_features=768, out_features=768, bias=True)
              (v_lin): Linear(in_features=768, out_features=768, bias=True)
              (out_lin): Linear(in_features=768, out_features=768, bias=True)
            )
            (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (ffn

All required is to use `predict()` method of the classifier on a sentence. This adds the predicted label to the sentence. 

In [25]:
# Predict for example sentence
sentence = Sentence("enormously entertaining for moviegoers of any age")
classifier.predict(sentence)

# check prediction
print(sentence)

Sentence: "enormously entertaining for moviegoers of any age"   [− Tokens: 7  − Sentence-Labels: {'label': [POSITIVE (0.9979)]}]


In [26]:
sentence = Sentence("A real critical thinker of this day and age; offers deeper insights than any other film about the sad, horrifying state of humanity today")

classifier.predict(sentence)

print(sentence)

Sentence: "A real critical thinker of this day and age ; offers deeper insights than any other film about the sad , horrifying state of humanity today"   [− Tokens: 26  − Sentence-Labels: {'label': [POSITIVE (0.9999)]}]


## Communicative Functions Text Classification

In [27]:
# TODO doesn't work now
#functionsClassifier = TextClassifier.load('communicative-functions')
#functionsClassifier