Flair embeddings are a special type of contextual string embeddings that model words as a sequence of characters (FLAIR's secret sauce)
+ reason for flairs excellent sequence tagging performance
+ base motivation for the introduction of Flair NLP framework, see 'The Contextual String Embeddings for Sequence Labeling' paper
+ mainly two properties: contextuality and character-level sequence modeling

+ contextuality: a word embedding is not only defined by its syntactic-semantic meaning but also the by its context it appears in.
+ each pre-trained Flair model offers a <strong>forward</strong> version and a <strong>backward</strong> version
    + the <strong>forward</strong> version takes into account the context that happens before the word
    + the <strong>backward</strong> version takes into account the context that happens behind the word

In [None]:
#already installed on my local machine
#%pip install flair

In [2]:
from flair.data import Sentence
from flair.embeddings import FlairEmbeddings

embedding = FlairEmbeddings('news-forward')
s1 = Sentence("nice shirt")
s2 = Sentence("nice pants")

embedding.embed(s1)
embedding.embed(s2)

print(s1[0].embedding.tolist() == s2[0].embedding.tolist()) #will be true, since there are no contextual words in front of 'nice'

True


In [3]:
s1 = Sentence('very nice shirt')
s2 = Sentence('pretty nice pants')

embedding.embed(s1)
embedding.embed(s2)

print(s1[1].embedding.tolist() == s2[1].embedding.tolist()) # will be false, since there are different contextual words in front of 'nice'

False


Character-level sequence modeling in Flair embeddings
+ OOV = out of vocabulary
+ some word embedding techniques, such as GloVe, offer no support for OOV. A vector of zero's is returned then.
+ good performance on words, that were not trained or are misspelled

In [9]:
from sklearn.metrics.pairwise import cosine_similarity as sim
from flair.embeddings import FlairEmbeddings
from flair.data import Sentence

s1 = Sentence('eating potato')
s2 = Sentence('eating potatoo')

embedding = FlairEmbeddings('news-forward')
embedding.embed(s1)
embedding.embed(s2)
e1 = s1[1].embedding.tolist()
e2 = s2[1].embedding.tolist()

print(sim([e1],[e2]))


[[0.86473531]]


A cosine_similarity close to one one, here 0,86, indicate a strong connection between the two word embeddings

# Cosine similarity, reminder:

similarity(A,B) = $\frac{A \cdot B}{||A|| ||B||}$ = $\frac{\sum_{n=1}^{n} A_iB_i}{\sqrt(\sum_{n=1}^{n} A_i²)\sqrt(\sum_{n=1}^{n} B_i²)}$ 

resulting in a number between -1 and 1.
+ 1 = same smilarity
+ 0 = no corellation
+ -1 both vectors are the exact oppositve of each other

# Pooled Flair embeddings
+ same syntax as Flair embeddings
+ better performance, but more memory usage

# List of flair embeddings
https://github.com/flairNLP/flair/blob/master/resources/docs/embeddings/FLAIR_EMBEDDINGS.md