# Tutorial

This tutorial takes you through the Flair library. 

## NLP base types

The Sentence object is the central object to our library. It holds a Sentence, consisting of Tokens. To this object, various layers of linguistic annotation may be added. This is also the central object for embedding your text.

Let's illustrate this with an example sentence.

In [3]:
# The sentence objects holds a sentence that we may want to embed
from flair.data import Sentence

# Make a sentence object by passing a whitespace tokenized string
sentence = Sentence('The grass is green .')

# Print the object to see what's in there
print(sentence)

Sentence: "The grass is green ." - 5 Tokens


Each word in a sentence is a Token object. You can directly access a token using the token_id. Each token has attributes, such as an id and a text.

In [10]:
print(sentence[4])

Token: 4 green
green
4


You can also iterate over all tokens in a sentence.

In [11]:
for token in sentence:
    print(token) 

Token: 1 The
Token: 2 grass
Token: 3 is
Token: 4 green
Token: 5 .


Tokens can also have tags, such as a named entity tag. In this example, we're adding an NER tag of type 'color' to 
the word 'green' in the example sentence.


In [12]:
# add a tag to a word in the sentence
sentence[4].add_tag('ner', 'color')

# print the sentence with all tags of this type
print(sentence.to_ner_string())

The grass is green <color> .


## Tagging with Pre-Trained Models

Now, lets use a pre-trained model for named entity recognition (NER). 
This model was trained over the English CoNLL-03 task and can recognize 4 different entity
types.



You chose which pre-trained model you load by passing the appropriate 
string you pass to the `load()` method of the `SequenceTaggerLSTM` class. Currently, the following pre-trained models
are provided (more coming): 
 
'ner': English NER

## Embeddings

We provide a set of classes with which you can embed the words in sentences in various ways. Note that all embedding 
classes inherit from the `TextEmbeddings` class and implement the `embed()` method which you need to call 
to embed your text. This means that for most users of Flair, the complexity of different embeddings remains hidden 
behind this interface. Simply instantiate the embedding class you require and call `embed()` to embed your text.

All embeddings produced with our methods are pytorch vectors, so they can be immediately used for training and 
fine-tuning.

### Classic Word Embeddings

Classic word embeddings are static and word-level, meaning that each distinc word gets exactly one pre-computed 
embedding. Most embeddings fall under this class, including the popular GloVe or Komnios embeddings. 

Simply instantiate the WordEmbeddings class and pass a string identifier of the embedding you wish to load. So, if 
you want to use GloVe embeddings, pass the string 'glove' to the constructor: 



You chose which pre-trained model you load by passing the appropriate 
string you pass to the `load()` method of the `SequenceTaggerLSTM` class. Currently, the following pre-trained models
are provided (more coming): 
 
'ner': English NER

## Embeddings

We provide a set of classes with which you can embed the words in sentences in various ways. Note that all embedding 
classes inherit from the `TextEmbeddings` class and implement the `embed()` method which you need to call 
to embed your text. This means that for most users of Flair, the complexity of different embeddings remains hidden 
behind this interface. Simply instantiate the embedding class you require and call `embed()` to embed your text.

All embeddings produced with our methods are pytorch vectors, so they can be immediately used for training and 
fine-tuning.

### Classic Word Embeddings

Classic word embeddings are static and word-level, meaning that each distinc word gets exactly one pre-computed 
embedding. Most embeddings fall under this class, including the popular GloVe or Komnios embeddings. 

Simply instantiate the WordEmbeddings class and pass a string identifier of the embedding you wish to load. So, if 
you want to use GloVe embeddings, pass the string 'glove' to the constructor: 


This prints out the tokens and their embeddings. GloVe embeddings are pytorch vectors of dimensionality 100.

You choose which pre-trained embeddings you load by passing the appropriate 
string you pass to the constructor of the `WordEmbeddings` class. Currently, the following static embeddings
are provided (more coming): 

 'glove'     : GloVe embeddings 

 'extvec'    : Komnios embeddings 

 'ft-crawl'  : FastText embeddings 

 'ft-german' : German FastText embeddings 

So, if you want to load German FastText embeddings, instantiate the method as follows:


This prints out the tokens and their embeddings. GloVe embeddings are pytorch vectors of dimensionality 100.

You choose which pre-trained embeddings you load by passing the appropriate 
string you pass to the constructor of the `WordEmbeddings` class. Currently, the following static embeddings
are provided (more coming): 

 'glove'     : GloVe embeddings 

 'extvec'    : Komnios embeddings 

 'ft-crawl'  : FastText embeddings 

 'ft-german' : German FastText embeddings 

So, if you want to load German FastText embeddings, instantiate the method as follows:


In [None]:
german_embedding = WordEmbeddings('ft-german')
