In [None]:
!pip install flair

## Contextual Word Vectors with BERT and Stacking Embeddings
We will later use BERT, a state-of-the-art transformer model that was trained on a very large corpus and can be fine-tuned for our own custom task. The Flair package can also be used to derive contextual word embeddings using BERT and its successors. We will use a different package for BERT, to provide you with sample code for using it, and for adaptation of the weights of the BERT model itself for our own task.

Below, you can try to use BERT, Roberta, XLNet or other models provided in Flair for contextual word embeddings.  

In [None]:
from flair.embeddings import BertEmbeddings
# init BERT
bert_embedding = BertEmbeddings('bert-base-uncased')

## Stack Embeddings in Flair
Flair also provides a simple way to stack vectors from different methods.

In [None]:
from flair.embeddings import FlairEmbeddings, BertEmbeddings

# init Flair embeddings
flair_forward_embedding = FlairEmbeddings('multi-forward')
flair_backward_embedding = FlairEmbeddings('multi-backward')

# init BERT
bert_embedding = BertEmbeddings('bert-base-uncased')

In [None]:
from flair.embeddings import WordEmbeddings, FlairEmbeddings, DocumentPoolEmbeddings, Sentence
from flair.embeddings import StackedEmbeddings

# now create the StackedEmbedding object that combines all embeddings
stacked_embeddings = StackedEmbeddings(
    embeddings=[flair_forward_embedding, flair_backward_embedding, bert_embedding])

In [None]:
sentence = Sentence('The grass is green .')

# just embed a sentence using the StackedEmbedding as you would with any single embedding.
stacked_embeddings.embed(sentence)

# now check out the embedded tokens.
for token in sentence:
    print(token)
    print(token.embedding)

**Try it yourself:** Train a classifier using stacked embeddings of different models. Do you see an increase in performance?