# Testing Other Emoji Embeddings

How to encode a sequence of tokens. 
- Alternative: Use GloVe (or other) vectors and average them.
- Alternative: Use Flair document embeddings https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_5_DOCUMENT_EMBEDDINGS.md
resources/docs/TUTORIAL_5_DOCUMENT_EMBEDDINGS.md
##### Tutorial 5: Document Embeddings

>Document embeddings are different from [word embeddings](/resources/docs/TUTORIAL_3_WORD_EMBEDDING.md) in that they give you one embedding for an entire text, whereas word embeddings give you embeddings for individual words.
Show more
<https://github.com/flairNLP/flair|flairNLP/flair>flairNLP/flair | Added by GitHub

- https://colab.research.google.com/drive/1uNYVIFBTphidKWBZdF_6M2nMDAWuhatD?usp=sharing
- Alternative: Use Spacy document embeddings https://spacy.io/api/doc#vector (edited) 

https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_5_DOCUMENT_EMBEDDINGS.md

<a id='table'></a>
## Table of Contents
<ul>
<li><a href="#loading_emojis">1. Loading Emojipedia DataFrame and some preprocessing</a></li>
<li><a href="#glove_embedding">2. Glove Embedding</a></li>
<li><a href="#zalando_flair">3. Zalando Flair</a></li>
</ul>

In [2]:
import pandas as pd
import numpy as np
import pickle
from scipy import spatial
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

import spacy 
nlp = spacy.load('en_core_web_lg')

### Text used for test 

In [2]:
aladdin = open("aladdin_page.txt").read()
file = nlp(aladdin)

In [3]:
for num, sentence in enumerate(file.sents):
    print(f'{num}: {sentence}')

0: THE STORY OF ALADDIN AND HIS MAGICAL LAMP

1: There once lived, in one of the large and rich cities of China, a tailor, named Mustapha.
2: He was very poor.
3: He could hardly, by his daily labor, maintain himself and his family, which consisted only of his wife and a son.


4: His son, who was called Aladdin, was a very careless and idle fellow.
5: He was disobedient to his father and mother, and would go out early in the morning and stay out all day, playing in the streets and public places with idle children of his own age.


6: When he was old enough to learn a trade, his father took him into his own shop, and taught him how to use his needle; but all his father’s endeavors to keep him to his work were vain, for no sooner was his back turned than he was gone for that day.
7: Mustapha chastised him; but Aladdin was incorrigible, and his father, to his great grief, was forced to abandon him to his idleness, and was so much troubled about him that he fell sick and died in a few mon

<a id='loading_emojis'></a>
## 1. Loading Emojipedia DataFrame
<a href="#table">Back to the top </a>

In [4]:
csv_loc = "emojis_cleaned.csv"
emojis = pd.read_csv(csv_loc)

In [6]:
emojis.shape

(1752, 8)

> We'll put the descriptions in lower cases, remove the stopwords and punctuations

In [9]:
import nltk
from nltk.corpus import stopwords

In [10]:
def some_preprocessing(description):
    # Place to lower case
    prep_descr = description.lower()
    # Tokenize and remove non alphanumeric tokens
    tokens = nltk.word_tokenize(prep_descr)
    token_words = [w for w in tokens if w.isalpha()]
    # Removing stopwords
    stops = set(stopwords.words("english")) 
    without_stopwords = [w for w in token_words if not w in stops]
    return without_stopwords

### b. Saving and Loading the lookup dictionnary with Pickle

In [16]:
filename = 'glove_lookup'

In [18]:
# Loading the Glove Embedding Lookup Table
infile = open(filename,'rb')
glove_lookup = pickle.load(infile)
infile.close()

### c. Average Glove vector for the descriptions

> **Basic Idea:** Getting the Glove Vector representation of each word present in the cleaned description and aggregating those by averaging to 1 Vector. 

In [19]:
## Input: list of preprocessed description (no punctuations, no stopwords, ...)
## Output: Vector of with same dimension as the Glove Embedding used (here 100 words)

## Comment : 
#            Glove Lookup = dictionnary with as keys the words of the corpus and the values the vector 
##           of that word

def avg_glove_vector(descr_list):
    # counting number of vectors found in the lookup table
    n_vectors = 0
    # Getting back the size of the Glove Vectors
    glove_dim = len(glove_lookup['a'])
    # Average Vector
    avg_vector = np.zeros(glove_dim)
    
    # Going over each word in the input_list
    for word in descr_list:
        if word in glove_lookup.keys():
            n_vectors += 1
            avg_vector += glove_lookup[word]
        else:
            continue
    
    if n_vectors == 0:
        return ""
    else:
        return avg_vector / n_vectors

#### Check if some some empty vectors

### d. Vector Visualization with  t-SNE 

#### 2D

> After initializing the t-SNE class, we need to get a list of every emoji, and the corresponding vector to that emoji.

In [27]:
emoji_symbols =  list(emojis.emoji_symbol)
emoji_names =  list(emojis.emoji_name)
vectors = list(emojis.avg_glove_embedding)

In [28]:
def building_tsne_df(emoji_symbol_list, emoji_name_list, emb_emoji_vector_list):
    tsne = TSNE(n_components=2, random_state=0)
    Y = tsne.fit_transform(emb_emoji_vector_list)
    tsne_2d_df = pd.DataFrame({'emoji_names': np.asarray(emoji_name_list), \
                               'emoji_symbols': np.asarray(emoji_symbol_list),\
                              'X': Y[:, 0], 'Y': Y[:, 1]})
    
    return tsne_2d_df

In [29]:
import plotly.express as px

def tsne_plot(tsne_2d_df, graph_title):
    fig = px.scatter(tsne_2d_df, x='X', y='Y', text='emoji_names')
    fig.update_traces(textposition='top center')
    fig.update_layout(
        height=1200,
        width=1000,
        title_text=graph_title)
    fig.show()

#### 3D

In [32]:
tsne_3D = TSNE(n_components=3, random_state=0)

In [33]:
Y_3D = tsne_3D.fit_transform(vectors)

In [34]:
tsne_3d_df = pd.DataFrame({'emoji_names': np.asarray(emoji_names), 'emoji_symbols': np.asarray(emoji_symbols),\
                          'X': Y_3D[:, 0], 'Y': Y_3D[:, 1], 'Z':Y_3D[:,2]})

In [35]:
fig = px.scatter_3d(tsne_3d_df, x='X', y='Y', z='Z',
              )
fig.show()

### e. Finding Similar Vectors

> source: https://medium.com/analytics-vidhya/basics-of-using-pre-trained-glove-vectors-in-python-d38905f356db

In [8]:
with open('avg_glove_embedding.pkl', 'rb') as f:
    emoji_symb2emb_dic = pickle.load(f)

In [9]:
emoji_symb2emb_dic

{'😀': array([-0.05930327,  0.35218397,  0.43395171, -0.12258554,  0.06305937,
         0.37891134, -0.2883654 ,  0.09169994, -0.05414372, -0.32154709,
        -0.26650945,  0.02116586,  0.39788769,  0.07167966,  0.31061497,
         0.210968  , -0.17650148,  0.20071566,  0.11206828, -0.10764008,
         0.0686942 ,  0.19454968, -0.10344411, -0.18674517,  0.50585883,
         0.4840676 , -0.13399481, -0.277326  ,  0.10005874, -0.24886603,
         0.21497052,  0.16937448, -0.00785789, -0.10838272,  0.34476989,
         0.05951737, -0.18924697,  0.02182089,  0.20252986, -0.03828259,
         0.02880191, -0.43547434,  0.08455885, -0.14119326, -0.15976694,
         0.11818975,  0.16721946,  0.23364474, -0.01106368, -0.377144  ,
         0.0034518 , -0.19946888,  0.21372275,  0.83450255, -0.056626  ,
        -1.51512717,  0.12700671, -0.01079069,  0.77324663,  0.11053886,
         0.18606523,  0.85109485, -0.29809348,  0.10459246,  0.37532843,
         0.08264265,  0.45009277,  0.10899173,

In [37]:
## Input: Sentence for which, we are looking for the closest Emoji (according to it's description average in 
## Glove's Vector space)
## Output: List of closest emojis in descending order

def find_closest_emoji_emb(sentence):
    # Preprocess the sentence: removing punctuations, stopswords, ...
    preprocessed_list = some_preprocessing(sentence)
    # Take the Avg of the GloVe Embedded Vectors
    embedded_sentence = avg_glove_vector(preprocessed_list)
    closest_emojis = sorted(emoji_symb2emb_dic.keys(), key= lambda emoji_symbol:\
                            spatial.distance.euclidean(emoji_symb2emb_dic[emoji_symbol], embedded_sentence))
    return closest_emojis

In [38]:
def translate_text(spacy_nlp_file):
    for num, sentence in enumerate(spacy_nlp_file.sents):
        print(f'{num}: {sentence}')
        closest_emojis = find_closest_emoji_emb(str(sentence))
        print(closest_emojis[:3])
        print("")

In [39]:
translate_text(file)

0: THE STORY OF ALADDIN AND HIS MAGICAL LAMP

['💎', '🔦', '🕯️']

1: There once lived, in one of the large and rich cities of China, a tailor, named Mustapha.
['🥠', '☪️', '🎍']

2: He was very poor.
['🚸', '🏥', '🚋']

3: He could hardly, by his daily labor, maintain himself and his family, which consisted only of his wife and a son.


['👪', '🙋', '🏠']

4: His son, who was called Aladdin, was a very careless and idle fellow.
['🙉', '🙈', '👹']

5: He was disobedient to his father and mother, and would go out early in the morning and stay out all day, playing in the streets and public places with idle children of his own age.


['💑', '💏', '🚶']

6: When he was old enough to learn a trade, his father took him into his own shop, and taught him how to use his needle; but all his father’s endeavors to keep him to his work were vain, for no sooner was his back turned than he was gone for that day.
['👻', '🤗', '🚶']

7: Mustapha chastised him; but Aladdin was incorrigible, and his father, to his great gri

In [40]:
def translate_by_keywords(spacy_nlp_file):
    for num, sentence in enumerate(spacy_nlp_file.sents):
        print("\033[1m" + f'{num}: {sentence}' + "\033[0m")
        for token in sentence:
            token_pos = token.pos_
            if token_pos == 'PROPN' or token_pos == 'NOUN':
                closest_emojis = find_closest_emoji_emb(str(token))
                print(token, " --- EMOJI --->  ", closest_emojis[0])
        print("")

In [41]:
translate_by_keywords(file)

[1m0: THE STORY OF ALADDIN AND HIS MAGICAL LAMP
[0m
STORY  --- EMOJI --->   📘
ALADDIN  --- EMOJI --->   🦋
LAMP  --- EMOJI --->   🛋️

[1m1: There once lived, in one of the large and rich cities of China, a tailor, named Mustapha.[0m
cities  --- EMOJI --->   🚌
China  --- EMOJI --->   🇹🇼
tailor  --- EMOJI --->   💁
Mustapha  --- EMOJI --->   🍈

[1m2: He was very poor.[0m

[1m3: He could hardly, by his daily labor, maintain himself and his family, which consisted only of his wife and a son.

[0m
labor  --- EMOJI --->   👨‍⚕️
family  --- EMOJI --->   👨‍👩‍👧
wife  --- EMOJI --->   👨‍👩‍👧‍👦
son  --- EMOJI --->   👨‍👩‍👧‍👦

[1m4: His son, who was called Aladdin, was a very careless and idle fellow.[0m
son  --- EMOJI --->   👨‍👩‍👧‍👦
Aladdin  --- EMOJI --->   🦋
fellow  --- EMOJI --->   🎓

[1m5: He was disobedient to his father and mother, and would go out early in the morning and stay out all day, playing in the streets and public places with idle children of his own age.

[0m
father  --- E

provisions  --- EMOJI --->   🩹
utensils  --- EMOJI --->   🍢
neighbors  --- EMOJI --->   🖕

[1m28: She spent the whole day in preparing the supper; and at night, when it was ready, said to her son, “Perhaps the stranger knows not how to find our house; go and bring him, if you meet with him.”
[0m
day  --- EMOJI --->   🙌
supper  --- EMOJI --->   🥐
night  --- EMOJI --->   🌃
son  --- EMOJI --->   👨‍👩‍👧‍👦
stranger  --- EMOJI --->   👺
house  --- EMOJI --->   🏠



<a id='zalando_flair'></a>
## 3. Testing out spacymoji
<a href="#table">Back to the top </a>

## 1. Here we test for baseline comparision the preexisting embedding from Spacymoji

In [42]:
emojis

Unnamed: 0.1,Unnamed: 0,emoji_symbol,emoji_name,emoji_code,emoji_description,preprocessed_description,avg_glove_embedding
0,0,😀,Grinning Face,U+1F600,"A yellow face with simple, open eyes and a bro...","[yellow, face, simple, open, eyes, broad, open...","[-0.05930326535765614, 0.35218397080898284, 0...."
1,1,😃,Grinning Face with Big Eyes,U+1F603,"A yellow face with smiling eyes and a broad, o...","[yellow, face, smiling, eyes, broad, open, smi...","[-0.04583876235410571, 0.34958994686603545, 0...."
2,2,😄,Grinning Face with Smiling Eyes,U+1F604,"A yellow face with smiling eyes and a broad, o...","[yellow, face, smiling, eyes, broad, open, smi...","[-0.07637538687036984, 0.3653278946876526, 0.4..."
3,3,😁,Beaming Face with Smiling Eyes,U+1F601,A yellow face with smiling eyes and full-tooth...,"[yellow, face, smiling, eyes, grin, sayingchee...","[-0.058055011515534716, 0.21979669058782747, 0..."
4,4,😆,Grinning Squinting Face,U+1F606,"A yellow face with a broad, open smile and scr...","[yellow, face, broad, open, smile, scrunched, ...","[-0.1096445924735495, 0.4070707155125482, 0.27..."
...,...,...,...,...,...,...,...
1759,1759,🇿🇼,🇼 Flag: Zimbabwe,"U+1F1FF, U+1F1FC","The flag forZimbabwe, which may show as the le...","[flag, forzimbabwe, may, show, letterszwon, fl...","[-0.14868279459575812, 0.3756362621982892, 0.0..."
1760,1760,🏴󠁧󠁢󠁥󠁮󠁧󠁿,󠁧󠁢󠁥󠁮󠁧󠁿 Flag: England,"U+1F3F4, U+E0067, U+E0062, U+E0065, U+E006E, U...","The flag for England, a country in theUnited K...","[flag, england, country, theunited, kingdom, m...","[-0.1737559937464539, 0.4466676575539168, 0.08..."
1761,1761,🏴󠁧󠁢󠁳󠁣󠁴󠁿,󠁧󠁢󠁳󠁣󠁴󠁿 Flag: Scotland,"U+1F3F4, U+E0067, U+E0062, U+E0073, U+E0063, U...","The flag for Scotland, a country in theUnited ...","[flag, scotland, country, theunited, flag, sco...","[-0.17220392764235537, 0.4779988704559704, 0.0..."
1762,1762,🏴󠁧󠁢󠁷󠁬󠁳󠁿,󠁧󠁢󠁷󠁬󠁳󠁿 Flag: Wales,"U+1F3F4, U+E0067, U+E0062, U+E0077, U+E006C, U...","The flag for Wales, a country in theUnited Kin...","[flag, wales, country, theunited, kingdom, may...","[-0.19631805751123466, 0.4601589393278118, 0.0..."


In [43]:
#things to do -- compare with spacy embeddings
import spacy
from spacymoji import Emoji

nlp_new = spacy.load('en')
emoji = Emoji(nlp_new)
nlp_new.add_pipe(emoji, first=True)

emojis


Unnamed: 0.1,Unnamed: 0,emoji_symbol,emoji_name,emoji_code,emoji_description,preprocessed_description,avg_glove_embedding
0,0,😀,Grinning Face,U+1F600,"A yellow face with simple, open eyes and a bro...","[yellow, face, simple, open, eyes, broad, open...","[-0.05930326535765614, 0.35218397080898284, 0...."
1,1,😃,Grinning Face with Big Eyes,U+1F603,"A yellow face with smiling eyes and a broad, o...","[yellow, face, smiling, eyes, broad, open, smi...","[-0.04583876235410571, 0.34958994686603545, 0...."
2,2,😄,Grinning Face with Smiling Eyes,U+1F604,"A yellow face with smiling eyes and a broad, o...","[yellow, face, smiling, eyes, broad, open, smi...","[-0.07637538687036984, 0.3653278946876526, 0.4..."
3,3,😁,Beaming Face with Smiling Eyes,U+1F601,A yellow face with smiling eyes and full-tooth...,"[yellow, face, smiling, eyes, grin, sayingchee...","[-0.058055011515534716, 0.21979669058782747, 0..."
4,4,😆,Grinning Squinting Face,U+1F606,"A yellow face with a broad, open smile and scr...","[yellow, face, broad, open, smile, scrunched, ...","[-0.1096445924735495, 0.4070707155125482, 0.27..."
...,...,...,...,...,...,...,...
1759,1759,🇿🇼,🇼 Flag: Zimbabwe,"U+1F1FF, U+1F1FC","The flag forZimbabwe, which may show as the le...","[flag, forzimbabwe, may, show, letterszwon, fl...","[-0.14868279459575812, 0.3756362621982892, 0.0..."
1760,1760,🏴󠁧󠁢󠁥󠁮󠁧󠁿,󠁧󠁢󠁥󠁮󠁧󠁿 Flag: England,"U+1F3F4, U+E0067, U+E0062, U+E0065, U+E006E, U...","The flag for England, a country in theUnited K...","[flag, england, country, theunited, kingdom, m...","[-0.1737559937464539, 0.4466676575539168, 0.08..."
1761,1761,🏴󠁧󠁢󠁳󠁣󠁴󠁿,󠁧󠁢󠁳󠁣󠁴󠁿 Flag: Scotland,"U+1F3F4, U+E0067, U+E0062, U+E0073, U+E0063, U...","The flag for Scotland, a country in theUnited ...","[flag, scotland, country, theunited, flag, sco...","[-0.17220392764235537, 0.4779988704559704, 0.0..."
1762,1762,🏴󠁧󠁢󠁷󠁬󠁳󠁿,󠁧󠁢󠁷󠁬󠁳󠁿 Flag: Wales,"U+1F3F4, U+E0067, U+E0062, U+E0077, U+E006C, U...","The flag for Wales, a country in theUnited Kin...","[flag, wales, country, theunited, kingdom, may...","[-0.19631805751123466, 0.4601589393278118, 0.0..."


In [44]:
# there is a problem with converting some to vectors but we will ignore this

n_errors = 0
for emoji in emojis['emoji_symbol']:
    try:
        doc = nlp(emoji)
    except ValueError as e:
        print(e)
        print(emoji)
        n_errors += 1

In [45]:
def extract_vector(emoji):
    try:
        vec = nlp_new(emoji).vector
    except:
        vec = np.zeros(96)
    return vec
        

In [46]:
emojis['spacy_embedding'] = emojis.apply(lambda x :extract_vector(x['emoji_symbol']), axis=1)

emojis

Unnamed: 0.1,Unnamed: 0,emoji_symbol,emoji_name,emoji_code,emoji_description,preprocessed_description,avg_glove_embedding,spacy_embedding
0,0,😀,Grinning Face,U+1F600,"A yellow face with simple, open eyes and a bro...","[yellow, face, simple, open, eyes, broad, open...","[-0.05930326535765614, 0.35218397080898284, 0....","[-1.6294127, 2.101685, 1.8814864, -2.8542278, ..."
1,1,😃,Grinning Face with Big Eyes,U+1F603,"A yellow face with smiling eyes and a broad, o...","[yellow, face, smiling, eyes, broad, open, smi...","[-0.04583876235410571, 0.34958994686603545, 0....","[2.2437549, -1.5684885, -0.40939862, -2.736048..."
2,2,😄,Grinning Face with Smiling Eyes,U+1F604,"A yellow face with smiling eyes and a broad, o...","[yellow, face, smiling, eyes, broad, open, smi...","[-0.07637538687036984, 0.3653278946876526, 0.4...","[1.8132592, -0.45015156, 0.15728128, -0.840531..."
3,3,😁,Beaming Face with Smiling Eyes,U+1F601,A yellow face with smiling eyes and full-tooth...,"[yellow, face, smiling, eyes, grin, sayingchee...","[-0.058055011515534716, 0.21979669058782747, 0...","[1.3395119, -1.1522889, -0.87755823, -1.756431..."
4,4,😆,Grinning Squinting Face,U+1F606,"A yellow face with a broad, open smile and scr...","[yellow, face, broad, open, smile, scrunched, ...","[-0.1096445924735495, 0.4070707155125482, 0.27...","[-0.30889916, -1.2340499, -2.1786036, -2.68890..."
...,...,...,...,...,...,...,...,...
1759,1759,🇿🇼,🇼 Flag: Zimbabwe,"U+1F1FF, U+1F1FC","The flag forZimbabwe, which may show as the le...","[flag, forzimbabwe, may, show, letterszwon, fl...","[-0.14868279459575812, 0.3756362621982892, 0.0...","[0.03218031, -0.84441435, -2.352934, -1.112183..."
1760,1760,🏴󠁧󠁢󠁥󠁮󠁧󠁿,󠁧󠁢󠁥󠁮󠁧󠁿 Flag: England,"U+1F3F4, U+E0067, U+E0062, U+E0065, U+E006E, U...","The flag for England, a country in theUnited K...","[flag, england, country, theunited, kingdom, m...","[-0.1737559937464539, 0.4466676575539168, 0.08...","[-0.4405098, -2.3296275, -1.0480746, -0.386221..."
1761,1761,🏴󠁧󠁢󠁳󠁣󠁴󠁿,󠁧󠁢󠁳󠁣󠁴󠁿 Flag: Scotland,"U+1F3F4, U+E0067, U+E0062, U+E0073, U+E0063, U...","The flag for Scotland, a country in theUnited ...","[flag, scotland, country, theunited, flag, sco...","[-0.17220392764235537, 0.4779988704559704, 0.0...","[1.5263562, -1.8564854, 0.50159216, -0.5016355..."
1762,1762,🏴󠁧󠁢󠁷󠁬󠁳󠁿,󠁧󠁢󠁷󠁬󠁳󠁿 Flag: Wales,"U+1F3F4, U+E0067, U+E0062, U+E0077, U+E006C, U...","The flag for Wales, a country in theUnited Kin...","[flag, wales, country, theunited, kingdom, may...","[-0.19631805751123466, 0.4601589393278118, 0.0...","[0.24251169, 0.65490943, 0.6117422, -0.2369406..."


In [50]:
# Building the emoji to averaged description GloVe Embedding Dictionnary
emoji_symb2emb_dic_spacy = {}
list_emoji_symb = list(emojis["emoji_symbol"])
list_spacy_avg_emb = list(emojis["spacy_embedding"])

for i in range(len(list_emoji_symb)):
    emoji_symb2emb_dic_spacy[list_emoji_symb[i]] = list_spacy_avg_emb[i]


def find_closest_emoji_spacy(sentence):
    embedded_sentence = nlp_new(sentence).vector
    closest_emojis = sorted(emoji_symb2emb_dic_spacy.keys(), key= lambda emoji_symbol:\
                            spatial.distance.euclidean(emoji_symb2emb_dic_spacy[emoji_symbol], embedded_sentence))
    return closest_emojis

def translate_text_spacy(spacy_nlp_file):
    for num, sentence in enumerate(spacy_nlp_file.sents):
        print(f'{num}: {sentence}')
        closest_emojis = find_closest_emoji_spacy(str(sentence))
        print(closest_emojis[:3])
        print("")
        
def translate_by_keywords_spacy(spacy_nlp_file):
    for num, sentence in enumerate(spacy_nlp_file.sents):
        print("\033[1m" + f'{num}: {sentence}' + "\033[0m")
        for token in sentence:
            token_pos = token.pos_
            if token_pos == 'PROPN' or token_pos == 'NOUN':
                closest_emojis = find_closest_emoji_spacy(str(token))
                print(token, " --- EMOJI --->  ", closest_emojis[0])
        print("")

In [None]:
# we could also use the google api - database of images and descriptions = we need a labelled dataset then we use the embeddign of the description and compare this to th
# to the embedding of the sentence - find the closest matching picture

In [48]:
translate_by_keywords_spacy(file)

# we can see that this makes no sense

[1m0: THE STORY OF ALADDIN AND HIS MAGICAL LAMP
[0m
STORY  --- EMOJI --->   📋
ALADDIN  --- EMOJI --->   🔨
LAMP  --- EMOJI --->   🇺🇿

[1m1: There once lived, in one of the large and rich cities of China, a tailor, named Mustapha.[0m
cities  --- EMOJI --->   🈵
China  --- EMOJI --->   🌵
tailor  --- EMOJI --->   🐪
Mustapha  --- EMOJI --->   ⤴️

[1m2: He was very poor.[0m

[1m3: He could hardly, by his daily labor, maintain himself and his family, which consisted only of his wife and a son.

[0m
labor  --- EMOJI --->   🇫🇰
family  --- EMOJI --->   🤽‍♂️
wife  --- EMOJI --->   🤘
son  --- EMOJI --->   🐜

[1m4: His son, who was called Aladdin, was a very careless and idle fellow.[0m
son  --- EMOJI --->   🐜
Aladdin  --- EMOJI --->   🥮
fellow  --- EMOJI --->   🇵🇱

[1m5: He was disobedient to his father and mother, and would go out early in the morning and stay out all day, playing in the streets and public places with idle children of his own age.

[0m
father  --- EMOJI --->   🐜
mother

supper  --- EMOJI --->   🤘
night  --- EMOJI --->   ⬇️
son  --- EMOJI --->   🐜
stranger  --- EMOJI --->   🦸‍♀️
house  --- EMOJI --->   🆙



In [55]:
# this looks very strange so we will plot them out
tsne_dataframe = building_tsne_df(list_emoji_symb, emojis['emoji_name'], list_spacy_avg_emb)
graph_title = 'Emojis with Spacymoji built in Embeddings'
tsne_plot(tsne_dataframe, graph_title)

## This shows there is no semantic meaning captured with the spacymoji embeddings

# 2. Zalando Flair

In [108]:
from flair.data import Sentence
from flair.embeddings import WordEmbeddings, DocumentRNNEmbeddings, TransformerDocumentEmbeddings

# see also SentenceTransformerEmbeddings
# this uses Glove word embeddings as before, then passes them through an RNN?

# this uses classical word embeddings 
glove_embedding = WordEmbeddings('glove')


# ELMoEmbeddings could also be used to provide contextualised word embeddings (do you then feed this into the transformer or just the Document RNN Embedding)? 
# maybe good to perform max pooling operation with ELMO embeddings just to say

embeddings = DocumentRNNEmbeddings([glove_embedding])

# So this embedding needs to be trained in order to make sense. - otherwise it is just initalize from the average pooling of the glove ? as before???

# glove_doc_embeddings = DocumentRNNEmbeddings([glove_embedding], rnn_type='LSTM')
# embedding dimensionality depends on the number of hidden states

# from flair.embeddings import StackedEmbeddings
# now create the StackedEmbedding object that combines all embeddings - then we do some kind of average??
# stacked_embeddings = StackedEmbeddings(
# embeddings=[flair_forward_embedding, flair_backward_embedding, bert_embedding])


sentence = Sentence('the sky is blue')
embeddings.embed(sentence)

print(sentence.get_embedding())

tensor([-0.2768, -0.1580,  0.0204, -0.2831, -0.2596,  0.0181, -0.3400, -0.2917,
        -0.2027,  0.4078, -0.0148,  0.0176, -0.3534,  0.0739, -0.0994, -0.2642,
        -0.0478,  0.2153,  0.1188, -0.0798, -0.4131,  0.0992,  0.0852,  0.0395,
         0.5449,  0.1140, -0.0790,  0.0741, -0.0565, -0.1915,  0.4346,  0.2743,
         0.0124, -0.6078,  0.1130,  0.1818, -0.0115, -0.4273, -0.2827, -0.2990,
         0.1069, -0.0837,  0.0865,  0.0857,  0.2111,  0.0796,  0.1587,  0.4549,
         0.0428, -0.0977, -0.1179, -0.1062, -0.1565,  0.0254,  0.1673,  0.1251,
         0.0674,  0.4939,  0.1676, -0.2928,  0.3219, -0.1951, -0.4568, -0.0834,
         0.1051,  0.3595, -0.1788, -0.1863, -0.2132,  0.5668, -0.2077, -0.0924,
         0.1979,  0.1993, -0.2867,  0.0878,  0.1099, -0.1849,  0.1994, -0.0886,
        -0.2534,  0.4657, -0.2908,  0.0530,  0.2298,  0.0305,  0.4274, -0.4689,
        -0.2034,  0.3097, -0.0310, -0.0839,  0.0319,  0.0898,  0.3435, -0.2323,
         0.0742,  0.0326,  0.0287, -0.21

In [111]:
# The transformer does not need to be trained, since you can get embeddings from pretrained transformer models
# the RNN will not actually return anything, important just at least the average pooling of the glove word embedding since you actually need to train it on something

embedding = TransformerDocumentEmbeddings('bert-base-uncased')
#embedding = TransformerDocumentEmbeddings('roberta-base')
embedding.embed(sentence)

[Sentence: "the sky is blue"   [− Tokens: 4]]

In [117]:
emojis.head()
emojis.iloc[:5, :]

Unnamed: 0.1,Unnamed: 0,emoji_symbol,emoji_name,emoji_code,emoji_description,preprocessed_description,avg_glove_embedding,spacy_embedding
0,0,😀,Grinning Face,U+1F600,"A yellow face with simple, open eyes and a bro...","[yellow, face, simple, open, eyes, broad, open...","[-0.05930326535765614, 0.35218397080898284, 0....","[-1.6294127, 2.101685, 1.8814864, -2.8542278, ..."
1,1,😃,Grinning Face with Big Eyes,U+1F603,"A yellow face with smiling eyes and a broad, o...","[yellow, face, smiling, eyes, broad, open, smi...","[-0.04583876235410571, 0.34958994686603545, 0....","[2.2437549, -1.5684885, -0.40939862, -2.736048..."
2,2,😄,Grinning Face with Smiling Eyes,U+1F604,"A yellow face with smiling eyes and a broad, o...","[yellow, face, smiling, eyes, broad, open, smi...","[-0.07637538687036984, 0.3653278946876526, 0.4...","[1.8132592, -0.45015156, 0.15728128, -0.840531..."
3,3,😁,Beaming Face with Smiling Eyes,U+1F601,A yellow face with smiling eyes and full-tooth...,"[yellow, face, smiling, eyes, grin, sayingchee...","[-0.058055011515534716, 0.21979669058782747, 0...","[1.3395119, -1.1522889, -0.87755823, -1.756431..."
4,4,😆,Grinning Squinting Face,U+1F606,"A yellow face with a broad, open smile and scr...","[yellow, face, broad, open, smile, scrunched, ...","[-0.1096445924735495, 0.4070707155125482, 0.27...","[-0.30889916, -1.2340499, -2.1786036, -2.68890..."


In [130]:
from tqdm import tqdm

In [None]:
# we embed the descriptions using the Flair RNN

def embed_descr(descr, embedding):
    sentence = Sentence(descr)
    embedding.embed(sentence)
    # the returned object should be a numpy array not a tensor
    return sentence.get_embedding().detach().numpy()

tqdm.pandas()
# this takes AGES !!!!!
emojis['flair_rnn_embedding'] = emojis.apply(lambda x : embed_descr(x['emoji_description'], DocumentRNNEmbeddings([glove_embedding])), axis=1)
emojis['flair_bert_embedding'] = emojis.progress_apply(lambda x : embed_descr(x['emoji_description'], TransformerDocumentEmbeddings('bert-base-uncased')), axis=1)

 98%|█████████▊| 1723/1752 [1:38:36<01:40,  3.45s/it]

In [126]:
#sentence.get_embedding().detach().numpy()
sentence.get_embedding()

tensor([-2.7684e-01, -1.5799e-01,  2.0387e-02, -2.8305e-01, -2.5961e-01,
         1.8089e-02, -3.3997e-01, -2.9169e-01, -2.0268e-01,  4.0779e-01,
        -1.4796e-02,  1.7552e-02, -3.5337e-01,  7.3870e-02, -9.9396e-02,
        -2.6422e-01, -4.7751e-02,  2.1530e-01,  1.1884e-01, -7.9754e-02,
        -4.1307e-01,  9.9167e-02,  8.5184e-02,  3.9458e-02,  5.4487e-01,
         1.1400e-01, -7.8976e-02,  7.4117e-02, -5.6517e-02, -1.9149e-01,
         4.3458e-01,  2.7428e-01,  1.2430e-02, -6.0776e-01,  1.1301e-01,
         1.8179e-01, -1.1523e-02, -4.2728e-01, -2.8273e-01, -2.9901e-01,
         1.0694e-01, -8.3696e-02,  8.6512e-02,  8.5697e-02,  2.1105e-01,
         7.9641e-02,  1.5871e-01,  4.5486e-01,  4.2769e-02, -9.7693e-02,
        -1.1795e-01, -1.0616e-01, -1.5653e-01,  2.5370e-02,  1.6735e-01,
         1.2512e-01,  6.7439e-02,  4.9388e-01,  1.6758e-01, -2.9278e-01,
         3.2190e-01, -1.9507e-01, -4.5684e-01, -8.3438e-02,  1.0511e-01,
         3.5952e-01, -1.7881e-01, -1.8627e-01, -2.1

In [123]:
emojis.loc[1, 'flair_rnn_embedding']

tensor([-2.1330e-01, -3.7281e-02,  1.2532e-02, -1.6843e-01, -2.6051e-01,
         1.9966e-02, -3.5443e-01,  6.5127e-02, -6.0910e-02,  3.7181e-01,
        -1.2933e-01,  1.5397e-01, -4.1938e-01,  2.8373e-01, -1.1225e-01,
        -1.9926e-01,  1.1170e-01,  4.9037e-01, -5.9120e-02, -2.5536e-01,
        -3.9496e-01, -7.0523e-02, -5.2945e-03,  2.2540e-01,  6.2947e-01,
         4.5019e-02, -8.0362e-02,  2.2945e-01, -5.4344e-02, -3.6315e-01,
         3.4525e-01,  3.0827e-01,  7.0979e-02, -5.1717e-01,  9.6649e-02,
        -2.5714e-02, -1.6190e-01, -2.7905e-01, -2.6472e-01, -3.3276e-01,
         2.3159e-02,  8.1838e-02,  1.3180e-01,  8.6585e-02,  4.1910e-01,
         2.7662e-01,  2.7126e-01,  5.4434e-01, -5.8407e-02, -1.3673e-01,
        -8.7098e-02,  1.3875e-01, -3.9323e-01,  1.1915e-01,  1.8402e-01,
         2.1936e-01, -5.9475e-02,  2.3506e-01,  2.2032e-01,  1.2590e-01,
         1.7522e-01, -2.4573e-01, -2.5330e-01, -1.4440e-01,  9.8038e-02,
         2.9824e-01, -8.3228e-02, -2.5060e-01, -3.7

## 2. Testing with transformers from hugging face

In [57]:
!pip install git+https://github.com/huggingface/transformers.git 

Collecting git+https://github.com/huggingface/transformers.git
  Cloning https://github.com/huggingface/transformers.git to /private/var/folders/2j/41vzwd817mggmlw9q7vlb67m0000gn/T/pip-req-build-dcw0kb6q
  Running command git clone -q https://github.com/huggingface/transformers.git /private/var/folders/2j/41vzwd817mggmlw9q7vlb67m0000gn/T/pip-req-build-dcw0kb6q
Collecting tokenizers==0.8.1.rc2
  Downloading tokenizers-0.8.1rc2-cp37-cp37m-macosx_10_14_x86_64.whl (2.1 MB)
[K     |████████████████████████████████| 2.1 MB 1.8 MB/s eta 0:00:01
Collecting filelock
  Using cached filelock-3.0.12-py3-none-any.whl (7.6 kB)
Collecting sentencepiece!=0.1.92
  Downloading sentencepiece-0.1.91-cp37-cp37m-macosx_10_6_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 20.4 MB/s eta 0:00:01
[?25hCollecting sacremoses
  Downloading sacremoses-0.0.43.tar.gz (883 kB)
[K     |████████████████████████████████| 883 kB 31.5 MB/s eta 0:00:01
Building wheels for collected packages: transfo

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=908.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898823.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1629486723.0, style=ProgressStyle(descr…




Some weights of the model checkpoint at facebook/bart-large-mnli were not used when initializing BartForSequenceClassification: ['model.encoder.version', 'model.decoder.version']
- This IS expected if you are initializing BartForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BartForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [59]:
from transformers import pipeline
classifier = pipeline("zero-shot-classification")

Some weights of the model checkpoint at facebook/bart-large-mnli were not used when initializing BartForSequenceClassification: ['model.encoder.version', 'model.decoder.version']
- This IS expected if you are initializing BartForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BartForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [82]:
# this took way too long?? did I do something wrong?? 
for i, sent in enumerate(file.sents):
    print(sent)
    result = classifier(str(sent), list_emoji_symb)
    print(result['labels'][np.argmax(result['scores'])])
    
    if i==2:
        break
            
# the results make no sense      

THE STORY OF ALADDIN AND HIS MAGICAL LAMP

😃
There once lived, in one of the large and rich cities of China, a tailor, named Mustapha.
⬜
He was very poor.
🏳️


In [100]:
from transformers import BertTokenizer, BertModel
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

inputs = tokenizer("We are testing Aladdin", return_tensors="pt")
outputs = model(**inputs)

embedding = outputs[0]

In [95]:
from transformers import GPT2Tokenizer, GPT2Model

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2Model.from_pretrained("gpt2")

inputs = tokenizer("We are testing Aladdin", return_tensors="pt")
outputs = model(**inputs)
embedding = outputs[0]


Some weights of GPT2Model were not initialized from the model checkpoint at gpt2 and are newly initialized: ['h.0.attn.masked_bias', 'h.1.attn.masked_bias', 'h.2.attn.masked_bias', 'h.3.attn.masked_bias', 'h.4.attn.masked_bias', 'h.5.attn.masked_bias', 'h.6.attn.masked_bias', 'h.7.attn.masked_bias', 'h.8.attn.masked_bias', 'h.9.attn.masked_bias', 'h.10.attn.masked_bias', 'h.11.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [101]:
def get_embedding():
    pass

def embedding(emoji_descr):
    inputs = tokenizer(emoji_descr)
    outputs = model(**inputs)
    embedding = outputs[0]
    return embedding

In [103]:
emojis

Unnamed: 0.1,Unnamed: 0,emoji_symbol,emoji_name,emoji_code,emoji_description,preprocessed_description,avg_glove_embedding,spacy_embedding
0,0,😀,Grinning Face,U+1F600,"A yellow face with simple, open eyes and a bro...","[yellow, face, simple, open, eyes, broad, open...","[-0.05930326535765614, 0.35218397080898284, 0....","[-1.6294127, 2.101685, 1.8814864, -2.8542278, ..."
1,1,😃,Grinning Face with Big Eyes,U+1F603,"A yellow face with smiling eyes and a broad, o...","[yellow, face, smiling, eyes, broad, open, smi...","[-0.04583876235410571, 0.34958994686603545, 0....","[2.2437549, -1.5684885, -0.40939862, -2.736048..."
2,2,😄,Grinning Face with Smiling Eyes,U+1F604,"A yellow face with smiling eyes and a broad, o...","[yellow, face, smiling, eyes, broad, open, smi...","[-0.07637538687036984, 0.3653278946876526, 0.4...","[1.8132592, -0.45015156, 0.15728128, -0.840531..."
3,3,😁,Beaming Face with Smiling Eyes,U+1F601,A yellow face with smiling eyes and full-tooth...,"[yellow, face, smiling, eyes, grin, sayingchee...","[-0.058055011515534716, 0.21979669058782747, 0...","[1.3395119, -1.1522889, -0.87755823, -1.756431..."
4,4,😆,Grinning Squinting Face,U+1F606,"A yellow face with a broad, open smile and scr...","[yellow, face, broad, open, smile, scrunched, ...","[-0.1096445924735495, 0.4070707155125482, 0.27...","[-0.30889916, -1.2340499, -2.1786036, -2.68890..."
...,...,...,...,...,...,...,...,...
1759,1759,🇿🇼,🇼 Flag: Zimbabwe,"U+1F1FF, U+1F1FC","The flag forZimbabwe, which may show as the le...","[flag, forzimbabwe, may, show, letterszwon, fl...","[-0.14868279459575812, 0.3756362621982892, 0.0...","[0.03218031, -0.84441435, -2.352934, -1.112183..."
1760,1760,🏴󠁧󠁢󠁥󠁮󠁧󠁿,󠁧󠁢󠁥󠁮󠁧󠁿 Flag: England,"U+1F3F4, U+E0067, U+E0062, U+E0065, U+E006E, U...","The flag for England, a country in theUnited K...","[flag, england, country, theunited, kingdom, m...","[-0.1737559937464539, 0.4466676575539168, 0.08...","[-0.4405098, -2.3296275, -1.0480746, -0.386221..."
1761,1761,🏴󠁧󠁢󠁳󠁣󠁴󠁿,󠁧󠁢󠁳󠁣󠁴󠁿 Flag: Scotland,"U+1F3F4, U+E0067, U+E0062, U+E0073, U+E0063, U...","The flag for Scotland, a country in theUnited ...","[flag, scotland, country, theunited, flag, sco...","[-0.17220392764235537, 0.4779988704559704, 0.0...","[1.5263562, -1.8564854, 0.50159216, -0.5016355..."
1762,1762,🏴󠁧󠁢󠁷󠁬󠁳󠁿,󠁧󠁢󠁷󠁬󠁳󠁿 Flag: Wales,"U+1F3F4, U+E0067, U+E0062, U+E0077, U+E006C, U...","The flag for Wales, a country in theUnited Kin...","[flag, wales, country, theunited, kingdom, may...","[-0.19631805751123466, 0.4601589393278118, 0.0...","[0.24251169, 0.65490943, 0.6117422, -0.2369406..."


In [109]:
bert_embedding = emojis.iloc[:10, :].apply(lambda x : embedding(x['emoji_description']), axis=1)

AttributeError: 'list' object has no attribute 'size'

In [None]:
import tensorflow as tf
import tensorflow_datasets
from transformers import *
'''
# Load dataset, tokenizer, model from pretrained model/vocabulary
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
model = TFBertForSequenceClassification.from_pretrained('bert-base-cased')
#data = tensorflow_datasets.load('glue/mrpc')

# Prepare dataset for GLUE as a tf.data.Dataset instance
train_dataset = glue_convert_examples_to_features(data['train'], tokenizer, max_length=128, task='mrpc')
valid_dataset = glue_convert_examples_to_features(data['validation'], tokenizer, max_length=128, task='mrpc')
train_dataset = train_dataset.shuffle(100).batch(32).repeat(2)
valid_dataset = valid_dataset.batch(64)

# Prepare training: Compile tf.keras model with optimizer, loss and learning rate schedule
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])

# Train and evaluate using tf.keras.Model.fit()
history = model.fit(train_dataset, epochs=2, steps_per_epoch=115,
                    validation_data=valid_dataset, validation_steps=7)

# Load the TensorFlow model in PyTorch for inspection
model.save_pretrained('./save/')
pytorch_model = BertForSequenceClassification.from_pretrained('./save/', from_tf=True)

# Quickly test a few predictions - MRPC is a paraphrasing task, let's see if our model learned the task
sentence_0 = "This research was consistent with his findings."
sentence_1 = "His findings were compatible with this research."
sentence_2 = "His findings were not compatible with this research."
inputs_1 = tokenizer(sentence_0, sentence_1, add_special_tokens=True, return_tensors='pt')
inputs_2 = tokenizer(sentence_0, sentence_2, add_special_tokens=True, return_tensors='pt')

pred_1 = pytorch_model(inputs_1['input_ids'], token_type_ids=inputs_1['token_type_ids'])[0].argmax().item()
pred_2 = pytorch_model(inputs_2['input_ids'], token_type_ids=inputs_2['token_type_ids'])[0].argmax().item()

print("sentence_1 is", "a paraphrase" if pred_1 else "not a paraphrase", "of sentence_0")
print("sentence_2 is", "a paraphrase" if pred_2 else "not a paraphrase", "of sentence_0")

'''

In [None]:
'''
The simplest type of document embedding does a pooling operation over all word embeddings in a sentence to obtain an embedding for the whole sentence. The default is mean pooling, meaning that the average of all word embeddings is used.

To instantiate, you need to pass a list of word embeddings to pool over:

from flair.embeddings import WordEmbeddings, DocumentPoolEmbeddings

# initialize the word embeddings
glove_embedding = WordEmbeddings('glove')

# initialize the document embeddings, mode = mean
document_embeddings = DocumentPoolEmbeddings([glove_embedding])
Now, create an example sentence and call the embedding's embed() method.

# create an example sentence
sentence = Sentence('The grass is green . And the sky is blue .')

# embed the sentence with our document embedding
document_embeddings.embed(sentence)

# now check out the embedded sentence.
print(sentence.embedding)
This prints out the embedding of the document. Since the document embedding is derived from word embeddings, its dimensionality depends on the dimensionality of word embeddings you are using. For more details on these embeddings, check here.

One advantage of DocumentPoolEmbeddings is that they do not need to be trained, you can immediately use them to embed your documents.

Document RNN Embeddings

These embeddings run an RNN over all words in sentence and use the final state of the RNN as embedding for the whole document. In order to use the DocumentRNNEmbeddings you need to initialize them by passing a list of token embeddings to it:

from flair.embeddings import WordEmbeddings, DocumentRNNEmbeddings

glove_embedding = WordEmbeddings('glove')

document_embeddings = DocumentRNNEmbeddings([glove_embedding])
By default, a GRU-type RNN is instantiated. Now, create an example sentence and call the embedding's embed() method.

# create an example sentence
sentence = Sentence('The grass is green . And the sky is blue .')

# embed the sentence with our document embedding
document_embeddings.embed(sentence)

# now check out the embedded sentence.
print(sentence.get_embedding())
'''
'''
Document RNN Embeddings

These embeddings run an RNN over all words in sentence and use the final state of the RNN as embedding for the whole document. In order to use the DocumentRNNEmbeddings you need to initialize them by passing a list of token embeddings to it:

from flair.embeddings import WordEmbeddings, DocumentRNNEmbeddings

glove_embedding = WordEmbeddings('glove')

document_embeddings = DocumentRNNEmbeddings([glove_embedding])
By default, a GRU-type RNN is instantiated. Now, create an example sentence and call the embedding's embed() method.

# create an example sentence
sentence = Sentence('The grass is green . And the sky is blue .')

# embed the sentence with our document embedding
document_embeddings.embed(sentence)

# now check out the embedded sentence.
print(sentence.get_embedding())
This will output a single embedding for the complete sentence. The embedding dimensionality depends on the number of hidden states you are using and whether the RNN is bidirectional or not. For more details on these embeddings, check here.

Note that when you initialize this embedding, the RNN weights are randomly initialized. So this embedding needs to be trained in order to make sense.

TransformerDocumentEmbeddings

You can get embeddings for a whole sentence directly from a pre-trained transformer. There is a single class for all transformer embeddings that you instantiate with different identifiers get different transformers. For instance, to load a standard BERT transformer model, do:

from flair.embeddings import TransformerDocumentEmbeddings

# init embedding
embedding = TransformerDocumentEmbeddings('bert-base-uncased')

# create a sentence
sentence = Sentence('The grass is green .')

# embed the sentence
embedding.embed(sentence)
If instead you want to use RoBERTa, do:

from flair.embeddings import TransformerDocumentEmbeddings

# init embedding
embedding = TransformerDocumentEmbeddings('roberta-base')

# create a sentence
sentence = Sentence('The grass is green .')

# embed the sentence
embedding.embed(sentence)
Here is a full list of all models (BERT, RoBERTa, XLM, XLNet etc.). You can use any of these models with this class.

SentenceTransformerDocumentEmbeddings

You can also get several embeddings from the sentence-transformer library. These models are pre-trained to give good general-purpose vector representations for sentences.

from flair.data import Sentence
from flair.embeddings import SentenceTransformerDocumentEmbeddings

# init embedding
embedding = SentenceTransformerDocumentEmbeddings('bert-base-nli-mean-tokens')

# create a sentence
sentence = Sentence('The grass is green .')

# embed the sentence
embedding.embed(sentence)
You can find a full list of their pretained models here.

Note: To use this embedding, you need to install sentence-transformers with pip install sentence-transformers.

'''