## Relation embeddings

This is a small **tutorial on how to use relation embeddings. In general, you can think of them as word embeddings, but instead of representing a word (e.g. *france*), they represent word pairs (e.g. *france\_\_belgium*). First, we need to load our pre-trained relation embeddings with gensim:

In [None]:
import gensim
from gensim.models import KeyedVectors

In [None]:
path_vectors="./relative_wikipedia_en_300d.bin" # Path to the pre-trained relation embeddings (e.g. "/home/relative_wikipedia_en_300d.bin")
model=KeyedVectors.load_word2vec_format(path_vectors,binary=True) # You can write binary=False if vectors are in .txt format
vocab=model.vocab

Then, we can try to output the nearest neighbours of a given word pair in the vector space.
*Note*: Words are separated by a double underscore.

In [None]:
top_words=20 # Number of top nearest neighbours to display
input_pair="belgium__france" # Input pair
ms=model.most_similar(positive=input_pair,topn=top_words)
for pair in ms:
    print (pair)

To check all the pairs available in the vocabulary given a word, you can use the following snippet:

In [None]:
import re
input_word="belgium"
list_pairs=[pair for pair in model.vocab if (re.search("^"+input_word+"\_\_", pair) or re.search("\_\_"+input_word+"$", pair))]
for pair in list_pairs:
    print (pair)

If you are using our relative or relative-init models, remember that they **share the same space with the word embeddings used as input**. In the case of our pre-trained models, these are fasttext word embeddings. You can then display the nearest words in the vector space given a word pair as input as follows:

In [None]:
path_words="./fasttext_wikipedia_en_300d.bin" # Path to your word embeddings
model_words=KeyedVectors.load_word2vec_format(path_words,binary=True) # You can write binary=False if vectors are in .txt format

In [None]:
top_words=20 # Number of top nearest neighbours to display
input_pair="guitar__piano"
input_vector=model.__getitem__(input_pair)
ms=model_words.similar_by_vector(input_vector,topn=top_words) #,restrict_vocab=None
for word in ms:
    print (word)


Just out of curiosity, try to **play with the relative and relative-init models** and see how different they are in terms of closest words! While in relative-init you will find consistently very frequent words (e.g., punctuation marks), you will find a wider variety in the relative model.