In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from whatlies import Embedding, EmbeddingSet
from whatlies.transformers import Pca, Umap
import spacy 

## Making Plots ... More Cool 

The `Embedding` object merely has support for matplotlib, but the `EmbeddingSet` has support for Altair too! You can plot this interactively by just passing the names of the tokens you'd like to see.

In [None]:
nlp = spacy.load("en_core_web_sm")
words = ["prince", "princess", "nurse", "doctor", "banker", "man", "woman", 
         "cousin", "neice", "king", "queen", "dude", "guy", "gal", "fire", 
         "dog", "cat", "mouse", "red", "bluee", "green", "yellow", "water", 
         "person", "family", "brother", "sister"]
emb = EmbeddingSet({t.text: Embedding(t.text, t.vector) for t in nlp.pipe(words)})

In [None]:
orig_chart = emb.plot_interactive('man', 'woman')
orig_chart

In [None]:
new_ts = emb | (emb['king'] - emb['queen'])
new_chart = new_ts.plot_interactive('man', 'woman')

Note that altair has a convenient syntax for plotting two charts next to eachother. This is really cool when you want to compare. Feel free to zoom in and play as well!

In [None]:
orig_chart | new_chart

The charts that we output here are from the Altair library. This means that you can, among other things, customise the size if you prefer.

In [None]:
s = 250
(orig_chart.properties(width=s, height=s) | new_chart.properties(width=s, height=s))

This idea of adding steps as a pipeline is pretty neat tho. You can also add operators from before. 

In [None]:
emb.transform(lambda e: e | (e["man"] - e["woman"])).transform(Pca(2))

## Transformations

There's something extra too. Sofar we're been mapping vectors unto other ones in order to plot them. But theoretically we could go a step further.

In [None]:
orig_chart = emb.plot_interactive('man', 'woman')
pca_emb = emb.transform(Pca(2))
umap_emb = emb.transform(Umap(2))

pca_emb.plot_interactive('pca_0', 'pca_1') | umap_emb.plot_interactive('umap_0', 'umap_1')

Note that we can increase the number of components and still only plot a few. 

In [None]:
pca_emb = emb.transform(Pca(3))

pca_emb.plot_interactive('pca_0', 'pca_1').properties(width=s, height=s) | pca_emb.plot_interactive('pca_2', 'pca_1').properties(width=s, height=s)

But why go with only two plots when you can have an entire matrix? 

In [None]:
pca_emb.plot_interactive_matrix('pca_0', 'pca_1', 'pca_2')

What is particularily interesting here is the pca axes. They seem to encode information and can we attempt an understanding by glancing at it.

But the overlap makes it hard to read. So let's apply one more transformation here.

In [None]:
from whatlies.transformers import Noise 

(emb
 .transform(Pca(3))
 .transform(Noise(2))
 .plot_interactive_matrix('king', 'queen', 'man', 'woman', annot=True, width=200, height=200))

Note that we also offer the ability to add a few random embeddings. This can be a useful sanity check.

In [None]:
from whatlies.transformers import AddRandom

(emb
 .transform(AddRandom(n=10, sigma=0.1))
 .transform(Pca(2))
 .plot_interactive_matrix('pca_0', 'pca_1', annot=True, width=200, height=200))