Path |
pimlico.modules.visualization.embeddings_plot |
Executable |
yes |
Plot vectors from embeddings, trained by some other module, in a 2D space using a MDS reduction and Matplotlib.
They might, for example, come from pimlico.modules.embeddings.word2vec
. The embeddings are read in using Pimlico's generic word embedding storage type.
Uses scikit-learn to perform the MDS/TSNE reduction.
Name |
Type(s) |
vectors |
:class:list <pimlico.datatypes.base.MultipleInputs> of Embeddings <pimlico.datatypes.embeddings.Embeddings> |
Name |
Type(s) |
plot |
~pimlico.datatypes.plotting.PlotOutput |
Name |
Description |
Type |
skip |
Number of most frequent words to skip, taking the next most frequent after these. Default: 0 |
int |
metric |
Distance metric to use. Choose from 'cosine', 'euclidean', 'manhattan'. Default: 'cosine' |
'cosine', 'euclidean' or 'manhattan' |
reduction |
Dimensionality reduction technique to use to project to 2D. Available: mds (Multi-dimensional Scaling), tsne (t-distributed Stochastic Neighbor Embedding). Default: mds |
'mds' or 'tsne' |
colors |
List of colours to use for different embedding sets. Should be a list of matplotlib colour strings, one for each embedding set given in input_vectors |
comma-separated list of strings |
cmap |
Mapping from word prefixes to matplotlib plotting colours. Every word beginning with the given prefix has the prefix removed and is plotted in the corresponding colour. Specify as a JSON dictionary mapping prefix strings to colour strings |
JSON string |
words |
Number of most frequent words to plot. Default: 50 |
int |