Word Embedding Visualizer #1419

aneesh-joshi · 2017-06-16T09:04:00Z

Currently, there is no direct way to visualise word embeddings made by the gensim word2vec model.
I have made a visualiser using matplotlib.
It uses Incremental PCA to reduce to manageable dimensions and then uses tSNE to bring it down to 2 dimensions.

Note: I have taken some of my code from https://github.com/jeffThompson/Word2VecAndTsne

This will be my first contribution (if at all), so please guide me.

(Also, I am not sure if this is the right place to submit a feature request, so please don't mind)

gojomo · 2017-06-16T16:55:42Z

Word2vec visualizations are very useful, this seems like a very good contribution idea!

It might work best as a demo notebook, or as extensions to the existing word2vec notebooks – though of course if a few general-usefulness methods support the visualizations, they could become improvements to existing gensim classes/modules. (For example, perhaps new utility functions on the KeyedVectors class, the model for sets-of-vectors keyed-by-strings... typically keyed-by-words.)

Also, is reduction by PCA before applying tSNE really necessary? (I could be wrong but thought tSNE could, on its own, do all necessary dim-reduction on word2vec-like hundreds-of-dimensions spaces.)

aneesh-joshi · 2017-06-17T04:39:47Z

is reduction by PCA before applying tSNE really necessary?

tSNE can do the work but it tends to be pretty slow. I have set a flag so that the user can decide whether they want to use PCA first. If it is not set, tSNE will be used throughout.

Plotting the visualisation in Jupyter Notebooks takes away the ability to scroll, pan and zoom.
Unless there is a work around for that, it might have to be implemented as a script.

My script only requires the path to the Word2Vec model. It does the rest. Although, to prevent saving too much in active memory, I do save the reduced 2D vectors to a CSV file.

Could you suggest where I should implement my code and submit a PR?

anotherbugmaster · 2017-06-17T04:44:24Z

@aneesh-joshi

You can probably use plotly for zooming and panning.

aneesh-joshi · 2017-06-17T04:58:04Z

@anotherbugmaster
Thanks!
will try it.
Will it allow zooming, panning,etc from within a notebook?

anotherbugmaster · 2017-06-17T05:06:42Z

Yes, the interface is the same as in web-version.

aneesh-joshi · 2017-06-18T06:06:38Z

@gojomo
So where exactly do you want me to implement it?
in a demo notebook or append in an existing notebook or in the keyed vectors utility?

parulsethi · 2017-06-18T12:40:22Z

Sorry for commenting late, but you can directly visualize the word embeddings using Tensorboard projector.

Save the gensim model embeddings using model.wv.save_word2vec_format("filename") and use this script to convert the saved embedding file to tsv format which is required by tensorboard. See the instructions for usage in the docstring of script.

aneesh-joshi · 2017-06-18T13:05:15Z

Ah
There goes my first contribution.
I had just added it to the word2vec ipynb

aneesh-joshi · 2017-06-18T13:06:02Z

@parulsethi So I'll close this issue?

aneesh-joshi · 2017-06-18T13:08:32Z

Also,
please review my PR
#1426

parulsethi · 2017-06-18T23:59:32Z

I think it might be good to make a small mention of tensorboard for visualizing gensim word embeddings, in word2vec.ipynb. @menshikh-iv wdyt?

aneesh-joshi · 2017-06-19T05:36:59Z

In my opinion, the tensorboard visualisations aren't very intuitive.
They are just dots on the screen which need hovering to get the words.
Recognising clusters,etc becomes difficult.

aneesh-joshi · 2017-06-19T05:43:08Z

menshikh-iv · 2017-06-22T04:29:25Z

@parulsethi I think several viz in word2vec notebook will not be superfluous.
@aneesh-joshi fell free to add some visualization in notebook.

aneesh-joshi · 2017-06-22T05:49:46Z

@menshikh-iv made the PR #1440

halflings · 2017-12-21T04:18:12Z

@aneesh-joshi I think a lot of work have been to the tensorflow embedding visualizer:

If you click on "A", it will show a label of your choosing (you can pass strings, or any other value) and show it instead of the dot. This is useful when you have a small number of words, and want to visualize them directly instead of hovering.

It has also search, conditional coloring of datapoints (e.g: you can pass a label for sentiment score and visualize that as a color, or a label for each class of words, etc.), allows you to interactively visualize the nearest neighbors of each dot, allows you to run PCA and t-SNE clustering, and allows you to "isolate" certain words you want to focus on, and only show those (which re-runs PCA or t-SNE on those points alone, showing more contrast between them).

I suspect all these features will be very hard to replicate, and as such I would really like to see this integrate with gensim rather than trying to reinvent the wheel.

aneesh-joshi · 2017-12-21T07:16:27Z

@halflings
I agree with you.
My thought process was, especially considering it's in a notebook, that somebody new should be able to visualise the vectors that they've made. And the tensorboard visualizations didn't feel so intuitive.

However, I wasn't aware of the full extent of the tensorboard features. These would indeed be hard to replicate. Not to mention the additional dependency of the plotly package that comes with my version.

Do you suggest we make something like:
model.plot()

which brings up the tensorboard visualisations?

menshikh-iv · 2017-12-21T07:34:37Z

@halflings we already have "how to" pass data to tensorboard https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Tensorboard_visualizations.ipynb

halflings · 2017-12-21T07:44:59Z

@menshikh-iv : Thanks, did not know about the word2vec2tensor script.
I guess it makes sense to also have a more lightweight way to visualize things directly from a notebook.

I can try reaching out to the people working on the embeddings projector; there might be a possibility of running it directly from a notebook, but that should not stop you from doing a smaller version using plotly!

menshikh-iv · 2017-12-21T07:48:28Z

@halflings probably "embedded" tensorboard will looks not good for this case (too little space in notebook).

IMO, "emedding" isn't needed for this case, but I could be wrong.

menshikh-iv · 2017-12-21T13:20:51Z

Resolved in #1800

aneesh-joshi mentioned this issue Jun 21, 2017

Fixed incorrect link in notebook #1426

Merged

aneesh-joshi mentioned this issue Jun 22, 2017

added word embedding visualisation in docs/word2vec.ipynb #1440

Closed

menshikh-iv added documentation Current issue related to documentation feature Issue described a new feature difficulty easy Easy issue: required small fix labels Oct 2, 2017

markroxor mentioned this issue Dec 19, 2017

Add word embedding viz to word2vec notebook. Fix #1419 #1800

Merged

menshikh-iv closed this as completed in e28144a Dec 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word Embedding Visualizer #1419

Word Embedding Visualizer #1419

aneesh-joshi commented Jun 16, 2017

gojomo commented Jun 16, 2017

aneesh-joshi commented Jun 17, 2017

anotherbugmaster commented Jun 17, 2017

aneesh-joshi commented Jun 17, 2017

anotherbugmaster commented Jun 17, 2017

aneesh-joshi commented Jun 18, 2017

parulsethi commented Jun 18, 2017 •

edited

aneesh-joshi commented Jun 18, 2017

aneesh-joshi commented Jun 18, 2017

aneesh-joshi commented Jun 18, 2017

parulsethi commented Jun 18, 2017

aneesh-joshi commented Jun 19, 2017

aneesh-joshi commented Jun 19, 2017

menshikh-iv commented Jun 22, 2017

aneesh-joshi commented Jun 22, 2017

halflings commented Dec 21, 2017

aneesh-joshi commented Dec 21, 2017

menshikh-iv commented Dec 21, 2017 •

edited

halflings commented Dec 21, 2017

menshikh-iv commented Dec 21, 2017

menshikh-iv commented Dec 21, 2017

Word Embedding Visualizer #1419

Word Embedding Visualizer #1419

Comments

aneesh-joshi commented Jun 16, 2017

gojomo commented Jun 16, 2017

aneesh-joshi commented Jun 17, 2017

anotherbugmaster commented Jun 17, 2017

aneesh-joshi commented Jun 17, 2017

anotherbugmaster commented Jun 17, 2017

aneesh-joshi commented Jun 18, 2017

parulsethi commented Jun 18, 2017 • edited

aneesh-joshi commented Jun 18, 2017

aneesh-joshi commented Jun 18, 2017

aneesh-joshi commented Jun 18, 2017

parulsethi commented Jun 18, 2017

aneesh-joshi commented Jun 19, 2017

aneesh-joshi commented Jun 19, 2017

menshikh-iv commented Jun 22, 2017

aneesh-joshi commented Jun 22, 2017

halflings commented Dec 21, 2017

aneesh-joshi commented Dec 21, 2017

menshikh-iv commented Dec 21, 2017 • edited

halflings commented Dec 21, 2017

menshikh-iv commented Dec 21, 2017

menshikh-iv commented Dec 21, 2017

parulsethi commented Jun 18, 2017 •

edited

menshikh-iv commented Dec 21, 2017 •

edited