-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Word Embedding Visualizer #1419
Comments
Word2vec visualizations are very useful, this seems like a very good contribution idea! It might work best as a demo notebook, or as extensions to the existing word2vec notebooks – though of course if a few general-usefulness methods support the visualizations, they could become improvements to existing gensim classes/modules. (For example, perhaps new utility functions on the Also, is reduction by PCA before applying tSNE really necessary? (I could be wrong but thought tSNE could, on its own, do all necessary dim-reduction on word2vec-like hundreds-of-dimensions spaces.) |
tSNE can do the work but it tends to be pretty slow. I have set a flag so that the user can decide whether they want to use PCA first. If it is not set, tSNE will be used throughout. Plotting the visualisation in Jupyter Notebooks takes away the ability to scroll, pan and zoom. My script only requires the path to the Word2Vec model. It does the rest. Although, to prevent saving too much in active memory, I do save the reduced 2D vectors to a CSV file. Could you suggest where I should implement my code and submit a PR? |
You can probably use plotly for zooming and panning. |
@anotherbugmaster |
Yes, the interface is the same as in web-version. |
@gojomo |
Sorry for commenting late, but you can directly visualize the word embeddings using Tensorboard projector. Save the gensim model embeddings using |
@parulsethi So I'll close this issue? |
Also, |
I think it might be good to make a small mention of tensorboard for visualizing gensim word embeddings, in word2vec.ipynb. @menshikh-iv wdyt? |
In my opinion, the tensorboard visualisations aren't very intuitive. |
@parulsethi I think several viz in word2vec notebook will not be superfluous. |
@menshikh-iv made the PR #1440 |
@aneesh-joshi I think a lot of work have been to the tensorflow embedding visualizer: If you click on "A", it will show a label of your choosing (you can pass strings, or any other value) and show it instead of the dot. This is useful when you have a small number of words, and want to visualize them directly instead of hovering. It has also search, conditional coloring of datapoints (e.g: you can pass a label for sentiment score and visualize that as a color, or a label for each class of words, etc.), allows you to interactively visualize the nearest neighbors of each dot, allows you to run PCA and t-SNE clustering, and allows you to "isolate" certain words you want to focus on, and only show those (which re-runs PCA or t-SNE on those points alone, showing more contrast between them). I suspect all these features will be very hard to replicate, and as such I would really like to see this integrate with gensim rather than trying to reinvent the wheel. |
@halflings However, I wasn't aware of the full extent of the tensorboard features. These would indeed be hard to replicate. Not to mention the additional dependency of the plotly package that comes with my version. Do you suggest we make something like: which brings up the tensorboard visualisations? |
@halflings we already have "how to" pass data to tensorboard https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Tensorboard_visualizations.ipynb |
@menshikh-iv : Thanks, did not know about the I can try reaching out to the people working on the embeddings projector; there might be a possibility of running it directly from a notebook, but that should not stop you from doing a smaller version using plotly! |
@halflings probably "embedded" tensorboard will looks not good for this case (too little space in notebook). IMO, "emedding" isn't needed for this case, but I could be wrong. |
Resolved in #1800 |
Currently, there is no direct way to visualise word embeddings made by the gensim word2vec model.
I have made a visualiser using matplotlib.
It uses Incremental PCA to reduce to manageable dimensions and then uses tSNE to bring it down to 2 dimensions.
Note: I have taken some of my code from https://github.com/jeffThompson/Word2VecAndTsne
This will be my first contribution (if at all), so please guide me.
(Also, I am not sure if this is the right place to submit a feature request, so please don't mind)
The text was updated successfully, but these errors were encountered: