Topic worth investigating over: 'vector rejection' #595
Labels
difficulty medium
Medium issue: required good gensim understanding & python skills
feature
Issue described a new feature
wishlist
Feature request
@benSchmidt has written an interesting blog post on the use of a method he calls 'vector rejection' to separate words with ambiguous meanings.
During experimentation with a Nepali news corpus dataset, I found his method to be more useful to discard unwanted vectors than the existing method with most_similar.
I have recreated his method (which he has in R) in this gist and have been working with it for the last few days. In my (admittedly limited) series of experiments it seems to have quite a lot of value. Yoav Goldberg has a twitter thread about the operation/post here.
I bring this up because someone might want to look it over/possibly see if this aligns with the project? Please close the issue if you believe otherwise.
edit: correct link.
The text was updated successfully, but these errors were encountered: