Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add smart information retrieval system for TFIDF #1785

Closed
markroxor opened this issue Dec 13, 2017 · 1 comment
Closed

Add smart information retrieval system for TFIDF #1785

markroxor opened this issue Dec 13, 2017 · 1 comment
Labels
difficulty medium Medium issue: required good gensim understanding & python skills feature Issue described a new feature wishlist Feature request

Comments

@markroxor
Copy link
Contributor

markroxor commented Dec 13, 2017

https://en.wikipedia.org/wiki/SMART_Information_Retrieval_System

The current TFIDF model uses natural TF and IDF for computing TFIDF. The idea is to try various transformation like logarithmic, augmented,boolean etc. before computing the vectors.

More about this - http://www.cs.odu.edu/~jbollen/IR04/readings/article1-29-03.pdf and https://nlp.stanford.edu/IR-book/pdf/06vect.pdf

Will send a PR tomorrow.

@menshikh-iv menshikh-iv added feature Issue described a new feature wishlist Feature request labels Dec 15, 2017
@menshikh-iv
Copy link
Contributor

related issue #220

@menshikh-iv menshikh-iv added the difficulty medium Medium issue: required good gensim understanding & python skills label Dec 15, 2017
sj29-innovate pushed a commit to sj29-innovate/gensim that referenced this issue Feb 21, 2018
…y#1785 (piskvorky#1791)

* fixing appveyor

* verify weights

* verify weights

* smartirs ready

* change old tests

* remove lambdas

* address suggestions

* minor fix

* pep8 fix

* fix pickle problem

* flake8 fix

* fix bug in docstring

* added few tests

* fix normalize issue for pickling

* fix normalize issue for pickling

* test without sklearn api

* hanging idents and new tests

* add docstring

* add docstring

* better way cmparing floats

* old way of cmp floats

* doc fix[1]

* doc fix[2]

* fix description TODOs

* fix irksome comparision
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty medium Medium issue: required good gensim understanding & python skills feature Issue described a new feature wishlist Feature request
Projects
None yet
Development

No branches or pull requests

2 participants