Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maybe a token filter that limit tokens by frequency is useful to minhash #26487

Closed
cclient opened this issue Sep 4, 2017 · 2 comments
Closed
Labels

Comments

@cclient
Copy link

cclient commented Sep 4, 2017

feature:

filter tokens like

select count(*) as num,token from tokens group by token order by num desc limit 128;

then do minhash

@jimczi
Copy link
Contributor

jimczi commented Sep 8, 2017

Can you describe the feature a bit more ? What do you mean by frequency ? The frequency of the terms inside the document ? If so I think it would be better to implement it directly inside the minhash filter since it seems to be only useful there ?

@colings86
Copy link
Contributor

No further feedback. If you can provide more details on the feature request then please post them here and we can reopen the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants