Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

td-idf of terms, add synonyms from forum train set #3

Closed
reality opened this issue Jun 13, 2019 · 0 comments
Closed

td-idf of terms, add synonyms from forum train set #3

reality opened this issue Jun 13, 2019 · 0 comments
Assignees
Labels
task To-do item

Comments

@reality
Copy link

reality commented Jun 13, 2019

after #1 calculate word frequencies wrt length of threads using td-idf. doing this iteratively (removing terms directly expressed in the ontology each time) will inform you on synonyms and new terms to add, until you have coverage of all relevant terms in the ontology.

looks like the td-idf isn't in nltk directly, but you can use it to calculate the measure: https://nlpforhackers.io/tf-idf/

It looks like scikit-learn does have the functionality directly though: https://www.bogotobogo.com/python/NLTK/tf_idf_with_scikit-learn_NLTK.php

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task To-do item
Projects
None yet
Development

No branches or pull requests

2 participants