-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coherence score on new data Key Error #2711
Comments
Can you please make your example reproducible? Ideally, you should be able to copy-paste it into a REPL shell to demonstrate the problem. |
We have 2 topics and want to calculate the coherence score of the topic on new data. If a word like "human" is present in a topic but not in the new data we want to calculate the coherence score over an key error is trown.
If i want to calculate the coherence score over the test_WITH_human , it works and we get a score of 1.0. If you want to calculate it over the test data test_WITHOUT_human, an error is thrown:
Error message:
|
I have the same problem: KeyError Traceback (most recent call last) 5 frames During handling of the above exception, another exception occurred: KeyError Traceback (most recent call last) KeyError: 'de arma' |
* Fixed coherence model issue #2711 * Handled token or id formatting of topics * Raised error with wrong formatting * removed blank lines * updated code * updated code * revision on coherencemodel.py * added new tests * rm trailing whitespace * more flake8 fixes * still more flake8 fixes * update changelog Co-authored-by: Michael Penkov <misha.penkov@gmail.com>
I want to compare different models (LDA, Mallet, etc.) with a Cross Validation. I train the model with training data and want to calculate the coherence score (c_v) with the test data. I do something like this:
When a word in a topic found on the training data is not present in the test data this raises an key error in the dictionary at some point. Does someone know something about this? is this a bug or how do i solve this issue?
The text was updated successfully, but these errors were encountered: