New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigramCollocationFinder.score_ngrams with BigramAssocMeasure.likelihood_ratio raises ValueError: math domain error #2200
Comments
The problem is with the calculation of the contingency table.
When you call So for example, as for word token "gerne" and the period token "." The formula for the statistic of log-likelihood ratio test is:
* where m_ii is the expected value of n_ii and thus m_ii = n_ix (the row total) * n_xi (the column total) / n_xx, and m_oi and so on... Trying to compute the logarithm of a negative value would of course raise an error. |
Ahh, thank you. Seems I was not as thorough as I thought. My workplace has a Java tool that computes word and sentence cooccurrences but for 'quick' prototyping it is somewhat complicated to use. So I found the NLTK The reason I don't just paste the whole text / concatenate the sentences is that I want to compute the collocations/cooccurrences for sentences, not texts. My/Our sentences have no references to their origin text and are therefore 'isolated'(?). I would not like to have And thank you for giving a example computation. I may use it later to look up details. Again, thank you. The next time I will look better through the source. I think this issue can be closed? |
You mean you want to find collocation in each sentence of your text? |
Yes and no? I have a whole corpus but the sentences are not contiguous. I want to generate statistics for all the sentences, so the a |
I want to compute bigram collocations per sentence and tried to output my demo results in a jupyter notebook. Only for those four/five sentence combinations and a window size of larger than 4, the ordering with loglikelihood-ratio fails. Probably because of a negative log computation.
I'm not sure what to do at this point. Below my demo code to recreate the error.
The text was updated successfully, but these errors were encountered: