-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
func CosineSimilarity() return NaN #7
Comments
The NaN usually occurs if one of the vectors being compared has an L2 norm of 0 which usually means the matrix has no non-zero values. The result of calculating cosine similarity on such a vector is undefined. Have you trained (fitted) the vectoriser prior to using it to transform your document(s)? I suspect at least some of the documents in your test data are 100% comprised of words that were not present in your training corpus. Perhaps try using a larger training data set to make sure the vectoriser learns more words during training. |
Let me know how you get on? |
Yeah! My training data set is pretty small now. It's true that some of the documents in my test data are comprised of words that were not present in training corpus. And I'll have a new try. |
Not a problem and you are most welcome. You could also try the HashingVectoriser as a direct drop in replacement for the Count vectoriser which doesn't require training. |
Dear James bowman
I use your library to calculate similarity. The function CosineSimilarity() returns many NaN. So i can't continue my work.
I have only changed your vectorisers.go file . All my changes are as following.
func (v *CountVectoriser) Transform(docs ...string) (mat.Matrix, error) { //function begin from here
mat := sparse.NewDOK(len(v.Vocabulary), len(docs))
}
The text was updated successfully, but these errors were encountered: