func CosineSimilarity() return NaN #7

WhisperRain · 2018-06-07T10:06:48Z

Dear James bowman
I use your library to calculate similarity. The function CosineSimilarity() returns many NaN. So i can't continue my work.
I have only changed your vectorisers.go file . All my changes are as following.

func (v *CountVectoriser) Transform(docs ...string) (mat.Matrix, error) { //function begin from here
mat := sparse.NewDOK(len(v.Vocabulary), len(docs))

for d, doc := range docs {
	v.Tokeniser.ForEachIn(doc, func(word string) {
		i, exists := v.Vocabulary[word]

		if exists {
			weight, wieghtExist := TrainingData.WeightMap[word]
			// normal weight value: 2,  unimportant weight value: 1, important  weight value: 3
			if wieghtExist {
				mat.Set(i, d, mat.At(i, d)+weight)
			} else {
				mat.Set(i, d, mat.At(i, d)+1)
			}

		}
	})
}
return mat.ToCSR(), nil

}

The text was updated successfully, but these errors were encountered:

james-bowman · 2018-06-07T22:36:26Z

The NaN usually occurs if one of the vectors being compared has an L2 norm of 0 which usually means the matrix has no non-zero values. The result of calculating cosine similarity on such a vector is undefined. Have you trained (fitted) the vectoriser prior to using it to transform your document(s)? I suspect at least some of the documents in your test data are 100% comprised of words that were not present in your training corpus. Perhaps try using a larger training data set to make sure the vectoriser learns more words during training.

james-bowman · 2018-06-07T22:37:42Z

Let me know how you get on?

WhisperRain · 2018-06-08T03:37:47Z

Yeah! My training data set is pretty small now. It's true that some of the documents in my test data are comprised of words that were not present in training corpus. And I'll have a new try.
Thanks for answering my question in such a short time.

james-bowman · 2018-06-08T05:09:47Z

Not a problem and you are most welcome. You could also try the HashingVectoriser as a direct drop in replacement for the Count vectoriser which doesn't require training.

james-bowman closed this as completed Jun 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

func CosineSimilarity() return NaN #7

func CosineSimilarity() return NaN #7

WhisperRain commented Jun 7, 2018 •

edited

Loading

james-bowman commented Jun 7, 2018 •

edited

Loading

james-bowman commented Jun 7, 2018

WhisperRain commented Jun 8, 2018

james-bowman commented Jun 8, 2018

func CosineSimilarity() return NaN #7

func CosineSimilarity() return NaN #7

Comments

WhisperRain commented Jun 7, 2018 • edited Loading

james-bowman commented Jun 7, 2018 • edited Loading

james-bowman commented Jun 7, 2018

WhisperRain commented Jun 8, 2018

james-bowman commented Jun 8, 2018

WhisperRain commented Jun 7, 2018 •

edited

Loading

james-bowman commented Jun 7, 2018 •

edited

Loading