Skip to content

Measuring similarity in textual data (Bag of Words model) using Jaccard distance, Cosine similarity and Euclidean distance.

Notifications You must be signed in to change notification settings

syedhadi816/Similarity-Measurement-in-Text-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

There are 3 files in this folder
data50.csv
label.csv
group.csv. 

In data50.csv there is a sparse representation of the bags-of-words, with each row containing 3 fields: articleId, wordId, and count.
To find out which group an article belongs to, use the file label.csv, where for articleId i, line i in label.csv contains the groupId. 

Finally the group name is in group.csv, with line i containing the name of group i.

About

Measuring similarity in textual data (Bag of Words model) using Jaccard distance, Cosine similarity and Euclidean distance.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published