Skip to content

liquidsunset/similarity_search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

similarity_search

make, then run:

similarity_search <#lines(sets) to find common integers(tokens,words)> <jaccard-threshold(0..1)>

demo with threshold 0.9 (almost ident sets/lines)

./similarity_search dblp_first500.txt 50 0.9

demo for original implementation

./set_sim_join --timings --statistics --whitespace '/home/liquid/similarity_search/enron.format' allpairs 0.9