Skip to content

Latest commit

 

History

History
9 lines (7 loc) · 401 Bytes

README.md

File metadata and controls

9 lines (7 loc) · 401 Bytes

NDD_DocSets

Data set for Near Duplicate documents detection (NDD) this repository contains 2 document sets:

1- a subset of R8 document collection containing 1000 randomly selected documets. near duplicate documents : file00- file49, len = 50

2- a subset of WebKB document collection containing 1000 randomly selected documets. near duplicate documents : file00- file49, len = 50