Data set for Near Duplicate documents detection (NDD) this repository contains 2 document sets:
1- a subset of R8 document collection containing 1000 randomly selected documets. near duplicate documents : file00- file49, len = 50
2- a subset of WebKB document collection containing 1000 randomly selected documets. near duplicate documents : file00- file49, len = 50