UHFerret is a copy-detection tool, supporting the analysis of large sets of documents to find pairs of documents with substantial amounts of lexical copying. Documents containing either natural language (e.g. English) or computer programs (in C-family) may be processed.
This library provides a Ruby wrapper around uhferret suitable for scripting, a command-line executable, 'uhferret-ruby', and a simple server version, 'start-uhferret-server'.
For a version of Ferret designed to be run as an application, see ferret.
Copyright © 2011-12, Peter Lane.
To install uhferret:
$ gem install uhferret
Ensure the EXECUTABLE DIRECTORY shown by
$ gem env
is in your PATH, to use the 'uhferret-ruby' or 'start-uhferret-server' executables.
The 'examples' folder contains examples of using UHFerret from a Ruby script.
Usage: uhferret-ruby [options] file1 file2 ... -h, --help help message -c, --code process documents as code -t, --text process documents as text (default) -d, --data-table output similarity table (default) -l, --list-trigrams output trigram list -a, --all-comparisons output list of all comparisons -x, --xml-report FILE generate xml report from two documents -f, --definition-file FILE read document names from file
To compute the similarities of a set of files, use:
$ uhferret-ruby file1.txt file2.txt ...
An xml output can be generated for a pair of files using:
$ uhferret-ruby -x outfile.xml file1.txt file2.txt
The xml output can be displayed in a browser using the style sheet 'uhferret.xsl' in the examples folder, and then printed from the browser.
Usage: start-uhferret-server [options] -h, --help help message -p, --port n port number -f, --folder FOLDER base folder
The folder to store the processed files will default to 'FerretFiles' and the port to 2000. Initial address: localhost:2000/ferret/home
NB: The server uses some *nix commands, and so currently does not work under Windows.
uhferret is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
uhferret is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with uhferret. If not, see <www.gnu.org/licenses/>.
UHFerret has been developed at the University of Hertfordshire by members of the Plagiarism Detection Group. The original concept of using trigrams for measuring copying was developed by Caroline Lyon and James Malcolm. JunPeng Bao, Ruth Barrett and Bob Dickerson also contributed to the development of earlier versions of Ferret.