Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Ruby wrapper around the uhferret copy-detection tool
C++ Ruby
Branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
bin
examples
ext
lib
README.rdoc
uhferret.gemspec

README.rdoc

UHFerret

UHFerret is a copy-detection tool, supporting the analysis of large sets of documents to find pairs of documents with substantial amounts of lexical copying. Documents containing either natural language (e.g. English) or computer programs (in C-family) may be processed.

This library provides a Ruby wrapper around uhferret suitable for scripting, a command-line executable, 'uhferret-ruby', and a simple server version, 'start-uhferret-server'.

For a version of Ferret designed to be run as an application, see ferret.

Copyright © 2011-12, Peter Lane.

Install

To install uhferret:

$ gem install uhferret

Ensure the EXECUTABLE DIRECTORY shown by

$ gem env

is in your PATH, to use the 'uhferret-ruby' or 'start-uhferret-server' executables.

Use

The 'examples' folder contains examples of using UHFerret from a Ruby script.

Command Line

Usage: uhferret-ruby [options] file1 file2 ...
    -h, --help                       help message
    -c, --code                       process documents as code
    -t, --text                       process documents as text (default)
    -d, --data-table                 output similarity table (default)
    -l, --list-trigrams              output trigram list
    -a, --all-comparisons            output list of all comparisons
    -x, --xml-report FILE            generate xml report from two documents
    -f, --definition-file FILE       read document names from file

To compute the similarities of a set of files, use:

$ uhferret-ruby file1.txt file2.txt ...

An xml output can be generated for a pair of files using:

$ uhferret-ruby -x outfile.xml file1.txt file2.txt

The xml output can be displayed in a browser using the style sheet 'uhferret.xsl' in the examples folder, and then printed from the browser.

Server

Usage: start-uhferret-server [options]
    -h, --help                       help message
    -p, --port n                     port number
    -f, --folder FOLDER              base folder

The folder to store the processed files will default to 'FerretFiles' and the port to 2000. Initial address: localhost:2000/ferret/home

NB: The server uses some *nix commands, and so currently does not work under Windows.

License

uhferret is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

uhferret is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with uhferret. If not, see <www.gnu.org/licenses/>.

Acknowledgements

UHFerret has been developed at the University of Hertfordshire by members of the Plagiarism Detection Group. The original concept of using trigrams for measuring copying was developed by Caroline Lyon and James Malcolm. JunPeng Bao, Ruth Barrett and Bob Dickerson also contributed to the development of earlier versions of Ferret.

Something went wrong with that request. Please try again.