Simple tf-idf based reverse index experiment.
Python
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lt09
tools
README
build_index.py
containment.py
query.py

README

Simple scripts to generate a reverse index from a collection of text files,
based on tf-idf weights.

We also use a shingling technique to calculate text containment between the
files of the collection.

The tree-tagger English parameter file is available from [here](http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html).