a Perl library used to support simple and introductory text mining analysis
Perl
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
corpus
etc
readme.md

readme.md

Tiny Text Mining Tools

This is the very beginnings of Perl library used to support simple and introductory text mining analysis -- tiny text mining tools.

Presently the library is implemented in a set of subroutines stored in a single file supporting:

  • simple in-memory indexing and single-term searching

  • relevancy ranking through term-frequency inverse document frequency (TFIDF) for searching and classification

  • cosine similarity for clustering and "finding more items like this one"

I use these subroutines and the associated Perl scripts to do quick & dirty analysis against corpuses of journal articles, books, and websites.

I know, I know. It would be better to implement these thing as a set of Perl modules, but I'm practicing what I preach. "Give it away even if it is not ready." The ultimate idea is to package these things into a single distribution, and enable researchers to have them at their finger tips as opposed to a Web-based application.

-- Eric Lease Morgan emorgan@nd.edu April 2, 2014