GitHub - ericleasemorgan/Tiny-Text-Mining-Tools: a Perl library used to support simple and introductory text mining analysis

Tiny Text Mining Tools

This is the very beginnings of Perl library used to support simple and introductory text mining analysis -- tiny text mining tools.

Presently the library is implemented in a set of subroutines stored in a single file supporting:

simple in-memory indexing and single-term searching
relevancy ranking through term-frequency inverse document frequency (TFIDF) for searching and classification
cosine similarity for clustering and "finding more items like this one"

I use these subroutines and the associated Perl scripts to do quick & dirty analysis against corpuses of journal articles, books, and websites.

I know, I know. It would be better to implement these thing as a set of Perl modules, but I'm practicing what I preach. "Give it away even if it is not ready." The ultimate idea is to package these things into a single distribution, and enable researchers to have them at their finger tips as opposed to a Web-based application.

-- Eric Lease Morgan emorgan@nd.edu April 2, 2014

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bin		bin
corpus		corpus
etc		etc
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

ericleasemorgan/Tiny-Text-Mining-Tools

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages