Skip to content

a Perl library used to support simple and introductory text mining analysis

Notifications You must be signed in to change notification settings

ericleasemorgan/Tiny-Text-Mining-Tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Tiny Text Mining Tools

This is the very beginnings of Perl library used to support simple and introductory text mining analysis -- tiny text mining tools.

Presently the library is implemented in a set of subroutines stored in a single file supporting:

  • simple in-memory indexing and single-term searching

  • relevancy ranking through term-frequency inverse document frequency (TFIDF) for searching and classification

  • cosine similarity for clustering and "finding more items like this one"

I use these subroutines and the associated Perl scripts to do quick & dirty analysis against corpuses of journal articles, books, and websites.

I know, I know. It would be better to implement these thing as a set of Perl modules, but I'm practicing what I preach. "Give it away even if it is not ready." The ultimate idea is to package these things into a single distribution, and enable researchers to have them at their finger tips as opposed to a Web-based application.

-- Eric Lease Morgan emorgan@nd.edu April 2, 2014

About

a Perl library used to support simple and introductory text mining analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages