Term-Frequency Inverse Document Frequency calculator for Perl 6.
Perl6
Latest commit 0372524 Mar 27, 2016 @kmwallio Check for stop-words in more locations
...eliminate useless calculation for when stop words are present...

README.md

Text::TFIdf

Given a set of documents, generates TF-IDF Vectors for them. Build Status

Installation

panda install Text::TFIdf

Usage

use Text::TFIdf;

my %stop-words;

my $doc-store = TFIdf.new(:trim(True), :stop-list(%stop-words));

$doc-store.add('perl is cool');
$doc-store.add('i like node');
$doc-store.add('java is okay');
$doc-store.add('perl and node are interesting meh about java');

sub results($id, $score) {
  say $id ~ " got " ~ $score;
}

$doc-store.tfids('node perl java', &results);

Output:

0 got 0.858454714967854
1 got 0.858454714967854
2 got 0.858454714967854
3 got 2.17296349726238

Acknowledgements