counting words in multilingual texts
Perl
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib/Text
t
Changes
README.pod
TODO
dist.ini

README.pod

SYNOPSIS

my $counter = Text::WordCounter->new();

my $word_count = $counter->word_count( $text )

DESCRIPTION

It is quite heuristic, for example '-' and digits inside word characters are treated as a word character, see the tests to find out how all the special cases are resolved,

The features parameter should be a hashref and is an accumulator for found features.

ATTRIBUTES

stemming

If set stemming via Lingua::Stem is performed on the words. We never managed to make it sanely in multilingual texts.

stopwords

A hashref with words to discard.

INSTANCE METHODS

is_stop_word

normalize

Lowercases words and stemms them if the stemming attribute is true.

split_scripts

word_count

Returns a hashref with word counts.

LIMITATIONS

From languages that don't use spaces only Chinese is currently supported (using Lingua::ZH::MMSEG).

SEE ALSO

__END__