Skip to content

operasoftware/Text-WordCounter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SYNOPSIS

my $counter = Text::WordCounter->new();

my $word_count = $counter->word_count( $text )

DESCRIPTION

It is quite heuristic, for example '-' and digits inside word characters are treated as a word character, see the tests to find out how all the special cases are resolved,

The features parameter should be a hashref and is an accumulator for found features.

ATTRIBUTES

stemming

If set stemming via Lingua::Stem is performed on the words. We never managed to make it sanely in multilingual texts.

stopwords

A hashref with words to discard.

INSTANCE METHODS

is_stop_word

normalize

Lowercases words and stemms them if the stemming attribute is true.

split_scripts

word_count

Returns a hashref with word counts.

LIMITATIONS

From languages that don't use spaces only Chinese is currently supported (using Lingua::ZH::MMSEG).

SEE ALSO

__END__

About

counting words in multilingual texts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages