Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Lingua::Stem::UniNE (Perl 5): University of Neuchâtel stemmers
Perl
Tag: v0.02

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib/Lingua/Stem
src
t
xt/author
.gitignore
Build.PL
Changes
INSTALL
LICENSE
MANIFEST
MANIFEST.SKIP
README.pod
TODO

README.pod

NAME

Lingua::Stem::UniNE - University of Neuchâtel stemmers

VERSION

This document describes Lingua::Stem::UniNE v0.02.

SYNOPSIS

    use Lingua::Stem::UniNE;

    # create Bulgarian stemmer
    $stemmer = Lingua::Stem::UniNE->new(language => 'bg');

    # get stem for word
    $stem = $stemmer->stem($word);

    # get list of stems for list of words
    @stems = $stemmer->stem(@words);

    # replace words in array reference with stems
    $stemmer->stem(\@words);

DESCRIPTION

This module contains a collection of stemmers for multiple languages based on stemming algorithms provided by Jacques Savoy of the University of Neuchâtel. The languages currently implemented are Bulgarian, Czech, and Persian. Work is ongoing for Arabic, Bengali, Finnish, French, German, Hindi, Hungarian, Italian, Portuguese, Marathi, Russian, Spanish, and Swedish. The top priority is languages for which there are no stemmers available on CPAN.

Attributes

language

The following language codes are currently supported.

    ┌───────────┬────┐
    │ Bulgarian │ bg │
    │ Czech     │ cs │
    │ Persian   │ fa │
    └───────────┴────┘

They are in the two-letter ISO 639-1 format and are case-insensitive but are always returned in lowercase when requested.

    # instantiate a stemmer object
    $stemmer = Lingua::Stem::UniNE->new(language => $language);

    # get current language
    $language = $stemmer->language;

    # change language
    $stemmer->language($language);

Country codes such as cz for the Czech Republic are not supported, nor are IETF language tags such as pt-PT or pt-BR.

Methods

stem

When a list of strings is provided, each string is stemmed and a list of stems is returned. The list returned will always have the same number of elements in the same order as the list provided.

    @stems = $stemmer->stem(@words);

    # get the stem for a single word
    $stem = $stemmer->stem($word);

When an array reference is provided, each element is stemmed and replaced with the resulting stem.

    $stemmer->stem(\@words);

The words should be provided as character strings and the stems are returned as character strings. Byte strings in arbitrary character encodings are not supported.

languages

Returns a list of supported two-letter language codes using lowercase letters.

    # object method
    @languages = $stemmer->languages;

    # class method
    @languages = Lingua::Stem::UniNE->languages;

In scalar context it returns the number of supported languages.

SEE ALSO

IR Multilingual Resources at UniNE provides the original stemming algorithms that were implemented in this module.

Lingua::Stem::Snowball provides alternate stemming algorithms for Finnish, French, German, Hungarian, Italian, Portuguese, Russian, Spanish, and Swedish, as well as other languages.

ACKNOWLEDGEMENTS

Jacques Savoy and Ljiljana Dolamic of the University of Neuchâtel authored the original stemming algorithms that were implemented in this module.

This module is brought to you by Shutterstock (@ShutterTech). Additional open source projects from Shutterstock can be found at code.shutterstock.com.

AUTHOR

Nick Patch <patch@cpan.org>

COPYRIGHT AND LICENSE

© 2012–2013 Nick Patch

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Something went wrong with that request. Please try again.