Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

LibIndic Stemmer

Build Status Coverage Status

LibIndic's stemmer module may be used to extract stems of the words in a sentence. It is implemented in a rule-based model and follows iterative suffix stripping to handle multiple levels of inflection. Right now, it supports Malayalam language only.

Installation

  1. Clone the repository git clone https://github.com/libindic/indicstemmer.git
  2. Change to the cloned directory cd indicstemmer
  3. Run setup.py to create installable source python setup.py sdist
  4. Install using pip pip install dist/libindic-stemmer*.tar.gz

Note: Prefer using virtualenv for installation as the library is in experimental stage

Usage

Input: String <str> containing words word1 word2 word3 ...
Output: Dict <dict> of the format
{
    'word1': {
                        'stem': 'stem1',
                        'inflection': ['tag1', 'tag2', ...]
             },
    'word2': {
                        'stem': 'stem2',
                        'inflection': ['tag1', 'tag2', ...]
             },
    .
    .
    .
}

>>> from libindic.stemmer import Stemmer
>>> stemmer = Stemmer()
>>> result = stemmer.stem(language='malayalam', text=രാമന്റെ വീട്ടിലേക്ക്')
>>> for word, output in result.items():
...    print word, " : ", output['stem'], " : ", output['inflection']
രാമന്റെ  :  രാമൻ  :  ['SAMB1']
വീട്ടിലേക്ക്  :  വീട്  :  ['MISC1', 'ADH1', 'UDH1']

For more details read the docs

About

experimental malayalam stemmer

Resources

Packages

No packages published
You can’t perform that action at this time.