Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
34 lines (17 sloc) 998 Bytes
Stemmer algorithm used in varnam:
STEM(word, stem_results)
1. Compile the scheme file
2. initialize "word_buffer" and "suffix" and "new_ending" to empty strings
3. copy word to word_buffer
4. while(word_buffer is not empty)
4.1 end_char = last unicode character in word_buffer
4.2 insert end_char to the beginning of suffix
4.3 remove end_char from word_buffer
4.4 Search the database for a stemrule corresponding to contents of "suffix" and copy the RHS of stemrule to string buffer new_ending
4.5 if (stemrule found)
4.5.1 Check the exceptions table to see if stemming should be performed.
This is useful in cases like ന് => ൻ. This stem should not be performed if ന് is part of ന്ന് because internally ന്ന is ന് + ന
4.5.1.1 continue (execute loop again) if exception check returns true
4.5.2 append the contents of new_ending to word_buffer
4.5.3 Add word_copy into stem_results
4.5.4 clear string buffer "suffix"