Cgo binding for Snowball C library
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
include Initial commit. Dec 11, 2012
test Initial commit. Dec 11, 2012
.gitignore Initial commit. Dec 11, 2012
LICENCE Initial commit. Dec 11, 2012
README.md Minor fixes. Dec 11, 2012
api.c Initial commit. Dec 11, 2012
libstemmer.c Initial commit. Dec 11, 2012
modules.txt Initial commit. Dec 11, 2012
stem.go Fixed package name. Dec 11, 2012
stem_ISO_8859_1_danish.c Initial commit. Dec 11, 2012
stem_ISO_8859_1_dutch.c Initial commit. Dec 11, 2012
stem_ISO_8859_1_english.c Initial commit. Dec 11, 2012
stem_ISO_8859_1_finnish.c Initial commit. Dec 11, 2012
stem_ISO_8859_1_french.c Initial commit. Dec 11, 2012
stem_ISO_8859_1_german.c Initial commit. Dec 11, 2012
stem_ISO_8859_1_hungarian.c Initial commit. Dec 11, 2012
stem_ISO_8859_1_italian.c Initial commit. Dec 11, 2012
stem_ISO_8859_1_norwegian.c Initial commit. Dec 11, 2012
stem_ISO_8859_1_porter.c Initial commit. Dec 11, 2012
stem_ISO_8859_1_portuguese.c Initial commit. Dec 11, 2012
stem_ISO_8859_1_spanish.c Initial commit. Dec 11, 2012
stem_ISO_8859_1_swedish.c Initial commit. Dec 11, 2012
stem_ISO_8859_2_romanian.c Initial commit. Dec 11, 2012
stem_KOI8_R_russian.c Initial commit. Dec 11, 2012
stem_UTF_8_danish.c Initial commit. Dec 11, 2012
stem_UTF_8_dutch.c Initial commit. Dec 11, 2012
stem_UTF_8_english.c Initial commit. Dec 11, 2012
stem_UTF_8_finnish.c Initial commit. Dec 11, 2012
stem_UTF_8_french.c Initial commit. Dec 11, 2012
stem_UTF_8_german.c Initial commit. Dec 11, 2012
stem_UTF_8_hungarian.c Initial commit. Dec 11, 2012
stem_UTF_8_italian.c Initial commit. Dec 11, 2012
stem_UTF_8_norwegian.c Initial commit. Dec 11, 2012
stem_UTF_8_porter.c Initial commit. Dec 11, 2012
stem_UTF_8_portuguese.c Initial commit. Dec 11, 2012
stem_UTF_8_romanian.c Initial commit. Dec 11, 2012
stem_UTF_8_russian.c Initial commit. Dec 11, 2012
stem_UTF_8_spanish.c Initial commit. Dec 11, 2012
stem_UTF_8_swedish.c Initial commit. Dec 11, 2012
stem_UTF_8_turkish.c Initial commit. Dec 11, 2012
stem_test.go Fixed package name. Dec 11, 2012
utilities.c Initial commit. Dec 11, 2012

README.md

Description

Snowball stemmer port (cgo wrapper) for Go. Provides word stem extraction functionality. For more detailed info see http://snowball.tartarus.org/

Installing

go get github.com/goodsign/snowball
go test github.com/goodsign/snowball (Must PASS)

Done! Use it in your go files. (import 'github.com/goodsign/snowball')

Usage

  stemmer, err := NewWordStemmer(algorithm, encoding)
  
  if nil != err {
    /*...handle error...*/
  }
  defer stemmer.Close() 

  wordStem, err := stemmer.Stem(word)
  if nil != err {
    /*...handle error...*/
  }

  /* Use wordStem */

Usage notes

According to Snowball documentation:

Creating a stemmer is a relatively expensive operation - the expected
usage pattern is that a new stemmer is created when needed, used
to stem many words, and deleted after some time.

Algorithms & encodings

File modules.txt contains all the main algorithms for each language, in UTF-8, and also with the most commonly used encoding.

Language        Encodings               Algorithms

danish          UTF_8,ISO_8859_1        danish,da,dan
dutch           UTF_8,ISO_8859_1        dutch,nl,dut,nld
english         UTF_8,ISO_8859_1        english,en,eng
finnish         UTF_8,ISO_8859_1        finnish,fi,fin
french          UTF_8,ISO_8859_1        french,fr,fre,fra
german          UTF_8,ISO_8859_1        german,de,ger,deu
hungarian       UTF_8,ISO_8859_1        hungarian,hu,hun
italian         UTF_8,ISO_8859_1        italian,it,ita
norwegian       UTF_8,ISO_8859_1        norwegian,no,nor
portuguese      UTF_8,ISO_8859_1        portuguese,pt,por
romanian        UTF_8,ISO_8859_2        romanian,ro,rum,ron
russian         UTF_8,KOI8_R            russian,ru,rus
spanish         UTF_8,ISO_8859_1        spanish,es,esl,spa
swedish         UTF_8,ISO_8859_1        swedish,sv,swe
turkish         UTF_8                   turkish,tr,tur

Thread-safety

The original Snowball documentation says:

Stemmers are re-entrant, but not threadsafe.  In other words, if
you wish to access the same stemmer object from multiple threads,
you must ensure that all access is protected by a mutex or similar
device.

Thus this Go wrapper uses sync.Mutex for each stem operation, so it is thread safe.

Snowball Licence

The Snowball library is released under the BSD Licence

Licence

The goodsign/snowball binding is released under the BSD Licence

LICENCE file