Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
branch: master
Fetching contributors…

Cannot retrieve contributors at this time

46 lines (27 sloc) 0.764 kB

STOPWORDS

REALLY JUST A LIST OF STOPWORDS WITH SOME HELPERS

Obviously part of something bigger but worth breaking out for reuse.

USAGE


	
require 'stopwords'

#List all stop words
Stopwords::STOP_WORDS

#Test to see if a token is a stop word
Stopwords.is?('and')

=>true

#Ensures a token is both a 'word' and not a stop word
Stopwords.valid?('vector')

=>true

SPECS


$ rake specs

SANITIZE

Not part of the library but you should probably sanitize tokens before using them (if your tokenize doesn’t already)


SANITIZE_REGEXP = /('|\"|‘|’|\/|\\)/
text.downcase.gsub(SANITIZE_REGEXP, '')

ENDAX

Software Services shop (primarily Ruby) in Brooklyn, NY.

Jump to Line
Something went wrong with that request. Please try again.