REALLY JUST A LIST OF STOPWORDS WITH SOME HELPERS
Obviously part of something bigger but worth breaking out for reuse.
require 'stopwords' #List all stop words Stopwords::STOP_WORDS #Test to see if a token is a stop word Stopwords.is?('and') =>true #Ensures a token is both a 'word' and not a stop word Stopwords.valid?('vector') =>true
$ rake specs
Not part of the library but you should probably sanitize tokens before using them (if your tokenize doesn’t already)
SANITIZE_REGEXP = /('|\"|‘|’|\/|\\)/ text.downcase.gsub(SANITIZE_REGEXP, '')
Software Services shop (primarily Ruby) in Brooklyn, NY.