Skip to content
/ separa Public

Separa splits chunks of text into tokens to be indexed

License

Notifications You must be signed in to change notification settings

Porta/separa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Separa

Separa splits chunks of text into tokens to be indexed

Description

Separa splits chunks of text or ruby objects into tokens to be indexed by Busca, the simple redis search.

Installation

As usual, you can install it using rubygems.

$ gem install separa

Usage

The simplest possible usage is with default options:

separa = Separa.new
words = "This is a bunch of words. Separated"
result = separa.call(words)
puts result.inspect
# ["This", "is", "a", "bunch", "of", "words", "", "Separated"]

You'll notice a few things here:

  • There's an empty element between words and Separated
  • Words kept their capitalization

That's all intended. Separa only takes care of spliting the string into an array. It is up to you to filter later that array.

Separa comes bundled with two 'Separators', but you can roll your own (more on that later). The separator usage is fairly simple, just pass the separator to the Separa.new constructor.

Separa::Text Splits a string of text into an array. You can pass a regexp to be used on the split.

Separa::Obj Splits a ruby hash into an array. This is where things get interesting. Let's see a example:

separa = Separa.new(Separa::Obj)
h = { uno: 1, dos: 2, tres: {uno: 'one', dos: 'two'} }
result = separa.call(h)
puts result.inspect
# ["uno:1", "dos:2", "tres.uno:one", "tres.dos:two"]

By default, Separa::Obj will use a semicollon divide the object key and it's value. You can change that passing a different divider.

separa = Separa.new(Separa::Obj, divider: '-')
h = { uno: 1, dos: 2, tres: {uno: 'one', dos: 'two'} }
result = separa.call(h)
puts result.inspect
# ["uno-1", "dos-2", "tres.uno-one", "tres.dos-two"]

Roll your own separator

Writting your own separator is fairly simple. You only need to take care of 3 things.

  • It should respond to a call method.
  • The call method should receive 2 parameters. The string to split and a hash with options.
  • It should return an array. (Actually, returning an array isn't required, but recommended. I mean, that's half of the objective of this library, right?)

Take a look at the bundled separators if you need inspiration:

Separa::Text

Separa::Obj

The code is pretty straightforward.

Have fun splitting your strings, and drop a line to julian@porta.sh if you have something to say.

About

Separa splits chunks of text into tokens to be indexed

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published