Charactersubstituters configuration

floere edited this page Dec 28, 2012 · 10 revisions

Character substituters

Character substituters tell an indexer or query how to replace funky characters, like “ç”.

Example
Possible CharacterSubstituters
  WestEuropean
  (There’s a built-in Polish one)
  (And a Vietnamese one: https://github.com/duyleekun/picky-vietnamese)

Q: Why don’t you provide a single substituter for all the world’s characters?
A: Since it is expensive, it’s better to use one that is just doing what you need.

Example

The option substitutes_characters_with character_substituter is defined for methods indexing and searching.

class PickySearch < Application

  indexing  substitutes_characters_with: CharacterSubstituters::WestEuropean.new
  searching substitutes_characters_with: CharacterSubstituters::WestEuropean.new

end

The substitutes_characters_with option defines how characters are handled. If none is defined, no characters are substituted.

Possible character substituters

WestEuropean

Use as follows (not necessarily in indexing and querying):

  indexing substitutes_characters_with: CharacterSubstituters::WestEuropean.new
  searching substitutes_characters_with: CharacterSubstituters::WestEuropean.new

This substitutes as follows, straight from the specs:

  it_should_substitute 'ä', 'ae'
  it_should_substitute 'Ä', 'Ae'
  it_should_substitute 'ö', 'oe'
  it_should_substitute 'Ö', 'Oe'
  it_should_substitute 'ü', 'ue'
  it_should_substitute 'Ü', 'Ue'
  it_should_substitute 'ë', 'e'
  it_should_substitute 'Ë', 'E'
  it_should_substitute 'ï', 'i'
  it_should_substitute 'Ï', 'I'
  it_should_substitute 'é', 'e'
  it_should_substitute 'É', 'E'
  it_should_substitute 'à', 'a'
  it_should_substitute 'À', 'A'
  it_should_substitute 'è', 'e'
  it_should_substitute 'È', 'E'
  it_should_substitute 'ì', 'i'
  it_should_substitute 'ò', 'o'
  it_should_substitute 'â', 'a'
  it_should_substitute 'ê', 'e'
  it_should_substitute 'Ê', 'E'
  it_should_substitute 'î', 'i'
  it_should_substitute 'Î', 'I'
  it_should_substitute 'ô', 'o'
  it_should_substitute 'Ô', 'O'
  it_should_substitute 'û', 'u'
  it_should_substitute 'ç', 'c'
  it_should_substitute 'Ç', 'C'
  it_should_substitute 'ß', 'ss'
  it_should_substitute 'å', 'a'
  it_should_substitute 'Å', 'A'
  it_should_substitute 'ñ', 'n'