Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Fixed specs, added support for utf-8 chars, #3

Closed
wants to merge 9 commits into from

2 participants

@aokon

I fixed broken specs, added posibility to update WordsToIgnore list and support utf-8 characters for stemmer.

@jnunemaker
Owner

This is cool.

@jnunemaker

I intentionally don't force this index as it might not be the one you want. If you are scoping search to user_id for example, you want a compound index and having this one is just cruft that never gets used.

aokon and others added some commits
@jnunemaker jnunemaker closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jul 28, 2011
  1. mongomapper 0.9.0 required that plugins will extend

    Adam Okoń authored
    ActiveSupport::Concern, updated gems version in Gemfile
  2. fixed self.configure was replaced with self.included to enable

    Adam Okoń authored
    before_save callback
  3. Added posibility to update Hunt words to ignore list

    Adam Okoń authored
Commits on Jul 29, 2011
  1. Updated documentation => Usage section

    Adam Okoń authored
Commits on Aug 17, 2011
  1. added ensure_index for searches.default in searches

    Adam Okoń authored
Commits on Oct 15, 2011
  1. @aokon
Commits on Oct 24, 2011
  1. Added transliteration_option config option for Hunt

    Adam Okoń authored
    Update configure block => use equals for additional_words_to_ignore
This page is out of date. Refresh to see the latest.
View
3  Gemfile
@@ -1,6 +1,7 @@
source "http://rubygems.org"
-gem 'bson_ext', '1.1.5', :require => false
+gem 'bson_ext', '1.3.1', :require => false
+gem 'babosa'
gemspec
View
39 Gemfile.lock
@@ -1,29 +1,33 @@
PATH
remote: .
specs:
- hunt (0.2)
+ hunt (0.3)
fast-stemmer (~> 1.0)
- mongo_mapper (~> 0.8.6)
+ mongo_mapper (>= 0.9.0)
GEM
remote: http://rubygems.org/
specs:
- activesupport (3.0.3)
- bson (1.1.5)
- bson_ext (1.1.5)
+ activemodel (3.0.5)
+ activesupport (= 3.0.5)
+ builder (~> 2.1.2)
+ i18n (~> 0.4)
+ activesupport (3.0.5)
+ babosa (0.3.5)
+ bson (1.3.1)
+ bson_ext (1.3.1)
+ builder (2.1.2)
diff-lcs (1.1.2)
fast-stemmer (1.0.0)
i18n (0.4.2)
- jnunemaker-validatable (1.8.4)
- activesupport (>= 2.3.4)
- mongo (1.1.5)
- bson (>= 1.1.5)
- mongo_mapper (0.8.6)
- activesupport (>= 2.3.4)
- jnunemaker-validatable (~> 1.8.4)
- plucky (~> 0.3.6)
- plucky (0.3.6)
- mongo (~> 1.1)
+ mongo (1.3.1)
+ bson (>= 1.3.1)
+ mongo_mapper (0.9.1)
+ activemodel (~> 3.0)
+ activesupport (~> 3.0)
+ plucky (~> 0.3.8)
+ plucky (0.3.8)
+ mongo (~> 1.3)
rspec (2.3.0)
rspec-core (~> 2.3.0)
rspec-expectations (~> 2.3.0)
@@ -37,9 +41,8 @@ PLATFORMS
ruby
DEPENDENCIES
- bson_ext (= 1.1.5)
- fast-stemmer (~> 1.0)
+ babosa
+ bson_ext (= 1.3.1)
hunt!
i18n
- mongo_mapper (~> 0.8.6)
rspec (~> 2.3)
View
13 README.rdoc
@@ -18,12 +18,23 @@ Declare the plugin.
searches :title, :body, :tags
end
+You can configure Hunt using a Hunt.configure and passing block of options:
+
+ Hunt.configure do |config|
+ config.searches_index_name = :"searches.default"
+ config.additional_words_to_ignore ["bang", 'yabadabaduu']
+ end
+
This creates a key named searches that is a Hash. Title, body, and tags get mashed together before save into a unique array of stemmed words and stored in searches.default.
+If word contains non ascii chars, they'll be transliterated to ascii format using a utilities of babosa gem.
You can index the terms individually or with any other combination of keys.
Note.ensure_index :'searches.default' # or ...
- Note.ensure_index [[:user_id, Mongo::Ascending], [:'searches.default', Mongo::Ascending]]
+ Note.ensure_index [[:user_id, Mongo::ASCENDING], [:'searches.default', Mongo::ASCENDING]] # or use config block
+ Hunt.configure do |config|
+ config.searches_index_name = :"searches.default"
+ end
You also get a search class method that returns a scope.
View
2  hunt.gemspec
@@ -13,7 +13,7 @@ Gem::Specification.new do |s|
s.description = %q{Really basic search for MongoMapper models.}
s.add_dependency 'fast-stemmer', '~> 1.0'
- s.add_dependency 'mongo_mapper', '~> 0.8.6'
+ s.add_dependency 'mongo_mapper', '>= 0.9.0'
s.add_development_dependency 'rspec', '~> 2.3'
View
37 lib/hunt.rb
@@ -3,8 +3,38 @@
require 'hunt/util'
module Hunt
- def self.configure(model)
- model.before_save(:index_search_terms)
+ extend ActiveSupport::Concern
+
+ @@searches_index_name = nil
+
+ class << self
+ def included(model)
+ model.before_save(:index_search_terms)
+ end
+
+ def configure(&block)
+ block.call(self) if block_given?
+ end
+
+ def additional_words_to_ignore=(words)
+ Util.update_words_to_ignore(words) if words.any?
+ end
+
+ def searches_index_name=(value)
+ @@searches_index_name = value
+ end
+
+ def searches_index_name
+ @@searches_index_name
+ end
+
+ def transliteration_option=(value)
+ Util.transliteration_option = value
+ end
+
+ def transliteration_option
+ Util.transliteration_option
+ end
end
module ClassMethods
@@ -15,6 +45,7 @@ def search_keys
def searches(*keys)
# Using a hash to support multiple indexes per document at some point
key(:searches, Hash)
+ ensure_index(Hunt.searches_index_name) unless Hunt.searches_index_name.nil?
@search_keys = keys
end
@@ -32,4 +63,4 @@ def index_search_terms
self.searches['default'] = Util.to_stemmed_words(concatted_search_values)
end
end
-end
+end
View
23 lib/hunt/util.rb
@@ -27,6 +27,24 @@ module Util
"the", "how"
]
+ @@transliteration_option = nil
+
+ def words_to_ignore
+ @@words_to_ignore ||= WordsToIgnore
+ end
+
+ def update_words_to_ignore(value)
+ @@words_to_ignore = words_to_ignore + value
+ end
+
+ def transliteration_option=(value)
+ @@transliteration_option = value
+ end
+
+ def transliteration_option
+ @@transliteration_option
+ end
+
def strip_puncuation(value)
value.to_s.gsub(StripPunctuationRegex, PunctuationReplacement)
end
@@ -36,13 +54,14 @@ def stem(word)
end
def to_words(value)
+ value = value.ascii_only? ? value : value.to_slug.transliterate(transliteration_option) unless value.nil?
value.
to_s.
squeeze(Separator).
split(Separator).
map { |word| word.downcase }.
reject { |word| word.size < 2 }.
- reject { |word| WordsToIgnore.include?(word) }.
+ reject { |word| words_to_ignore.include?(word) }.
map { |word| strip_puncuation(word) }.
reject { |word| word.blank? }.
uniq
@@ -54,4 +73,4 @@ def to_stemmed_words(value)
extend self
end
-end
+end
View
6 spec/data/samples.yml
@@ -0,0 +1,6 @@
+german:
+ note_title: 'Jürgen Müller'
+ search_phrase: "Jürgen"
+
+cyrillic:
+ sample: "Карта сайта"
View
6 spec/helper.rb
@@ -14,4 +14,8 @@
c.before(:each) do
MongoMapper.database.collections.each(&:remove)
end
-end
+end
+
+def data_samples
+ @data_samples ||= YAML.load(File.open(File.expand_path("../data/samples.yml", __FILE__)))
+end
View
17 spec/hunt/util_spec.rb
@@ -1,3 +1,4 @@
+# encoding: utf-8
require 'helper'
describe Hunt::Util do
@@ -16,6 +17,11 @@
end
describe ".to_words" do
+
+ after(:all) do
+ Hunt::Util.transliteration_option = nil
+ end
+
it "does not fail with nil" do
Hunt::Util.to_words(nil).should == []
end
@@ -52,6 +58,15 @@
it "removes duplicates" do
Hunt::Util.to_words('boom boom').should == %w(boom)
end
+
+ it 'should transliterate unicode words to asci format' do
+ Hunt::Util.to_words('łąkę źródło łódź Börse äußert').should == %w(lake zrodlo lodz borse aussert)
+ end
+
+ it "should transliterate cyrllic words with transliteration_option" do
+ Hunt::Util.transliteration_option = :cyrillic
+ Hunt::Util.to_words(data_samples['cyrillic']['sample']).should == %w(karta sajta)
+ end
end
describe ".to_stemmed_words" do
@@ -59,4 +74,4 @@
Hunt::Util.to_stemmed_words('I just Caught you kissing.').should == %w(just caught kiss)
end
end
-end
+end
View
56 spec/hunt_spec.rb
@@ -1,4 +1,5 @@
require 'helper'
+#encoding: utf-8
class Note
include MongoMapper::Document
@@ -13,6 +14,7 @@ class Note
key :user_id, ObjectId
belongs_to :user
+
end
class User
@@ -42,6 +44,58 @@ class User
Note.search('').count.should == 0
end
+
+ context 'using .configure' do
+ after(:all) do
+ Hunt.configure do |config|
+ config.transliteration_option = nil
+ end
+ end
+
+ it 'returns a query result when black list was not updated' do
+ Hunt.configure
+ Note.create(:title => 'bang yabadabaduu')
+ Note.search('bang').count.should == 1
+ Note.search('yabadabaduu').count.should == 1
+ end
+
+ it 'should ommit words which was added to black list' do
+ Hunt.configure do |config|
+ config.additional_words_to_ignore = ["bang", 'yabadabaduu']
+ end
+ Note.create(:title => 'bang yabadabaduu')
+ Note.search('bang').count.should == 0
+ Note.search('yabadabaduu').count.should == 0
+ end
+
+ it "adds index key as symbol if it was defined" do
+ searches_index_name = :"searches.default"
+ Hunt.configure do |config|
+ config.searches_index_name = searches_index_name
+ end
+ Hunt.searches_index_name.should == searches_index_name
+ end
+
+ it "adds index as array if it was defined" do
+ searches_index_name = [[:"searches.default", Mongo::ASCENDING], [:user_dir, Mongo::ASCENDING]]
+ Hunt.configure do |config|
+ config.searches_index_name = searches_index_name
+ end
+ Hunt.searches_index_name.should == searches_index_name
+ end
+
+ it "should set transliteration_option if it was defined" do
+ transliteration_option = :german
+ Hunt.transliteration_option.should be_nil
+ Hunt.configure do |config|
+ config.transliteration_option = transliteration_option
+ end
+ Hunt.transliteration_option.should == transliteration_option
+ Note.create(:title => data_samples["german"]['note_title'] )
+ Note.search(data_samples["german"]['search_phrase']).count.should == 1
+ end
+ end
+
context "chained on scope" do
before(:each) do
@user = User.create
@@ -182,4 +236,4 @@ class User
end
end
end
-end
+end
Something went wrong with that request. Please try again.