Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Add the ability to match on phrase instead of word. #23

Open
wants to merge 1 commit into from

2 participants

Andrew Harvey Marc MacLeod
Andrew Harvey

There are some cases where it is require to match exact phrases and not
words. Thus I have created a PhraseLoader and PhraseMatcher for the
purpose of working with phrases.

That means that if you have the phrase "awesome stuff", the term "awes"
will match, but the term "stuf" will not.

I have not yet enabled phrase loading through the CLI, only by
explicitly loading with a PhraseLoader class. The PhraseLoader has an
identical interface to the standard Loader, it just refrains from
normalising anything, because that gets messy with phrases.

When it comes to matching, supplying phrase=true as part of the query
string will tell the Soulmate::Server to use the PhraseMatcher, and
successfully match the loaded phrases.

Andrew Harvey mootpointer Add the ability to match on phrase instead of word.
There are some cases where it is require to match exact phrases and not
words. Thus I have created a PhraseLoader and PhraseMatcher for the
purpose of working with phrases.

That means that if you have the phrase "awesome stuff", the term "awes"
will match, but the term "stuf" will not.

I have not yet enabled phrase loading through the CLI, only by
explicitly loading with a PhraseLoader class. The PhraseLoader has an
identical interface to the standard Loader, it just refrains from
normalising anything, because that gets messy with phrases.

When it comes to matching, supplying phrase=true as part of the query
string will tell the Soulmate::Server to use the PhraseMatcher, and
successfully match the loaded phrases.
7255823
Andrew Harvey

This code is currently working in production, quite well as well.

Marc MacLeod

I would be interested in this, +1

Andrew Harvey

@marbemac If you look at Westfield.com.au, our search box currently has this pull request powering our typeahead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 1 unique commit by 1 author.

Apr 02, 2012
Andrew Harvey mootpointer Add the ability to match on phrase instead of word.
There are some cases where it is require to match exact phrases and not
words. Thus I have created a PhraseLoader and PhraseMatcher for the
purpose of working with phrases.

That means that if you have the phrase "awesome stuff", the term "awes"
will match, but the term "stuf" will not.

I have not yet enabled phrase loading through the CLI, only by
explicitly loading with a PhraseLoader class. The PhraseLoader has an
identical interface to the standard Loader, it just refrains from
normalising anything, because that gets messy with phrases.

When it comes to matching, supplying phrase=true as part of the query
string will tell the Soulmate::Server to use the PhraseMatcher, and
successfully match the loaded phrases.
7255823
This page is out of date. Refresh to see the latest.
2  lib/soulmate.rb
@@ -7,6 +7,8 @@
7 7 require 'soulmate/base'
8 8 require 'soulmate/matcher'
9 9 require 'soulmate/loader'
  10 +require 'soulmate/phrase_loader'
  11 +require 'soulmate/phrase_matcher'
10 12
11 13 module Soulmate
12 14
9 lib/soulmate/helpers.rb
... ... @@ -1,13 +1,18 @@
1 1 module Soulmate
2 2 module Helpers
3 3
4   - def prefixes_for_phrase(phrase)
  4 + def word_prefixes_for_phrase(phrase)
5 5 words = normalize(phrase).split(' ').reject do |w|
6 6 Soulmate.stop_words.include?(w)
7 7 end
8 8 words.map do |w|
9 9 (MIN_COMPLETE-1..(w.length-1)).map{ |l| w[0..l] }
10 10 end.flatten.uniq
  11 +
  12 + end
  13 +
  14 + def prefixes_for_phrase(phrase)
  15 + (MIN_COMPLETE-1..(phrase.length-1)).map{ |l| phrase[0..l] }.flatten.uniq
11 16 end
12 17
13 18 def normalize(str)
@@ -15,4 +20,4 @@ def normalize(str)
15 20 end
16 21
17 22 end
18   -end
  23 +end
6 lib/soulmate/loader.rb
@@ -37,7 +37,7 @@ def add(item, opts = {})
37 37 # store the raw data in a separate key to reduce memory usage
38 38 Soulmate.redis.hset(database, item["id"], MultiJson.encode(item))
39 39 phrase = ([item["term"]] + (item["aliases"] || [])).join(' ')
40   - prefixes_for_phrase(phrase).each do |p|
  40 + word_prefixes_for_phrase(phrase).each do |p|
41 41 Soulmate.redis.sadd(base, p) # remember this prefix in a master set
42 42 Soulmate.redis.zadd("#{base}:#{p}", item["score"], item["id"]) # store the id of this term in the index
43 43 end
@@ -53,7 +53,7 @@ def remove(item)
53 53 Soulmate.redis.pipelined do
54 54 Soulmate.redis.hdel(database, prev_item["id"])
55 55 phrase = ([prev_item["term"]] + (prev_item["aliases"] || [])).join(' ')
56   - prefixes_for_phrase(phrase).each do |p|
  56 + word_prefixes_for_phrase(phrase).each do |p|
57 57 Soulmate.redis.srem(base, p)
58 58 Soulmate.redis.zrem("#{base}:#{p}", prev_item["id"])
59 59 end
@@ -61,4 +61,4 @@ def remove(item)
61 61 end
62 62 end
63 63 end
64   -end
  64 +end
45 lib/soulmate/phrase_loader.rb
... ... @@ -0,0 +1,45 @@
  1 +module Soulmate
  2 +
  3 + class PhraseLoader < Loader
  4 +
  5 + # "id", "term", "score", "aliases", "data"
  6 + def add(item, opts = {})
  7 + opts = { :skip_duplicate_check => false }.merge(opts)
  8 + raise ArgumentError unless item["id"] && item["term"]
  9 +
  10 + # kill any old items with this id
  11 + remove("id" => item["id"]) unless opts[:skip_duplicate_check]
  12 +
  13 + Soulmate.redis.pipelined do
  14 + # store the raw data in a separate key to reduce memory usage
  15 + Soulmate.redis.hset(database, item["id"], MultiJson.encode(item))
  16 + phrases = ([item["term"]] + (item["aliases"] || []))
  17 + phrases.each do |phrase|
  18 + prefixes_for_phrase(phrase).each do |p|
  19 + Soulmate.redis.sadd(base, p) # remember this prefix in a master set
  20 + Soulmate.redis.zadd("#{base}:#{p}", item["score"], item["id"]) # store the id of this term in the index
  21 + end
  22 + end
  23 + end
  24 + end
  25 +
  26 + # remove only cares about an item's id, but for consistency takes an object
  27 + def remove(item)
  28 + prev_item = Soulmate.redis.hget(database, item["id"])
  29 + if prev_item
  30 + prev_item = MultiJson.decode(prev_item)
  31 + # undo the operations done in add
  32 + Soulmate.redis.pipelined do
  33 + Soulmate.redis.hdel(database, prev_item["id"])
  34 + phrases = ([prev_item["term"]] + (prev_item["aliases"] || []))
  35 + phrases.each do |phrase|
  36 + prefixes_for_phrase(phrase).each do |p|
  37 + Soulmate.redis.srem(base, p)
  38 + Soulmate.redis.zrem("#{base}:#{p}", prev_item["id"])
  39 + end
  40 + end
  41 + end
  42 + end
  43 + end
  44 + end
  45 +end
28 lib/soulmate/phrase_matcher.rb
... ... @@ -0,0 +1,28 @@
  1 +module Soulmate
  2 +
  3 + class PhraseMatcher < Base
  4 +
  5 + def matches_for_term(term, options = {})
  6 + options = { :limit => 5, :cache => true }.merge(options)
  7 +
  8 + return [] if term.empty?
  9 +
  10 + cachekey = "#{cachebase}:" + term
  11 +
  12 + if !options[:cache] || !Soulmate.redis.exists(cachekey)
  13 + interkeys = ["#{base}:#{term}"]
  14 + Soulmate.redis.zinterstore(cachekey, interkeys)
  15 + Soulmate.redis.expire(cachekey, 10 * 60) # expire after 10 minutes
  16 + end
  17 +
  18 + ids = Soulmate.redis.zrevrange(cachekey, 0, options[:limit] - 1)
  19 + if ids.size > 0
  20 + results = Soulmate.redis.hmget(database, *ids)
  21 + results = results.reject{ |r| r.nil? } # handle cached results for ids which have since been deleted
  22 + results.map { |r| MultiJson.decode(r) }
  23 + else
  24 + []
  25 + end
  26 + end
  27 + end
  28 +end
3  lib/soulmate/server.rb
@@ -23,10 +23,11 @@ class Server < Sinatra::Base
23 23 limit = (params[:limit] || 5).to_i
24 24 types = params[:types].map { |t| normalize(t) }
25 25 term = params[:term]
  26 + matcher_class = params[:phrase] ? PhraseMatcher : Matcher
26 27
27 28 results = {}
28 29 types.each do |type|
29   - matcher = Matcher.new(type)
  30 + matcher = matcher_class.new(type)
30 31 results[type] = matcher.matches_for_term(term, :limit => limit)
31 32 end
32 33
19 test/test_soulmate.rb
@@ -91,15 +91,22 @@ def test_can_update_items
91 91
92 92 end
93 93
94   - def test_prefixes_for_phrase
  94 + def test_prefixes_for_phrase_words
95 95 loader = Soulmate::Loader.new('venues')
96 96
97 97 Soulmate.stop_words = ['the']
98 98
99   - assert_equal ["kn", "kni", "knic", "knick", "knicks"], loader.prefixes_for_phrase("the knicks")
100   - assert_equal ["te", "tes", "test", "testi", "testin", "th", "thi", "this"], loader.prefixes_for_phrase("testin' this")
101   - assert_equal ["te", "tes", "test", "testi", "testin", "th", "thi", "this"], loader.prefixes_for_phrase("testin' this")
102   - assert_equal ["te", "tes", "test"], loader.prefixes_for_phrase("test test")
103   - assert_equal ["so", "sou", "soul", "soulm", "soulma", "soulmat", "soulmate"], loader.prefixes_for_phrase("SoUlmATE")
  99 + assert_equal ["kn", "kni", "knic", "knick", "knicks"], loader.word_prefixes_for_phrase("the knicks")
  100 + assert_equal ["te", "tes", "test", "testi", "testin", "th", "thi", "this"], loader.word_prefixes_for_phrase("testin' this")
  101 + assert_equal ["te", "tes", "test", "testi", "testin", "th", "thi", "this"], loader.word_prefixes_for_phrase("testin' this")
  102 + assert_equal ["te", "tes", "test"], loader.word_prefixes_for_phrase("test test")
  103 + assert_equal ["so", "sou", "soul", "soulm", "soulma", "soulmat", "soulmate"], loader.word_prefixes_for_phrase("SoUlmATE")
  104 + end
  105 +
  106 + def test_prefixes_for_phrase
  107 + loader = Soulmate::PhraseLoader.new('venues')
  108 + assert_equal ["th", "the", "the ", "the k", "the kn", "the kni", "the knic", "the knick", "the knicks"], loader.prefixes_for_phrase("the knicks")
  109 + # We don't normalise because it gets messy with whole phrases
  110 + assert_equal ["it", "it'", "it's"], loader.prefixes_for_phrase("it's")
104 111 end
105 112 end

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.