Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I add synonyms to a model? #63

Closed
klaut opened this issue Apr 2, 2014 · 9 comments
Closed

How do I add synonyms to a model? #63

klaut opened this issue Apr 2, 2014 · 9 comments

Comments

@klaut
Copy link

klaut commented Apr 2, 2014

Hi all,

I am not sure how to do this:
I have an ActiveRecord that has some attributes indexed. I would like to add the synonyms for some of these fields but I am not sure how to do this.. should it be put inside the mappings block? is it even possible to do that with this gem?

Thank you in advance!

@klaut klaut changed the title How do I add a synonyms to a model? How do I add synonyms to a model? Apr 2, 2014
@karmi
Copy link
Contributor

karmi commented Apr 2, 2014

Yes, you have to set this up in the mapping block (or, generally speaking, in the mapping for the index), have a look at documentation and an example.

You will have to re-index all the data, if it's a development setup, the easiest way is to re-import it (Rake task or the import method).

@klaut
Copy link
Author

klaut commented Apr 2, 2014

Thank you for your fast reply!

I guess what I am asking is that I am not sure how to do this with Elasticsearch::Model

I have added it to the mappings like so:

settings index: { number_of_shards: 1, number_of_replicas: 0 },
              analysis: {
                filter: {
                  synonym: {
                    type: "synonym",
                    synonyms: File.readlines(Rails.root.join("config", "analysis", "data"), "synonym.txt")).map(&:chomp)
                  }
                },
                analyzer: {
                  synonym: {
                    tokenizer: "whitespace",
                    filter: ["synonym"]
                  }
                }
            } do
      mapping do

        indexes :my_field, analyzer: 'synonym'

      end
    end

But does not seem to work so I think I might have done this wrong?

@karmi
Copy link
Contributor

karmi commented Apr 2, 2014

I'll need more info than "it does not seem to work" I'm afraid :)

So, can you configure the synonyms inline (not by external file), and post the config? Are you sure you have re-created the index with correct mappings? (Check localhost:9200/MYINDEX/_mapping or MyModel.mappings.to_hash) How do you verify whether it "works", by searching, or by using the analyze API?

@klaut
Copy link
Author

klaut commented Apr 2, 2014

Ok, sorry, will try to be more elaborate :)

I have this mapping in my searchable model (i excluded other fields where i do not need the synonyms for the sake of readability here):

    settings index: { number_of_shards: 1, number_of_replicas: 0 },
              analysis: {
                filter: {
                  synonym: {
                    type: "synonym",
                    synonyms:[
                      "developer, programmer, hacker, web developer, coder, sofwtare developer, software engineer",
                      "designer, web designer, graphic",
                      "ux, user experience"
                    ] 
                  }
                },
                analyzer: {
                  synonym: {
                    tokenizer: "whitespace",
                    filter: ["synonym"]
                  }
                }
            } do
      mapping do

        indexes :i_am_role, analyzer: 'synonym'

      end
    end

Then i recreated the index and this is what i get if i inspect it with Marvel:

GET /my_application/_mapping/

{
   "my_application": {
      "mappings": {
         "profile": {
            "properties": {
               "i_am_role": {
                  "type": "string",
                  "analyzer": "synonym"
               }
            }
         }
      }
   }
}

I am verifying by searching through Marvel. I have 7 models indexed and 5 of them have the field set to be either Developer, developer, or Programmer

But when i do the search, only the record that has "developer" is returned. I am doing something wrong, I am sure. I just can't spot it :)

@klaut
Copy link
Author

klaut commented Apr 2, 2014

uhm.. i just tried it with the API and i think the synonym analyzer is not properly loaded or something because i get an error:

 Profile.__elasticsearch__.client.indices.analyze text: 'programmer', analyzer: 'synonym'
Elasticsearch::Transport::Transport::Errors::BadRequest: [400] {"error":"ElasticsearchIllegalArgumentException[failed to find analyzer [synonym]]","status":400}
from /Users/tanja/.rvm/gems/ruby-2.1.1@headhunted/bundler/gems/elasticsearch-ruby-e4794be90094/elasticsearch-transport/lib/elasticsearch/transport/transport/base.rb:132:in `__raise_transport_error'

@karmi
Copy link
Contributor

karmi commented Apr 17, 2014

You forgot to create the index with proper settings and mappings, I think -- Profile.__elasticsearch__.create_index! force: true

Working code:

require 'pry'

require 'logger'
require 'ansi/core'
require 'active_record'
require 'active_support/core_ext/numeric'
require 'active_support/core_ext/hash'

require 'elasticsearch/model'

ActiveRecord::Base.logger = ActiveSupport::Logger.new(STDOUT)
ActiveRecord::Base.establish_connection( adapter: 'sqlite3', database: ":memory:" )

ActiveRecord::Schema.define(version: 1) do
  create_table :people do |t|
    t.string :name
    t.string :occupation
    t.timestamps
  end
end

Elasticsearch::Model.client= Elasticsearch::Client.new log: true

class Person < ActiveRecord::Base
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks

  settings index: { number_of_shards: 1, number_of_replicas: 0 },
              analysis: {
                filter: {
                  synonym: {
                    type: "synonym",
                    synonyms:[
                      "developer, programmer, hacker, web developer, coder, sofwtare developer, software engineer",
                      "designer, web designer, graphic",
                      "ux, user experience"
                    ]
                  }
                },
                analyzer: {
                  synonym: {
                    tokenizer: "whitespace",
                    filter: ["synonym"]
                  }
                }
            } do

      mapping do
        indexes :name
        indexes :occupation, analyzer: 'synonym'
      end

    end
end

Person.__elasticsearch__.create_index! force: true

# Store data
#
Person.delete_all
Person.create name: 'John', occupation: 'developer'
Person.__elasticsearch__.refresh_index!

puts '', '-'*Pry::Terminal.width!

p Person.__elasticsearch__.client.indices.analyze(index: 'people', text: 'developer', analyzer: 'synonym')['tokens'].map { |d| d['token'] }
# => ["developer",
#     "programmer",
#     "hacker",
#     "web",
#     "coder",
#     "sofwtare",
#     "software",
#     "developer",
#     "developer",
#     "engineer"]

p Person.search('occupation:hacker').to_a.first.name
# => "John"

binding.pry;

@klaut
Copy link
Author

klaut commented Apr 17, 2014

Uhm, yeah. That is weird. I tried everything again, and now it works.. I must have forgotten to reindex, although i could swear i did it.
Sorry about that. Thanks for looking at it! Closing this because it is working now.

this is my final settings and mappings:

settings index: { number_of_shards: 1, number_of_replicas: 0 },
      analysis: {
      filter: {
        synonym: {
          type: "synonym",
          ignore_case: true,
          synonyms:[
            "developer,programmer,hacker,coder,software,engineer",
            "designer,web designer,graphic",
            "ux,user experience",
            "copywriter,writer,blogger,journalist"
          ]
        }
      },
      analyzer: {
        synonym: {
          tokenizer: "whitespace",
          filter: ["synonym", "lowercase", "stop", "snowball"]
        }
      }
    } do
      mapping do ....

👍 :)

@klaut klaut closed this as completed Apr 17, 2014
@karmi
Copy link
Contributor

karmi commented Apr 17, 2014

Great!, glad it's working :)

@daino3
Copy link

daino3 commented Mar 27, 2015

Hi guys. I'm not sure if this should be a new issue, or remain on this thread, but I'll mention it here and go from there.

In the example above, multi-word / phrase synonyms are being treated as single terms ("web developer" ~> "web", "developer"). How would we incorporate / treat phrase synonyms? The docs mention contraction, but don't give code / json examples of how to incorporate these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants