SearchFlip
Full-Featured ElasticSearch Ruby Client with a Chainable DSL
Using SearchFlip it is dead-simple to create index classes that correspond to ElasticSearch indices and to manipulate, query and aggregate these indices using a chainable, concise, yet powerful DSL. Finally, SearchFlip supports ElasticSearch 1.x, 2.x, 5.x, 6.x, 7.x. Check section Feature Support for version dependent features.
CommentIndex.search("hello world", default_field: "title").where(visible: true).aggregate(:user_id).sort(id: "desc")
CommentIndex.aggregate(:user_id) do |aggregation|
aggregation.aggregate(histogram: { date_histogram: { field: "created_at", interval: "month" }})
end
CommentIndex.range(:created_at, gt: Date.today - 1.week, lt: Date.today).where(state: ["approved", "pending"])Updating from previous SearchFlip versions
Checkout UPDATING.md for detailed instructions.
Comparison with other gems
There are great ruby gems to work with Elasticsearch like e.g. searchkick and elasticsearch-ruby already. However, they don't have a chainable API. Compare yourself.
# elasticsearch-ruby
Comment.search(
query: {
query_string: {
query: "hello world",
default_operator: "AND"
}
}
)
# searchkick
Comment.search("hello world", where: { available: true }, order: { id: "desc" }, aggs: [:username])
# search_flip
CommentIndex.where(available: true).search("hello world").sort(id: "desc").aggregate(:username)
Finally, SearchFlip comes with a minimal set of dependencies (http-rb, hashie and oj only).
Reference Docs
SearchFlip has a great documentation. Check youself at http://www.rubydoc.info/github/mrkamel/search_flip
Install
Add this line to your application's Gemfile:
gem 'search_flip'and then execute
$ bundle
or install it via
$ gem install search_flip
Config
You can change global config options like:
SearchFlip::Config[:environment] = "development"
SearchFlip::Config[:base_url] = "http://127.0.0.1:9200"Available config options are:
index_prefixto have a prefix added to your index names automatically. This can be useful to separate the indices of e.g. testing and development environments.base_urlto tell SearchFlip how to connect to your clusterbulk_limita global limit for bulk requestsbulk_max_mba global limit for the payload of bulk requestsauto_refreshtells SearchFlip to automatically refresh an index after import, index, delete, etc operations. This is e.g. usuful for testing, etc. Defaults to false.
Usage
First, create a separate class for your index and include SearchFlip::Index.
class CommentIndex
include SearchFlip::Index
endThen tell the Index about the index name, the correspoding model and how to serialize the model for indexing.
class CommentIndex
include SearchFlip::Index
def self.index_name
"comments"
end
def self.model
Comment
end
def self.serialize(comment)
{
id: comment.id,
username: comment.username,
title: comment.title,
message: comment.message
}
end
endOptionally, you can specify a custom type_name, but note that starting with
Elasticsearch 7, types are deprecated.
class CommentIndex
# ...
def self.type_name
"comment"
end
endYou can additionally specify an index_scope which will automatically be
applied to scopes, eg. ActiveRecord::Relation objects, passed to #import,
#index, etc. This can be used to preload associations that are used when
serializing records or to restrict the records you want to index.
class CommentIndex
# ...
def self.index_scope(scope)
scope.preload(:user)
end
end
CommentIndex.import(Comment.all) # => CommentIndex.import(Comment.all.preload(:user))To specify a custom mapping:
class CommentIndex
# ...
def self.mapping
{
properties: {
# ...
}
}
end
# ...
endPlease note that you need to specify the mapping without a type name, even for Elasticsearch versions before 7, as SearchFlip will add the type name automatically if neccessary.
To specify index settings:
def self.index_settings
{
settings: {
number_of_shards: 10,
number_of_replicas: 2
}
}
endThen you can interact with the index:
CommentIndex.create_index
CommentIndex.index_exists?
CommentIndex.delete_index
CommentIndex.update_mapping
CommentIndex.close_index
CommentIndex.open_indexindex records (automatically uses the bulk API):
CommentIndex.import(Comment.all)
CommentIndex.import(Comment.first)
CommentIndex.import([Comment.find(1), Comment.find(2)])
CommentIndex.import(Comment.where("created_at > ?", Time.now - 7.days))query records:
CommentIndex.total_entries
# => 2838
CommentIndex.search("title:hello").records
# => [#<Comment ...>, #<Comment ...>, ...]
CommentIndex.where(username: "mrkamel").total_entries
# => 13
CommentIndex.aggregate(:username).aggregations(:username)
# => {1=>#<SearchFlip::Result doc_count=37 ...>, 2=>... }
...
CommentIndex.search("hello world").sort(id: "desc").aggregate(:username).request
# => {:query=>{:bool=>{:must=>[{:query_string=>{:query=>"hello world", :default_operator=>:AND}}]}}, ...}delete records:
# for ElasticSearch >= 2.x and < 5.x, the delete-by-query plugin is required
# for the following query:
CommentIndex.match_all.delete
# or delete manually via the bulk API:
CommentIndex.bulk do |indexer|
CommentIndex.match_all.find_each do |record|
indexer.delete record.id
end
endWorking with Elasticsearch Aliases
You can use and manage Elasticsearch Aliases like the following:
class UserIndex
include SearchFlip::Index
def self.index_name
alias_name
end
def self.alias_name
"users"
end
endThen, create an index, import the records and add the alias like:
new_user_index = UserIndex.with_settings(index_name: "users-#{SecureRandom.hex}")
new_user_index.create_index
new_user_index.import User.all
new_user.connection.update_aliases(actions: [
add: { index: new_user_index.index_name, alias: new_user_index.alias_name }
])If the alias already exists, you of course have to remove it as well first
within update_aliases.
Please note that with_settings(index_name: '...') returns an anonymous, i.e.
temporary, class inherting from UserIndex and overwriting index_name.
Advanced Usage
SearchFlip supports even more advanced usages, like e.g. post filters, filtered aggregations or nested aggregations via simple to use API methods.
Post filters
All criteria methods (#where, #where_not, #range, etc.) are available
in post filter mode as well, ie. filters/queries applied after aggregations
are calculated. Checkout the ElasticSearch docs for further info.
query = CommentIndex.aggregate(:user_id)
query = query.post_where(reviewed: true)
query = query.post_search("username:a*")Checkout PostFilterable for a complete API reference.
Aggregations
SearchFlip allows to elegantly specify nested aggregations, no matter how deeply nested:
query = OrderIndex.aggregate(:username, order: { revenue: "desc" }) do |aggregation|
aggregation.aggregate(revenue: { sum: { field: "price" }})
endGenerally, aggregation results returned by ElasticSearch are wrapped in a
SearchFlip::Result, which wraps a Hashie::Mashsuch that you can access them
via:
query.aggregations(:username)["mrkamel"].revenue.valueStill, if you want to get the raw aggregations returned by ElasticSearch,
access them without supplying any aggregation name to #aggregations:
query.aggregations # => returns the raw aggregation section
query.aggregations["username"]["buckets"].detect { |bucket| bucket["key"] == "mrkamel" }["revenue"]["value"] # => 238.50Once again, the criteria methods (#where, #range, etc.) are available in
aggregations as well:
query = OrderIndex.aggregate(average_price: {}) do |aggregation|
aggregation = aggregation.match_all
aggregation = aggregation.where(user_id: current_user.id) if current_user
aggregation.aggregate(average_price: { avg: { field: "price" }})
end
query.aggregations(:average_price).average_price.valueCheckout Aggregatable as well as Aggregation for a complete API reference.
Suggestions
query = CommentIndex.suggest(:suggestion, text: "helo", term: { field: "message" })
query.suggestions(:suggestion).first["text"] # => "hello"Highlighting
CommentIndex.highlight([:title, :message])
CommentIndex.highlight(:title).highlight(:description)
CommentIndex.highlight(:title, require_field_match: false)
CommentIndex.highlight(title: { type: "fvh" })query = CommentIndex.highlight(:title).search("hello")
query.results[0]._hit.highlight.title # => "<em>hello</em> world"Advanced Criteria Methods
There are even more methods to make your life easier, namely source,
scroll, profile, includes, preload, find_in_batches, find_each,
find_results_in_batches, failsafe and unscope to name just a few:
source
In case you want to restrict the returned fields, simply specify
the fields via #source:
CommentIndex.source([:id, :message]).search("hello world")paginate,page,per
SearchFlip supports
will_paginate and
kaminari compatible pagination. Thus,
you can either use #paginate or #page in combination with #per:
CommentIndex.paginate(page: 3, per_page: 50)
CommentIndex.page(3).per(50)scroll
You can as well use the underlying scroll API directly, ie. without using higher level pagination:
query = CommentIndex.scroll(timeout: "5m")
until query.records.empty?
# ...
query = query.scroll(id: query.scroll_id, timeout: "5m")
endprofile
Use #profile to enable query profiling:
query = CommentIndex.profile(true)
query.raw_response["profile"] # => { "shards" => ... }preload,eager_loadandincludes
Uses the well known methods from ActiveRecord to load associated database records when fetching the respective records themselves. Works with other ORMs as well, if supported.
Using #preload:
CommentIndex.preload(:user, :post).records
PostIndex.includes(comments: :user).recordsor #eager_load
CommentIndex.eager_load(:user, :post).records
PostIndex.eager_load(comments: :user).recordsor #includes
CommentIndex.includes(:user, :post).records
PostIndex.includes(comments: :user).recordsfind_in_batches
Used to fetch and yield records in batches using the ElasicSearch scroll API. The batch size and scroll API timeout can be specified.
CommentIndex.search("hello world").find_in_batches(batch_size: 100) do |batch|
# ...
endfind_results_in_batches
Used like find_in_batches, but yielding the raw results instead of database
records. Again, the batch size and scroll API timeout can be specified.
CommentIndex.search("hello world").find_results_in_batches(batch_size: 100) do |batch|
# ...
endfind_each
Like #find_in_batches, use #find_each to fetch records in batches, but yields
one record at a time.
CommentIndex.search("hello world").find_each(batch_size: 100) do |record|
# ...
endfailsafe
Use #failsafe to prevent any exceptions from being raised for query string
syntax errors or ElasticSearch being unavailable, etc.
CommentIndex.search("invalid/request").execute
# raises SearchFlip::ResponseError
# ...
CommentIndex.search("invalid/request").failsafe(true).execute
# => #<SearchFlip::Response ...>merge
You can merge criterias, ie. combine the attributes (constraints, settings, etc) of two individual criterias:
CommentIndex.where(approved: true).merge(CommentIndex.search("hello"))
# equivalent to: CommentIndex.where(approved: true).search("hello")unscope
You can even remove certain already added scopes via #unscope:
CommentIndex.aggregate(:username).search("hello world").unscope(:search, :aggregate)timeout
Specify a timeout to limit query processing time:
CommentIndex.timeout("3s").executeterminate_after
Activate early query termination to stop query processing after the specified number of records has been found:
CommentIndex.terminate_after(10).executeFor further details and a full list of methods, check out the reference docs.
Using multiple Elasticsearch clusters
To use multiple Elasticsearch clusters, specify a connection within your indices:
class MyIndex
include SearchFlip::Index
def self.connection
@connection ||= SearchFlip::Connection.new(base_url: "http://elasticsearch.host:9200")
end
endThis allows to use different clusters per index e.g. when migrating indices to new versions of Elasticsearch.
Routing and other index-time options
Override index_options in case you want to use routing or pass other
index-time options:
class CommentIndex
include SearchFlip::Index
def self.index_options(comment)
{
routing: comment.user_id,
version: comment.version,
version_type: "external_gte"
}
end
endThese options will be passed whenever records get indexed, deleted, etc.
Non-ActiveRecord models
SearchFlip ships with built-in support for ActiveRecord models, but using
non-ActiveRecord models is very easy. The model must implement a find_each
class method and the Index class needs to implement Index.record_id and
Index.fetch_records. The default implementations for the index class are as
follows:
class MyIndex
include SearchFlip::Index
def self.record_id(object)
object.id
end
def self.fetch_records(ids)
model.where(id: ids)
end
endThus, simply add your custom implementation of those methods that work with whatever ORM you use.
Date and Timestamps in JSON
ElasticSearch requires dates and timestamps to have one of the formats listed here: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html#strict-date-time.
However, JSON.generate in ruby by default outputs something like:
JSON.generate(time: Time.now.utc)
# => "{\"time\":\"2018-02-22 18:19:33 UTC\"}"This format is not compatible with ElasticSearch by default. If you're on
Rails, ActiveSupport adds its own #to_json methods to Time, Date, etc.
However, ActiveSupport checks whether they are used in combination with
JSON.generate or not and adapt:
Time.now.utc.to_json
=> "\"2018-02-22T18:18:22.088Z\""
JSON.generate(time: Time.now.utc)
=> "{\"time\":\"2018-02-22 18:18:59 UTC\"}"SearchFlip is using the Oj gem to generate JSON. More concretely, SearchFlip is using:
Oj.dump({ key: "value" }, mode: :custom, use_to_json: true)This mitigates the issues if you're on Rails:
Oj.dump(Time.now, mode: :custom, use_to_json: true)
# => "\"2018-02-22T18:21:21.064Z\""However, if you're not on Rails, you need to add #to_json methods to Time,
Date and DateTime to get proper serialization. You can either add them on
your own, via other libraries or by simply using:
require "search_flip/to_json"Feature Support
#post_searchand#profileare only supported from up to ElasticSearch version >= 2.- for ElasticSearch 2.x, the delete-by-query plugin is required to delete records via queries
Keeping your Models and Indices in Sync
Besides the most basic approach to get you started, SarchFlip currently doesn't ship with any means to automatically keep your models and indices in sync, because every method is very much bound to the concrete environment and depends on your concrete requirements. In addition, the methods to achieve model/index consistency can get arbitrarily complex and we want to keep this bloat out of the SearchFlip codebase.
class Comment < ActiveRecord::Base
include SearchFlip::Model
notifies_index(CommentIndex)
endIt uses after_commit (if applicable, after_save, after_destroy and
after_touch otherwise) hooks to synchronously update the index when your
model changes.
Links
- ElasticSearch: https://www.elastic.co/
- Reference Docs: http://www.rubydoc.info/github/mrkamel/search_flip
- Travis: http://travis-ci.org/mrkamel/search_flip
- will_paginate: https://github.com/mislav/will_paginate
- kaminari: https://github.com/kaminari/kaminari
- Oj: https://github.com/ohler55/oj
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create new Pull Request
Running the test suite
Running the tests is super easy. The test suite uses sqlite, such that you only need to install ElasticSearch. You can install ElasticSearch on your own, or you can e.g. use docker-compose:
$ cd search_flip
$ sudo ES_IMAGE=elasticsearch:5.4 docker-compose up
$ rspec
That's it.
