Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved search feature (elasticsearch based, demo available) #455

Closed
wants to merge 22 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
e18cfb2
[SEARCH] Added "tire" dependency for searching Rubygems.org with elas…
karmi Aug 24, 2012
6632bc2
[SEARCH] Allow connections to elasticsearch [localhost:9200] in tests…
karmi Aug 24, 2012
79b6857
[SEARCH] Mock HTTP responses to Elasticsearch in unit tests
karmi May 10, 2013
70f0f1f
[SEARCH] Added, that factories trigger `touch` callbacks after create
karmi Aug 24, 2012
2e56fcb
[SEARCH] Added elementary Tire integration into the Rubygem model
karmi Aug 24, 2012
611774a
[SEARCH] Added the simplest possible search with elasticsearch
karmi Aug 24, 2012
2389438
[SEARCH] Added proper analyzer for Rubygem names
karmi Aug 25, 2012
bfd2aa7
[SEARCH] Changed the search definition to a DSL-based syntax, added s…
karmi Aug 25, 2012
db6e1d3
[SEARCH] Changed, that search results are ordered first by downloads,…
karmi Aug 25, 2012
aa63c84
[SEARCH] Added a more complex mapping definition and serialization fo…
karmi Aug 26, 2012
415ba28
[SEARCH] Added a more complex search query in the SearchesController#…
karmi Aug 26, 2012
f0d20d4
[SEARCH] Added a `rescue_from` failed search requests due to incorrec…
karmi Aug 26, 2012
5c13726
[SEARCH] Added a "user enters a search query with incorrect syntax" C…
karmi Aug 26, 2012
ac96085
[SEARCH] Added the "Search Advanced" Cucumber feature
karmi Aug 26, 2012
0346d62
[SEARCH] Added a Cucumber scenario for searching in gem authors
karmi Aug 28, 2012
4eb11a8
[SEARCH] Refactored the search steps to a higher-level nested step "I…
karmi Aug 28, 2012
d389af0
[SEARCH] Added "search tips" sliding panel at the search results page
karmi Aug 29, 2012
cf5beec
[SEARCH] Prevent indexing errors on Rubygem records without a version
karmi Apr 24, 2013
01f479b
[SEARCH] Added starting of "elasticsearch" in the Travis CI configura…
karmi Aug 29, 2012
382b3a6
[SEARCH] Added information about installing Elasticsearch into "Contr…
karmi Apr 25, 2013
c1f99aa
[SEARCH] Handle search engine being not available in user-friendly way
karmi Apr 25, 2013
06f2626
[SEARCH] Changed, that errors when indexing to Elasticsearch are rescued
karmi May 9, 2013
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions .travis.yml
Expand Up @@ -9,3 +9,5 @@ language: ruby
rvm:
- 1.9.3
script: bundle exec rake default
services:
- elasticsearch
18 changes: 11 additions & 7 deletions CONTRIBUTING.md
Expand Up @@ -43,7 +43,7 @@ git remote set-url origin git@github.com:rubygems/rubygems.org.git

Otherwise, you can continue to hack away in your own fork.

If you’re looking for things to hack on, please check
If you’re looking for things to hack on, please check
[GitHub Issues](http://github.com/rubygems/rubygems.org/issues). If you’ve
found bugs or have feature ideas don’t be afraid to pipe up and ask the
[mailing list](http://groups.google.com/group/gemcutter) or IRC channel
Expand Down Expand Up @@ -127,19 +127,23 @@ running:
**version 2.0 or higher**. If you have homebrew,
do `brew install redis -H`, if you use macports,
do `sudo port install redis`.
* Install [Elasticsearch](http://www.elasticsearch.org).
You can do it with `brew install elasticsearch`,
or just download, unzip and run
a [release](http://www.elasticsearch.org/download/).
* Rubygems is configured to use PostgreSQL (>= 8.4.x),
for MySQL see below. Install with: `brew install postgres`

**Get the code:**

* Clone the repo: `git clone git://github.com/rubygems/rubygems.org`
* Move into your cloned rubygems directory if you haven’t already:
* Move into your cloned rubygems directory if you haven’t already:
`cd rubygems.org`

**Setup the database:**

* Get set up: `./script/setup`
* Run the database rake tasks if needed:
* Run the database rake tasks if needed:
`rake db:create:all db:drop:all db:setup db:test:prepare --trace`

**Running tests:**
Expand All @@ -151,7 +155,7 @@ running:

* Set the REDISTOGO_URL environment variable. For example:
`REDISTOGO_URL="redis://localhost:6379"`
* Import gems if you want to seed the database.
* Import gems if you want to seed the database.
`rake gemcutter:import:process PATHTO_GEMS/cache`
* _To import a small set of gems you can point the import process to any
gems cache directory, like a very small `rvm` gemset for instance._
Expand Down Expand Up @@ -188,8 +192,8 @@ running:

> **Warning:** Gem names are case sensitive (eg. `BlueCloth` vs.
> `bluecloth` 2). MySQL has a `utf8_bin` collation, but it appears
> that you still need to do `BINARY name = ?` for searching.
> It is recommended that you stick to PostgreSQL >= 8.4.x
> that you still need to do `BINARY name = ?` for searching.
> It is recommended that you stick to PostgreSQL >= 8.4.x
> for development. Some tests will also fail if you use MySQL
> because some queries use SQL functions which don't exist in MySQL..

Expand Down
1 change: 1 addition & 0 deletions Gemfile
Expand Up @@ -31,6 +31,7 @@ gem 'validates_formatting_of'
gem 'will_paginate'
gem 'xml-simple'
gem 'yajl-ruby', :require => 'yajl'
gem 'tire'

# enable if on heroku, make sure to toss this into an initializer:
# Rails.application.config.middleware.use HerokuAssetCacher
Expand Down
11 changes: 11 additions & 0 deletions Gemfile.lock
Expand Up @@ -30,6 +30,7 @@ GEM
multi_json (~> 1.0)
addressable (2.3.5)
aggregate (0.2.2)
ansi (1.4.3)
arel (3.0.2)
bcrypt-ruby (3.1.2)
bluepill (0.0.66)
Expand Down Expand Up @@ -101,6 +102,7 @@ GEM
gherkin (2.12.1)
multi_json (~> 1.3)
gravtastic (3.2.6)
hashr (0.0.22)
high_voltage (1.2.4)
highline (1.6.19)
hike (1.2.3)
Expand Down Expand Up @@ -212,6 +214,14 @@ GEM
thor (0.18.1)
tilt (1.4.1)
timecop (0.6.3)
tire (0.6.0)
activemodel (>= 3.0)
activesupport
ansi
hashr (~> 0.0.19)
multi_json (~> 1.3)
rake
rest-client (~> 1.6)
treetop (1.4.15)
polyglot
polyglot (>= 0.3.1)
Expand Down Expand Up @@ -279,6 +289,7 @@ DEPENDENCIES
shoulda
sinatra
timecop
tire
unicorn
validates_formatting_of
webmock
Expand Down
44 changes: 41 additions & 3 deletions app/controllers/searches_controller.rb
@@ -1,11 +1,49 @@
class SearchesController < ApplicationController

# Handle search engine not being available
#
rescue_from Errno::EHOSTUNREACH, Errno::ECONNREFUSED, SocketError do |error|
flash.now[:failure] = "Sorry, search is not available at the moment." if params[:query]
render :show, :status => :internal_server_error
end

# Indicate incorrect query to the user
#
rescue_from Tire::Search::SearchRequestFailed do |error|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cover the case where ES is completely unavailable/disconnected?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that would have to be handled by a separate rescue_from clause, displaying an error such as "We're sorry, search is currently not available".

flash.now[:failure] = "Sorry, your query is incorrect." if error.message =~ /SearchParseException/ && params[:query]
render :show, :status => :internal_server_error
end

def show
if params[:query]
@gems = Rubygem.search(params[:query]).with_versions.paginate(:page => params[:page])
if params[:query].present?
@gems = Rubygem.tire.search :page => params[:page],
:per_page => Rubygem.per_page,
:load => {:include => 'versions'} do |search|

search.query do |s|
s.filtered do |f|
f.query do |q|
q.boolean do |it|
it.should { |q| q.match 'name.raw', params[:query], :boost => 500 }
it.should { |q| q.match :name, params[:query], :type => 'phrase_prefix', :operator => 'and', :boost => 100 }
it.should { |q| q.string params[:query], :default_operator => 'and' }
end
end
f.filter :term, :indexed => true
end
end

search.sort do
by 'downloads', :desc
by 'name.raw', :asc
end

# STDOUT.puts search.to_curl if Rails.env.development?
end

@exact_match = Rubygem.name_is(params[:query]).with_versions.first

redirect_to rubygem_path(@exact_match) if @gems == [@exact_match]
redirect_to rubygem_path(@exact_match) if @exact_match && @gems.size == 1 && @gems.first.id == @exact_match.id
end
end

Expand Down
2 changes: 1 addition & 1 deletion app/helpers/application_helper.rb
Expand Up @@ -16,7 +16,7 @@ def atom_feed_link(title, url)
end

def short_info(version)
truncate(version.info, :length => 100)
version ? truncate(version.info, :length => 100) : ''
end

def gravatar(size, id = "gravatar", user = current_user)
Expand Down
7 changes: 7 additions & 0 deletions app/helpers/searches_helper.rb
@@ -0,0 +1,7 @@
module SearchesHelper

def link_to_example_search(query)
link_to query, search_url( :query => query, :anchor => 'tips' )
end

end
4 changes: 2 additions & 2 deletions app/models/dependency.rb
@@ -1,8 +1,8 @@
class Dependency < ActiveRecord::Base
LIMIT = 250

belongs_to :rubygem
belongs_to :version
belongs_to :rubygem, :touch => true
belongs_to :version, :touch => true

before_validation :use_gem_dependency,
:use_existing_rubygem,
Expand Down
2 changes: 1 addition & 1 deletion app/models/linkset.rb
@@ -1,5 +1,5 @@
class Linkset < ActiveRecord::Base
belongs_to :rubygem
belongs_to :rubygem, :touch => true
attr_protected :rubygem_id

LINKS = %w(home wiki docs mail code bugs).freeze
Expand Down
2 changes: 1 addition & 1 deletion app/models/ownership.rb
@@ -1,5 +1,5 @@
class Ownership < ActiveRecord::Base
belongs_to :rubygem
belongs_to :rubygem, :touch => true
belongs_to :user

validates :user_id, :uniqueness => {:scope => :rubygem_id}
Expand Down
53 changes: 53 additions & 0 deletions app/models/rubygem.rb
@@ -1,6 +1,8 @@
class Rubygem < ActiveRecord::Base
include Patterns

include Tire::Model::Search

has_many :owners, :through => :ownerships, :source => :user
has_many :ownerships, :dependent => :destroy
has_many :subscribers, :through => :subscriptions, :source => :user
Expand All @@ -15,6 +17,50 @@ class Rubygem < ActiveRecord::Base
after_create :update_unresolved
before_destroy :mark_unresolved

after_create :update_elasticsearch_index_with_rescue
after_destroy :update_elasticsearch_index_with_rescue
after_touch :update_elasticsearch_index_with_rescue

tire do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this is perfect for a Concern module, something like Searchable.

class Rubygem < ActiveRecord::Base
  include Searchable

And that module has all of the necessary includes, methods, etc. Any thoughts about that approach?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing against such approach -- normally, I like to keep mapping/etc definitions inside the model, and since the after_create :update_unresolved hook was already there, I just followed the convention. Do you want to extract everything related to search to a module?

index_prefix Rails.env

settings :number_of_shards => 1,
:number_of_replicas => 1,
:analysis => {
:analyzer => {
:rubygem => {
:type => 'pattern',
:pattern => "[\s#{Regexp.escape(SPECIAL_CHARACTERS)}]+"
}
}
} do
mapping do
indexes :name, :type => 'multi_field',
:fields => {
:name => { :type => 'string', :analyzer => 'rubygem', :boost => 10.0 },
:raw => { :type => 'string', :analyzer => 'keyword', :boost => 10.0 }
}
indexes :indexed, :type => 'boolean', :include_in_all => false, :as => proc { versions.any?(&:indexed?) }
indexes :downloads, :type => 'integer', :include_in_all => false

indexes :summary, :analyzer => 'english', :as => proc { versions.most_recent.try(:summary) }
indexes :description, :analyzer => 'english', :as => proc { versions.most_recent.try(:description) }
indexes :author, :as => proc { versions.most_recent.try(:authors).try(:split, /\s*,\s*/) }

indexes :version, :analyzer => 'keyword', :as => proc { versions.map(&:number) },
:include_in_all => false

indexes :uses, :as => proc { versions.most_recent.dependencies.map(&:name) if versions.most_recent rescue nil },
:include_in_all => false
indexes :depends, :as => proc { versions.most_recent.dependencies.runtime.map(&:name) if versions.most_recent rescue nil },
:include_in_all => false

indexes :created_at, :type => 'date', :include_in_all => false
indexes :updated_at, :type => 'date', :include_in_all => false
end
end
end

def self.with_versions
where("rubygems.id IN (SELECT rubygem_id FROM versions where versions.indexed IS true)")
end
Expand Down Expand Up @@ -268,6 +314,13 @@ def gittip_enabled?
owners.where('gittip_username is not null').count > 0
end

def update_elasticsearch_index_with_rescue
update_elasticsearch_index
rescue Exception => e
Rails.logger.error "Error when updating Elasticsearch. Original exception: #{e.inspect}"
return true
end

private

def ensure_name_format
Expand Down
2 changes: 1 addition & 1 deletion app/models/subscription.rb
@@ -1,5 +1,5 @@
class Subscription < ActiveRecord::Base
belongs_to :rubygem
belongs_to :rubygem, :touch => true
belongs_to :user

validates :rubygem_id, :uniqueness => {:scope => :user_id}
Expand Down
2 changes: 1 addition & 1 deletion app/models/version.rb
@@ -1,5 +1,5 @@
class Version < ActiveRecord::Base
belongs_to :rubygem
belongs_to :rubygem, :touch => true
has_many :dependencies, :order => 'rubygems.name ASC', :include => :rubygem, :dependent => :destroy

before_save :update_prerelease
Expand Down
43 changes: 43 additions & 0 deletions app/views/searches/_search_tips.en.html.erb
@@ -0,0 +1,43 @@
<div id="search-tips">
<div>
<p>
When looking for gems, you can use a wide variety of search queries
in the <a href="http://lucene.apache.org/core/3_6_1/queryparsersyntax.html" class="external">Lucene syntax</a>.
</p>

<p>
Quite simply, you can search in gem names, summaries and descriptions with queries like
<code><%= link_to_example_search 'rack' %></code> or
<code><%= link_to_example_search 'imap' %></code>
</p>

<p>You can, of course, restrict the search to gem names only:</p>
<p><code><%= link_to_example_search 'name:rack' %></code></p>

<p>To broaden your search, you can use wildcards:</p>
<p>
<code><%= link_to_example_search 'name:ra*' %></code> or
<code><%= link_to_example_search 'web*' %></code>
</p>

<p>You can search for specific gem authors:</p>
<p><code><%= link_to_example_search 'author:john' %></code></p>

<p>Of course, you can combine these queries into complex ones:</p>
<p>
<code><%= link_to_example_search 'name:ra* AND author:john' %></code> or
<code><%= link_to_example_search 'name:ra* AND version:1*' %></code>
</p>

<p>To discover more gems, you can search by their depencies in runtime:</p>
<p><code><%= link_to_example_search 'depends:rack' %></code></p>
<p>or in development:</p>
<p><code><%= link_to_example_search 'uses:rack' %></code></p>

<p>Lastly, you can restrict your search to gems created or updated in certain timeframe:</p>
<p><code><%= link_to_example_search "name:rack AND updated_at:[#{Time.now.to_date.beginning_of_month.to_s(:db)} TO #{Time.now.to_date.end_of_month.to_s(:db)}]" %></code></p>

<p class="legend">The searchable fields are <em>name</em>, <em>summary</em>, <em>description</em>, <em>author</em>, <em>version</em>, <em>uses</em>, <em>depends</em>, <em>created_at</em>, <em>updated_at</em> and <em>downloads</em>.</p>

</div>
</div>
10 changes: 9 additions & 1 deletion app/views/searches/show.html.erb
@@ -1,6 +1,14 @@
<% @title = "search" %>

<% @subtitle = t('.subtitle', :query => nil) if params[:query].present? %>
<%= form_tag search_url, :id => "in-page-search", :method => :get do %>
<%= text_field_tag :query, params[:query] if params[:query].present? %>
<a href="#" id="search-tips-toggle" title="<%= t '.tips_tooltip' %>"><%= t '.tips' %></a>
<% end %>

<%= render :partial => 'search_tips' %>

<% if @gems %>
<% @subtitle = t('.subtitle', :query => content_tag(:em, h(params[:query]))) %>
<% if @exact_match %>
<p><%= t '.exact_match' %></p>
<div class="gems border">
Expand Down
3 changes: 3 additions & 0 deletions config/environments/test.rb
@@ -1,3 +1,6 @@
require 'webmock' # Allow connections to elasticsearch
WebMock.disable_net_connect!(:allow => /localhost\:9200/)

Gemcutter::Application.configure do
config.cache_classes = true
config.whiny_nils = true
Expand Down
2 changes: 2 additions & 0 deletions config/locales/en.yml
Expand Up @@ -166,6 +166,8 @@ en:
show:
subtitle: "for %{query}"
exact_match: Exact match
tips: Tips
tips_tooltip: "Show search tips"

sessions:
new:
Expand Down