Skip to content

Latest commit

 

History

History
179 lines (120 loc) · 8.23 KB

common_issues.textile

File metadata and controls

179 lines (120 loc) · 8.23 KB
layout title
ts_en
Common Questions and Issues

Common Questions and Issues

Depending on how you have Sphinx setup, or what database you’re using, you might come across little issues and curiosities. Here’s a few to be aware of.

Viewing Result Weights

To retrieve the weights/rankings of each search result, you can enumerate through your matches using each_with_weighting:

{% highlight ruby %}
results.each_with_weighting do |result, weight|


  1. end
    {% endhighlight %}

However, there is currently no clean way to get the weight of a specific result without looping though the dataset.

Wildcard Searching

Sphinx can support wildcard searching (for example: Austr∗), but it is turned off by default. To enable it, you need to add two settings to your config/sphinx.yml file:

{% highlight yaml }
development:
enable_star: 1
min_infix_len: 1
test:
enable_star: 1
min_infix_len: 1
production:
enable_star: 1
min_infix_len: 1
{
endhighlight %}

You can set the min_infix_len value to something higher if you don’t need single characters with a wildcard being matched. This may be a worthwhile fine-tuning, because the smaller the infixes are, the larger your index files become.

Don’t forget to rebuild your Sphinx indexes after making this change.

{% highlight sh }
rake thinking_sphinx:rebuild
{
endhighlight %}

Slow Indexing

If Sphinx is taking a while to process all your records, there are a few common reasons for this happening. Firstly, make sure you have database indexes on any foreign key columns and any columns you filter or sort by.

Secondly – are you using fixtures? Rails’ fixtures have randomly generated IDs, which are usually extremely large integers, and Sphinx isn’t set up to process disparate IDs efficiently by default. To get around this, you’ll need to set sql_range_step in your config/sphinx.yml file for the appropriate environments:

{% highlight yaml }
development:
sql_range_step: 10000000
{
endhighlight %}

MySQL and Large Fields

If you’ve got a field that is built off multiple values in one column – ie: through a has_many association – then you may hit MySQL’s default limit for string concatenation: 1024 characters. You can increase the group_concat_max_len value by adding the following to your define_index block:

{% highlight rb %}
define_index do

set_property :group_concat_max_len => 8192

end
{% endhighlight %}

If these fields get particularly large though, then there’s another setting you may need to set in your MySQL configuration: max_allowed_packet, which has a default of sixteen megabytes. You can’t set this option via Thinking Sphinx though (it’s a rare edge case).

PostgreSQL with Manual Fields and Attributes

If you’re using fields or attributes defined by strings (raw SQL), then the columns used in them aren’t automatically included in the GROUP BY clause of the generated SQL statement. To make sure the query is valid, you will need to explicitly add these columns to the GROUP BY clause.

A common example is if you’re converting latitude and longitude columns from degrees to radians via SQL.

{% highlight ruby %}
define_index do

has “RADIANS”, :as => :latitude, :type => :float has “RADIANS”, :as => :longitude, :type => :float group_by “latitude”, “longitude”

end
{% endhighlight %}

Delta Indexing Not Working

Often people find delta indexing isn’t working on their production server. Sometimes, this is because Sphinx is running as one user on the system, and the Rails/Merb application is being served as a different user. Check your production.log and Apache/Nginx error log file for mentions of permissions issues to confirm this.

Indexing for deltas is invoked by the web user, and so needs to have access to the index files. The simplest way to ensure this is run all Thinking Sphinx rake tasks by that web user.

If you’re still having issues, and you’re using Passenger, read the next hint.

Running Delta Indexing with Passenger

If you’re using Phusion Passenger on your production server, with delta indexing on some models, a common issue people find is that their delta indexes don’t get processed.

If it’s not a permissions issue (see the previous hint), another common cause is because Passenger has it’s own PATH set up, and can’t execute the Sphinx binaries (indexer and searchd) implicitly.

The way around this is to find out where your binaries are on the server:

{% highlight sh }
which searchd
{
endhighlight %}

And then set the bin_path option in your config/sphinx.yml file for the production environment:

{% highlight yaml }
production:
bin_path: ‘/usr/local/bin’
{
endhighlight %}

Can only access the first thousand search results

This is actually how Sphinx is supposed to behave. Have a read of the Large Result Sets section of the Advanced Configuration page to see why, and how to work around it if you really need to.

Vendored Delayed Job, AfterCommit and Riddle

If you’ve still got Delayed Job vendored as part of Thinking Sphinx and would rather use a more up-to-date version of the former, recent releases of Thinking Sphinx do not have it included any longer.

As for AfterCommit and Riddle, while they are still included for plugin installs, they’re no longer in the Thinking Sphinx gem (since 1.3.3). Instead, they are considered dependencies, and will be installed as separate gems.

Filtering on String Attributes

While you can have string columns as attributes in Sphinx, they aren’t stored as strings. Instead, Sphinx figures out the alphabetical order, and gives each string an integer value to make them useful for sorting. However, this means it’s close to impossible to filter on these attributes.

So, to get around this, there’s two options: firstly, use integer attributes instead, if you possibly can. This works for small result sets (for example: gender). Otherwise, you might want to consider manually converting the string to a CRC integer value:

{% highlight ruby }
has “CRC32”, :as => :category, :type => :integer
{
endhighlight %}

This way, you can filter on it like so:

{% highlight ruby }
Article.search ‘pancakes’, :with => {
:category => ‘Ruby’.to_crc32
}
{
endhighlight %}

Of course, this isn’t amazingly clean, but it will work quite well. You should also take note that CRC32 encoding can have collisions, so it’s not the perfect solution.

Models outside of app/models

If you’re using plugins or other web frameworks (Radiant, Ramaze, etc) that don’t always store their models in app/models, you can tell Thinking Sphinx to look in other locations when building the configuration file:

{% highlight ruby }
ThinkingSphinx::Configuration.instance.
model_directories << “/path/to/models/dir”
{
endhighlight %}

By default, Thinking Sphinx will load all models in app/models and vendor/plugins/*/app/models.

Using Thinking Sphinx with Bundler

If you’re using Thinking Sphinx with the gem manager Bundler, you will need to set the :require_as option to thinking_sphinx.

{% highlight ruby }
gem ‘thinking-sphinx’,
:version => ‘1.3.14’,
:require_as => ‘thinking_sphinx’
{
endhighlight %}

If this isn’t done, it can introduce issues with gem loading order and script/console.