Best Practices

What follows are some general recommendations on how to improve your search. Some tips represent performance benefits, some provide a better search index. You should evaluate these options for yourself and pick the ones that will work best for you. Not all situations are created equal and many of these options could be considered mandatory in some cases and unnecessary premature optimizations in others. Your mileage may vary.

Good Search Needs Good Content

Most search engines work best when they're given corpuses with predominantly text (as opposed to other data like dates, numbers, etc.) in decent quantities (more than a couple words). This is in stark contrast to the databases most people are used to, which rely heavily on non-text data to create relationships and for ease of querying.

To this end, if search is important to you, you should take the time to carefully craft your SearchIndex subclasses to give the search engine the best information you can. This isn't necessarily hard but is worth the investment of time and thought. Assuming you've only ever used the BasicSearchIndex, in creating custom SearchIndex classes, there are some easy improvements to make that will make your search better:

For your document=True field, use a well-constructed template.
Add fields for data you might want to be able to filter by.
If the model has related data, you can squash good content from those related models into the parent model's SearchIndex.
Similarly, if you have heavily de-normalized models, it may be best represented by a single indexed model rather than many indexed models.

Well-Constructed Templates

A relatively unique concept in Haystack is the use of templates associated with SearchIndex fields. These are data templates, will never been seen by users and ideally contain no HTML. They are used to collect various data from the model and structure it as a document for the search engine to analyze and index.

Note

If you read nothing else, this is the single most important thing you can do to make search on your site better for your users. Good templates can make or break your search and providing the search engine with good content to index is critical.

Good templates structure the data well and incorporate as much pertinent text as possible. This may include additional fields such as titles, author information, metadata, tags/categories. Without being artificial, you want to construct as much context as you can. This doesn't mean you should necessarily include every field, but you should include fields that provide good content or include terms you think your users may frequently search on.

Unless you have very unique numbers or dates, neither of these types of data are a good fit within templates. They are usually better suited to other fields for filtering within a SearchQuerySet.

Additional Fields For Filtering

Documents by themselves are good for generating indexes of content but are generally poor for filtering content, for instance, by date. All search engines supported by Haystack provide a means to associate extra data as attributes/fields on a record. The database analogy would be adding extra columns to the table for filtering.

Good candidates here are date fields, number fields, de-normalized data from related objects, etc. You can expose these things to users in the form of a calendar range to specify, an author to look up or only data from a certain series of numbers to return.

You will need to plan ahead and anticipate what you might need to filter on, though with each field you add, you increase storage space usage. It's generally NOT recommended to include every field from a model, just ones you are likely to use.

Related Data

Related data is somewhat problematic to deal with, as most search engines are better with documents than they are with relationships. One way to approach this is to de-normalize a related child object or objects into the parent's document template. The inclusion of a foreign key's relevant data or a simple Django {% for %} templatetag to iterate over the related objects can increase the salient data in your document. Be careful what you include and how you structure it, as this can have consequences on how well a result might rank in your search.

Avoid Hitting The Database

A very easy but effective thing you can do to drastically reduce hits on the database is to pre-render your search results using stored fields then disabling the load_all aspect of your SearchView.

Warning

This technique may cause a substantial increase in the size of your index as you are basically using it as a storage mechanism.

To do this, you setup one or more stored fields (indexed=False) on your SearchIndex classes. You should specify a template for the field, filling it with the data you'd want to display on your search results pages. When the model attached to the SearchIndex is placed in the index, this template will get rendered and stored in the index alongside the record.

Note

The downside of this method is that the HTML for the result will be locked in once it is indexed. To make changes to the structure, you'd have to reindex all of your content. It also limits you to a single display of the content (though you could use multiple fields if that suits your needs).

The second aspect is customizing your SearchView and its templates. First, pass the load_all=False to your SearchView, ideally in your URLconf. This prevents the SearchQuerySet from loading all models objects for results ahead of time. Then, in your template, simply display the stored content from your SearchIndex as the HTML result.

Warning

To do this, you must absolutely avoid using {{ result.object }} or any further accesses beyond that. That call will hit the database, not only nullifying your work on lessening database hits, but actually making it worse as there will now be at least query for each result, up from a single query for each type of model with load_all=True.

Real-Time Search

If your site sees heavy search traffic and up-to-date information is very important, Haystack provides a way to constantly keep your index up to date. By using the RealTimeSearchIndex class instead of the SearchIndex class, Haystack will automatically update the index whenever a model is saved/deleted.

You can find more information within the :doc:`searchindex_api` documentation.

Use Of A Queue For A Better User Experience

By default, you have to manually reindex content, Haystack immediately tries to merge it into the search index. If you have a write-heavy site, this could mean your search engine may spend most of its time churning on constant merges. If you can afford a small delay between when a model is saved and when it appears in the search results, queuing these merges is a good idea.

You gain a snappier interface for users as updates go into a queue (a fast operation) and then typical processing continues. You also get a lower churn rate, as most search engines deal with batches of updates better than many single updates. You can also use this to distribute load, as the queue consumer could live on a completely separate server from your webservers, allowing you to tune more efficiently.

Implementing this is relatively simple. There are two parts, creating a new QueuedSearchIndex class and creating a queue processing script to handle the actual updates.

For the QueuedSearchIndex, simply inherit from the SearchIndex provided by Haystack and override the _setup_save/_setup_delete methods. These methods usually attach themselves to their model's post_save/post_delete signals and call the backend to update or remove a record. You should override this behavior and place a message in your queue of choice. At a minimum, you'll want to include the model you're indexing and the id of the model within that message, so that you can retrieve the proper index from the SearchSite in your consumer. Then alter all of your SearchIndex classes to inherit from this new class. Now all saves/deletes will be handled by the queue and you should receive a speed boost.

For the consumer, this is much more specific to the queue used and your desired setup. At a minimum, you will need to periodically consume the queue, fetch the correct index from the SearchSite for your application, load the model from the message and pass that model to the update_object or remove_object methods on the SearchIndex. Proper grouping, batching and intelligent handling are all additional things that could be applied on top to further improve performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

best_practices.rst

best_practices.rst

Best Practices

Good Search Needs Good Content

Well-Constructed Templates

Additional Fields For Filtering

Related Data

Avoid Hitting The Database

Real-Time Search

Use Of A Queue For A Better User Experience

Files

best_practices.rst

Latest commit

History

best_practices.rst

File metadata and controls

Best Practices

Good Search Needs Good Content

Well-Constructed Templates

Additional Fields For Filtering

Related Data

Avoid Hitting The Database

Real-Time Search

Use Of A Queue For A Better User Experience