Skip to content

Commit

Permalink
Properly handle distance information returned in ElasticSearch results.
Browse files Browse the repository at this point in the history
  • Loading branch information
joshdrake committed May 9, 2012
1 parent f2f5adc commit d5cc42f
Showing 1 changed file with 12 additions and 2 deletions.
14 changes: 12 additions & 2 deletions haystack/backends/elasticsearch_backend.py
Expand Up @@ -299,6 +299,7 @@ def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,
"unit" : "km"
}
}
geo_sort = True
else:
if field == 'distance':
warnings.warn("In order to sort by distance, you must call the '.distance(...)' method.")
Expand Down Expand Up @@ -495,7 +496,7 @@ def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,
self.log.error("Failed to query Elasticsearch using '%s': %s", query_string, e)
raw_results = {}

return self._process_results(raw_results, highlight=highlight, result_class=result_class)
return self._process_results(raw_results, highlight=highlight, result_class=result_class, distance_point=distance_point, geo_sort=geo_sort)

def more_like_this(self, model_instance, additional_query_string=None,
start_offset=0, end_offset=None, models=None,
Expand Down Expand Up @@ -534,7 +535,7 @@ def more_like_this(self, model_instance, additional_query_string=None,

return self._process_results(raw_results, result_class=result_class)

def _process_results(self, raw_results, highlight=False, result_class=None):
def _process_results(self, raw_results, highlight=False, result_class=None, distance_point=None, geo_sort=False):
from haystack import connections
results = []
hits = raw_results.get('hits', {}).get('total', 0)
Expand Down Expand Up @@ -587,6 +588,15 @@ def _process_results(self, raw_results, highlight=False, result_class=None):
if 'highlight' in raw_result:
additional_fields['highlighted'] = raw_result['highlight'].get(content_field, '')

if distance_point:
additional_fields['_point_of_origin'] = distance_point

if geo_sort and raw_result.get('sort'):
from haystack.utils.geo import Distance
additional_fields['_distance'] = Distance(km=float(raw_result['sort'][0]))
else:
additional_fields['_distance'] = None

result = result_class(app_label, model_name, source[DJANGO_ID], raw_result['_score'], **additional_fields)
results.append(result)
else:
Expand Down

2 comments on commit d5cc42f

@jezdez
Copy link

@jezdez jezdez commented on d5cc42f Jun 20, 2012

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this is needed, can you elaborate?

@joshdrake
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jezdez

Er, sorry, I thought I had created a separate pull for this commit. Basically, the distance information returned by ElasticSearch isn't processed by when building search result objects. This commit implements it in a similar fashion as in the Solr backend (L379). The canonical example of this usage is shown in the docs here.

There are a few things going on in this commit of note:

  • When sorting by distance, the geo_sort variable is set to True (L302)
  • The point used in the .distance() call is passed to the _process_results method, along with geo_sort (L499)
  • The _point_of_origin additional field is set to the distance point so the Haystack can calculate the distance information with GeoPy. This is needed because ElasticSearch only returns distance information when sorting by distance. (L592)
  • If distance was one of the sorting parameters, then ElasticSearch has distance information in the results. In this case, we populate the distance field with that returned directly from ElasticSearch, rather than letting GeoPy waste cycles doing it. (L594)

Sorry about the confusion, let me know if there's anything else I can help with.

Please sign in to comment.