Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DATETIME__lt lookup results in ParseException for Elasticsearch #45

Closed
AriHrannar opened this issue Apr 10, 2016 · 12 comments
Closed

DATETIME__lt lookup results in ParseException for Elasticsearch #45

AriHrannar opened this issue Apr 10, 2016 · 12 comments
Assignees
Labels

Comments

@AriHrannar
Copy link

Hi guys,

Great project :)

I am however in a bit of a pickle. Ive setup ElasticSearch, Django Haystack and drf-haystack in my project and it mostly seems to work great. Ive tried a lot of things and I am completely stumped so I am hoping someone here might have an idea whats going on :)

But when I execute this query (submission is a datetime):

?submission__lt=2016-04-01

I get 0 results and the following error in my PyCharm console:

RequestError: TransportError(400, u'SearchPhaseExecutionException[Failed to execute phase [query_fetch], all shards failed; shardFailures {[AQ0uPhhWQlyk026CpVP89w][haystack][0]: SearchParseException[[haystack][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"filtered": {"filter": {"terms": {"django_ct": ["model.modelitem", "model.modelrepository", "model.modelfamily"]}}, "query": {"query_string": {"fuzzy_max_expansions": 50, "auto_generate_phrase_queries": true, "default_operator": "AND", "analyze_wildcard": true, "query": "(submission:({* TO \\"2016\\\\-04\\\\-01\\"}) AND submission:({* TO \\"2016\\\\\\\\\\\\-04\\\\\\\\\\\\-01\\"}))", "default_field": "text", "fuzzy_min_sim": 0.5}}}}, "from": 0, "highlight": {"fields": {"text": {"store": "yes"}}}}]]]; nested: ElasticsearchParseException[failed to parse date field [2016\\-04\\-01], tried both date format [dateOptionalTime], and timestamp number]; nested: IllegalArgumentException[Invalid format: "2016\\-04\\-01" is malformed at "\\-04\\-01"]; }]')

But when I execute

?submission__gt=2016-04-01
It returns the expected results/documents and no error is displayed

Setup:
OS: Ubuntu 15.10
Python: 2.7.10
Elasticsearch: 1.6.2

From requirements.txt:
Django: 1.9.1
Haystack: 2.5.dev0 (Else it doesnt work with Django 1.9)
drf-haystack==1.5.6
elasticsearch==1.9.0

I’ve tried downgrading Django to 1.8 and Haystack to 2.4 but I had the same error. I’ve tried a few other combinations of versions but there is always something that breaks :) Usually the same error or facet search didnt work.

I ran the same query using the python shell (using just haystack):

>>> lt_results = SearchQuerySet().filter(submission__lt=datetime.date(2016, 4, 1))
>>> lt_results.count()
6

Which led me to believe this has something to do with drf-haystack :) Any help would be greatly appreciated, I am completely stumped on how to fix this.

The faceted search works for all fields, apart from the submission field. When I click the narrow_url (that has a count of 6) and execute that query, it returns 0 results. Maybe that is related somehow?

Well here is my code if that helps:

urls.py:

router = routers.DefaultRouter()
router.register("model/search",ModelItemFacetViewSet, base_name="model-search")
router.register("^search/facets/$", ModelItemFacetViewSet, base_name="model-search-facet")
urlpatterns =(
    url(r"", include(router.urls)),
)

views.py:

class ModelItemFacetViewSet(HaystackViewSet):

    index_classes = [ModelItemIndex, ModelFamilyIndex, ModelRepositoryIndex]

    # This will be used to filter and serialize regular queries as well
    # as the results if the `facet_serializer_class` has the
    # `serialize_objects = True` set.
    serializer_class = ModelItemSerializer
    filter_backends = [HaystackHighlightFilter, HaystackAutocompleteFilter]

    # This will be used to filter and serialize faceted results
    facet_serializer_class = ModelItemFacetSerializer  # See example above!
    facet_filter_backends = [HaystackFacetFilter]   # This is the default facet filter, and
                                                    # can be left out.

search_indexes.py:

class SpecimenIndex(indexes.SearchIndex):
    text = indexes.CharField(document=True, use_template=True)
    name = indexes.CharField(model_attr='name')
    submission = indexes.DateTimeField(model_attr='submission', faceted=True)
    type = indexes.CharField(model_attr='type', null=True)
    abbreviation = indexes.CharField(model_attr='abbreviation', null=True)

    class Meta:
        abstract = True

    def get_model(self):
        return Specimen

    def index_queryset(self, using=None):
        return self.get_model().objects.all()


class ReportableIndex(SpecimenIndex):
    class Meta:
        abstract = True

    def get_model(self):
        return Reportable

    def index_queryset(self, using=None):
        return self.get_model().objects.all()


class ReviewableIndex(ReportableIndex):
    id = indexes.CharField(model_attr='id')
    life_cycle_phase = indexes.CharField(model_attr='life_cycle_phase', null=True)
    model_category = indexes.CharField(model_attr='model_category', faceted=True)
    file_size = indexes.CharField(model_attr='generated_information__file_size', faceted=True)
    file_format = indexes.CharField(model_attr='generated_information__file_format', faceted=True)
    number_of_downloads = indexes.CharField(model_attr='generated_information__number_of_downloads', faceted=True)

    number_of_models = indexes.CharField(model_attr='classification_information__number_of_models', faceted=True)
    modeling_language = indexes.CharField(model_attr='classification_information__modeling_language', faceted=True)

    tool_vendor = indexes.CharField(model_attr='classification_information__generating_tool__vendor', null=True, faceted=True)
    tool_product = indexes.CharField(model_attr='classification_information__generating_tool__product', null=True, faceted=True)
    tool_version = indexes.CharField(model_attr='classification_information__generating_tool__version', null=True, faceted=True)

    quality = indexes.CharField(model_attr='assessment_information__quality')
    completeness = indexes.CharField(model_attr='assessment_information__completeness')
    level_of_model = indexes.CharField(model_attr='assessment_information__level_of_model')

    uuid = indexes.CharField(model_attr='uuid', null=True)

    license = indexes.CharField(model_attr='license__name', null=True, faceted=True)

    class Meta:
        abstract = True

    def get_model(self):
        return Reviewable

    def index_queryset(self, using=None):
        return self.get_model().objects.all()


class GroupableIndex(ReviewableIndex):
    parent = indexes.CharField(model_attr='parent', null=True)
    class Meta:
        abstract = True

    def get_model(self):
        return Groupable

    def index_queryset(self, using=None):
        return self.get_model().objects.all()


class ModelItemIndex(GroupableIndex, indexes.Indexable):
    def get_model(self):
        return ModelItem

    def index_queryset(self, using=None):
        return self.get_model().objects.all()


class ModelFamilyIndex(GroupableIndex, indexes.Indexable):
    def get_model(self):
        return ModelFamily

    def index_queryset(self, using=None):
        return self.get_model().objects.all()


class ModelRepositoryIndex(ReviewableIndex, indexes.Indexable):
    def get_model(self):
        return ModelRepository

    def index_queryset(self, using=None):
        return self.get_model().objects.all()

All template _text.txt files look like this (model item, model family and model repository) :

{{object.name}}
{{object.id}}
{{object.submission}}
@rhblind
Copy link
Owner

rhblind commented Apr 11, 2016

Hi,

Thanks for the (detailed) report. I'll have a closer look at it when I've got a bit more time. I'm a little busy this week, so please be patient with me =)

@AriHrannar
Copy link
Author

If it helps I just tried downgrading Django to 1.8.9 and Django Haystack to 2.4.1 (to fix another issue I was having with building the index through the haystack management options) and the issue is still there :(

@rhblind
Copy link
Owner

rhblind commented Apr 12, 2016

No, the problem is in drf-haystack, probably because the HaystackFilter does not convert the date string to a datetime object.
I'll make a fix for this in the upcoming v1.6 release, but as you have noticed, I need to wait for django-haystack to release a new version for it to work with Django 1.9.

@rhblind
Copy link
Owner

rhblind commented Apr 24, 2016

Hello,
I've tried to reproduce this, but I'm unable to on the current development branch. Could you try to pull the develop branch and see if it works? There are some (potentially breaking) changes, but unless you have overridden drf-haystack internals you should be good to go.

I've also added some test cases for your issue.

@AriHrannar
Copy link
Author

Hi,
Sorry for the late response :)

So I replaced
drf-haystack==1.5.6
with
-e git://github.com/inonit/drf-haystack.git@develop#egg=drf-haystack

in my requirements.txt. Never referenced a specific branch from my requirements.txt from github, is that done correctly?

I have 9 items indexed. 1 created on April 22 2016 and then 8 created on April 23 2016

?submission__gt=2016-04-21
Returns 9
?submission__lt=2016-04-24
Returns 0
?submission__gt=2016-04-22
Returns 8

So unless I did anything wrong with referencing the branch it appears to have the same problem :(
I did however start receiving 401 responses at first when searching, since I had not specified a permission class in my view - which makes me believe I am not using the same version of drf-haystack :)

SMALL OFF TOPIC:
I tried faceting the dates
"dates": {
"submission": [
{
"text": "2016-04-22T00:00:00",
"count": 1,
"narrow_url": "/api/model/search/facets/?selected_facets=submission_exact%3A2016-04-22+00%3A00%3A00"
},
{
"text": "2016-04-23T00:00:00",
"count": 8,
"narrow_url": "/api/model/search/facets/?selected_facets=submission_exact%3A2016-04-23+00%3A00%3A00"
}
]

But when I click the narrow_url for either of them, the result is always
{
"objects": {
"count": 0,
"next": null,
"previous": null,
"results": []
}
}
},

Submission facet specification in the serializer:
"submission": {
"start_date": datetime.now() - timedelta(days=3 * 365),
"end_date": datetime.now(),
"gap_by": "day",
"gap_amount": 1
}

All other facets I have work perfectly
Related issue? Should I create a bug for this as well?

@rhblind
Copy link
Owner

rhblind commented Apr 27, 2016

Hmm, thats weird.
About the faceting stuff; it's probably related.

I did notice one thing in your search_indexes.py example above.
From the django-haystack docs:

To build a SearchIndex, all that’s necessary is to subclass both indexes.SearchIndex & indexes.Indexable, define the fields you want to store data with and define a get_model method.

It seems like you have only inherited from indexes.SearchIndex.
Maybe you can try to create a bit simpler index class with only date fields and see it work works?
Also, check out the test suite for examples.

If it works with a simpler index, try to build up from there.
Please let me know your findings, if it still doesn't work we need to investigate a bit further...

@AriHrannar
Copy link
Author

I only inherit indexes.SearchIndex at the top level, I dont want to index the abstract classes I have. The three classes I want to Index all inherit from indexes.Indexable (ModelItem, ModelFamily, ModelRepository).

I will try something simpler and see if I can get anywhere

@AriHrannar
Copy link
Author

Here is the whole stacktrace, forgot to add that to the original post

Failed to query Elasticsearch using '(submission:({* TO "2016\-04\-24"}) AND submission:({* TO "2016\\\-04\\\-24"}))': TransportError(400, u'SearchPhaseExecutionException[Failed to execute phase [query_fetch], all shards failed; shardFailures {[_bfbRn60RyC2qSP-3bKyiw][haystack][0]: SearchParseException[[haystack][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"filtered": {"filter": {"terms": {"django_ct": ["model.modelitem", "model.modelrepository", "model.modelfamily"]}}, "query": {"query_string": {"query": "(submission:({* TO \\"2016\\\\-04\\\\-24\\"}) AND submission:({* TO \\"2016\\\\\\\\\\\\-04\\\\\\\\\\\\-24\\"}))", "default_operator": "AND", "default_field": "text", "auto_generate_phrase_queries": true, "analyze_wildcard": true}}}}, "size": 1, "from": 0, "highlight": {"fields": {"text": {"store": "yes"}}}}]]]; nested: ElasticsearchParseException[failed to parse date field [2016\\-04\\-24], tried both date format [dateOptionalTime], and timestamp number]; nested: IllegalArgumentException[Invalid format: "2016\\-04\\-24" is malformed at "\\-04\\-24"]; }]')
Traceback (most recent call last):
  File "/home/ari/Workspace/DTU/moccasin-backend/env/local/lib/python2.7/site-packages/haystack/backends/elasticsearch_backend.py", line 516, in search
    _source=True)
  File "/home/ari/Workspace/DTU/moccasin-backend/env/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/home/ari/Workspace/DTU/moccasin-backend/env/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 531, in search
    doc_type, '_search'), params=params, body=body)
  File "/home/ari/Workspace/DTU/moccasin-backend/env/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/home/ari/Workspace/DTU/moccasin-backend/env/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request
    self._raise_error(response.status, raw_data)
  File "/home/ari/Workspace/DTU/moccasin-backend/env/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
RequestError: TransportError(400, u'SearchPhaseExecutionException[Failed to execute phase [query_fetch], all shards failed; shardFailures {[_bfbRn60RyC2qSP-3bKyiw][haystack][0]: SearchParseException[[haystack][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"filtered": {"filter": {"terms": {"django_ct": ["model.modelitem", "model.modelrepository", "model.modelfamily"]}}, "query": {"query_string": {"query": "(submission:({* TO \\"2016\\\\-04\\\\-24\\"}) AND submission:({* TO \\"2016\\\\\\\\\\\\-04\\\\\\\\\\\\-24\\"}))", "default_operator": "AND", "default_field": "text", "auto_generate_phrase_queries": true, "analyze_wildcard": true}}}}, "size": 1, "from": 0, "highlight": {"fields": {"text": {"store": "yes"}}}}]]]; nested: ElasticsearchParseException[failed to parse date field [2016\\-04\\-24], tried both date format [dateOptionalTime], and timestamp number]; nested: IllegalArgumentException[Invalid format: "2016\\-04\\-24" is malformed at "\\-04\\-24"]; }]')
Failed to query Elasticsearch using '(submission:({* TO "2016\-04\-24"}) AND submission:({* TO "2016\\\-04\\\-24"}))': TransportError(400, u'SearchPhaseExecutionException[Failed to execute phase [query_fetch], all shards failed; shardFailures {[_bfbRn60RyC2qSP-3bKyiw][haystack][0]: SearchParseException[[haystack][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"filtered": {"filter": {"terms": {"django_ct": ["model.modelitem", "model.modelrepository", "model.modelfamily"]}}, "query": {"query_string": {"query": "(submission:({* TO \\"2016\\\\-04\\\\-24\\"}) AND submission:({* TO \\"2016\\\\\\\\\\\\-04\\\\\\\\\\\\-24\\"}))", "default_operator": "AND", "default_field": "text", "auto_generate_phrase_queries": true, "analyze_wildcard": true}}}}, "from": 0, "highlight": {"fields": {"text": {"store": "yes"}}}}]]]; nested: ElasticsearchParseException[failed to parse date field [2016\\-04\\-24], tried both date format [dateOptionalTime], and timestamp number]; nested: IllegalArgumentException[Invalid format: "2016\\-04\\-24" is malformed at "\\-04\\-24"]; }]')
Traceback (most recent call last):
  File "/home/ari/Workspace/DTU/moccasin-backend/env/local/lib/python2.7/site-packages/haystack/backends/elasticsearch_backend.py", line 516, in search
    _source=True)
  File "/home/ari/Workspace/DTU/moccasin-backend/env/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/home/ari/Workspace/DTU/moccasin-backend/env/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 531, in search
    doc_type, '_search'), params=params, body=body)
  File "/home/ari/Workspace/DTU/moccasin-backend/env/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/home/ari/Workspace/DTU/moccasin-backend/env/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request
    self._raise_error(response.status, raw_data)
  File "/home/ari/Workspace/DTU/moccasin-backend/env/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
RequestError: TransportError(400, u'SearchPhaseExecutionException[Failed to execute phase [query_fetch], all shards failed; shardFailures {[_bfbRn60RyC2qSP-3bKyiw][haystack][0]: SearchParseException[[haystack][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"filtered": {"filter": {"terms": {"django_ct": ["model.modelitem", "model.modelrepository", "model.modelfamily"]}}, "query": {"query_string": {"query": "(submission:({* TO \\"2016\\\\-04\\\\-24\\"}) AND submission:({* TO \\"2016\\\\\\\\\\\\-04\\\\\\\\\\\\-24\\"}))", "default_operator": "AND", "default_field": "text", "auto_generate_phrase_queries": true, "analyze_wildcard": true}}}}, "from": 0, "highlight": {"fields": {"text": {"store": "yes"}}}}]]]; nested: ElasticsearchParseException[failed to parse date field [2016\\-04\\-24], tried both date format [dateOptionalTime], and timestamp number]; nested: IllegalArgumentException[Invalid format: "2016\\-04\\-24" is malformed at "\\-04\\-24"]; }]')
[27/Apr/2016 13:53:14] "GET /api/model/search/?submission__lt=2016-04-24 HTTP/1.1" 200 52

@AriHrannar
Copy link
Author

I think I might have solved it?

I have
filter_backends = [HaystackHighlightFilter, HaystackAutocompleteFilter]

In my view

If I change it to
filter_backends = [HaystackFilter]

Now I get 10x results for ?submission__lt=2016-04-24

So might be a problem with the other two filter backends? I got this idea from looking at your tests and wondering why they worked :)

@AriHrannar
Copy link
Author

Changed my requirements.txt to reference

drf-haystack==1.5.6

again. Works there as well.

@rhblind
Copy link
Owner

rhblind commented Apr 28, 2016

Ah, hehe, nice spotted!
It's of course the HaystackAutocompleteFilter. It doesn't support the __ filtering. The HaystackHighlightFilter is a subclass of HaystackFilter and should support the __ syntax.

Autocomplete should probably have a separate view.

@rhblind
Copy link
Owner

rhblind commented Apr 28, 2016

Could you please just confirm and close the issue?
Good luck with your project, thanks for using my library ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants