Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using scroll search from a node that doesn't hold data (master node or data node) returns data for every call #18885

Closed
astefan opened this issue Jun 15, 2016 · 3 comments
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories

Comments

@astefan
Copy link
Contributor

astefan commented Jun 15, 2016

Elasticsearch version: 1.7.5

Steps to reproduce:

  • create a 3-node cluster made of two data nodes and one master node
  • create an index like the following
POST /test/test/_bulk
{"index":{"_id":1}}
{"event": "BlackBerry device active", "timestamp": "2016-05-19T01:00:00Z", "username": "a"} 
{"index":{"_id":2}}
{"event": "BlackBerry device active", "timestamp": "2016-05-19T02:00:00Z", "username": "a"} 
{"index":{"_id":3}}
{"event": "BlackBerry device active", "timestamp": "2016-05-19T03:00:00Z", "username": "a"} 
{"index":{"_id":4}}
{"event": "BlackBerry device active", "timestamp": "2016-05-19T04:00:00Z"} 
{"index":{"_id":5}}
{"event": "BlackBerry device active", "timestamp": "2016-05-19T05:00:00Z"} 
  • use the following query using scroll and send the request to the master node:
GET /test_/test/_search?scroll=1m&size=2
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "username": {
        "order": "asc",
        "unmapped_type": "string"
      }
    },
    {
      "timestamp": {
        "order": "asc",
        "unmapped_type": "date"
      }
    }
  ],
  "_source": [
    "username",
    "timestamp"
  ]
}
  • the first result will show documents with IDs 1 and 2
  • the second call GET /_search/scroll?scroll=1m&scroll_id=cXVlcnlUaGVuRmV0Y2g7NTs1OlAxWnFBTEEtUWRTaEJQLUQzLVNxVnc7NTpOTGVNaENDVFNYMm9wSUdjSkhpeGV3OzQ6UDFacUFMQS1RZFNoQlAtRDMtU3FWdzszOk5MZU1oQ0NUU1gyb3BJR2NKSGl4ZXc7NDpOTGVNaENDVFNYMm9wSUdjSkhpeGV3OzA7 (still to the master node) will get docs with IDs 3 and 4
  • the third call will bring back docs with IDs 4 and 5
  • the fourth call will bring back docs with IDs 4 and 5 and so on...

Almost the same behavior can be observed with sending the requests to a data node instead of a master only node, but the data node doesn't have any data from the index (I used "index.routing.allocation.exclude._name": "NODE_NAME") to make that data node (where the requests are sent) not hold data for that specific index.

@clintongormley clintongormley added >bug :Search/Search Search-related issues that do not fall into other categories labels Jun 15, 2016
@jpountz
Copy link
Contributor

jpountz commented Jun 15, 2016

This reproduces for me. I'll dig.

@jpountz
Copy link
Contributor

jpountz commented Jun 15, 2016

Found it: there is a bug in the hack that we have to merge top hits on the coordinating node. It checks reference equality instead of logical equality, which is an issue since the data might go over the network, so reference equality does not work.

This has been fixed in 2.0+ thanks to the cleanups in #12127 but I cannot backport it since it has some bw breaks. I will open a PR shortly.

@jpountz
Copy link
Contributor

jpountz commented Jun 15, 2016

Fixed via #18893.

@jpountz jpountz closed this as completed Jun 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

3 participants