Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException using "has_child" filter after upgrade to v0.90.5 #3965

Closed
ajhalani opened this Issue Oct 24, 2013 · 14 comments

Comments

Projects
None yet
4 participants
@ajhalani
Copy link

commented Oct 24, 2013

After upgrading from v0.90.1 -> v0.90.5, we noticed that some of our has_child filter queries started to fail on some shards. Following is the request -

curl -x '' -s -XPOST 'http://localhost:9201/data_index_20131011/vendor/_search?from=0&size=2' -d '
{
  "filter" : {
    "has_child" : {
      "query" : {
        "match" : {
          "set_aside_descriptions" : {
            "query" : "No set aside used."
          }
        }
      },
      "type" : "transaction"
    }
  }
}
'

Gives failure in response

  "_shards" : {
    "total" : 4,
    "successful" : 2,
    "failed" : 2,
    "failures" : [ {
      "index" : "data_index_20131011",
      "shard" : 0,
      "status" : 500,
      "reason" : "RemoteTransportException[[mach2.node][inet[/<internal ip>:9301]][search/phase/query]]; nested: QueryPhaseExecutionException[[data_index_20131011][0]: query[ConstantScore(cache(_type:vendor))],from[0],size[2]: Query Failed [Failed to execute main query]]; nested: NullPointerException; "
    }, {
      "index" : "data_index_20131011",
      "shard" : 1,
      "status" : 500,
      "reason" : "QueryPhaseExecutionException[[data_index_20131011][1]: query[ConstantScore(cache(_type:vendor))],from[0],size[2]: Query Failed [Failed to execute main query]]; nested: NullPointerException; "
    } ]
  },

And the error trace from logs -

[2013-10-24 08:54:07,330][TRACE][search                   ] [mach2.node] Query phase failed
org.elasticsearch.search.query.QueryPhaseExecutionException: [data_index_20131011][0]: query[ConstantScore(cache(_type:vendor))],from[0],size[2]: Query Failed [Failed to execute main query]
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:138)
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:219)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:623)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:612)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:269)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NullPointerException
        at org.elasticsearch.common.lucene.docset.MatchDocIdSet.shortCircuit(MatchDocIdSet.java:82)
        at org.elasticsearch.index.search.child.HasChildFilter$ParentDocSet.matchDoc(HasChildFilter.java:174)
        at org.elasticsearch.common.lucene.docset.MatchDocIdSet.get(MatchDocIdSet.java:69)
        at org.elasticsearch.common.lucene.search.FilteredCollector.collect(FilteredCollector.java:61)
        at org.apache.lucene.search.Scorer.score(Scorer.java:65)
        at org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.score(ConstantScoreQuery.java:245)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:624)
        at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:162)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:488)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:444)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:134)
        ... 7 more

The same query works fine in v0.90.1. It's tough to provide a gist to replicate because the error is data dependent and index size is more than a GB, it only happens for some queries. For e.g.
"query" : "blah blah" works fine but
"query" : "No set aside used." fails.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Oct 24, 2013

Could this be because of the stop word no?

@ajhalani

This comment has been minimized.

Copy link
Author

commented Oct 24, 2013

I am able to replicate with another query without no.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Oct 24, 2013

Do all of your child docs have the field set_aside_descriptions?

Do your child docs exist on all shards? or do you just have child docs at the moment?

@ajhalani

This comment has been minimized.

Copy link
Author

commented Oct 24, 2013

yes set_aside_descriptions field is populated for all child documents.
Yes child docs exist on all shards.

I notice that this issue is only happening for filter which result in a large number of child docs. For e.g.
for field set_aside_descriptions, the 2 top match values(matching 1450731 and 249870 child docs) are failing but not for the rest(46123 and below).
Similarly for another field socioeconomic_indicators_names where the issue is happening, it fails for top 3 match values (matching 1373706, 876751 and 804976 child docs) but not for the rest (350865 and below)

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Oct 24, 2013

I don't think 0.90.6 is vulnerable to this since we pull the iterator forcefully during weight creation

DocIdSet docIdSet = new ParentDocSet(context.reader(), parentsBits, collectedUids.v(), idReaderTypeCache);
 return ConstantScorer.create(docIdSet, this, queryWeight); // <=== pulls the docIdSet.iterator()

I'd still be curious if we can reproduce this somehow. FYI we literally rewrote ParentChild internally to have proper queries etc.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Oct 24, 2013

@ajhalani any chance you could try distilling this failure down to something we can replicate? That way we can make sure that it is fixed in 0.90.6, which is due out soon.

@ajhalani

This comment has been minimized.

Copy link
Author

commented Oct 24, 2013

it will be little tough to provide replicable steps since the issue seems to be data specific. I will try to write a script which populates millions of small child docs to and see if it still happens. If not, I can build the v0.90 master branch and test it against our data.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Oct 24, 2013

even if it is long, it would be useful.

i'd prefer to have an actual test case that we can add to our test suite, rather than just making sure that it is fixed for now :)

thanks

@ajhalani

This comment has been minimized.

Copy link
Author

commented Oct 24, 2013

I understand replicable test will be very useful but I am unable to share the original index data. I tried replicating it in a new index with dummy data but cannot replicate the issue.

@ajhalani

This comment has been minimized.

Copy link
Author

commented Oct 24, 2013

I updated one of the 3 nodes in cluster to v0.90 master and errors stopped happening. I don't understand why upgrading just one of the the 3 nodes stopped failures on other nodes as well.

Reverting back to v0.90.5 they happen again :)

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Oct 24, 2013

good stuff so I will close this for now! Thanks so much for verifying this is very much appreciated!

@s1monw s1monw closed this Oct 24, 2013

@ghost ghost assigned s1monw Oct 24, 2013

@ajhalani

This comment has been minimized.

Copy link
Author

commented Oct 24, 2013

thanks for all the help, looking forward to next release :)

@joelabrahamsson

This comment has been minimized.

Copy link

commented Nov 1, 2013

We had the same issue with 0.90.5 and can confirm that when using a built version of the master branch instead the issue is resolved.

@ajhalani

This comment has been minimized.

Copy link
Author

commented Nov 1, 2013

Would it be possible to provide an ETA when the next ES version release is planned for? It will help to determine if we should put effort in a temporary workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.