Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Bug in _termvectors with artificial document? #21906
I am using the
When I submit a
More so, if I send an artificial document with the field
Here are the steps to reproduce:
Create the index
Index one document
Verify the document exists in one shard
One of those requests will return 0 hits, the other 1 (in my case shard 0 did not return the hit)
Get the Term Vectors of that one document
Get the TV of 'one' using an artificial document
From shard 0
From shard 1
In my case, shard 0 returns
BTW, and this may be a different bug report, when I send the TV requests with the
Submitting the request multiple times eventually succeeds.
Here's the NPE stacktrace from the console BTW:
After reviewing ES code, I believe I found the issue, in
Since the shard has no documents indexed, it finds no terms for the artificial document's field, and therefore uses the doc's TV as the terms iterator.
So if I send:
I still think it's a bug, in that it's OK to receive the artificial doc's TV, but I don't expect term_statistics to use the doc's stats as what's in the index?
Thanks for the quick response @clintongormley. As for what I use it for, see this discussion that I started https://discuss.elastic.co/t/terms-stats-api/67508 and this feature request #21886. Basically I want to get terms statistics (currently for re-ranking capabilities, and also at the moment outside of 'scripting') and the lack of API got me to try TV and artificial documents, where I send the list of terms I wish to get stats for as an artificial document, to all the shards.
I would love to send a PR, but I'll need to do some work to setup a dev environment. I.e. I don't have an ES fork, not even Gradle installed
Also, what about that NPE? Do you prefer I report that in a separate bug report?
just a few easy clicks away :)
No, a drive-by-fix would be fine :)