New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make _index optional for pinned query docs #97450
Make _index optional for pinned query docs #97450
Conversation
I think this change makes sense, but we should probably update the pinned query docs as well if we make this change:
|
182cbd1
to
0a94cbb
Compare
0a94cbb
to
95267d1
Compare
There is one side effect that I want to call out here, it is possible to pin the same documents multiple times with the pinned query - however the results will not contain duplicate pinned documents. This example should explain it better:
results:
I think this should be okay, because a similar behaviour can be achieved on main with the use of aliases:
results:
|
Pinging @elastic/es-search (Team:Search) |
Hi @ioanatia, I've created a changelog YAML for you. |
I've noticed that the actual GitHub issue isn't mentioned in this PR. Is this because it's a public repository? 🤔 |
What does organic match mean? I am trying to understand the meaning by reading this doc on pinned query. Is there any other related documentation? |
Yes.
The organic query is the query that is run, the results are underneath all pinned documents. |
@@ -105,13 +102,13 @@ private Item(String id) { | |||
* Read from a stream. | |||
*/ | |||
public Item(StreamInput in) throws IOException { | |||
index = in.readString(); | |||
index = in.readOptionalString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BWC serialization must be handled on input and output.
- skip: | ||
version: " - 8.9.99" | ||
reason: "'_index' was made optional in 8.10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs bumped to 8.10.99
if this is still going to be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PinnedQueryBuilderTests
should be updated to account for the index
being randomly null
or not when generating a query.
if (out.getTransportVersion().onOrAfter(OPTIONAL_INDEX_IN_DOCS_VERSION)) { | ||
out.writeOptionalString(index); | ||
} else { | ||
out.writeString(index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will throw an NPE. It is possible that the coordinator is upgraded to the latest version of Elasticsearch but the relevant data nodes are not. Consequently, if the user does not provide an _index
, the coordinator will accept the call, attempt to serialize, and an NPE will throw.
We should instead be nice here and throw a better error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - I've included an IllegalArgumentException
that hopefully makes sense 👍
@@ -129,7 +126,7 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws | |||
); | |||
|
|||
static { | |||
PARSER.declareString(constructorArg(), INDEX_FIELD); | |||
PARSER.declareStringOrNull(optionalConstructorArg(), INDEX_FIELD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
declareStringOrNull
This means we allow the value to be string or explicitly null
. I think we instead allow it to be string
or not provided at all. Consequently, it should just be PARSER.declareString(optionalConstructorArg(), INDEX_FIELD);
out.writeOptionalString(index); | ||
} else { | ||
out.writeString(index); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below in the toXContent
method, we should not write the INDEX_FIELD
if this.index == null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersion.java
Thanks for checking this - I've added that to the tests 👍 |
@elasticmachine update branch |
merge conflict between base and head |
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersion.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thanks for taking it over the line!
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersion.java
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersion.java
Currently pinned queries require either the `ids` or `docs` parameter. `docs` allows pinning documents from specific indices. However for `docs` the `_index` field is always required: ``` GET test/_search { "query": { "pinned": { "organic": { "query_string": { "query": "something" } }, "docs": [ { "_id": "1" } ] } } } ``` returns an error: ``` { "error": { "root_cause": [ { "type": "parsing_exception", "reason": "[10:22] [pinned] failed to parse field [docs]", "line": 10, "col": 22 } ], "type": "parsing_exception", "reason": "[10:22] [pinned] failed to parse field [docs]", "line": 10, "col": 22, "caused_by": { "type": "x_content_parse_exception", "reason": "[10:22] [pinned] failed to parse field [docs]", "caused_by": { "type": "illegal_argument_exception", "reason": "Required [_index]" } } }, "status": 400 } ``` The proposal here is to make `_index` optional. I don't think we have a strong requirement for making `_index` required, when it was initially introduced in elastic#74873, we mostly wanted the ability to pin docs from specific indices. Making `_index` optional can give more flexibility to use a combination of pinned documents from specific indices or just document ids. This change can also help with pinned query rules. Currently pinned query rules can accept either `ids` or `docs`. If multiple pinned query rules match and they use a combination of `ids` and `docs`, we cannot build a pinned query and we would need to return an error. This is because a pinned query cannot accept both `ids` and `docs`. By making `_index` optional we would no longer need to return an error when pinned query rules use a combination of `ids` and `docs`, because we can easily translate `ids` in `docs`. The following pinned queries would be equivalent: ``` GET test/_search { "query": { "pinned": { "organic": { "query_string": { "query": "something" } }, "docs": [ { "_id": "1" } ] } } } GET test/_search { "query": { "pinned": { "organic": { "query_string": { "query": "something" } }, "ids": [1] } } } ``` The scores should be consistent when using a combination of _docs that might use _index or not - see example <details> <summary>Example </summary> ``` PUT test-1/_doc/1 { "title": "doc 1" } PUT test-1/_doc/2 { "title": "doc 2" } PUT test-2/_doc/1 { "title": "doc 1" } PUT test-2/_doc/3 { "title": "lalala" } POST test-1,test-2/_search { "query": { "pinned": { "organic": { "query_string": { "query": "lalala" } }, "docs": [ { "_id": "2", "_index": "test-1" }, { "_id": "1" } ] } } } ``` response: ``` { "took": 1, "timed_out": false, "_shards": { "total": 2, "successful": 2, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 4, "relation": "eq" }, "max_score": 1.7014124e+38, "hits": [ { "_index": "test-1", "_id": "2", "_score": 1.7014124e+38, "_source": { "title": "doc 2" } }, { "_index": "test-1", "_id": "1", "_score": 1.7014122e+38, // same score as doc with id 1 from test-2 "_source": { "title": "doc 1" } }, { "_index": "test-2", "_id": "1", "_score": 1.7014122e+38, // same score as doc with id 1 from test-1 "_source": { "title": "doc 1" } }, { "_index": "test-2", "_id": "3", "_score": 0.8025915, // organic result "_source": { "title": "lalala" } } ] } } ``` </details> For query rules, if we have two query rules that both match and use a combination of `ids` and `pinned`: ``` PUT _query_rules/test-ruleset { "ruleset_id": "test-ruleset", "rules": [ { "rule_id": "1", "type": "pinned", "criteria": [ { "type": "exact", "metadata": "query_string", "value": "country" } ], "actions": { "docs": [ { "_index": "singers", "_id": "1" } ] } }, { "rule_id": "2", "type": "pinned", "criteria": [ { "type": "exact", "metadata": "query_string", "value": "country" } ], "actions": { "ids": [ 2 ] } } ] } ``` and the following query: ``` POST singers/_search { "query": { "rule_query": { "organic": { "query_string": { "default_field": "name", "query": "country" } }, "match_criteria": { "query_string": "country" }, "ruleset_id": "test-ruleset" } } } ``` then this would get translated into the following pinned query: ``` POST singers/_search { "query": { "pinned": { "organic": { "query_string": { "default_field": "name", "query": "country" } }, "docs": [ { "_index": "singers", "_id": "1" }, {"_id": 2 } ] } } } ``` I think we can also simplify the pinned query rule so that it only receives `docs`: ``` PUT _query_rules/test-ruleset { "ruleset_id": "test-ruleset", "rules": [ { "rule_id": "1", "type": "pinned", "criteria": [ { "type": "exact", "metadata": "query_string", "value": "country" } ], "actions": { "docs": [ { "_id": "1" }, { "_id": "2", "_index": "singers" } ] } } ] } ```
Currently pinned queries require either the
ids
ordocs
parameter.docs
allows pinning documents from specific indices.However for
docs
the_index
field is always required:returns an error:
The proposal here is to make
_index
optional. I don't think we have a strong requirement for making_index
required, when it was initially introduced in #74873, we mostly wanted the ability to pin docs from specific indices.Making
_index
optional can give more flexibility to use a combination of pinned documents from specific indices or just document ids.This change can also help with pinned query rules. Currently pinned query rules can accept either
ids
ordocs
.If multiple pinned query rules match and they use a combination of
ids
anddocs
, we cannot build a pinned query and we would need to return an error. This is because a pinned query cannot accept bothids
anddocs
.By making
_index
optional we would no longer need to return an error when pinned query rules use a combination ofids
anddocs
, because we can easily translateids
indocs
.The following pinned queries would be equivalent:
The scores should be consistent when using a combination of _docs that might use _index or not - see example
Example
response:
For query rules, if we have two query rules that both match and use a combination of
ids
andpinned
:and the following query:
then this would get translated into the following pinned query:
I think we can also simplify the pinned query rule so that it only receives
docs
: