Fail shards early when we can detect a type missmatch #79869

not-napoleon · 2021-10-26T20:47:16Z

Resolves #72276

Generally speaking, we can't detect field type mismatches between different shards until reduce time, which then causes us to fail the whole aggregation. There is an exception though when the user has specified a value type. Since the value type gets pushed out to all the shards, we can detect on the shard if the field type doesn't match the specified value type, and fail only that shard allowing for a partial result on the aggregation. In the case where the user supplies a script as well, we don't fail the shard, because it's possible the script changes the type (this was a pattern before runtime fields)

csoulios

LGTM! This completes the VS framework.

I was only thinking that if a shard contains no documents, we could be more lenient instead of failing. I don't think this is a big problem though. It's more a question than a suggestion.

csoulios · 2021-11-09T14:52:32Z

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search.aggregation/20_terms.yml

@@ -1368,3 +1368,67 @@ huge size:
  - match: { aggregations.str_terms.buckets.1.doc_count: 2 }
  - match: { aggregations.str_terms.buckets.2.key: c }
  - match: { aggregations.str_terms.buckets.2.doc_count: 3 }
+
+---
+Value type missmatch fails shard:


Typo: mismatch (not missmatch)

csoulios · 2021-11-09T15:11:26Z

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search.aggregation/20_terms.yml

+                field: ip
+                value_type: ip
+
+  - match: { _shards.failed: 1 }


I think this is a special case of the problem, because valuetype_test_1 contains no documents. Wouldn't it be ok to silently skip its shards instead of failing them?

Also, would it make sense to add a test where the index may contain documents?

So, there are two problems with silently skipping shards that don't have docs that I can think of. First, we don't know that until later in the process, and by that time we don't have an obvious way to check if we have a conflict. So we'd need to carry some "has type conflict" flag and fail if that's true and we have docs. Seems clunky. It also adds some leniency and unpredictability. If the query changes to include docs on those shards, now the aggregation starts failing. Seems confusing.

Furthermore, if there are no matching docs on the shard, failing it doesn't change the results. It just lets the user know there's a potential problem. More information is better.

And, finally, yes, it makes sense to add a test with docs in both indices. Will push one up shortly. Thanks!

not-napoleon · 2021-11-29T18:58:18Z

@elasticmachine run elasticsearch-ci/part-2

* upstream/master: (150 commits) Fix ComposableIndexTemplate equals when composed_of is null (elastic#80864) Optimize DLS bitset building for matchAll query (elastic#81030) URL option for BaseRunAsSuperuserCommand (elastic#81025) Less Verbose Serialization of Snapshot Failure in SLM Metadata (elastic#80942) Fix shadowed vars pt7 (elastic#80996) Fail shards early when we can detect a type missmatch (elastic#79869) Delegate Ref Counting to ByteBuf in Netty Transport (elastic#81096) Clarify `unassigned.reason` docs (elastic#81017) Strip blocks from settings for reindex targets (elastic#80887) Split off the values supplier for ScriptDocValues (elastic#80635) [ML] Switch message and detail for model snapshot deprecations (elastic#81108) [DOCS] Update xrefs for snapshot restore docs (elastic#81023) [ML] Updates visiblity of validate API (elastic#81061) Track histogram of transport handling times (elastic#80581) [ML] Fix datafeed preview with remote indices (elastic#81099) [ML] Fix acceptable model snapshot versions in ML deprecation checker (elastic#81060) [ML] Add logging for failing PyTorch test (elastic#81044) Extending the timeout waiting for snapshot to be ready (elastic#81018) [ML] Fix incorrect logging of unexpected model size error (elastic#81089) [ML] Make inference timeout test more reliable (elastic#81094) ... # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java

* upstream/master: (55 commits) Fix ComposableIndexTemplate equals when composed_of is null (elastic#80864) Optimize DLS bitset building for matchAll query (elastic#81030) URL option for BaseRunAsSuperuserCommand (elastic#81025) Less Verbose Serialization of Snapshot Failure in SLM Metadata (elastic#80942) Fix shadowed vars pt7 (elastic#80996) Fail shards early when we can detect a type missmatch (elastic#79869) Delegate Ref Counting to ByteBuf in Netty Transport (elastic#81096) Clarify `unassigned.reason` docs (elastic#81017) Strip blocks from settings for reindex targets (elastic#80887) Split off the values supplier for ScriptDocValues (elastic#80635) [ML] Switch message and detail for model snapshot deprecations (elastic#81108) [DOCS] Update xrefs for snapshot restore docs (elastic#81023) [ML] Updates visiblity of validate API (elastic#81061) Track histogram of transport handling times (elastic#80581) [ML] Fix datafeed preview with remote indices (elastic#81099) [ML] Fix acceptable model snapshot versions in ML deprecation checker (elastic#81060) [ML] Add logging for failing PyTorch test (elastic#81044) Extending the timeout waiting for snapshot to be ready (elastic#81018) [ML] Fix incorrect logging of unexpected model size error (elastic#81089) [ML] Make inference timeout test more reliable (elastic#81094) ...

Lots of tests!

47119d8

elasticsearchmachine added the v8.0.0 label Oct 26, 2021

not-napoleon added 9 commits October 27, 2021 16:39

slightly cleaner script compile injection

7e374a3

more VSC resolve tests

887bdc1

javadoc

64d82ea

Move value type/field conflict detection earlier in the process

4a0d8e6

formatting fix

77d621d

Merge branch 'master' into 72276-fail-shard-on-valuetype-missmatch

af886a1

formatting fix

04fcb13

Merge branch 'master' into 72276-fail-shard-on-valuetype-missmatch

179fcbe

fix skip version

3739806

not-napoleon added :Analytics/Aggregations Aggregations >bug v8.1.0 and removed v8.0.0 labels Nov 5, 2021

not-napoleon marked this pull request as ready for review November 5, 2021 14:15

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 5, 2021

not-napoleon added >enhancement and removed >bug labels Nov 5, 2021

csoulios approved these changes Nov 9, 2021

View reviewed changes

not-napoleon added 2 commits November 29, 2021 11:29

Merge branch 'master' into 72276-fail-shard-on-valuetype-missmatch

08f61ac

add another test, this time with docs

345b24a

not-napoleon merged commit 76e935e into elastic:master Nov 29, 2021

not-napoleon deleted the 72276-fail-shard-on-valuetype-missmatch branch November 29, 2021 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail shards early when we can detect a type missmatch #79869

Fail shards early when we can detect a type missmatch #79869

not-napoleon commented Oct 26, 2021 •

edited

Loading

csoulios left a comment

csoulios Nov 9, 2021

csoulios Nov 9, 2021

not-napoleon Nov 29, 2021

not-napoleon commented Nov 29, 2021

Fail shards early when we can detect a type missmatch #79869

Fail shards early when we can detect a type missmatch #79869

Conversation

not-napoleon commented Oct 26, 2021 • edited Loading

csoulios left a comment

Choose a reason for hiding this comment

csoulios Nov 9, 2021

Choose a reason for hiding this comment

csoulios Nov 9, 2021

Choose a reason for hiding this comment

not-napoleon Nov 29, 2021

Choose a reason for hiding this comment

not-napoleon commented Nov 29, 2021

not-napoleon commented Oct 26, 2021 •

edited

Loading