Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail shards early when we can detect a type missmatch #79869

Merged

Conversation

not-napoleon
Copy link
Member

@not-napoleon not-napoleon commented Oct 26, 2021

Resolves #72276

Generally speaking, we can't detect field type mismatches between different shards until reduce time, which then causes us to fail the whole aggregation. There is an exception though when the user has specified a value type. Since the value type gets pushed out to all the shards, we can detect on the shard if the field type doesn't match the specified value type, and fail only that shard allowing for a partial result on the aggregation. In the case where the user supplies a script as well, we don't fail the shard, because it's possible the script changes the type (this was a pattern before runtime fields)

@not-napoleon not-napoleon marked this pull request as ready for review November 5, 2021 14:15
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 5, 2021
Copy link
Contributor

@csoulios csoulios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This completes the VS framework.

I was only thinking that if a shard contains no documents, we could be more lenient instead of failing. I don't think this is a big problem though. It's more a question than a suggestion.

@@ -1368,3 +1368,67 @@ huge size:
- match: { aggregations.str_terms.buckets.1.doc_count: 2 }
- match: { aggregations.str_terms.buckets.2.key: c }
- match: { aggregations.str_terms.buckets.2.doc_count: 3 }

---
Value type missmatch fails shard:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: mismatch (not missmatch)

field: ip
value_type: ip

- match: { _shards.failed: 1 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a special case of the problem, because valuetype_test_1 contains no documents. Wouldn't it be ok to silently skip its shards instead of failing them?

Also, would it make sense to add a test where the index may contain documents?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there are two problems with silently skipping shards that don't have docs that I can think of. First, we don't know that until later in the process, and by that time we don't have an obvious way to check if we have a conflict. So we'd need to carry some "has type conflict" flag and fail if that's true and we have docs. Seems clunky. It also adds some leniency and unpredictability. If the query changes to include docs on those shards, now the aggregation starts failing. Seems confusing.

Furthermore, if there are no matching docs on the shard, failing it doesn't change the results. It just lets the user know there's a potential problem. More information is better.

And, finally, yes, it makes sense to add a test with docs in both indices. Will push one up shortly. Thanks!

@not-napoleon
Copy link
Member Author

@elasticmachine run elasticsearch-ci/part-2

@not-napoleon not-napoleon merged commit 76e935e into elastic:master Nov 29, 2021
@not-napoleon not-napoleon deleted the 72276-fail-shard-on-valuetype-missmatch branch November 29, 2021 19:56
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Nov 30, 2021
* upstream/master: (150 commits)
  Fix ComposableIndexTemplate equals when composed_of is null (elastic#80864)
  Optimize DLS bitset building for matchAll query (elastic#81030)
  URL option for BaseRunAsSuperuserCommand (elastic#81025)
  Less Verbose Serialization of Snapshot Failure in SLM Metadata (elastic#80942)
  Fix shadowed vars pt7 (elastic#80996)
  Fail shards early when we can detect a type missmatch (elastic#79869)
  Delegate Ref Counting to ByteBuf in Netty Transport (elastic#81096)
  Clarify `unassigned.reason` docs (elastic#81017)
  Strip blocks from settings for reindex targets (elastic#80887)
  Split off the values supplier for ScriptDocValues (elastic#80635)
  [ML] Switch message and detail for model snapshot deprecations (elastic#81108)
  [DOCS] Update xrefs for snapshot restore docs (elastic#81023)
  [ML] Updates visiblity of validate API (elastic#81061)
  Track histogram of transport handling times (elastic#80581)
  [ML] Fix datafeed preview with remote indices (elastic#81099)
  [ML] Fix acceptable model snapshot versions in ML deprecation checker (elastic#81060)
  [ML] Add logging for failing PyTorch test (elastic#81044)
  Extending the timeout waiting for snapshot to be ready (elastic#81018)
  [ML] Fix incorrect logging of unexpected model size error (elastic#81089)
  [ML] Make inference timeout test more reliable (elastic#81094)
  ...

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Nov 30, 2021
* upstream/master: (55 commits)
  Fix ComposableIndexTemplate equals when composed_of is null (elastic#80864)
  Optimize DLS bitset building for matchAll query (elastic#81030)
  URL option for BaseRunAsSuperuserCommand (elastic#81025)
  Less Verbose Serialization of Snapshot Failure in SLM Metadata (elastic#80942)
  Fix shadowed vars pt7 (elastic#80996)
  Fail shards early when we can detect a type missmatch (elastic#79869)
  Delegate Ref Counting to ByteBuf in Netty Transport (elastic#81096)
  Clarify `unassigned.reason` docs (elastic#81017)
  Strip blocks from settings for reindex targets (elastic#80887)
  Split off the values supplier for ScriptDocValues (elastic#80635)
  [ML] Switch message and detail for model snapshot deprecations (elastic#81108)
  [DOCS] Update xrefs for snapshot restore docs (elastic#81023)
  [ML] Updates visiblity of validate API (elastic#81061)
  Track histogram of transport handling times (elastic#80581)
  [ML] Fix datafeed preview with remote indices (elastic#81099)
  [ML] Fix acceptable model snapshot versions in ML deprecation checker (elastic#81060)
  [ML] Add logging for failing PyTorch test (elastic#81044)
  Extending the timeout waiting for snapshot to be ready (elastic#81018)
  [ML] Fix incorrect logging of unexpected model size error (elastic#81089)
  [ML] Make inference timeout test more reliable (elastic#81094)
  ...
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Nov 30, 2021
* upstream/master: (55 commits)
  Fix ComposableIndexTemplate equals when composed_of is null (elastic#80864)
  Optimize DLS bitset building for matchAll query (elastic#81030)
  URL option for BaseRunAsSuperuserCommand (elastic#81025)
  Less Verbose Serialization of Snapshot Failure in SLM Metadata (elastic#80942)
  Fix shadowed vars pt7 (elastic#80996)
  Fail shards early when we can detect a type missmatch (elastic#79869)
  Delegate Ref Counting to ByteBuf in Netty Transport (elastic#81096)
  Clarify `unassigned.reason` docs (elastic#81017)
  Strip blocks from settings for reindex targets (elastic#80887)
  Split off the values supplier for ScriptDocValues (elastic#80635)
  [ML] Switch message and detail for model snapshot deprecations (elastic#81108)
  [DOCS] Update xrefs for snapshot restore docs (elastic#81023)
  [ML] Updates visiblity of validate API (elastic#81061)
  Track histogram of transport handling times (elastic#80581)
  [ML] Fix datafeed preview with remote indices (elastic#81099)
  [ML] Fix acceptable model snapshot versions in ML deprecation checker (elastic#81060)
  [ML] Add logging for failing PyTorch test (elastic#81044)
  Extending the timeout waiting for snapshot to be ready (elastic#81018)
  [ML] Fix incorrect logging of unexpected model size error (elastic#81089)
  [ML] Make inference timeout test more reliable (elastic#81094)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fail aggregations on shards if the value_type is incompatible with the field
4 participants