Support widening of numeric types in union-types #112610

craigtaverner · 2024-09-06T17:16:14Z

Only two lines of this PR are the actual fix.

All the rest is updating the CSV-spec testing infrastructure to make it easier to test this, and adding the tests.

The refactoring also involved some cleanup and simplifications. This update allows us to add alternative mappings of existing data files without copying the files and changing the header line. Some of the existing union-types test files were deleted as a result, which is a step in the right direction.

Fixes #111277

Only two lines of this PR are the actual fix. All the rest is updating the CSV-spec testing infrastructure to make it easier to test this, and adding the tests. The refactoring involve some cleanup and simplifications also. This update allows us to add alternative mappings of existing data files without copying the files and changing the header line. Some of the existing union-types test files were deleted as a result, which is a step in the right direction.

elasticsearchmachine · 2024-09-06T17:16:38Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2024-09-06T17:17:00Z

Hi @craigtaverner, I've created a changelog YAML for you.

alex-spies

I love the improvements to the csv data loader.

I think we could increase test coverage a little; and, while we're at it, tidy up usage of widened data types in field attributes a little, since I think there are some footguns in place in the form of unwritten invariants, that we may very well make written.

But this is a nice PR and can also be merged as-is IMHO.

alex-spies · 2024-09-09T14:22:30Z

...org/elasticsearch/xpack/esql/expression/function/scalar/convert/AbstractConvertFunction.java

@@ -63,7 +63,7 @@ protected AbstractConvertFunction(StreamInput in) throws IOException {
     * Build the evaluator given the evaluator a multivalued field.
     */
    protected final ExpressionEvaluator.Factory evaluator(ExpressionEvaluator.Factory fieldEval) {
-        DataType sourceType = field().dataType();
+        DataType sourceType = field().dataType().widenSmallNumeric();


Uhh, how did that field not end up being widened in the first place? In Analyzer.mappingAsAttributes, line 240, each field attribute's field should already have been widened, including the contained EsField's type.

What fails if we do not widen here? I'd like to figure out if this is really necessary.

Also, to avoid confusion in the future, do you think we should add an assert to FieldAttribute's constructor to ensure it only ever gets built with already widened data types?

I was surprised this failed too. It fails later in planning, I don't remember the exact stack trace, but the error was that this function was being called with an unsupported type (the narrow type). I understood the evaluator function to be called much later, so I was surprised by this too. I could re-investigate because perhaps there is a code path to this that is happening unnecessarily early?

I think this deserves a follow up, because there begin to be many places were widening is necessary - and we only find out by failed queries/tests :/ I opened #112691.

I investigated, and this exception is thrown down during query execution (ie. after local physical planning) during the setup of the type converting block loader at https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/EsPhysicalOperationProviders.java#L384.

This is well past any analyzing, planning phases, so it is clear that the types are not widened in general, but only in specific cases. A followup investigation makes a lot of sense.

For completeness, this is the stack trace:

org.elasticsearch.xpack.esql.EsqlIllegalArgumentException: illegal data type [short] at org.elasticsearch.xpack.esql.EsqlIllegalArgumentException.illegalDataType(EsqlIllegalArgumentException.java:43) at org.elasticsearch.xpack.esql.EsqlIllegalArgumentException.illegalDataType(EsqlIllegalArgumentException.java:39) at org.elasticsearch.xpack.esql.expression.function.scalar.convert.AbstractConvertFunction.evaluator(AbstractConvertFunction.java:69) at org.elasticsearch.xpack.esql.expression.function.scalar.convert.AbstractConvertFunction.toEvaluator(AbstractConvertFunction.java:128) at org.elasticsearch.xpack.esql.planner.EsPhysicalOperationProviders.<init>(EsPhysicalOperationProviders.java:384) at org.elasticsearch.xpack.esql.planner.EsPhysicalOperationProviders.getBlockLoaderFor(EsPhysicalOperationProviders.java:143) at org.elasticsearch.xpack.esql.planner.EsPhysicalOperationProviders.lambda(EsPhysicalOperationProviders.java:123) at org.elasticsearch.compute.lucene.ValuesSourceReaderOperator.newShard(ValuesSourceReaderOperator.java:470) at org.elasticsearch.compute.lucene.ValuesSourceReaderOperator.positionFieldWork(ValuesSourceReaderOperator.java:195) at org.elasticsearch.compute.lucene.ValuesSourceReaderOperator.loadFromSingleLeaf(ValuesSourceReaderOperator.java:220) at org.elasticsearch.compute.lucene.ValuesSourceReaderOperator.process(ValuesSourceReaderOperator.java:143) at org.elasticsearch.compute.operator.AbstractPageMappingOperator.getOutput(AbstractPageMappingOperator.java:76) at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:258) at org.elasticsearch.compute.operator.Driver.run(Driver.java:189) at org.elasticsearch.compute.operator.Driver.doRun(Driver.java:378)

alex-spies · 2024-09-09T14:24:55Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

-                    // TODO: Shouldn't we perform widening of small numerical types here?
-                    if (supportedTypes.contains(type)) {
+                    if (supportedTypes.contains(type.widenSmallNumeric())) {


Contrary to the funny TODO that I left here, haha, maybe we should actually widen at the point where the InvalidMappedField gets created? That'd be consistent with how we widen the types of regular EsFields in Analyzer.mappingAsAttributes. If we do so, we could enforce this as an invariant of FieldAttribute (so that when it contains an IMF, it has to be widened).

alex-spies · 2024-09-09T14:33:11Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/union_types.csv-spec

+required_capability: union_types_numeric_widening
+
+FROM apps, apps_short METADATA _index
+| EVAL id = id::integer


This is cool!

However, to reduce risk some more (some data types can be quirky and stuff could happen at block loading time, I guess, to types like float and byte ...); how about we add to this/replace this by a parameterized integration test that exercises all widened data types?

RestEsqlTestCase is quite nice and, like csv tests, is executed in multiple environments (single/multi node etc.).

Tests in RestEsqlTestCase also have the benefit that we can more easily test cases with more than 2 different types becoming a union type. E.g. float, byte and, I dunno, keyword being stuffed into to_string.

alex-spies · 2024-09-09T14:47:10Z

...ugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvTestsDataLoader.java

+    private static final TestsDataset EMPLOYEES = new TestsDataset("employees", "mapping-default.json", "employees.csv").noSubfields();
+    private static final TestsDataset HOSTS = new TestsDataset("hosts");
+    private static final TestsDataset APPS = new TestsDataset("apps");
+    private static final TestsDataset APPS_SHORT = APPS.withIndex("apps_short").withTypeMapping(Map.of("id", "short"));


…tion-ironbank-ubi * upstream/main: (302 commits) Deduplicate BucketOrder when deserializing (elastic#112707) Introduce test utils for ingest pipelines (elastic#112733) [Test] Account for auto-repairing for shard gen file (elastic#112778) Do not throw in task enqueued by CancellableRunner (elastic#112780) Mute org.elasticsearch.script.StatsSummaryTests testEqualsAndHashCode elastic#112439 Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testTransportException elastic#112779 Use a dedicated test executor in MockTransportService (elastic#112748) Estimate segment field usages (elastic#112760) (Doc+) Inference Pipeline ignores Mapping Analyzers (elastic#112522) Fix verifyVersions task (elastic#112765) (Doc+) Terminating Exit Codes (elastic#112530) (Doc+) CAT Nodes default columns (elastic#112715) [DOCS] Augment installation warnings (elastic#112756) Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testCorruption elastic#112769 Bump Elasticsearch to a minimum of JDK 21 (elastic#112252) ESQL: Compute support for filtering ungrouped aggs (elastic#112717) Bump Elasticsearch version to 9.0.0 (elastic#112570) add CDR related data streams to kibana_system priviliges (elastic#112655) Support widening of numeric types in union-types (elastic#112610) Introduce data stream options and failure store configuration classes (elastic#109515) ...

* Support widening of numeric types in union-types Only two lines of this PR are the actual fix. All the rest is updating the CSV-spec testing infrastructure to make it easier to test this, and adding the tests. The refactoring involve some cleanup and simplifications also. This update allows us to add alternative mappings of existing data files without copying the files and changing the header line. Some of the existing union-types test files were deleted as a result, which is a step in the right direction. * Update docs/changelog/112610.yaml * Link capability to PR

…112821) * Fix union-types where one index is missing the field (#111932) * Fix union-types where one index is missing the field When none of the indexes has the field, a validation error is correctly thrown, and when all indexes have the field, union-types works as normal. But when some indexes have the field and some do not, we were getting and internal error. We treat this case similarly to when some documents are missing the field, in which case `null` values are produced. So now a multi-index query where some indexes are missing the field will produce nulls for the documents coming from those indexes. * Update docs/changelog/111932.yaml * Added capability for this fix (missing-field) * Support widening of numeric types in union-types (#112610) * Support widening of numeric types in union-types Only two lines of this PR are the actual fix. All the rest is updating the CSV-spec testing infrastructure to make it easier to test this, and adding the tests. The refactoring involve some cleanup and simplifications also. This update allows us to add alternative mappings of existing data files without copying the files and changing the header line. Some of the existing union-types test files were deleted as a result, which is a step in the right direction. * Update docs/changelog/112610.yaml * Link capability to PR

craigtaverner added >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Sep 6, 2024

craigtaverner requested a review from alex-spies September 6, 2024 17:16

craigtaverner requested a review from a team as a code owner September 6, 2024 17:16

elasticsearchmachine added the v8.16.0 label Sep 6, 2024

Update docs/changelog/112610.yaml

ad0bf9e

Link capability to PR

b06894c

alex-spies approved these changes Sep 9, 2024

View reviewed changes

alex-spies mentioned this pull request Sep 10, 2024

ESQL: Ensure data types in field attributes are always widened #112691

Open

craigtaverner merged commit bb872e6 into elastic:main Sep 11, 2024
15 checks passed

craigtaverner mentioned this pull request Sep 12, 2024

Backporting two union-types fixes to 8.15 (from 111932 and 112610) #112821

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support widening of numeric types in union-types #112610

Support widening of numeric types in union-types #112610

craigtaverner commented Sep 6, 2024

elasticsearchmachine commented Sep 6, 2024

elasticsearchmachine commented Sep 6, 2024

alex-spies left a comment

alex-spies Sep 9, 2024

craigtaverner Sep 9, 2024

alex-spies Sep 10, 2024

craigtaverner Sep 11, 2024

craigtaverner Sep 11, 2024

alex-spies Sep 9, 2024

alex-spies Sep 9, 2024

alex-spies Sep 9, 2024

alex-spies Sep 9, 2024

Support widening of numeric types in union-types #112610

Support widening of numeric types in union-types #112610

Conversation

craigtaverner commented Sep 6, 2024

elasticsearchmachine commented Sep 6, 2024

elasticsearchmachine commented Sep 6, 2024

alex-spies left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment