Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: emit warnings from single-value functions processing multi-values #102417

Merged
merged 21 commits into from
Dec 5, 2023

Conversation

bpintea
Copy link
Contributor

@bpintea bpintea commented Nov 21, 2023

When encountering a multi-value, a single-value function (i.e. all non-mv_xxx()) returns a null. This behaviour is opaque to the user. This PR adds the functionality for these functions to emit a Warning header, so the user is informed about the cause for the nulls.

Within testing, there are some differences between the emulated CSV-based tests (TestPhysical*) and the REST CSV-tests and thus the exact messages in the warnings:

  • The REST ones can push operations to Lucene; when this happens, a query containing a negation, not <predicate>, can be translated to a must_not query, that will include the not in the Source. But outside of Lucene, the execution would consider the predicate first, then the negation. So when the predicate contains a SV function, only this part's Source will show up in the warning.
  • When pushed to Lucene, a query is wrapped within the SingleValueQuery. This emits now warnings when encountering MVs (and returning no match). However, this only happens once the query that it wraps returns something itself. Comparatively, the TestPhysical* filters will issue a warning for every encountered MV (irrespective of sigle values within the MV matching or not).

To differentiate between the slightly differing values of the warnings, one can now append the #[Emulated: prefix to a warning, followed by the value of the warning for the emulated checks, then a corresponding ].
Example: warning:Line 1:24: evaluation of [not(salary_change < 1)] failed, treating result as null. Only first 20 failures recorded.#[Emulated:Line 1:28: evaluation of [salary_change < 1] failed, treating result as null. Only first 20 failures recorded.]

Closes #98743.

This adds Warning headers when scalar singlevalue functions enounter
multivalues (and emits a `null`).
This adds warning generation to SingleValueQuery.

Two issues with this approch, however:
- the source is serialized with the text, the EsqlConfiguration isn't
  made available.
- the translation of queries for Lucene differs from the exec engine, in
  respect to not(): `must_not` query "contains" the `not` and includes
  it in its source, while in the exec engine its execution happens in
  two separate steps (so the warning is added to the predicate the not()
  negates).
Allow slightly different warnings for Rest and CSV tests.
Currently, this was done on a `"values_count()" != 1`, which is wrong,
since it has to differentiate from a 0 count.
@elasticsearchmachine
Copy link
Collaborator

Hi @bpintea, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Hi @bpintea, I've updated the changelog YAML for you.

@bpintea bpintea marked this pull request as ready for review November 21, 2023 18:38
@elasticsearchmachine elasticsearchmachine added the Team:QL (Deprecated) Meta label for query languages team label Nov 21, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I just left a comment about the serial compatibility.

}

Builder(StreamInput in) throws IOException {
super(in);
this.next = in.readNamedWriteable(QueryBuilder.class);
this.field = in.readString();
this.stats = new Stats();
this.source = readSource(in);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to bump the transport version for this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, where do we serialise this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to bump the transport version for this change?

Good point, we do. I've added a new transport version.

Actually, where do we serialise this?

The doWriteTo() method below updates the serialisation based on the transport version.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I only have minor remarks except for what Chris already pointed out.

Like Chris, I think we'll need to bump transport version since we're now sending the source along with single value queries.

Do we need to account for this in other ways as well? I guess if we have a transport connection to 8.11 nodes, we could just not send the source to avoid breaking bwc?

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bpintea bpintea added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Dec 4, 2023
@bpintea
Copy link
Contributor Author

bpintea commented Dec 4, 2023

@elasticsearchmachine run elasticsearch-ci/docs

@costin
Copy link
Member

costin commented Dec 5, 2023

This is a big usability improvement - let's get this in for 8.12.
From my side, I wonder if we could remove the exception classname and just rely on the error message itself as that's what the user cares about.
Talking about the message, how about pointing the user either to the docs (chapter/section) or specifying the workaround - use mv_expand.

@bpintea
Copy link
Contributor Author

bpintea commented Dec 5, 2023

@elasticsearchmachine run elasticsearch-ci/part-3 elasticsearch-ci

@elasticsearchmachine elasticsearchmachine merged commit 8d0551e into elastic:main Dec 5, 2023
15 checks passed
@bpintea bpintea deleted the esql/warn_sv_func_on_mv2 branch December 5, 2023 10:51
@bpintea
Copy link
Contributor Author

bpintea commented Dec 5, 2023

I've merged this as-is, since it bumps the TransportVersions, which is popular before FF.

From my side, I wonder if we could remove the exception classname and just rely on the error message itself as that's what the user cares about.

We could, yes. The only two concerns I'd have is that:

  • it would be slightly misaligned with the general failure response;
  • it might be misleading in some of the failures, like number parsing: java.lang.NumberFormatException: For input string: "xx" -- the way the warning is generated is reused in SingleValueQuery.

Talking about the message, how about pointing the user either to the docs (chapter/section) or specifying the workaround - use mv_expand.

I'll update the message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >feature Team:QL (Deprecated) Meta label for query languages team v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: emit warnings from all functions returning nulls on multivalues
6 participants