Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support metrics counter types in ESQL #107877

Merged
merged 14 commits into from Apr 26, 2024
Merged

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 24, 2024

This commit adds support for numeric metrics counter fields in ES|QL. These counter types, including counter_long, counter_integer, and counter_double, are different from their parent types. Users will have limited interaction with these counter types, restricted to:

  • Retrieving values without any processing
  • Casting to their root type (e.g., to_long(a_long_counter))
  • Using them in the metrics rate aggregation

These restrictions are intentional to prevent misuse. If users want to use them as numeric values, explicit casting to their root types is required.

@dnhatn dnhatn changed the title Support counter type in ESQL Support metrics counter types in ESQL Apr 24, 2024
@dnhatn dnhatn force-pushed the enable-counter-type branch 11 times, most recently from 84cd4ee to 68efa9a Compare April 25, 2024 05:23
@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn dnhatn marked this pull request as ready for review April 25, 2024 06:25
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels Apr 25, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, but there are edge cases that might need some refining:

  • stats min(tx) -> argument of [min(tx)] must be [datetime or numeric except unsigned_long], found value [min(tx)] type [counter_long]. It's good we have a verification exception, but its message is not entirely true
  • tests are needed in VerifierTests for error messages when using these types in various places in queries
  • it seems I cannot aggregate on these types, but I can group by them (ie min(time) by counter). Is this expected?
  • count(counter) seems to also work...

@astefan astefan requested a review from bpintea April 25, 2024 13:55
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some questions. Probably good to go but I'll recheck after you answer.

@@ -212,7 +219,8 @@ public static boolean isRepresentable(DataType t) {
&& t != FLOAT
&& t != SCALED_FLOAT
&& t != SOURCE
&& t != HALF_FLOAT;
&& t != HALF_FLOAT
&& isCounterType(t) == false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think counters are representable though. They can be loaded into blocks now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disabled this mainly to avoid the map in AbstractFunctionTestCase #107877 (comment). Would you be okay if I enable this in a follow-up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Delaying is fine.

DataTypes.NULL
),
"boolean or counter_double or datetime or numeric or string"
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've grown to hate this Map. I made it, but I dislike it....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've done the right thing to it, but I'm sorry you had to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another version of errorsForCasesWithoutExamples which takes another parameter that outputs a matcher for hte expected type. You could use that rather than changing the map.

@dnhatn dnhatn requested a review from nik9000 April 25, 2024 16:29
@dnhatn
Copy link
Member Author

dnhatn commented Apr 25, 2024

count(counter) seems to also work...

Should we allow counter_int and counter_long in all of these too?

Good question. Some usages, such as count(counter) or to_string(counter), seem to make sense. Should we support them all, or still require casting in these cases?

@dnhatn
Copy link
Member Author

dnhatn commented Apr 25, 2024

@astefan I pushed 5b09bdf to add verification tests and restrict usages in count, aggregation, and grouping.

@dnhatn dnhatn requested a review from astefan April 25, 2024 17:59
@astefan
Copy link
Contributor

astefan commented Apr 25, 2024

count(counter) seems to also work...

Should we allow counter_int and counter_long in all of these too?

Good question. Some usages, such as count(counter) or to_string(counter), seem to make sense. Should we support them all, or still require casting in these cases?

I would be cautious and start with requiring casting everywhere, there may be edge cases that we couldn't think of atm and it may be difficult to remove the support once it's merged. If we get demand for this, we'll analyze these requests and adapt at that time.

@dnhatn
Copy link
Member Author

dnhatn commented Apr 25, 2024

I would be cautious and start with requiring casting everywhere, there may be edge cases that we couldn't think of atm and it may be difficult to remove the support once it's merged. If we get demand for this, we'll analyze these requests and adapt at that time.

Thanks @astefan. We are on the same page.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks Nhat. Outside some superficial nits, the main comment I have is around CSV tests with counters in various places such as:

  • filtering (WHERE counter > 10)
  • expressions both named and anonymous + scalar functions (EVAL X = counter % 3 > 2 | WHERE counter_long > counter_double)
  • using it in agg functions MAX(counter), MIN(counter % 2)

Thanks!

@@ -344,7 +344,8 @@ private PhysicalOperation planTopN(TopNExec topNExec, LocalExecutionPlannerConte
case "version" -> TopNEncoder.VERSION;
case "boolean", "null", "byte", "short", "integer", "long", "double", "float", "half_float", "datetime", "date_period",
"time_duration", "object", "nested", "scaled_float", "unsigned_long", "_doc" -> TopNEncoder.DEFAULT_SORTABLE;
case "geo_point", "cartesian_point", "geo_shape", "cartesian_shape" -> TopNEncoder.DEFAULT_UNSORTABLE;
case "geo_point", "cartesian_point", "geo_shape", "cartesian_shape", "counter_long", "counter_integer", "counter_double" ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate from this PR but it makes sense to centralize these strings in one class to avoid bugs due to typos.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from Costin's comments (especially the additional tests), it's LGTM.

@nik9000
Copy link
Member

nik9000 commented Apr 26, 2024

Outside some superficial nits, the main comment I have is around CSV tests with counters in various places such as:

* filtering (WHERE counter > 10)

* expressions both named and anonymous  + scalar functions (EVAL X = counter % 3 > 2 | WHERE counter_long > counter_double)

* using it in agg functions MAX(counter), MIN(counter % 2)

I think all of those should fail unless you wrap them in to_integer or whatever. Which we may as well test paranoia. Tests are cheap and run fast and make sure future me doesn't break them. Future me is a menace.

@dnhatn
Copy link
Member Author

dnhatn commented Apr 26, 2024

Thank you all for your reviews. I'll have to delay the CSV tests because I'm going to build a CSV dataset for the tsdb suite. Also, the CSV test infra doesn't support error checking yet. The YAML tests should have good coverage.

@dnhatn dnhatn merged commit 22aad7b into elastic:main Apr 26, 2024
14 checks passed
@dnhatn dnhatn deleted the enable-counter-type branch April 26, 2024 19:15
dnhatn added a commit that referenced this pull request Apr 29, 2024
We should widen the numeric root types before converting 
them into counter types.

Relates #107877
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants