Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollup adding support for date field metrics (#34185) #34200

Merged

Conversation

benwtrent
Copy link
Member

This allows date data field types to be rolled up with our supported metric aggregations.

closes #34185

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are going to want to restrict the supported metrics to min, max, and value_counthere. The reason is that apart fromsumandavg` not making much sense, it would take less than 10,000,000 rolled up documents to overflow a long with dates which would probably lead to some very strange behaviours. I can see people reaching 10,000,000 original documents in a single rolled up document but more than that I can definitely see this overflow happening on the query side.

@benwtrent
Copy link
Member Author

@colings86 Think I addressed your issue, just waiting on @polyfractal for the final stamp of approval :)

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a nit comment but otherwise LGTM. I would like @polyfractal to give it the green light before this is merged though

@@ -36,6 +37,9 @@
public static final String ROLLUP_MISSING = "ROLLUP_MISSING_40710B25931745D4B0B8B310F6912A69";
public static final List<String> SUPPORTED_METRICS = Arrays.asList(MaxAggregationBuilder.NAME, MinAggregationBuilder.NAME,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we rename this SUPPORTED_NUMERIC_METRICS now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have SUPPORTED_NUMERIC_METRICS, SUPPORTED_DATE_METRICS, and then a union of those in a third list SUPPORTED_METRICS?

E.g. in the MetricsConfig constructor (and a few other places), we validate that an agg is one of the supported metrics. We don't yet have the field mapper type, so we can't validate numeric vs date field and their associated metrics, but it's a quick check to make sure the config is at least sane.

The current SUPPORTED_METRICS is technically a superset of the date metrics too so everything is fine. But I'm worried in the future the two lists may diverge for whatever reason, and we miss the validation because dates support something numerics don't, or vice versa.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I am making SUPPORTED_NUMERIC_METRICS and SUPPORTED_DATE_METRICS two separate lists with overlapping metrics.

Then there is a Set<String> that will be SUPPORTED_METRICS that is a union of the two (need to verify that the type change will not break other logic). This should lay the ground work/pattern for future non-overlapping supported metrics.

Copy link
Contributor

@polyfractal polyfractal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few questions, minor tweaks.

Also, I think we need to add a test to RollupRequestTranslationTests too. I believe the translation will "work" (not throw an error), but won't translate correctly due to field naming conventions.

Timestamps are stored as foo.date_histogram.timestamp + foo.date_histogram.interval + foo.date_histogram.time_zone. Metric values, however, are stored as foo.max.value.

Which means we treat all metrics as defaulting to .value when rewriting, and that won't match the .timestamp that date_histos use. I think that will break the rewriting, and we need to figure out a way to change from .value to .timestamp. I haven't looked to closely yet, but we could probably do something like detect if the agg is targeting the string "date_histogram" (since the convention is defined) and switch on that.

Probably good to verify on the response translation side too... but I think that side should "just work" once the request is rewritten correctly.

And yes, it would have been better if date_histos just used .value for their timestamp. alas :(

@@ -36,6 +37,9 @@
public static final String ROLLUP_MISSING = "ROLLUP_MISSING_40710B25931745D4B0B8B310F6912A69";
public static final List<String> SUPPORTED_METRICS = Arrays.asList(MaxAggregationBuilder.NAME, MinAggregationBuilder.NAME,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have SUPPORTED_NUMERIC_METRICS, SUPPORTED_DATE_METRICS, and then a union of those in a third list SUPPORTED_METRICS?

E.g. in the MetricsConfig constructor (and a few other places), we validate that an agg is one of the supported metrics. We don't yet have the field mapper type, so we can't validate numeric vs date field and their associated metrics, but it's a quick check to make sure the config is at least sane.

The current SUPPORTED_METRICS is technically a superset of the date metrics too so everything is fine. But I'm worried in the future the two lists may diverge for whatever reason, and we miss the validation because dates support something numerics don't, or vice versa.

@@ -108,18 +109,26 @@ public void validateMappings(Map<String, Map<String, FieldCapabilities>> fieldCa
Map<String, FieldCapabilities> fieldCaps = fieldCapsResponse.get(field);
if (fieldCaps != null && fieldCaps.isEmpty() == false) {
fieldCaps.forEach((key, value) -> {
if (RollupField.NUMERIC_FIELD_MAPPER_TYPES.contains(key)) {
if (RollupField.NUMERIC_FIELD_MAPPER_TYPES.contains(key) || RollupField.DATE_FIELD_MAPPER_TYPE.equals(key)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that there are two cases, I wonder if we should push the isAggregatable() check out of the type conditionals (e.g. apply to everything), then have an if...if else...else for the types? That way we won't need another conditional inside the first that specializes for the date type.

Not sure how that would look, so the current way is fine if that ends up being messier. :)

Has a nice side effect that the validation error will include both a message about non-aggregatable field as well as the metric missing at the same time.

RollupField.SUPPORTED_DATE_METRICS.containsAll(metrics) == false) {
List<String> unsupportedMetrics = new ArrayList<>(metrics);
unsupportedMetrics.removeAll(RollupField.SUPPORTED_DATE_METRICS);
validationException.addValidationError("Only the metrics " + RollupField.SUPPORTED_DATE_METRICS.toString() +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super tiny nit: could we wrap the SUPPORTED_DATE_METRICS with brackets ([...])? Makes it easier for the user to see the difference in message and dynamic values.

Ditto to below with unsupportedMetrics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@polyfractal A Collection<String> object when transformed to a String transforms to a [Metric1, Metric2]. To clarify, you want me to add another bracket so that the printout will be [[Metric1, Metric2]] ?

Reference Repl: https://repl.it/@ben_w_trent/QuerulousSubtleApplicationframework

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nope, I'm just bad at Java. [Metric1, Metric2] is exactly what I was wanting. :)

@polyfractal
Copy link
Contributor

Chatted on Slack, the rewriting .timestamp business doesn't apply (I confused myself). We will be adding distinct min/max metrics on the timestamp field, rather than reusing the .timestamp field that the date_histo generates.

Introduces a bit of duplication in the document but keeps everything simple.

@polyfractal
Copy link
Contributor

LGTM thanks! <3

@benwtrent benwtrent merged commit af2877c into elastic:master Oct 16, 2018
@benwtrent benwtrent deleted the feature/rollup-add-date-metric-support branch October 16, 2018 13:55
benwtrent added a commit that referenced this pull request Oct 16, 2018
* Rollup adding support for date field metrics (#34185)

* Restricting supported metrics for `date` field rollup

* fixing expected error message for yaml test

* Addressing PR comments
kcm pushed a commit that referenced this pull request Oct 30, 2018
* Rollup adding support for date field metrics (#34185)

* Restricting supported metrics for `date` field rollup

* fixing expected error message for yaml test

* Addressing PR comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add date value type support for certain metrics in Rollup
5 participants