New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rollup adding support for date field metrics (#34185) #34200
Rollup adding support for date field metrics (#34185) #34200
Conversation
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are going to want to restrict the supported metrics to min
, max, and
value_counthere. The reason is that apart from
sumand
avg` not making much sense, it would take less than 10,000,000 rolled up documents to overflow a long with dates which would probably lead to some very strange behaviours. I can see people reaching 10,000,000 original documents in a single rolled up document but more than that I can definitely see this overflow happening on the query side.
@colings86 Think I addressed your issue, just waiting on @polyfractal for the final stamp of approval :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a nit comment but otherwise LGTM. I would like @polyfractal to give it the green light before this is merged though
@@ -36,6 +37,9 @@ | |||
public static final String ROLLUP_MISSING = "ROLLUP_MISSING_40710B25931745D4B0B8B310F6912A69"; | |||
public static final List<String> SUPPORTED_METRICS = Arrays.asList(MaxAggregationBuilder.NAME, MinAggregationBuilder.NAME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we rename this SUPPORTED_NUMERIC_METRICS
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should have SUPPORTED_NUMERIC_METRICS
, SUPPORTED_DATE_METRICS
, and then a union of those in a third list SUPPORTED_METRICS
?
E.g. in the MetricsConfig constructor (and a few other places), we validate that an agg is one of the supported metrics. We don't yet have the field mapper type, so we can't validate numeric vs date field and their associated metrics, but it's a quick check to make sure the config is at least sane.
The current SUPPORTED_METRICS
is technically a superset of the date metrics too so everything is fine. But I'm worried in the future the two lists may diverge for whatever reason, and we miss the validation because dates support something numerics don't, or vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I am making SUPPORTED_NUMERIC_METRICS
and SUPPORTED_DATE_METRICS
two separate lists with overlapping metrics.
Then there is a Set<String>
that will be SUPPORTED_METRICS
that is a union of the two (need to verify that the type change will not break other logic). This should lay the ground work/pattern for future non-overlapping supported metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few questions, minor tweaks.
Also, I think we need to add a test to RollupRequestTranslationTests
too. I believe the translation will "work" (not throw an error), but won't translate correctly due to field naming conventions.
Timestamps are stored as foo.date_histogram.timestamp
+ foo.date_histogram.interval
+ foo.date_histogram.time_zone
. Metric values, however, are stored as foo.max.value
.
Which means we treat all metrics as defaulting to .value
when rewriting, and that won't match the .timestamp
that date_histos use. I think that will break the rewriting, and we need to figure out a way to change from .value
to .timestamp
. I haven't looked to closely yet, but we could probably do something like detect if the agg is targeting the string "date_histogram"
(since the convention is defined) and switch on that.
Probably good to verify on the response translation side too... but I think that side should "just work" once the request is rewritten correctly.
And yes, it would have been better if date_histos just used .value
for their timestamp. alas :(
@@ -36,6 +37,9 @@ | |||
public static final String ROLLUP_MISSING = "ROLLUP_MISSING_40710B25931745D4B0B8B310F6912A69"; | |||
public static final List<String> SUPPORTED_METRICS = Arrays.asList(MaxAggregationBuilder.NAME, MinAggregationBuilder.NAME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should have SUPPORTED_NUMERIC_METRICS
, SUPPORTED_DATE_METRICS
, and then a union of those in a third list SUPPORTED_METRICS
?
E.g. in the MetricsConfig constructor (and a few other places), we validate that an agg is one of the supported metrics. We don't yet have the field mapper type, so we can't validate numeric vs date field and their associated metrics, but it's a quick check to make sure the config is at least sane.
The current SUPPORTED_METRICS
is technically a superset of the date metrics too so everything is fine. But I'm worried in the future the two lists may diverge for whatever reason, and we miss the validation because dates support something numerics don't, or vice versa.
@@ -108,18 +109,26 @@ public void validateMappings(Map<String, Map<String, FieldCapabilities>> fieldCa | |||
Map<String, FieldCapabilities> fieldCaps = fieldCapsResponse.get(field); | |||
if (fieldCaps != null && fieldCaps.isEmpty() == false) { | |||
fieldCaps.forEach((key, value) -> { | |||
if (RollupField.NUMERIC_FIELD_MAPPER_TYPES.contains(key)) { | |||
if (RollupField.NUMERIC_FIELD_MAPPER_TYPES.contains(key) || RollupField.DATE_FIELD_MAPPER_TYPE.equals(key)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that there are two cases, I wonder if we should push the isAggregatable() check out of the type conditionals (e.g. apply to everything), then have an if...if else...else
for the types? That way we won't need another conditional inside the first that specializes for the date type.
Not sure how that would look, so the current way is fine if that ends up being messier. :)
Has a nice side effect that the validation error will include both a message about non-aggregatable field as well as the metric missing at the same time.
RollupField.SUPPORTED_DATE_METRICS.containsAll(metrics) == false) { | ||
List<String> unsupportedMetrics = new ArrayList<>(metrics); | ||
unsupportedMetrics.removeAll(RollupField.SUPPORTED_DATE_METRICS); | ||
validationException.addValidationError("Only the metrics " + RollupField.SUPPORTED_DATE_METRICS.toString() + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super tiny nit: could we wrap the SUPPORTED_DATE_METRICS
with brackets ([...]
)? Makes it easier for the user to see the difference in message and dynamic values.
Ditto to below with unsupportedMetrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@polyfractal A Collection<String>
object when transformed to a String
transforms to a [Metric1, Metric2]
. To clarify, you want me to add another bracket so that the printout will be [[Metric1, Metric2]]
?
Reference Repl: https://repl.it/@ben_w_trent/QuerulousSubtleApplicationframework
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nope, I'm just bad at Java. [Metric1, Metric2]
is exactly what I was wanting. :)
Chatted on Slack, the rewriting Introduces a bit of duplication in the document but keeps everything simple. |
LGTM thanks! <3 |
This allows
date
data field types to be rolled up with our supported metric aggregations.closes #34185