Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Changes default destination index field mapping and adds scripted_metric agg #40750

Conversation

benwtrent
Copy link
Member

This PR adjusts the default behavior for aggregations whose destination mapping cannot be accurately determined either through agg type or source field mapping. Previously, setting the field mapping to double was the default behavior. This will not work for more complex scenarios. So, instead of enforcing an explicit mapping for indeterminable fields, I allow the mapping to dynamically determine the appropriate type.

The reason for adding scripted_metric in this PR as well, is so we have an aggregation that will exercise the new default behavior code path.

I did NOT attempt to determine the explicit mapping for scripted_metric and simply let it always be dynamic. The reasoning for this is that we would simply be doing what the ES mapping system already does for us. It would add complexity with little benefit.

Also, this simply adds the support for the aggregation. This PR does NOT include any new abstractions, interfaces, fields, etc. in the API. If we want to obfuscate the complexities in the scripted_metric aggregation, that will have to be future work.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@benwtrent
Copy link
Member Author

run elasticsearch-ci/bwc

@hendrikmuhs
Copy link
Contributor

hendrikmuhs commented Apr 3, 2019

Nice addition.

This PR adjusts the default behavior for aggregations whose destination mapping cannot be accurately determined either through agg type or source field mapping. Previously, setting the field mapping to double was the default behavior.

I now get what you mean: this is if deduction fails, sounds good to me.

The field mapping is defined in dataframe/transforms/pivot/Aggregations.java, double is the default for avg, cardinality, value_count default to long, min, max default to the the type defined in the source. I would like to use the existing logic there but it requires changes, will comment on the places in code regarding a concrete suggestion.

@@ -29,7 +29,8 @@ private Aggregations() {}
VALUE_COUNT("value_count", "long"),
MAX("max", null),
MIN("min", null),
SUM("sum", null);
SUM("sum", null),
SCRIPTED_METRIC("scripted_metric", null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1st argument is the name known by aggregations, the 2nd the output type, which is either the concrete type or null if the source type should be used. So null is a magic variable and actually it should not be null for scripted_magic, but that's due to the lack of another magic.

null is a bad magic anyway and the types are well defined and limited. So I suggest to change this to:

enum AggregationType {
        AVG("avg", "double"),
        CARDINALITY("cardinality", "long"),
        VALUE_COUNT("value_count", "long"),
        MAX("max", "source"),
        MIN("min", "source"),
        SUM("sum", "source"),
        SCRIPTED_METRIC("scripted_metric", "dynamic");

There should never be a type source and dynamic, alternatively we could use _source, _dynamic to make the special meaning more visible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method resolveTargetMapping needs to be changed, it should return the concrete source type or dynamic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice, for sure! Let me crank that out today. It would be better than implicitly having different behavior for null between max and scripted_metric.

@@ -75,6 +76,8 @@ public static void deduceMappings(final Client client,
ValuesSourceAggregationBuilder<?, ?> valueSourceAggregation = (ValuesSourceAggregationBuilder<?, ?>) agg;
aggregationSourceFieldNames.put(valueSourceAggregation.getName(), valueSourceAggregation.field());
aggregationTypes.put(valueSourceAggregation.getName(), valueSourceAggregation.getType());
} else if(agg instanceof ScriptedMetricAggregationBuilder) {
// do nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to put the fieldname/mapping (dynamic) but only aggregationTypes and not aggregationSourceFieldNames

(I am not sure what happens further down, might require additional checks)

@@ -134,8 +137,7 @@ public static void getDestinationFieldMappings(final Client client,
if (destinationMapping != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs an additional else if to check for dynamic, would be good to do a logger.info for this case

@benwtrent
Copy link
Member Author

@hendrikmuhs sorry for the miscommunication. Anything that is the else block, I consider as "default behavior". Basically, "if none of the required conditions met, then do this". So, by this definition, double was indeed the default field for anything where the conditions of "discovering the field mapping in the source" and "does not have a valid default that we know of" are not met.

Copy link
Contributor

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent merged commit f7cd046 into elastic:master Apr 4, 2019
@benwtrent benwtrent deleted the feature/data-frame-allow-dynamic-mapped-types-add-scripted-metric branch April 4, 2019 12:44
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Apr 4, 2019
…d_metric agg (elastic#40750)

* [ML] Allowing destination index mappings to have dynamic types, adds script_metric agg

* Making dynamic|source mapping explicit
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 4, 2019
* elastic/master: (25 commits)
  Rollup ignores time_zone on date histogram (elastic#40844)
  HLRC: fix uri encode bug when url path starts with '/' (elastic#34436)
  Mutes GatewayIndexStateIT.testRecoverBrokenIndexMetadata
  Docs: Pin two IDs in the rest client (elastic#40785)
  Adds version 6.7.2
  [DOCS] Remind users to include @ symbol when applying license file (elastic#40688)
  HLRC: Convert xpack methods to client side objects (elastic#40705)
  Allow ILM to stop if indices have nonexistent policies (elastic#40820)
  Add an OpenID Connect authentication realm (elastic#40674)
  [DOCS] Rewrite query def (elastic#40746)
  [ML] Changes default destination index field mapping and adds scripted_metric agg (elastic#40750)
  Docs: Drop inline callouts from painless book (elastic#40805)
  [ML] refactoring start task a bit, removing unused code (elastic#40798)
  [DOCS] Document index.load_fixed_bitset_filters_eagerly (elastic#40780)
  Make remote cluster resolution stricter (elastic#40419)
  [ML] Add created_by info to usage stats (elastic#40518)
  SQL: [Docs] Small fixes for CURRENT_TIMESTAMP docs (elastic#40792)
  convert modules to use testclusters (elastic#40804)
  Replace javax activation with jakarta activation (elastic#40247)
  Document restrictions on fuzzy matching when using synonyms (elastic#40783)
  ...
benwtrent added a commit that referenced this pull request Apr 5, 2019
…d_metric agg (#40750) (#40846)

* [ML] Allowing destination index mappings to have dynamic types, adds script_metric agg

* Making dynamic|source mapping explicit
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
…d_metric agg (elastic#40750)

* [ML] Allowing destination index mappings to have dynamic types, adds script_metric agg

* Making dynamic|source mapping explicit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants