Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] adding pivot.max_search_page_size option for setting paging size #41920

Merged

Conversation

benwtrent
Copy link
Member

This adds a size parameter to the pivot configuration in a data frame transform.

I put the parameter inside the pivot configuration and not in the root object, as it relates directly to how we are pivoting the data. Since we are unsure what the next transform looks like, I thought it best to restricted it instead of making it a transform field.

If it is null we still opt to use the internal default of 500.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@benwtrent benwtrent force-pushed the feature/ml-df-add-max-page-size-option branch from 8cbb8e7 to e9944c4 Compare May 8, 2019 00:04
Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few points to discuss.

this.groups = ExceptionsHelper.requireNonNull(groups, DataFrameField.GROUP_BY.getPreferredName());
this.aggregationConfig = ExceptionsHelper.requireNonNull(aggregationConfig, DataFrameField.AGGREGATIONS.getPreferredName());
this.size = size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could require a non-null value for this. Then from the REST action we set it to the default value if the user did not specify it. This way the default is not hidden and the user becomes aware of the parameter. We tend to follow this approach unless the default value is dynamically calculated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the current approach as I would not promote this setting like query. My opinion: should be an expert setting.

@@ -27,6 +27,7 @@
public static final ParseField SOURCE = new ParseField("source");
public static final ParseField DESTINATION = new ParseField("dest");
public static final ParseField FORCE = new ParseField("force");
public static final ParseField SIZE = new ParseField("size");
Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou May 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know size is what the search expects and in the context of a search it is a size. However, in datafeeds and now data frame transforms, this is better described as the scroll_size or page_size. I'm not feeling too strongly about this but it's worth considering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, rollup uses page_size, I find size to unspecific

@@ -29,6 +30,7 @@
private static final String NAME = "data_frame_transform_pivot";
private final GroupConfig groups;
private final AggregationConfig aggregationConfig;
private final Integer size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On having size in pivot instead of the transform object itself:

I think the decision on this depends on whether we imagine that a transform will always use a composite aggregation. If yes, then size should be on the top level.

@hendrikmuhs
Copy link
Contributor

hendrikmuhs commented May 8, 2019

Looks good, we should decide on the naming. For me this is an expert setting, so I would not promote it.

API level I find it confusing from the user perspective: the usage of composite aggs is an implementation detail and a parameter "size" isn't descriptive enough.

It would be better to give it a more descriptive, explicit name: max_search_page_size (I like the very descriptive names in CCR: https://www.elastic.co/guide/en/elasticsearch/reference/current/ccr-put-follow.html). Long names make usage harder, but this is an expert setting.

About placement: If it's just 1 setting it's ok to put it into pivot, however I already see that we will have more settings in future, therefore I think it will get confusing.

My proposal, a top-level parameters section together with very descriptive names:

{
  "source": {...},
  "dest": {...},
  "pivot": {...},
  "parameters": {
      "max_search_page_size": 1000,
      ...
   }
}

The parameters might not apply to the function used in the transform, but I think that's simpler than having a parameters objects in every function. If a parameter does not make sense in the context of the function I would just ignore that, as said this is expert level.

@benwtrent benwtrent changed the title [ML] adding pivot.size option for setting paging size [ML] adding pivot.max_search_page_size option for setting paging size May 8, 2019
@benwtrent
Copy link
Member Author

run elasticsearch-ci/1

@benwtrent benwtrent requested review from hendrikmuhs and dimitris-athanasiou and removed request for hendrikmuhs May 9, 2019 16:13
@benwtrent
Copy link
Member Author

run elasticsearch-ci/1

1 similar comment
@hendrikmuhs
Copy link
Contributor

run elasticsearch-ci/1

Copy link
Contributor

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent merged commit da4899f into elastic:master May 10, 2019
@benwtrent benwtrent deleted the feature/ml-df-add-max-page-size-option branch May 10, 2019 14:31
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request May 10, 2019
…elastic#41920)

* [ML] adding pivot.size option for setting paging size

* Changing field name to address PR comments

* fixing ctor usage

* adjust hlrc for field name change
benwtrent added a commit that referenced this pull request May 10, 2019
…#41920) (#42079)

* [ML] adding pivot.size option for setting paging size

* Changing field name to address PR comments

* fixing ctor usage

* adjust hlrc for field name change
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request May 10, 2019
* elastic/master: (84 commits)
  [ML] adds geo_centroid aggregation support to data frames (elastic#42088)
  Add documentation for calendar/fixed intervals (elastic#41919)
  Remove global checkpoint assertion in peer recovery (elastic#41987)
  Don't create tempdir for cli scripts (elastic#41913)
  Fix debian-8 update (elastic#42056)
  Cleanup plugin bin directories (elastic#41907)
  Prevent order being lost for _nodes API filters (elastic#42045)
  Change IndexAnalyzers default analyzer access (elastic#42011)
  Remove reference to fs.data.spins in docs
  Mute failing AsyncTwoPhaseIndexerTests
  Remove close method in PageCacheRecycler/Recycler (elastic#41917)
  [ML] adding pivot.max_search_page_size option for setting paging size (elastic#41920)
  Docs: Tweak list formatting
  Simplify handling of keyword field normalizers (elastic#42002)
  [ML] properly nesting objects in document source (elastic#41901)
  Remove extra `ms` from log message (elastic#42068)
  Increase the sample space for random inner hits name generator (elastic#42057)
  Recognise direct buffers in heap size docs (elastic#42070)
  shouldRollGeneration should execute under read lock (elastic#41696)
  Wait for active shard after close in mixed cluster (elastic#42029)
  ...
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
…elastic#41920)

* [ML] adding pivot.size option for setting paging size

* Changing field name to address PR comments

* fixing ctor usage

* adjust hlrc for field name change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants