Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollup index patterns with live data #21187

Merged
merged 5 commits into from
Jul 27, 2018

Conversation

jen-huang
Copy link
Contributor

@jen-huang jen-huang commented Jul 25, 2018

Part of #20004

Rollup index pattern creation changes:

  • Allow wildcards so that an index pattern can have live indices and rollup index
    • Remove getIndexPatternCreationQuery() from registry, since we no longer need to filter list of indices in wizard
    • Remove getAllowWildcards() from registry, since we now allow wildcards
    • Remove illegalCharacters() from registry, this was only used to exclude * for wildcards
    • Add getFetchForWildcardOptions () to registry to allow additional information to be passed to /api/index_patterns/rollup/_fields_for_wildcard endpoint
    • Remove one matching index restriction, replace with one rollup index restriction, since that is a limitation of ES rollup search API
  • Use new ES rollup capabilities API, added as rollup.capabilitiesByRollupIndex to JS lib
    • Removed old capabilities mapping helpers, added mergeJobConfigurations() and areJobsCompatible() helpers in case a rollup index has more than one job configured
      • Jobs with differing date histogram configurations cannot be merged
      • Other aggregations and buckets will be merged together
      • To do: use real LCM (lowest common multiple) algorithm for merging intervals
  • Structure of capabilities information stored on typeMeta of a rollup index pattern has been changed to be friendlier for Kibana usage (keyed by agg then fields), it will now look something like like:
{
    ...
    type: 'rollup',
    typeMeta: {
      params: {
        rollup_index: 'logstash_rollup',
      },
      aggs: {
        terms: {
          node: {
            agg: 'terms'
          },
          host: {
            agg: 'terms'
          }
        },
        min: {
          temperature: {
            agg: 'min'
          }
        },
        date_histogram: {
          timestamp: {
            agg: 'date_histogram',
            time_zone: 'UTC',
            interval: '1h',
            delay: '7d'
          }
        },
        histogram: {
          voltage: {
            agg: 'histogram',
            interval: 20
          }
        }
      }
    }
}

Disable rollup index patterns from:

  • Timelion index suggestions
  • ML
  • Graph

Discover

  • (Patch from @cjcenizal) Add setPreferredSearchStrategyId() to search source registry to force Discover to use _msearch
  • While this will return documents for a rollup index pattern that includes live data, it does not return rollup data due to _msearch/_search not being able to handle range queries on rollup date histogram field
  • To do: Figure out how to return rollup documents, possibly from ES side by removing size: 0 limitation in _rollup_search? Or revisit the necessity of supporting rollup index patterns in Discover.

@elasticmachine
Copy link
Contributor

💔 Build Failed

@jen-huang
Copy link
Contributor Author

@alexfrancoeur

If you want to play around with creating rollup jobs and index patterns, generate some makelogs data. Then create a rollup job and run it.

The configuration I've been using which rolls up logstash-* data into logstash_rollup:

PUT _xpack/rollup/job/1
{
    "index_pattern": "logstash-*",
    "rollup_index": "logstash_rollup",
    "cron": "*/30 * * * * ?",
    "page_size": 1000,
    "groups" : {
      "date_histogram": {
        "field": "utc_time",
        "interval": "1h",
        "time_zone": "UTC"
      },
      "terms": {
        "fields": ["extension.keyword", "index.keyword"]
      },
      "histogram": {
        "interval": 1000,
        "fields": [
          "bytes"
        ]
      }
    },
    "metrics": [
        {
            "field": "bytes",
            "metrics": ["min", "max", "avg"]
        },
        {
            "field": "memory",
            "metrics": ["avg"]
        },
        {
            "field": "phpmemory",
            "metrics": ["value_count"]
        }
    ]
}

Start job:

POST _xpack/rollup/job/1/_start

Let job run for about a minute, or until you see that it has been triggered and documents have been processed. You can get these stats at this endpoint:

GET _xpack/rollup/job/1

I always stop my jobs after the documents have been processed since I'm not feeding in continuous data:

POST _xpack/rollup/job/1/_stop

- Timelion `index` suggestions
- ML
- Graph
@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

@jen-huang jen-huang requested a review from cjcenizal July 25, 2018 22:26
@jen-huang jen-huang changed the title [WIP] Rollup index patterns with live data Rollup index patterns with live data Jul 25, 2018
Copy link
Contributor

@cjcenizal cjcenizal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 Looks awesome! Just had a few suggestions, mostly about readability.

$scope.editingId = $route.current.params.indexPatternId;
$scope.$watch('defaultIndex', () => renderList());
config.bindToScope($scope, 'defaultIndex');
$scope.$apply();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what motivated these changes? And why call $scope.$apply()? I think it's generally considered a bad practice because it can cause a $digest already in progress error. The safer way would be to wrap some code inside of $evalAsync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a change on top of a previous change that converted the sidebar to react and I ran into some issues with rendering the react component. I agree $scope.$apply() should be avoided, will take a closer look at this next pass with my work for scripted fields


illegalCharacters = (characters = []) => {
return ['*'].concat(characters);
isRollupIndex = (indexName) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is pretty simple and straightforward, but because it's so far removed from the context in which it's consumed, maybe some unit tests would be useful for establishing intention and expected use cases? I'm mostly thinking for the getIndexPatternCreationOption, isRollupIndex, getIndexTags, and checkIndicesForErrors methods. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 will add tests next pass too

const jobs = indexPattern.typeMeta && indexPattern.typeMeta.jobs;
const capabilities = jobs && indexPattern.typeMeta.capabilities && indexPattern.typeMeta.capabilities[jobs[0]].fields[field];
const allAggs = indexPattern.typeMeta && indexPattern.typeMeta.aggs;
const fieldAggs = allAggs && Object.keys(allAggs).filter(agg => allAggs[agg][field]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would 0 or an empty string be valid values for allAggs[agg][field] here? In which case we'd need to check against null/undefined instead of falsiness.

Copy link
Contributor Author

@jen-huang jen-huang Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, allAggs[agg][field] should be an array - will add an array check here

// date histogram field.
if(
!allAggs[aggName] ||
(!allAggs[aggName][fieldName] && aggName !== 'date_histogram')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about extracting these into variables to make the code more self-descriptive?

        const aggDoesntExist = !allAggs[aggName];
        const fieldDoesntExist = !allAggs[aggName][fieldName];
        const aggIsntDateHistogram = aggName !== 'date_histogram';
        if(
          aggDoesntExist ||
          (fieldDoesntExist && aggIsntDateHistogram)
        ) {
          allAggs[aggName] = allAggs[aggName] || {};
          allAggs[aggName][fieldName] = { ...agg };
        }

Copy link
Contributor Author

@jen-huang jen-huang Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if allAggs[aggName] is undefined, const fieldDoesntExist = !allAggs[aggName][fieldName]; will throw an error. will change this to const fieldDoesntExist = allAggs[aggName] && !allAggs[aggName][fieldName];

// For date histograms, if it is on the same field, check that the configuration is identical,
// otherwise reject. If not the same field, reject;
case 'date_histogram':
if(!allAggs[aggName][fieldName] || !isEqual(allAggs[aggName][fieldName], agg)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reuse one of the variables here:

              const isDifferentDateHistogram = !isEqual(allAggs[aggName][fieldName], agg);
              if(fieldDoesntExist || isDifferentDateHistogram) {
                throw new Error('Multiple date histograms configured');
              }

} catch(err) {
if (isEsError(err)) {
return reply(wrapEsError(err));
switch(agg) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my logic on this, but it looks like we're duplicating some code here? Also, maybe we could add a comment on why we're caching the filtered fields?

          const regularFields = fields.filter(field => !rollupFieldNames.includes(field));

          // Cache this for later for some reason?
          rollupFieldNames.push(...regularFields);

          switch(agg) {
            case 'date_histogram':
              const fieldsWithRollupCapabilities = map(field => {
                return {
                  ...fieldsFromFieldCapsApi[`${field}.${agg}.timestamp`],
                  ...defaultField,
                  name: field,
                };
              });

              rollupFields.push(...fieldsWithRollupCapabilities);
              break;
            default:
              const fieldsWithRollupCapabilities = map(field => {
                return {
                  ...fieldsFromFieldCapsApi[`${field}.${agg}.value`],
                  ...defaultField,
                  name: field,
                };
              });

              rollupFields.push(...fieldsWithRollupCapabilities);
              break;
          }

Copy link
Contributor Author

@jen-huang jen-huang Jul 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I de-duped the code here, got rid of the switch altogether. the only thing that actually needed to be different is ${field}.${agg}.timestamp vs ${field}.${agg}.value so thanks for catching that.

the rationale behind this code is:

  • We get rollup aggregation capabilities information back in this kind of format. Note that bytes belongs to more than one aggregation, and we don't have any information as what type fields are (numeric, keyword, etc):
{
  logstash_rollup: {
    aggs: {
        avg: { bytes: {agg: "avg"}, memory: {agg: "avg"}},
        min: { bytes: {agg: "min"} }
        ...
    }
  },
  ...
}
  • We get fields information back via regular field capabilities (getFieldCapabilities()) that tells us type, but they are keyed with internal rollup field names:
{
  fields: {
    "bytes.min.value": {
      name: "bytes.min.value",
      type: "number",
      searchable: true,
      aggregatable: true,
      readFromDocValues: true
    },
    ...
  }
}
  • Logic in this endpoint looks through each aggregation, and for each field in the aggregation, pulls the relevant information from field capabilities by using the key ${field}.${agg}.value (or ${field}.${agg}.date_histogram for date histogram aggs)
  • Since a field can have multiple aggregations, if we don't cache and check the rollup fields we have already collected, we end up with duplicate field information

return { name: savedObject.attributes.title };
});
return resp.savedObjects
.filter(o => !o.get('type'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename o to savedObject?

@@ -661,7 +661,7 @@ app.controller('graphuiPlugin', function ($scope, $route, $interval, $http, kbnU
}
}

$scope.indices = $route.current.locals.indexPatterns;
$scope.indices = $route.current.locals.indexPatterns.filter(o => !o.get('type'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename it to indexPattern here?

@@ -48,7 +48,7 @@ module.controller('MlNewJobStepIndexOrSearch',
timefilter.disableTimeRangeSelector(); // remove time picker from top of page
timefilter.disableAutoRefreshSelector(); // remove time picker from top of page

$scope.indexPatterns = getIndexPatterns();
$scope.indexPatterns = getIndexPatterns().filter(o => !o.get('type'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here?

@elasticmachine
Copy link
Contributor

💔 Build Failed

@jen-huang jen-huang merged commit fdc5de7 into elastic:feature/rollups Jul 27, 2018
Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment about the filter used for the ml index patterns edit.

@@ -48,7 +48,7 @@ module.controller('MlNewJobStepIndexOrSearch',
timefilter.disableTimeRangeSelector(); // remove time picker from top of page
timefilter.disableAutoRefreshSelector(); // remove time picker from top of page

$scope.indexPatterns = getIndexPatterns();
$scope.indexPatterns = getIndexPatterns().filter(indexPattern => !indexPattern.get('type'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to explicitly filter out index patterns why type==='rollup', rather than just anything where type is set? Are rollups the only ones that curently set type? In the future, if any new types were added they would get filtered out too, which not be the behavior we want. There is a constants file, x-pack/plugins/ml/common/constants/index_patterns.js, which could have a TYPE_ROLLUP const added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peteharverson type is a new field on index patterns, so rollup ones are the only ones that will have a value. I don't foresee any other types in the near future. This way of filtering is to avoid rollup references in OSS code, but I suppose it is different here since ML is x-pack 😃Thanks for the constants file path. I'll think on this a little more, including our policies around referencing x-pack plugins from another x-pack plugin, and may adjust this filtering in my next pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to bring this up later, but I would like to reopen the discussion around avoiding referencing rollup index patterns from within OSS Kibana (and from within X-Pack plugins by extension).

I think that it makes sense to avoid referencing X-Pack features from within OSS (e.g. monitoring, reporting), because those things literally cannot exist unless X-Pack is installed. An OSS installation of the stack will never encounter those things. But rollups are different, because they generate artifacts in the form of rollup indices, which can exist within OSS installations. I think it makes sense for OSS Kibana to have awareness of these artifacts and the capability of dealing with them intelligently. So I think this is one very good case for making an exception to the rule.

jen-huang added a commit that referenced this pull request Jul 31, 2018
* Use new rollup capabilities keyed by rollup index name ES api, remove wildcard restriction
* Adjust plugins to work with new typeMeta information on rollup index patterns
* Disable rollup index patterns from:
- Timelion `index` suggestions
- ML
- Graph
* Set Discover to use default search strategy
jen-huang added a commit that referenced this pull request Aug 10, 2018
* Use new rollup capabilities keyed by rollup index name ES api, remove wildcard restriction
* Adjust plugins to work with new typeMeta information on rollup index patterns
* Disable rollup index patterns from:
- Timelion `index` suggestions
- ML
- Graph
* Set Discover to use default search strategy
cjcenizal pushed a commit to cjcenizal/kibana that referenced this pull request Aug 17, 2018
* Use new rollup capabilities keyed by rollup index name ES api, remove wildcard restriction
* Adjust plugins to work with new typeMeta information on rollup index patterns
* Disable rollup index patterns from:
- Timelion `index` suggestions
- ML
- Graph
* Set Discover to use default search strategy
cjcenizal pushed a commit to cjcenizal/kibana that referenced this pull request Aug 24, 2018
* Use new rollup capabilities keyed by rollup index name ES api, remove wildcard restriction
* Adjust plugins to work with new typeMeta information on rollup index patterns
* Disable rollup index patterns from:
- Timelion `index` suggestions
- ML
- Graph
* Set Discover to use default search strategy
cjcenizal pushed a commit that referenced this pull request Sep 4, 2018
* Use new rollup capabilities keyed by rollup index name ES api, remove wildcard restriction
* Adjust plugins to work with new typeMeta information on rollup index patterns
* Disable rollup index patterns from:
- Timelion `index` suggestions
- ML
- Graph
* Set Discover to use default search strategy
jen-huang added a commit that referenced this pull request Sep 7, 2018
* Use new rollup capabilities keyed by rollup index name ES api, remove wildcard restriction
* Adjust plugins to work with new typeMeta information on rollup index patterns
* Disable rollup index patterns from:
- Timelion `index` suggestions
- ML
- Graph
* Set Discover to use default search strategy
cjcenizal pushed a commit that referenced this pull request Sep 18, 2018
* Use new rollup capabilities keyed by rollup index name ES api, remove wildcard restriction
* Adjust plugins to work with new typeMeta information on rollup index patterns
* Disable rollup index patterns from:
- Timelion `index` suggestions
- ML
- Graph
* Set Discover to use default search strategy
cjcenizal pushed a commit that referenced this pull request Sep 19, 2018
* Use new rollup capabilities keyed by rollup index name ES api, remove wildcard restriction
* Adjust plugins to work with new typeMeta information on rollup index patterns
* Disable rollup index patterns from:
- Timelion `index` suggestions
- ML
- Graph
* Set Discover to use default search strategy
cjcenizal pushed a commit that referenced this pull request Sep 28, 2018
* Use new rollup capabilities keyed by rollup index name ES api, remove wildcard restriction
* Adjust plugins to work with new typeMeta information on rollup index patterns
* Disable rollup index patterns from:
- Timelion `index` suggestions
- ML
- Graph
* Set Discover to use default search strategy
cjcenizal pushed a commit that referenced this pull request Oct 2, 2018
* Use new rollup capabilities keyed by rollup index name ES api, remove wildcard restriction
* Adjust plugins to work with new typeMeta information on rollup index patterns
* Disable rollup index patterns from:
- Timelion `index` suggestions
- ML
- Graph
* Set Discover to use default search strategy
cjcenizal pushed a commit that referenced this pull request Oct 4, 2018
* Use new rollup capabilities keyed by rollup index name ES api, remove wildcard restriction
* Adjust plugins to work with new typeMeta information on rollup index patterns
* Disable rollup index patterns from:
- Timelion `index` suggestions
- ML
- Graph
* Set Discover to use default search strategy
jen-huang added a commit that referenced this pull request Oct 17, 2018
* Use new rollup capabilities keyed by rollup index name ES api, remove wildcard restriction
* Adjust plugins to work with new typeMeta information on rollup index patterns
* Disable rollup index patterns from:
- Timelion `index` suggestions
- ML
- Graph
* Set Discover to use default search strategy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants