Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance telemetry data collection for stack monitoring #73141

Closed
7 tasks
ravikesarwani opened this issue Jul 23, 2020 · 15 comments
Closed
7 tasks

Enhance telemetry data collection for stack monitoring #73141

ravikesarwani opened this issue Jul 23, 2020 · 15 comments

Comments

@ravikesarwani
Copy link
Contributor

Enhance telemetry data collection for stack monitoring feature set.

Collect additional telemetry data for stack monitoring focused around the usage of features by our customers.
Telemetry data should help us answer the following kinds of questions:

  • Is stack monitoring enabled (we may already have this)
  • What license type is being used (we may already have this)
  • What method is being used: local exporter, HTTP exporter, metricbeat (we may have exporter information but not metricbeat)
  • What components are being monitored and what version: ES, Kibana, Logstash, Beats
  • Count of page views by customers in the stack monitoring UI: ES overview, Node listing, Indices ....)
  • What features the users tried: Setup UI ...
  • Is out of box alerts deployed (applicable for Kibana 7.9+)

Answer to each of these questions help us gather data points that in turn helps us focus our efforts on most critical areas of the feature.

cc: @chrisronline @igoristic @sgrodzicki @jasonrhodes

@elasticmachine
Copy link
Contributor

Pinging @elastic/stack-monitoring (Team:Monitoring)

@igoristic
Copy link
Contributor

igoristic commented Jul 27, 2020

@ravikesarwani
Thank you for starting this thread! I have always wondered why we don't yet have this, and what kinda of strategy we should implement. The checkboxes make it very clear though, and easy to start a discussion around

I was wondering if this is something we should start investigating for 7.10 (or perhaps even earlier)?

@chrisronline
I actually don't see any telemetry "reporting" logic anywhere in our code base (only using it for getting kibana stats), or maybe I'm missing something?

@chrisronline
Copy link
Contributor

We don't have any telemetry right now, so we'd need to add a new collector and all that jazz

@ravikesarwani
Copy link
Contributor Author

Our telemetry server shows information about stack monitoring => https://kibana.telemetry.elastic.co/

@ravikesarwani
Copy link
Contributor Author

Screen Shot 2020-07-27 at 2 56 25 PM

@chrisronline
Copy link
Contributor

Ah cool, that must be coming from Elasticsearch telemetry. I don't think we are reporting any telemetry from Kibana for stack monitoring though.

@ravikesarwani
Copy link
Contributor Author

@igoristic For 7.10 my thinking was we need to continue our focus on out of box alerting work "Add more alerts and switching from Watcher to Kibana alerts" to make this a meaningful feature for our customers.

@igoristic
Copy link
Contributor

igoristic commented Jul 27, 2020

For 7.10 my thinking was we need to continue our focus on out of box alerting

++ I agree

@chrisronline chrisronline added this to the Stack Monitoring UI 7.10 milestone Jul 30, 2020
@chrisronline chrisronline added this to In progress in Stack Monitoring UI Aug 24, 2020
@chrisronline chrisronline self-assigned this Aug 24, 2020
@chrisronline
Copy link
Contributor

@ravikesarwani I've started looking into this and I'd like to understand more what you are hoping to build at the end of it. What charts do you want to build based on this data? It will help me understand how to properly format it.

@ravikesarwani
Copy link
Contributor Author

@chrisronline Right now we do not have any data so this is an attempt to start collection and being able to visualize that data.
Things I am looking to visualize from this are:

  • Count/Percent of clusters where stack monitoring is enabled
  • Then be able to group that data by:
    • Various license type
    • Internal vs. metricbeat
    • Components configured (ES, Kibana, Logstash, Beats, APM ...)
  • Then the second layer data is around what features are most used within stack monitoring?
    • Setup UI
    • Out of box alerts deployed
    • Different page views

@chrisronline
Copy link
Contributor

Update here.

It seems that collecting ui metric data is not in good shape, so we might not be able to collect the second layer of data you're requesting.

I think the rest should be fine and I have a WIP PR for this: #75878

My proposed mappings are:

{
  isEnabled: {
    type: "boolean",
  },
  license: {
    type: "keyword",
  },
  clusterUuid: {
    type: "keyword",
  },
  stackProductCount: {
    type: "long",
  },
  stackProductMbCount: {
    type: "long",
  },
  stackProductMbRatio: {
    type: "double",
  },
  elasticsearch: {
    properties: {
      count: {
        type: "long",
      },
      mbCount: {
        type: "long",
      },
      mbPercentage: {
        type: "double",
      },
      versions: {
        type: "keyword",
      },
    },
  },
  kibana: {
    properties: {
      count: {
        type: "long",
      },
      mbCount: {
        type: "long",
      },
      mbPercentage: {
        type: "double",
      },
      versions: {
        type: "keyword",
      },
    },
  },
  logstash: {
    properties: {
      count: {
        type: "long",
      },
      mbCount: {
        type: "long",
      },
      mbPercentage: {
        type: "double",
      },
      versions: {
        type: "keyword",
      },
    },
  },
  beats: {
    properties: {
      count: {
        type: "long",
      },
      mbCount: {
        type: "long",
      },
      mbPercentage: {
        type: "double",
      },
      versions: {
        type: "keyword",
      },
    },
  },
  apm: {
    properties: {
      count: {
        type: "long",
      },
      mbCount: {
        type: "long",
      },
      mbPercentage: {
        type: "double",
      },
      versions: {
        type: "keyword",
      },
    },
  },
  timestamp: {
    type: "date",
  }
};

This should enable us to build the proper dashboards/visualizations, but it might be worth exploring it a bit on your end @ravikesarwani to ensure this structure allows you to build what you need. I'd suggest using Kibana dev tools to create an index with the above mappings, index some documents into it, then make sure it works for you.

@chrisronline
Copy link
Contributor

Changes:

  • Change isEnabled -> isMonitoringEnabled - means we have the monitoring data
  • Add allClusterUuids field for all detected clusters in monitoring data
  • Collect single boolean for metricbeatEnabled if we see any monitoring data from Metricbeat
  • Change Components configured (ES, Kibana, Logstash, Beats, APM ...) to { elasticsearch: true|false, kibana: true|false, logstash: true|false ... }
  • Add count of nodes/instances for each product

@ravikesarwani
Copy link
Contributor Author

Thanks Chris! It was a good discussion.
Another things we also talked was we need to look into sending data for different cluster each day when monitoring cluster is used to monitor multiple production clusters. Random selection of cluster can miss out on the data for certain cluster. Round robin method will at ensure when aggregating, for say 30 days, we will get all clusters data seen.

@TinaHeiligers
Copy link
Contributor

Then the second layer data is around what features are most used within stack monitoring?
Setup UI
Out of box alerts deployed
Different page views

@chrisronline, the kibana Telemetry team's been looking into improving the usefulness of ui_metric but maybe tracking "Setup UI" counts and "Out of the box alerts deployed" could be done following a (possibly modified) implementation as in AppSearch .
We're also extending Application Usage to monitor different page views, targeted for 7.10 but will likely be pushed to 7.11.

@chrisronline
Copy link
Contributor

With #75878 merged, we just need to some transformations to happen on the telemetry side (https://github.com/elastic/infra/issues/23272) and we'll be able to see all of this data. Closing this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

5 participants