Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat] Add statistic method into configuration for aws module #12370

Closed
kaiyan-sheng opened this issue May 30, 2019 · 21 comments · Fixed by #12840
Closed

[Metricbeat] Add statistic method into configuration for aws module #12370

kaiyan-sheng opened this issue May 30, 2019 · 21 comments · Fixed by #12840
Assignees
Labels
discuss Issue needs further discussion. enhancement Metricbeat Metricbeat Team:Integrations Label for the Integrations team v7.4.0

Comments

@kaiyan-sheng
Copy link
Contributor

Screen Shot 2019-05-30 at 10 05 40 AM

Looking at various statistic method that AWS supports on their dashboards made me start thinking it might be useful to add a config option into aws module for different statistic methods when querying cloudwatch metrics. Currently in aws module, for each metricset, there is a default/hardcoded statistic method. If we make this configurable, users would have more flexibility to query metrics and the aws module itself will behave more similar to Cloudwatch itself.

With statistic method option, the config for aws module will look like:

- module: aws
  period: 300s
  metricsets:
    - ec2
  access_key_id: '${AWS_ACCESS_KEY_ID}'
  secret_access_key: '${AWS_SECRET_ACCESS_KEY}'
  session_token: '${AWS_SESSION_TOKEN}'
  default_region: '${AWS_REGION:us-west-1}'
  regions:
    - us-west-1
  statistic: Average

Two problems here:

  1. Should we allow multiple statistic methods?
- module: aws
  period: 300s
  metricsets:
    - ec2
  statistic: 
    - Average
    - Sum

If we do allow multiple methods, then same metrics will be reported multiple times with different statistic methods applied.

Also we need to consider if the code should treat the config example above same as the config below (how to make Cloudwatch query more efficient that way).

- module: aws
  period: 300s
  metricsets:
    - ec2
  statistic: 
    - Average
- module: aws
  period: 300s
  metricsets:
    - ec2
  statistic: 
    - Sum
  1. By looking at Cloudwatch metrics from different AWS services, I can see some services have the same statistic method applied to all metrics, for example EC2. Some services have different statistic methods applied to their cloudwatch metrics as default, for example in S3,

For NumberOfMessagesSent, it's using Sum by default:
Screen Shot 2019-05-30 at 10 10 50 AM

For ApproximateNumberOfMessagesVisible, it's using Average by default:
Screen Shot 2019-05-30 at 10 10 40 AM

This problem is the main thing that's stopping me from implementing: the ability to specify different statistic methods for different metrics in the same metricset.

@kaiyan-sheng kaiyan-sheng self-assigned this May 30, 2019
@kaiyan-sheng kaiyan-sheng added Team:Integrations Label for the Integrations team discuss Issue needs further discussion. enhancement Metricbeat Metricbeat labels May 30, 2019
@kaiyan-sheng kaiyan-sheng changed the title Add statistic method into configuration for aws module [Metricbeat] Add statistic method into configuration for aws module May 30, 2019
@kaiyan-sheng
Copy link
Contributor Author

kaiyan-sheng commented Jun 5, 2019

After discussing with @exekias , we think it will be good to support both statistic methods and a list of metric names that user wants to collect.

- module: aws
  period: 300s
  metricsets:
    - cloudwatch
  access_key_id: '${AWS_ACCESS_KEY_ID}'
  secret_access_key: '${AWS_SECRET_ACCESS_KEY}'
  session_token: '${AWS_SESSION_TOKEN}'
  default_region: '${AWS_REGION:us-west-1}'
  regions:
    - us-west-1
  statistic_methods: ["Average", "Sum"]
  cloudwatch_metrics:
    - namespace: AWS/EC2
      metric_names: ["CPUUtilization", "DiskReadOps", "DiskWriteOps"]

Please correct me if I'm wrong 😬 I just realized I didn't update this right after our meeting. So my memory is a little fuzzy now 😅

@exekias
Copy link
Contributor

exekias commented Jun 6, 2019

that sounds good to me!

I'm wondering now, if you can provide several Statistic methods, how will the metric names end up looking? Does it make sense to require several satistics for the same metric?

@kaiyan-sheng
Copy link
Contributor Author

@exekias Good point, I didn't think about changing names. I was thinking just add a field name in the event called aws.statistic_method or something. So if you require two statistics for the same metric, you will get two events hmmm

To the question of if it's useful, I think if we support metric_names, then the answer I think is no. Users will have the flexibility to specify different statistic methods on different metrics and that's what it matters. Then the config will look like:

- module: aws
  period: 300s
  metricsets:
    - cloudwatch
  access_key_id: '${AWS_ACCESS_KEY_ID}'
  secret_access_key: '${AWS_SECRET_ACCESS_KEY}'
  session_token: '${AWS_SESSION_TOKEN}'
  default_region: '${AWS_REGION:us-west-1}'
  regions:
    - us-west-1
  cloudwatch_metrics:
    - namespace: AWS/EC2
      metric_names: ["CPUUtilization", "DiskReadOps", "DiskWriteOps"]
      statistic_method: Average
    - namespace: AWS/EC2
      metric_names: ["NetworkIn", "NetworkOut"]
      statistic_method: Sum

Should we only make this available on cloudwatch metricset?

@exekias
Copy link
Contributor

exekias commented Jun 6, 2019

that sounds good to me

@kaiyan-sheng
Copy link
Contributor Author

Yay, I will move this issue to ready then 👍 Thank you for all the great ideas/discussions!

@roncohen
Copy link
Contributor

just checking:

  1. the default is that we pull all the available statistic_methods? I think it should be. It would give users the best out of the box experience and not require them to explicitly say which methods, but still give them the flexibility to do so
  2. isn't is normally the case that you'd name the metrics mymetric.avg instead of adding a label for them?

@roncohen
Copy link
Contributor

will it be possible to supply a list of statistic methods?

cloudwatch_metrics:
    - namespace: AWS/EC2
      metric_names: ["CPUUtilization", "DiskReadOps", "DiskWriteOps"]
      statistic_method: ["Average", "Minimum"]

without it, you'd need a config block for each statistics function, right?

cloudwatch_metrics:
    - namespace: AWS/EC2
      metric_names: ["CPUUtilization", "DiskReadOps", "DiskWriteOps"]
      statistic_method: "Average"
    - namespace: AWS/EC2
      metric_names: ["CPUUtilization", "DiskReadOps", "DiskWriteOps"]
      statistic_method: "Minimum"
    - namespace: AWS/EC2
      metric_names: ["CPUUtilization", "DiskReadOps", "DiskWriteOps"]
      statistic_method: "Maximum"
    - namespace: AWS/EC2
      metric_names: ["CPUUtilization", "DiskReadOps", "DiskWriteOps"]
      statistic_method: "Count"

@kaiyan-sheng
Copy link
Contributor Author

@roncohen Right now we are only hardcoding the statistic method for each metricset. I just find the default one statistic method from AWS Cloudwatch for each metricset because I was worries about how many metrics we will be reporting if default is for all statistic method. But I do agree, if default to be all statistic method, then it will give users the best out of the box experience.

For supporting a list of statistic_methods with mymetric.avg naming convention, I think if we use that, we can support a list of statistic_methods. @exekias Do you know if we have any ECS convention on statistic method naming? I thought I saw some before but I can't find any now.

@exekias
Copy link
Contributor

exekias commented Jun 24, 2019

I guess retrieving all methods by default is ok for cloudwatch metricset users, but really hope we can get opinionated on our modules using it, for 2 main reasons:

  • Disk space resulting from retrieving more data
  • Some methods don't even make sense, or offer little information depending on the metric type.

If we go down this path, I'm also +1 to allowing defining a list of aggregations for statistic_method.

For supporting a list of statistic_methods with mymetric.avg naming convention, I think if we use that, we can support a list of statistic_methods. @exekias Do you know if we have any ECS convention on statistic method naming? I thought I saw some before but I can't find any now.

I don't think this is fully described anywhere, maybe you saw https://www.elastic.co/guide/en/beats/devguide/current/event-conventions.html (let's open a separate PR to add this maybe?).

In any case, I think that's the notation we normally use, so appending .avg, .max, .sum etc sounds good to me.

As a note, this will be a breaking change, but should be ok as cloudwatch metricset is in beta.

@kaiyan-sheng
Copy link
Contributor Author

Oh no... breaking change...! But since cloudwatch is in beta, can we still make the change and merge it into the 7.x release right? Or we have to wait for 8.0?

@exekias
Copy link
Contributor

exekias commented Jun 28, 2019

We try to keep them to a minimum, but do them if needed. It is ok to to this kind of changes to a Beta module. I would say it's a requirement before moving this metricset to GA

@exekias
Copy link
Contributor

exekias commented Jul 5, 2019

Sorry for coming back to this one, but this is also relevant in the context of ongoing Azure monitor design. I really hope we can agree on a common settings layout for both AWS cloudwatch and Azure monitor metricsets. Each one has different naming (ie AWS uses the term statistic, while Azure uses aggregation), I think we should probably respect that but keep the same layout for settings. So far we have these concepts (I'm keeping AWS terminology for this issue):

Namespace - the root for a group of metrics
Metric name - name of the metric within the namespace
Statistic methods - statistic methods to use when retrieving a metric

Our last proposed layout has the drawback of repeating the namespace when we want a different set of statistics depending on the metric, we could do a different grouping to solve that:

- module: aws
  ...
  metricsets:
    - cloudwatch
  cloudwatch_metrics:
    - namespace: AWS/EC2
      metrics:
        - name: ["DiskReadOps", "DiskWriteOps"] # this could accept also a single value
          statistic: ["Average", "Maximum"]
        - name: CPUUtilization
          statistic: Average

We could also decide that namespace is tied to the metricset like this:

- module: aws
  ...
  metricsets:
    - cloudwatch
  cloudwatch:
    namespace: AWS/EC2
    metrics:
      - name: ["DiskReadOps", "DiskWriteOps"] # this could accept also a single value
        statistic: ["Average", "Maximum"]
      - name: CPUUtilization
        statistic: Average

I'm leaning towards this last example, as we plan to use this metricset as an input for lightweight modules, where normally a single namespace is required. If needed, you can always create several instances of the module for different namespaces.

Any opinions? Do you expect several namespaces being needed when monitoring a single cloud service (AWS ELB, Azure API Management, etc)?

In all cases this would return in documents looking like this:

{
   "aws": {
      "cloudwatch": {
         "namespace": "AWS/EC2",
         "metrics": {
            "DiskReadOps.avg": 30.2,
            "DiskReadOps.max": 34,
            "CPUUtilization.avg": 1.2,
         }
      }
   }
}

cc @narph, would love your input here too

@kaiyan-sheng
Copy link
Contributor Author

@exekias The only thing I can think of regarding to "Do you expect several namespaces being needed when monitoring a single cloud service" is: when users monitoring their EC2 instances, they might be interested in both AWS/EC2 namespace as well as the AWS/RDS or AWS/EBS at the same time.

But as you said, since users can always create more instances of the cloudwatch metricset for different namespaces, I think the second option will work.

@narph
Copy link
Contributor

narph commented Jul 8, 2019

@exekias , there is a notable difference between the azure monitor and aws cloudwatch configuration and that is the resources that have to be identified before listing the namespaces.
This is why I went for:

- module: azure
  metricsets:
  - monitor
  enabled: true
  period: 60s
  client_id: ''
  client_secret: ''
  tenant_id: ''
  subscription_id: ''
  refresh_list_interval: 120s
  resources:
  - resource_id: "/subscriptions/xxxxx/resourceGroups/obs-infrastructure/providers/Microsoft.ApiManagement/service/obsapimanagement"
    namespaces:
    - name: "Microsoft.ApiManagement/service"
      metrics:
      - name: "Requests"
        aggregations: ["Maximum"]
        dimensions:
        - name: "Hostname"
          value: "obsapimanagement.azure-api.net"
  - resource_group: "obs-infrastructure"
    resource_type: "Microsoft.Compute/virtualMachines"
    namespaces:
    - name: "Microsoft.Compute/virtualMachines"
      metrics:
      - name: "Percentage CPU"
        aggregations: ["Average", "Minimum", "Maximum"]
    - name: "Microsoft.Compute/virtualMachines"
      metrics:
      - name: ["Disk Read Bytes", "Data Disk Write Operations/Sec", "Network Out"]
        aggregations: ["Average", "Maximum"]
  - resource_query: ""
  ....

AWS does not have that.
The reason I am not leaning at all towards "monitor_metrics" and then repeat the "metrics" naming inside is because I am not following up with the metrics or namespaces but with the actual resource id or option of identifying these resources so I went with "resources", alternatives that I can think about are "monitor_resources".
Resources is a very common notion in azure and I think whatever we decide to name it, it should represent the fact that we are expecting users to configure ways to select resources and under each option the desired metrics.

Are we final on the document format returned?
The example above is different to what we were thinking about in the proposal and https://www.elastic.co/guide/en/beats/devguide/current/event-conventions.html
Ex:

"azure" : {
            "monitor" : {
              "resource" : {
                "name" : "obstestmemleak",
                "location" : "West Europe",
                "type" : "Microsoft.Compute/virtualMachines"
              },
              "namespace" : "Microsoft.Compute/virtualMachines",
              "subscriptionId" : "unique identifier",
              "metrics" : {
                "percentage_cpu.max" : 1.7,
                "percentage_cpu.avg" : 1.34,
                "percentage_cpu.min" : 1.04
              }
            }
          },

The metric name is "Percentage CPU" and the aggregations selected are max, avg, min.
Several changes have been applied to the metric name , replaced " " with "_", lowercase the name. In some cases I am checking for "../sec" and replacing it with "_per_sec".
Should we just keep the original metric naming then and list the aggregations inside?

@exekias
Copy link
Contributor

exekias commented Jul 8, 2019

Thank you both for the answers

@exekias , there is a notable difference between the azure monitor and aws cloudwatch configuration and that is the resources that have to be identified before listing the namespaces.
This is why I went for:

- module: azure
  metricsets:
  - monitor
  enabled: true
  period: 60s
  client_id: ''
  client_secret: ''
  tenant_id: ''
  subscription_id: ''
  refresh_list_interval: 120s
  resources:
  - resource_id: "/subscriptions/xxxxx/resourceGroups/obs-infrastructure/providers/Microsoft.ApiManagement/service/obsapimanagement"
    namespaces:
    - name: "Microsoft.ApiManagement/service"
      metrics:
      - name: "Requests"
        aggregations: ["Maximum"]
        dimensions:
        - name: "Hostname"
          value: "obsapimanagement.azure-api.net"
  - resource_group: "obs-infrastructure"
    resource_type: "Microsoft.Compute/virtualMachines"
    namespaces:
    - name: "Microsoft.Compute/virtualMachines"
      metrics:
      - name: "Percentage CPU"
        aggregations: ["Average", "Minimum", "Maximum"]
    - name: "Microsoft.Compute/virtualMachines"
      metrics:
      - name: ["Disk Read Bytes", "Data Disk Write Operations/Sec", "Network Out"]
        aggregations: ["Average", "Maximum"]
  - resource_query: ""
  ....

AWS does not have that.

From your examples it would seem that for each resource filter you always use the same namespace, which probably makes sense. Perhaps resource filter and namespace should have a 1:1 mapping? That would mean the namespace is defined at the same level where you define the resource filter.

The reason I am not leaning at all towards "monitor_metrics" and then repeat the "metrics" naming inside is because I am not following up with the metrics or namespaces but with the actual resource id or option of identifying these resources so I went with "resources", alternatives that I can think about are "monitor_resources".

Resources is a very common notion in azure and I think whatever we decide to name it, it should represent the fact that we are expecting users to configure ways to select resources and under each option the desired metrics.

+1

Are we final on the document format returned?
The example above is different to what we were thinking about in the proposal and https://www.elastic.co/guide/en/beats/devguide/current/event-conventions.html
Ex:

"azure" : {
            "monitor" : {
              "resource" : {
                "name" : "obstestmemleak",
                "location" : "West Europe",
                "type" : "Microsoft.Compute/virtualMachines"
              },
              "namespace" : "Microsoft.Compute/virtualMachines",
              "subscriptionId" : "unique identifier",
              "metrics" : {
                "percentage_cpu.max" : 1.7,
                "percentage_cpu.avg" : 1.34,
                "percentage_cpu.min" : 1.04
              }
            }
          },

The metric name is "Percentage CPU" and the aggregations selected are max, avg, min.
Several changes have been applied to the metric name , replaced " " with "_", lowercase the name. In some cases I am checking for "../sec" and replacing it with "_per_sec".
Should we just keep the original metric naming then and list the aggregations inside?

Having the aggregation as a suffix or subdocument makes no difference for Elasticsearch, so your document looks good to me! I've updated my example to avoid confusion, as implementation wise, this is easier.

@kaiyan-sheng
Copy link
Contributor Author

kaiyan-sheng commented Jul 8, 2019

@exekias How about we remove the cloudwatch: in config and directly specify the namespace and/or metrics? If there is no metrics specified, then by default it will collect all available metrics from that namespace and all statistic methods.

- module: aws
  period: 300s
  metricsets:
    - cloudwatch
  namespace: AWS/EC2
  metrics:
    - name: ["CPUUtilization", "DiskReadOps", "DiskWriteOps"]
      statistic: ["Average", "Maximum"]
      dimensions:
        - name: InstanceId
          value: i-0686946e22cf9494a
- module: aws
  period: 300s
  metricsets:
    - cloudwatch
  namespace: AWS/RDS
  metrics:
    - statistic: ["Average", "Maximum"]
- module: aws
  period: 300s
  metricsets:
    - cloudwatch
  namespace: AWS/ELB

@exekias
Copy link
Contributor

exekias commented Jul 9, 2019

We have done this kind of namespacing (cloudwatch.*) for metricset setings in the past (check system module for instance, with process settings). That said, there is no standard way defined for this, and the proposed examples look good to me 👍

It seems that in Azure, in some cases, multiple namespaces may be needed. For instance to retrieve custom metrics along with VM metrics in Azure: MicrosoftDocs/azure-docs#28457.

So I would say the safe play is doing this for AWS:

- module: aws
  period: 300s
  metricsets:
    - cloudwatch
  metrics:
    - name: ["CPUUtilization", "DiskWriteOps"]
      namespace: AWS/EC2
      statistic: ["Average", "Maximum"]
    - name: ["DiskReadOps"]
      namespace: AWS/EC2
      statistic: ["Sum"]

And figure out Azure afterwards, as it may be different (to be seen)

@narph
Copy link
Contributor

narph commented Jul 9, 2019

It seems that in Azure, in some cases, multiple namespaces may be needed. For instance to retrieve custom metrics along with VM metrics in Azure: MicrosoftDocs/azure-docs#28457.

Yes, so it seems that users can create and send the custom metrics themselves using the following methods to Azure Monitor.
I chose the second option in the list:

  • Install the Windows Azure Diagnostics (WAD) extension on your Azure VM, virtual machine scale set, classic VM, or classic Cloud Services and send performance counters to Azure Monitor.

and I could already see the added metric namespace to my resource:

{  
   "value":[  
      {  
         "id":"/subscriptions/xxxxxx/resourceGroups/obs-infrastructure/providers/Microsoft.Compute/virtualMachines/obstestmemleak/providers/microsoft.insights/metricNamespaces/Azure.VM.Windows.GuestMetrics",
         "name":"Azure.VM.Windows.GuestMetrics",
         "type":"Microsoft.Insights/metricNamespaces",
         "classification":"Custom",
         "properties":{  
            "metricNamespaceName":"Azure.VM.Windows.GuestMetrics"
         }
      },
      {  
         "id":"/subscriptions/xxxxxxxxx/resourceGroups/obs-infrastructure/providers/Microsoft.Compute/virtualMachines/obstestmemleak/providers/microsoft.insights/metricNamespaces/Microsoft.Compute-virtualMachines",
         "name":"Microsoft.Compute-virtualMachines",
         "type":"Microsoft.Insights/metricNamespaces",
         "classification":"Platform",
         "properties":{  
            "metricNamespaceName":"Microsoft.Compute/virtualMachines"
         }
      }
   ]
}

I was then able to retrieve the metrics specified:

- resource_group: "obs-infrastructure"
  resource_type: "Microsoft.Compute/virtualMachines"
  namespaces:
  - name: "Microsoft.Compute/virtualMachines"
    metrics:
    - name: "Percentage CPU"
       aggregations: ["Average", "Minimum", "Maximum"]
    - name: "Microsoft.Compute/virtualMachines"
      metrics:
      - name: ["Disk Read Bytes", "Data Disk Write Operations/Sec", "Network Out"]
         aggregations: ["Average", "Maximum"]
    - name: "Azure.VM.Windows.GuestMetrics"
      metrics:
      - name: ["Memory\\Committed Bytes", "System\\Processes"]
        aggregations: ["Average", "Maximum"]

So a resource can allow multiple namespaces.

@exekias
Copy link
Contributor

exekias commented Jul 9, 2019

Thank you for sharing @narph!

So it seems clear that there are some differences between Azure and AWS, we should not try to force the same config. That said, I see an opportunity for a more similar layout, from your example:

- resource_group: "obs-infrastructure"
  resource_type: "Microsoft.Compute/virtualMachines"
  metrics:
    - name: "Percentage CPU"
      namespace: "Microsoft.Compute/virtualMachines"
      aggregations: ["Average", "Minimum", "Maximum"]

    - name: ["Disk Read Bytes", "Data Disk Write Operations/Sec", "Network Out"]
      namespace: "Microsoft.Compute/virtualMachines"
      aggregations: ["Average", "Maximum"]

    - name: ["Memory\\Committed Bytes", "System\\Processes"]
      namespace: "Azure.VM.Windows.GuestMetrics"
      aggregations: ["Average", "Maximum"]

WDYT?

@narph
Copy link
Contributor

narph commented Jul 9, 2019

sounds good, looks more compact already

@kaiyan-sheng
Copy link
Contributor Author

Awesome! I will start the implementation on the aws side. @exekias @narph Thanks for all the discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. enhancement Metricbeat Metricbeat Team:Integrations Label for the Integrations team v7.4.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants