Discovery jobs filtering behaviour not working as expected #821

thepalbi · 2023-02-22T14:40:29Z

Description

Discovery jobs have a complex behaviour on how they fetch resources, fetch metrics, and cross match those to apply the corresponding tags to each. Basically, the behaviour could be summarized in the following steps:

Steps 1,1.5,2,3 work with resources, and step 4 fetches all metrics with all their possible dimension combinations. Step 5 is crucial, it combines the discovered resources with the metrics, associating each metrics stream found, with the resource it "best" describes. This is important since the resource carries the tag, and if the mapping goes wrong, the metric will have tags and an ARN associated that corresponds to other resource.

Right now, when scraping metrics for services that contain several resources, for example GlobalAccelerator, the mapping is not working as expected. See the test cases in the following link for having an example.

The expected/got results of the tests are listed below.

So far, this problem only seems to apply to services that have the following characteristics:

There's a metric that could match more that one resource. For example in GlobalAcclerator, there's the ProcessedByteOut that can have the following dimension sets: (Accelerator), (Accelerator, Listener), (Accelerator, Listener, EndpointGroup). This can be though of the same metrics, at different granularities
The ARN schema this service uses has a common prefix. For the example above, the ARN for each resource type is:
- Accelerator: arn:aws:globalaccelerator::012345678901:accelerator/super-accelerator
- Listener: arn:aws:globalaccelerator::012345678901:accelerator/super-accelerator/listener/some_listener
- EndpointGroup: arn:aws:globalaccelerator::012345678901:accelerator/super-accelerator/listener/some_listener/endpoint-group/eg1

The root cause is the matching algorithm is just looking one dimension at a time, while it should be looking at multiple.

Work

Add tests and refactor Refactor CW metrics to resource association logic and add tests #831
Fix

Related issues

[BUG] ecs-svc discovery includes all services in a cluster, and metric labels come from an arbitrary service #627

The text was updated successfully, but these errors were encountered:

thepalbi · 2023-02-28T10:45:03Z

Tries running the exporter and confirmed the error occurs, associating some metrics to the wrong resource. For example, having the following resources:

ECS cluster named scorekeep-cluster
ECS service named scorekeep-service, living in the cluster above

And the following YACE config:

    sts_region: us-east-1
    discovery:
      jobs:
        - type: AWS/ECS
          regions: [us-east-1]
          metrics:
            # cluster wide metrics
            - name: CPUReservation
              period: 5m
              statistics:
                - Average
                - Maximum
            - name: MemoryReservation
              period: 5m
              statistics:
                - Average
                - Maximum
            # cluster/service metrics
            - name: CPUUtilization
              period: 5m
              statistics:
                - Average
                - Maximum
            - name: MemoryUtilization
              period: 5m
              statistics:
                - Average
                - Maximum

According to the docs, both the CPUReservation and the MemoryReservation metrics, only are applicable to the ClusterName dimension, hence, they should be associated with the ecs:cluster resource.

If we look at the metrics scraped with the configuration above:

We can see all reservation metrics are associated with the ECS service ARN, instead of the ECS cluster ARN.

This was referenced Feb 28, 2023

Refactor CW metrics to resource association logic and add tests #831

Merged

Fix resource association on discovery mode #833

Closed

cristiangreco closed this as completed in #831 Feb 28, 2023

cristiangreco reopened this Feb 28, 2023

thepalbi mentioned this issue Mar 30, 2023

[BUG] ecs-svc discovery includes all services in a cluster, and metric labels come from an arbitrary service #627

Closed

1 task

cristiangreco closed this as completed May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discovery jobs filtering behaviour not working as expected #821

Discovery jobs filtering behaviour not working as expected #821

thepalbi commented Feb 22, 2023 •

edited

Loading

thepalbi commented Feb 28, 2023

Discovery jobs filtering behaviour not working as expected #821

Discovery jobs filtering behaviour not working as expected #821

Comments

thepalbi commented Feb 22, 2023 • edited Loading

Description

Work

Related issues

thepalbi commented Feb 28, 2023

thepalbi commented Feb 22, 2023 •

edited

Loading