Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discovery jobs filtering behaviour not working as expected #821

Closed
2 tasks
thepalbi opened this issue Feb 22, 2023 · 1 comment · Fixed by #831
Closed
2 tasks

Discovery jobs filtering behaviour not working as expected #821

thepalbi opened this issue Feb 22, 2023 · 1 comment · Fixed by #831

Comments

@thepalbi
Copy link
Contributor

thepalbi commented Feb 22, 2023

Description

Discovery jobs have a complex behaviour on how they fetch resources, fetch metrics, and cross match those to apply the corresponding tags to each. Basically, the behaviour could be summarized in the following steps:

image

Steps 1,1.5,2,3 work with resources, and step 4 fetches all metrics with all their possible dimension combinations. Step 5 is crucial, it combines the discovered resources with the metrics, associating each metrics stream found, with the resource it "best" describes. This is important since the resource carries the tag, and if the mapping goes wrong, the metric will have tags and an ARN associated that corresponds to other resource.

Right now, when scraping metrics for services that contain several resources, for example GlobalAccelerator, the mapping is not working as expected. See the test cases in the following link for having an example.

The expected/got results of the tests are listed below.

image

So far, this problem only seems to apply to services that have the following characteristics:

  • There's a metric that could match more that one resource. For example in GlobalAcclerator, there's the ProcessedByteOut that can have the following dimension sets: (Accelerator), (Accelerator, Listener), (Accelerator, Listener, EndpointGroup). This can be though of the same metrics, at different granularities
  • The ARN schema this service uses has a common prefix. For the example above, the ARN for each resource type is:
    • Accelerator: arn:aws:globalaccelerator::012345678901:accelerator/super-accelerator
    • Listener: arn:aws:globalaccelerator::012345678901:accelerator/super-accelerator/listener/some_listener
    • EndpointGroup: arn:aws:globalaccelerator::012345678901:accelerator/super-accelerator/listener/some_listener/endpoint-group/eg1

The root cause is the matching algorithm is just looking one dimension at a time, while it should be looking at multiple.

Work

Related issues

@thepalbi
Copy link
Contributor Author

Tries running the exporter and confirmed the error occurs, associating some metrics to the wrong resource. For example, having the following resources:

  • ECS cluster named scorekeep-cluster
  • ECS service named scorekeep-service, living in the cluster above

And the following YACE config:

    sts_region: us-east-1
    discovery:
      jobs:
        - type: AWS/ECS
          regions: [us-east-1]
          metrics:
            # cluster wide metrics
            - name: CPUReservation
              period: 5m
              statistics:
                - Average
                - Maximum
            - name: MemoryReservation
              period: 5m
              statistics:
                - Average
                - Maximum
            # cluster/service metrics
            - name: CPUUtilization
              period: 5m
              statistics:
                - Average
                - Maximum
            - name: MemoryUtilization
              period: 5m
              statistics:
                - Average
                - Maximum

According to the docs, both the CPUReservation and the MemoryReservation metrics, only are applicable to the ClusterName dimension, hence, they should be associated with the ecs:cluster resource.

If we look at the metrics scraped with the configuration above:

image

We can see all reservation metrics are associated with the ECS service ARN, instead of the ECS cluster ARN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants