Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Discovery on AWS API denied using EC2 role #4686

Open
DrHashi opened this Issue Oct 1, 2018 · 11 comments

Comments

Projects
None yet
8 participants
@DrHashi
Copy link

DrHashi commented Oct 1, 2018

Proposal

Fix AWS api calls to describe instance while using ec2_profile/role

Bug Report

What did you expect to see?
nodes being automatically propagated

  • System information:
	Linux 4.4.0-1067-aws x86_64
  • Prometheus version:
prometheus, version 2.4.2 (branch: HEAD, revision: c305ffaa092e94e9d2dbbddf8226c4813b1190a0)
  build user:       root@dcde2b74c858
  build date:       20180921-07:22:29
  go version:       go1.10.3
  • Prometheus configuration file:
global:
  scrape_interval: 5s
  evaluation_interval: 5s

scrape_configs:
  - job_name: 'node'
    ec2_sd_configs:
      - role_arn: arn:aws:iam::XXXXXXXXXX:role/prometheus_ec2_readonly
        region: us-east-2
  • EC2 Trust relationship:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
  • EC2 Instance Profile configuration:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "ec2:Describe*",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "elasticloadbalancing:Describe*",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:ListMetrics",
                "cloudwatch:GetMetricStatistics",
                "cloudwatch:Describe*"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "autoscaling:Describe*",
            "Resource": "*"
        }
    ]
}
  • Logs:
Oct  1 15:01:13 ip-10-2-254-92 prometheus[12325]: level=error ts=2018-10-01T19:01:13.087549155Z caller=ec2.go:184 component="discovery manager scrape" discovery=ec2 msg="Refresh failed" err="could not describe instances: AccessDenied: Access denied\n\tstatus code: 403, request id: 5da066a2-c5ac-11e8-b787-43656437a7ca"
Oct  1 15:02:13 ip-10-2-254-92 prometheus[12325]: level=error ts=2018-10-01T19:02:13.082507682Z caller=ec2.go:184 component="discovery manager scrape" discovery=ec2 msg="Refresh failed" err="could not describe instances: AccessDenied: Access denied\n\tstatus code: 403, request id: 8163ad26-c5ac-11e8-b787-43656437a7ca"
Oct  1 15:03:13 ip-10-2-254-92 prometheus[12325]: level=error ts=2018-10-01T19:03:13.085338666Z caller=ec2.go:184 component="discovery manager scrape" discovery=ec2 msg="Refresh failed" err="could not describe instances: AccessDenied: Access denied\n\tstatus code: 403, request id: a526f299-c5ac-11e8-b787-43656437a7ca"
Oct  1 15:04:13 ip-10-2-254-92 prometheus[12325]: level=error ts=2018-10-01T19:04:13.084593055Z caller=ec2.go:184 component="discovery manager scrape" discovery=ec2 msg="Refresh failed" err="could not describe instances: AccessDenied: Access denied\n\tstatus code: 403, request id: c8ea5f9e-c5ac-11e8-b787-43656437a7ca"
Oct  1 15:05:13 ip-10-2-254-92 prometheus[12325]: level=error ts=2018-10-01T19:05:13.089946781Z caller=ec2.go:184 component="discovery manager scrape" discovery=ec2 msg="Refresh failed" err="could not describe instances: AccessDenied: Access denied\n\tstatus code: 403, request id: ecada609-c5ac-11e8-b787-43656437a7ca"
Oct  1 15:06:13 ip-10-2-254-92 prometheus[12325]: level=error ts=2018-10-01T19:06:13.088427672Z caller=ec2.go:184 component="discovery manager scrape" discovery=ec2 msg="Refresh failed" err="could not describe instances: AccessDenied: Access denied\n\tstatus code: 403, request id: 1070c529-c5ad-11e8-b787-43656437a7ca"
Oct  1 15:07:13 ip-10-2-254-92 prometheus[12325]: level=error ts=2018-10-01T19:07:13.083561777Z caller=ec2.go:184 component="discovery manager scrape" discovery=ec2 msg="Refresh failed" err="could not describe instances: AccessDenied: Access denied\n\tstatus code: 403, request id: 34340ac4-c5ad-11e8-b787-43656437a7ca"
Oct  1 15:08:13 ip-10-2-254-92 prometheus[12325]: level=error ts=2018-10-01T19:08:13.084569986Z caller=ec2.go:184 component="discovery manager scrape" discovery=ec2 msg="Refresh failed" err="could not describe instances: AccessDenied: Access denied\n\tstatus code: 403, request id: 57f750e3-c5ad-11e8-b787-43656437a7ca"

@lostick

This comment has been minimized.

Copy link

lostick commented Oct 8, 2018

You might need to add ec2:DescribeTags action

@DrHashi

This comment has been minimized.

Copy link
Author

DrHashi commented Oct 16, 2018

You might need to add ec2:DescribeTags action

I already gave it access as per the following:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "ec2:Describe*", "Resource": "*" }

which includes ec2:DescribeTags
The only way I got the discovery to work is by giving prometheus restricted pragmatic access using aws access key and secret key.

@anshulshrivastava

This comment has been minimized.

Copy link

anshulshrivastava commented Nov 15, 2018

Same issue for me.
I created an IAM role (ROLE-1) which has EC2 Read-only Access. And then attach a policy to the Prometheus server's IAM role (ROLE-2) to assume ROLE-1. and passed ROLE-2 into the prometheus.yml file

prometheus.yml file

global:
  scrape_interval:     15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']
  - job_name: node
    scrape_interval: 50s
    scrape_timeout: 15s
    ec2_sd_configs:
      - region: us-east-1
        role_arn: arn:aws:iam::IAM_ROLE_ARN_HERE
        port: 9100
    relabel_configs:
      - source_labels: [__meta_ec2_tag_Name]
        regex: PRODUCT_CODE.*
        action: keep
      - source_labels: [__meta_ec2_instance_id]
        target_label: instance

logs from Prometheus container -

level=info ts=2018-11-15T22:41:15.136244498Z caller=web.go:397 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2018-11-15T22:41:15.141908958Z caller=main.go:564 msg="TSDB started"
level=info ts=2018-11-15T22:41:15.141950489Z caller=main.go:624 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-11-15T22:41:15.142918604Z caller=main.go:650 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-11-15T22:41:15.142949318Z caller=main.go:523 msg="Server is ready to receive web requests."
2018/11/15 22:41:15 Request.EachPage deprecated. Use Pagination type for configurable pagination of API operations
level=error ts=2018-11-15T22:41:15.213598245Z caller=ec2.go:170 component="discovery manager scrape" discovery=ec2 msg="Refresh failed" err="could not describe instances: AccessDenied: Access denied\n\tstatus code: 403, request id: 8f4c9fe7-"

I am not sure what I am missing here.

@DasFranck

This comment has been minimized.

Copy link

DasFranck commented Dec 3, 2018

I've the same problem, with the same config.
Any news for this issue ?

@nizam001

This comment has been minimized.

Copy link

nizam001 commented Jan 14, 2019

I also faced same issue , but after some hit and trial I found that , "role_arn" is the culprit, instead of this use "profile". It worked for me.

ec2_sd_configs:
- region: 'us-west-2'
profile: 'arn:aws:iam::XXXXX:instance-profile/my-ec2-role'
filters:
- name: tag:Service
values:
- abc

@ldormoy

This comment has been minimized.

Copy link

ldormoy commented Jan 29, 2019

I face the exact same issue with prometheus in a multi-account AWS structure.

Using "profile" instead of "role_arn" results in prometheus discovering the EC2 instances of the AWS account where it runs. I actually want it to perform cross-account discovery. I also do not want to create extra AWS access keys for prometheus in each account: assumable IAM roles exist for this purpose.

@DrHashi

This comment has been minimized.

Copy link
Author

DrHashi commented Jan 29, 2019

I used profile instead and it worked, HOWEVER I still believe that the right fix is to have the role name be able to call the API directly instead of having to specify the role_arn or in this case the profile_arn.

For example instead of

role_arn: arn:aws:iam::XXXXXXXXXX:role/prometheus_ec2_readonly
or
role_arn: arn:aws:iam::XXXXXXXXXX:profile/prometheus_ec2_readonly

we should be able to call the machine role directly and discover the arn from within the instance

role: prometheus_ec2_readonly

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Jan 30, 2019

If anyone has a clue of how this can be achieved with the AWS Go SDK, feel free to open a PR. Other than that I'm afraid that none of the Prometheus maintainers is familiar enough with the EC2 SD mechanism to tackle this.

@nizam001

This comment has been minimized.

Copy link

nizam001 commented Feb 22, 2019

Guys, we have been actually confused by the "role_arn" option. Think, if that IAM role is attached to my instance itself then why should I explicitly pass role_arn? I was investigating prometheus code for ec2 service discovery, I saw role_arn parameter is being used badly, there is no need to pass this parameter at all. AWS 'go' sdk takes care of this thing , if role is attached to instance, it creates temporary creds by sts assume role. Also, there is no need to pass profile parameter as well, there should only be options of API keys and region. As long as someone is passing API keys OR instance is having an IAM role attached, its enough. If IAM role is attached to instance, following code is working, Don't pass profile or role_arn param

  - job_name: ec2-instance-resources
    metrics_path: '/metrics'
    ec2_sd_configs:
      - region: 'us-west-2'
        port: 9100
        filters:
          - name: tag-key
            values:
              - node
@grockeek

This comment has been minimized.

Copy link

grockeek commented Mar 22, 2019

@ldormoy : I was able to achieve this, creating a unique pair of access/secret key :

  • A "main" account in which you create an user w/ access/secret key; aws credentials file needs to be deployed on your Prometheus machine and eventually included in systemd service file (through Environment=AWS_SHARED_CREDENTIALS_FILE=your_file/location directive). You also need an AssumeRole policy you will directly attach to your previously created user.

  • N-account having each one a role w/ the given policy AmazonEC2ReadOnlyAccess attached to them. You also need to create a trust relationship between your role and user created in your "main" account.
    You also need to have the aws config file created and deployed on your Prometheus machine (profiles will contains role arn you previously created).

Through that, you can configure prometheus.yaml this way :

- job_name: AWS - my-region-x - my_exporter
    ec2_sd_configs:
      - region: my-region-x
        profile: my-profile-n
        port: my_exporter_port
    relabel_configs:
      - source_labels: [__meta_ec2_tag_my_exporter_tag]
        regex: true
        action: keep
        # Use the instance ID as the instance label
      - source_labels: [__meta_ec2_private_ip]
        target_label: instance

And use this block n-times (n-role) to scrape exporter in different account and/or region ;)

@DrHashi

This comment has been minimized.

Copy link
Author

DrHashi commented Mar 22, 2019

@grockeek The functionality is to NOT use keys or store keys locally as it's insecure and can be easily exploited, rather the EC2 machine role AKA role_arn in this case.

The api SHOULD be able to handle what role to use without the role_arn but it does not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.