Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Credentials are not retrieved from AWS IMDSv2 when running on EC2 #2840

Closed
SamuelDudley opened this issue Dec 9, 2020 · 49 comments
Closed
Assignees
Labels
AWS Issues with AWS plugins or experienced by users running on AWS

Comments

@SamuelDudley
Copy link

SamuelDudley commented Dec 9, 2020

Bug Report

Describe the bug

Credentials are not retrieved from AWS Instance Metadata Service v2 (IMDSv2) when running on EC2. This causes plugins that require credentials to fail (e.g.: cloudwatch).

To Reproduce

Steps to reproduce the problem:

  • Create an EC2 instance with metadata version 2 only selected on the Advanced Details section of the Configure Instance step.
    NB: I have used Amazon Linux 2 AMI (HVM), SSD Volume Type - ami-09f765d333a8ebb4b (64-bit x86) in this example
    image

  • As I will be using the cloudwatch output to demonstrate this issue I have assigned a very loose role to the instance:
    image

  • I created and assigned fully open security group to remove that as a potential issue.

  • Install Fluent Bit as per https://docs.fluentbit.io/manual/installation/linux/amazon-linux

  • Apply the following configuration:

[SERVICE]
    flush        1
    daemon       Off
    log_level    info
    parsers_file parsers.conf
    plugins_file plugins.conf
    http_server  Off
    http_listen  0.0.0.0
    http_port    2020
    storage.metrics on

[INPUT]
    Name                systemd
    Path                /var/log/journal
    Buffer_Chunk_Size   32000
    Buffer_Max_Size     64000

[OUTPUT]
    Name cloudwatch_logs
    Match   *
    region ap-southeast-2
    log_group_name testing
    log_stream_name bazz
    auto_create_group true
  • Restart the service: sudo service td-agent-bit restart

Expected behaviour

Expected fluent bit to obtain temporary credentials from the instance metadata service and forward the logs to cloudwatch.

Observed behaviour

Fluent bit fails to obtain credentials and the cloudwatch stream is not created & logs are not sent.

Your Environment

  • Version used: Fluent bit v1.6
  • Configuration: (see above)
  • Environment name and version (e.g. Kubernetes? What version?): N/A
  • Server type and version: AWS EC2 (t2.micro) IMDSv2 enabled and IMDSv1 disabled
  • Operating System and version: Amazon Linux 2 (AMI: ami-09f765d333a8ebb4b)
  • Filters and plugins: cloudwatch (output) systemd (input)

Additional context

Firstly, thank you for this great bit of software 👍

In an AWS environment disabling IMDSv1 is considered best security practice due to the security venerability that it creates. We would like to follow this recommendation but currently can't with the issue described above.

I note that the AWS Metadata filter has a option to allow a user to select between IMDSv1 and v2 and it appears that the code to retrieve the token and pass it in the metadata request header as required by IMDSv2 is already implemented in the codebase but is not used for obtaining credentials.

NB: The above configuration works fine and without issue when IMDSv1 is enabled on the EC2 instance.

@zandernelson
Copy link

This exact same issue is affecting us with our fluent bit Kubernetes daemonset. We are using IMDSv2 on our EKS nodes and fluentbit is unable to communicate with our elastic search cluster. As a result, we have to turn off the AWS_Auth parameter.

This should be a high priority as this is a security risk for many users.

@LukaszRacon
Copy link

Check if you are affected by the hop limit - increase it to 2:
aws-cli ec2 modify-instance-metadata-options --instance-id i-00000000000 --http-put-response-hop-limit 2

https://aws.amazon.com/about-aws/whats-new/2020/08/amazon-eks-supports-ec2-instance-metadata-service-v2/

IMDSv2 requires a PUT request to initiate a session to the instance metadata service and retrieve a token. By default, the response to PUT requests has a response hop limit (time to live) of 1 at the IP protocol level. However, this limit is incompatible with containerized applications on Kubernetes that run in a separate network namespace from the instance.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Mar 9, 2021
@SamuelDudley
Copy link
Author

Still an issue, nothing to do with the hop limit. The code to handle IMDSv2 simply is not used for obtaining credentials.

@github-actions github-actions bot removed the Stale label Mar 10, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Apr 9, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Apr 9, 2021
@SamuelDudley
Copy link
Author

Commenting to keep this issue alive as I cant edit / remove labels.

@github-actions github-actions bot removed the Stale label Apr 10, 2021
@smithdebug
Copy link

Hi, I try to run Fluent bit on a Windows server 2016, the Cloudwatch plugins seem unable to authenticate using the Instance Profile.

@agup006
Copy link
Member

agup006 commented Apr 18, 2021

Can we try this with 1.7.x and see if it reproducing?

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label May 19, 2021
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@PettitWesley PettitWesley reopened this Jul 16, 2021
@PettitWesley PettitWesley self-assigned this Jul 16, 2021
@PettitWesley PettitWesley added AWS Issues with AWS plugins or experienced by users running on AWS and removed Stale labels Jul 16, 2021
@PettitWesley
Copy link
Contributor

Sorry folks. This is a feature gap which I had meant to address late last year but then lost it with too many other higher priority feature requests and bugs.

We will get someone to work on this soon.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Aug 16, 2021
@agup006 agup006 removed the Stale label Aug 16, 2021
@shalevutnik
Copy link

This issue is preventing using the S3 output plugin. The current workaround to use IMDSv1 is a security breach. Do you have an ETA to have it resolved?

@PettitWesley
Copy link
Contributor

@shalevutnik Yes, I understand this is very important, but I am stretched very thin lately. Unfortunately I can't give a promise any exact ETA yet but I have gotten someone from my team assigned to start work on this soon.

@matthewfala
Copy link
Contributor

Hi 👋 I am currently working on adding IMDSv2 support to AWS Fluent Bit plugins. Thank you for your patience. I will update you on the progress of this feature.

@matthewfala
Copy link
Contributor

Hi @kdalporto. This is not the hops limit issue any more, since you have hops limits correctly set to 2 at it looks like from your error logs Fluent Bit is not having that problem. It seems like the IMDS may be unreachable. Is it possible for you to try to curl 169.254.169.254 on your instance?

curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"

This should return a token.

@kdalporto
Copy link

kdalporto commented Nov 10, 2021

@matthewfala yes that returns a ~56 character token when running on the node instance where fluent-bit is running. I'm also able to manually upload objects to the destination bucket via the CLI. I currently have HttpTokens set to required.

@matthewfala
Copy link
Contributor

matthewfala commented Nov 10, 2021

That's strange. Your error message should only come up [imds] unable to evaluate IMDS version if the following request does not complete:

The following curl should return with a status code of 401 which indicates IMDSv2 availability.

curl -H "X-aws-ec2-metadata-token: INVALID" -v http://169.254.169.254/

It's not clear why this request is failing (not returning anything) (401 is expected).

@kdalporto
Copy link

That curl does indeed lead to a 401:

* About to connect() to 169.254.169.254 port 80 (#0)
*   Trying 169.254.169.254...
* Connected to 169.254.169.254 (169.254.169.254) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 169.254.169.254
> Accept: */*
> X-aws-ec2-metadata-token: INVALID
>
< HTTP/1.1 401 Unauthorized
< Content-Length: 0
< Date: Wed, 10 Nov 2021 23:38:57 GMT
< Server: EC2ws
< Connection: close
< Content-Type: text/plain
<
* Closing connection 0

@kdalporto
Copy link

kdalporto commented Nov 11, 2021

@matthewfala, I have a bit of an update. I've realized on two separate occasions that logs have gotten sent to S3, but I wasn't sure why. This morning it realized it had occurred again, as a result of me deleting my kubernetes deployment, the logs were sent to S3. This is consistent with the documentation snippet:

"If Fluent Bit is stopped suddenly it will try to send all data and complete all uploads before it shuts down."

At the moment, I don't understand why it seems to be able to send to S3 on shutdown, but fails during normal operations.

Update: I tried to reproduce the above scenario, however no logs were sent on shutdown this time.

@matthewfala
Copy link
Contributor

I'm not sure what the issue could be. The process of obtaining credentials during shutdown is the same as the process of obtaining credentials during normal operations. That is if the inputs (some of which have network activity) are not interfering with our requests. One thing that might be happening is that the input collectors are shut down, while the output plugins are still sending out logs. If the input plugin that is interfering with our network requests is stopped, then then that might explain why on shut down we are able to reach IMDS and during normal operations we are not. What input plugins are you using? anything that might require networking such as Prometheus?

I have a custom image which adds IMDSv1 fallback support and also some extra debug statements for IMDS problems. If you want to test this out and send the resulting logs, they could help us figure out what the problem is:
(if IMDSv2 fails IMDSv1 will be tried)

Here's the image repo and tag -

826489191740.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-for-fluent-bit:1.8.8-imds-fallback-patch

@kdalporto
Copy link

Yes, Prometheus is running in our deployment. I'll try to utilize that image and grab the logs.

@kdalporto
Copy link

Circling back on this... The issue was that the overall Kubernetes deployment repo we use specifically blocks pods from accessing IMDS in the namespace fluent-bit is deployed in, but access is still available from the instance level. I've confirmed running fluent-bit in it's own separate namespace allows fluent-bit to send logs to S3 with IMDS.

@PettitWesley
Copy link
Contributor

@kdalporto Thanks for this post. I had forgotten about that, I believe its recommended in EKS and ECS to block containers from accessing IMDS.

@matthewfala
Copy link
Contributor

Awesome @kdalporto. I'm glad to hear that this is no longer an issue for you. Thank you for letting us know.

@Ahlaee
Copy link

Ahlaee commented Apr 29, 2022

Hi, I'm using Fluent Bit v1.8.15 / aws-for-fluent-bit 2.23.4 on AWS EKS and I'm still getting this in the logs

[2022/04/29 11:16:43] [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS

I'm using IMDSv2 with the correct hop limit:
{
"State": "applied",
"HttpTokens": "required",
"HttpPutResponseHopLimit": 2,
"HttpEndpoint": "enabled",
"HttpProtocolIpv6": "disabled",
"InstanceMetadataTags": "disabled"
}

curl -H "X-aws-ec2-metadata-token: INVALID" -v http://169.254.169.254/ is reporting 401
curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600" returns a token

Sending logs to Cloudwatch does work though (at least for now). So I'm not sure if this is an error message which refers to IMDSv1 while IMDSv2 is working fine.

@PettitWesley
Copy link
Contributor

@Ahlaee Is there more log output than that?

CC @matthewfala

@Ahlaee
Copy link

Ahlaee commented Apr 29, 2022

@PettitWesley Everything else looks ok:

Fluent Bit v1.8.15

  • Copyright (C) 2015-2021 The Fluent Bit Authors
  • Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
  • https://fluentbit.io

[2022/04/29 10:33:43] [ info] [engine] started (pid=1)
[2022/04/29 10:33:43] [ info] [storage] version=1.1.6, initializing...
[2022/04/29 10:33:43] [ info] [storage] root path '/var/fluent-bit/state/flb-storage/'
[2022/04/29 10:33:43] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/04/29 10:33:43] [ info] [storage] backlog input plugin: storage_backlog.8
[2022/04/29 10:33:43] [ info] [cmetrics] version=0.2.2
[2022/04/29 10:33:43] [ info] [input:systemd:systemd.3] seek_cursor=s=bfc76bb2c6464c94b13827824290ea6a;i=14f... OK
[2022/04/29 10:33:43] [ info] [input:storage_backlog:storage_backlog.8] queue memory limit: 4.8M
[2022/04/29 10:33:43] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443
[2022/04/29 10:33:43] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2022/04/29 10:33:43] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2022/04/29 10:33:43] [ info] [filter:kubernetes:kubernetes.0] connectivity OK
[2022/04/29 10:33:43] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS on initialization
[2022/04/29 10:33:43] [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS on initialization
[2022/04/29 10:33:43] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2022/04/29 10:33:43] [ info] [sp] stream processor started

After that it creates the Log Streams. And then it repeats indefinitely:

[2022/04/29 20:16:46] [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS

Logs are forwarded to cloudwatch nonetheless.

@PettitWesley
Copy link
Contributor

@Ahlaee Ah this is the EC2 filter... and I think I might know the problem, you might have IMDS blocked for containers- this is a common/best practice. Does your setup include any of this? https://aws.amazon.com/premiumsupport/knowledge-center/ecs-container-ec2-metadata/

@Ahlaee
Copy link

Ahlaee commented May 3, 2022

@PettitWesley No, our setup runs on EKS not ECS. I never configured anything related to networking modes when spinning up the cluster using the console. As far as I understand from the linked article, having IMDS blocked is an intentional setting that must be included in the user data of the Amazon EC2 instance. I didn't include anything related to this. It might be implicitly included by AWS in the cluster creation process.

@PettitWesley
Copy link
Contributor

PettitWesley commented May 3, 2022

@Ahlaee Hmm you're right, this looks like the right link for EKS IMDS related things: aws/containers-roadmap#1109

After that it creates the Log Streams. And then it repeats indefinitely:

[2022/04/29 20:16:46] [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS

Logs are forwarded to cloudwatch nonetheless.

Yea so the filter is failing, creds must be succeeding. Can you please share your full config?

Also since you have IMDSv2 required (tokens required), then you need to set the config in the AWS filter: https://docs.fluentbit.io/manual/pipeline/filters/aws-metadata

[FILTER]
    Name aws
    Match *
    imds_version v2

@PettitWesley PettitWesley reopened this May 3, 2022
@Ahlaee
Copy link

Ahlaee commented May 4, 2022

I was following the AWS documentation when setting up fluent-bit for EKS: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-logs-FluentBit.html

Their fluent-bit.yaml which is linked under

https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/fluent-bit/fluent-bit.yaml

contains an older image of the software, that doesn't support IMDSv2 and also has the imds_version filter set to v1.

Setting the image version to 2.23.4 and the filter to imds_version v2 as you described above solved the issue for me. :)

Thank you!

@cgill27
Copy link

cgill27 commented May 12, 2022

I concur with @Ahlaee , using EKS with AWS supplied docs for setting up fluent-bit to Cloudwatch, also setting the image to 2.23.4 and imds_version v2 solved the issue for me aswell

@mconigliaro
Copy link

mconigliaro commented Jun 14, 2022

Just setting imds_version v2 fixed this for me. FWIW, it looks like the current stable version is 2.23.3.

@babebort
Copy link

Seems like for me just helped changed imds_version to v2

@whereisaaron
Copy link

In Oct 2022 the container image version in this manifest were new enough for IMDS v2 but configuration still contained 'imds_version v1' in two places. Updating 'v1' to 'v2' (in two places) was enough to fix that.

https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml

sylvan-mostert-th added a commit to thrivehealth/amazon-cloudwatch-container-insights that referenced this issue Jun 1, 2023
imds_version causing logging errors ([2023/06/01 18:43:56] [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS)

fluent/fluent-bit#2840 (comment)
@geocomm-shenningsgard
Copy link

geocomm-shenningsgard commented Jun 2, 2023

FWIW, I just re-deployed fluent-bit public.ecr.aws/aws-observability/aws-for-fluent-bit@sha256:ff702d8e4a0a9c34d933ce41436e570eb340f56a08a2bc57b2d052350bfbc05d and started receiving the error [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS. I changed the value for imds_version to v2 in both spots in the ConfigMap (and restarted the DaemonSet) and am still seeing the error.

@PettitWesley
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AWS Issues with AWS plugins or experienced by users running on AWS
Projects
None yet
Development

No branches or pull requests