Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign up1.6.1 becomes unresponsive when fails to describe EC2 instances when using auto discovery #2876
Comments
This comment has been minimized.
This comment has been minimized.
|
This happened with 1.7.1, around the same time last night, so wondering if this might be something up with AWS credentials:
From researching online it appears this can happen if the timestamp is off, but checking the date the time seems fine, so will continue looking at that. However, it still appears that Prometheus doesn't recover well from this situation (or it could be that AWS access is continually flipping out at that time and not providing info to scrape) |
brian-brazil
added
the
component/service discovery
label
Jul 7, 2017
brian-brazil
added
priority/P2
kind/bug
help wanted
labels
Jul 14, 2017
gouthamve
added
the
hacktoberfest
label
Sep 28, 2017
This comment has been minimized.
This comment has been minimized.
Sleashe
commented
Nov 7, 2017
|
Experiencing the exact same issue with tag latest (from docker hub). I suppose the prometheus version is the latest stable one (1.8.2).
|
This comment has been minimized.
This comment has been minimized.
|
The startup message in the logs tells you the exact version. But indeed, I believe the EC2 fix in 1.8.2 is different from the issue described here. |
This comment has been minimized.
This comment has been minimized.
kanand2
commented
Nov 7, 2017
|
I too got the same error and was able to fix that by passing proxy while running the prometheus docker image and it worked for me. |
This comment has been minimized.
This comment has been minimized.
dbonatto
commented
Dec 12, 2017
|
Got the same issue described here while using 1.8.2. |
This comment has been minimized.
This comment has been minimized.
|
Anyone still seeing the issue with the latest v2.x Prometheus? |
This comment has been minimized.
This comment has been minimized.
|
Closing as it relates to Prometheus 1.x. Feel free to (re)open the issue if it happens with the latest stable version too. |
simonpasquier
closed this
Oct 1, 2018
This comment has been minimized.
This comment has been minimized.
vietthang207
commented
Dec 28, 2018
|
@simonpasquier I am using 2.3.1 binary executable on ubuntu 16.04 and I've experience this error: level=error ts=2018-12-28T06:20:25.252856805Z caller=ec2.go:180 component="discovery manager scrape" discovery=ec2 msg="Refresh failed" err="could not describe instances: AuthFailure: AWS was not able to validate the provided access credentials\n\tstatus code: 401, request id: c43d1de4-b68a-4953-92ea-dae04b421f0b" After that the whole server crash because it ran out of Memory and suffer high load. Here is part of my config:
Thanks in advanced! |
This comment has been minimized.
This comment has been minimized.
|
@vietthang207 please use our user mailing list, which you can also search. The message |
damienmarshall commentedJun 26, 2017
What did you do?
I'm running Prometheus in a docker container on Ubuntu 16.04.02 on AWS using EC2 Auto discovery feature. Every few days I see the following errors in logs, after which CPU usage appears to spike and Prometheus appears to crash, requiring a restart
Environment
AWS
Linux 4.4.0-1020-aws x86_64
1.6.1 (have since upgrade to 1.7.1 to see if this helps)