-
Notifications
You must be signed in to change notification settings - Fork 49
ci instability: status
command only waits for 1 minute to resolve instance URLs
#981
Comments
status
command only waits for 1 minutes to resolve instance URLsstatus
command only waits for 1 minute to resolve instance URLs
Another one this time it failed to connect to
But it was able to connect a few minutes earlier just after the cluster was created.
|
|
Status command debug logs:
Loki querier logs show the
A couple of minutes later it was still trying to fetch data:
This could be a network hiccup preventing the queriers from fetching data from ingesters and/or S3. |
Should we bump the status command timeout to e.g. 5 minutes because this might have good impact on CI and not so important negative impact on anything else? :) |
|
I was testing locally, increasing the request timeout from 10s to 60s but it didn't address the issue. Weirdly, it works just after the create command finishes, and then this happens when you run the status command a few minutes later. |
I was able to repro this issue a few times locally by deleting the loki namespace and waiting for the controller to recreate things before running the status command. |
Closing since we haven't seen this since we bumped the timeout. We can reopen if we see it again. |
https://buildkite.com/opstrace/scheduled-main-builds/builds/3415 |
https://buildkite.com/opstrace/prs/builds/4923#58765aca-2f2f-4bd9-b19e-3469e67742c8/2087-2584
The
Error: timeout checking the status of the cluster
came after 1 minute. Is that timeout constant a little small?I think that makes sense if
status
is supposed to check the current state of the cluster w.r.t. some externally visible properties.Now, the idea of "current" of course has impact on the timeout constants being used.
The
status
command is also a little underspecified, so this issue is not a big deal, and potentially not an issue at all.This here is actually also a manifestation of #834 because
getaddrinfo ENOTFOUND
was issued for a DNS name that was resolvable shortly before that.The text was updated successfully, but these errors were encountered: