Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus: better resiliency: consider add --continue-on-error #236

Closed
Dentrax opened this issue Jan 20, 2022 · 0 comments · Fixed by grafana/mimir#3052 · May be fixed by #237
Closed

prometheus: better resiliency: consider add --continue-on-error #236

Dentrax opened this issue Jan 20, 2022 · 0 comments · Fixed by grafana/mimir#3052 · May be fixed by #237

Comments

@Dentrax
Copy link

Dentrax commented Jan 20, 2022

Prometheus analyzing process is always crashing after long run at random queries. There are a few workarounds to handle this situation in better way:

  • better resiliency by applying retry/timeout per query
  • continue to next query in case an error thrown

$ cortextool analyse prometheus ... command throws an exception during the analyzing process as such:

...
DEBU[0218] additional repository_duration_seconds_bucket 900
DEBU[0218] additional repository_duration_seconds_count 75
DEBU[0218] additional repository_duration_seconds_sum 75
cortextool: error: error querying count by (job) (request_duration_seconds_bucket): server_error: server error: 503, try --help

It throws 503 error but actually it returns 200 response:

$ curl <ADDR>/api/v1/query?query=count%20by%20(job)%20(consul_k8s_p_beholder_p2_1venus_worker_64_runtime_sys_bytes)

# 200 OK

Similar to $ cortextool analyse grafana ... command, we can continue to querying Prometheus and list the errors in a custom variable like query_errors as we already do in the grafana by defining a parse_errors field.

$ cortextool analyse grafana --address <ADDR> --key <KEY>
unmarshal board: json: cannot unmarshal object into Go struct field Current.templating.list.current.text of type []string for MJvznCp7z Prometheus / Remote Write

cc @developer-guy @eminaktas @yasintahaerol

Dentrax added a commit to Dentrax/cortex-tools that referenced this issue Jan 20, 2022
Fixes grafana#236

Signed-off-by: Furkan <furkan.turkal@trendyol.com>
Co-authored-by: Emin <emin.aktas@trendyol.com>
Co-authored-by: Yasin <yasintaha.erol@trendyol.com>
Co-authored-by: Batuhan <batuhan.apaydin@trendyol.com>
Dentrax added a commit to Dentrax/mimir that referenced this issue Sep 27, 2022
Fixes: grafana/cortex-tools#236

Signed-off-by: Furkan <furkan.turkal@trendyol.com>
pracucci pushed a commit to grafana/mimir that referenced this issue Sep 27, 2022
Fixes: grafana/cortex-tools#236

Signed-off-by: Furkan <furkan.turkal@trendyol.com>

Signed-off-by: Furkan <furkan.turkal@trendyol.com>
sysedwinistrator pushed a commit to sysedwinistrator/mimir that referenced this issue Nov 21, 2022
Fixes: grafana/cortex-tools#236

Signed-off-by: Furkan <furkan.turkal@trendyol.com>

Signed-off-by: Furkan <furkan.turkal@trendyol.com>
friedrichg pushed a commit to cortexproject/cortex-tools that referenced this issue Aug 1, 2023
* add option to set ingesters podManagementPolicy

Signed-off-by: Andre Ziviani <andrepziviani@gmail.com>

* update changelog

Signed-off-by: Andre Ziviani <andrepziviani@gmail.com>

* update readme

Signed-off-by: Andre Ziviani <andrepziviani@gmail.com>

* add a warning about podmanagementpolicy

Signed-off-by: Andre Ziviani <andrepziviani@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant