-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitor effects on timeouts and performance. (follow up #473 & #596) #618
Comments
2 tasks
1 task
check warning on observability-thanos-store-shard-0-0https://access.redhat.com/support/cases/#/case/03764352 observability-thanos-store-shard-0-0 link
|
check retention on OBS |
/CC @computate |
schwesig
added a commit
to schwesig/OCP-on-NERC_nerc-ocp-config
that referenced
this issue
Jul 1, 2024
This PR addresses the retention rate issues as discussed in nerc-project/operations#618 (comment) (having more than 30d raw etc.). The changes include updating the retention and concurrency settings for the Thanos Compactor to enhance observability and metrics performance. We will stay with the defaults where possible, adding remarks with the defaults to better understand the next changes or possible errors. Changes to focus on the needs for class, cost, and invoice analysis, as for future predictions: - Updated `retentionResolutionRaw` from 30d to 90d (quarterly high details for deep analysis, especially GPUs) - Updated `retentionResolution5m` from 90d to 360d (for cost, usage, and invoices; 15 minutes could be enough, but is not a default option) - Set `retentionResolution1h` to 0d (retain forever, following the default and recommendation) - Added `blockDuration`, `cleanupInterval`, `deleteDelay`, `retentionInLocal`, `consistencyDelay`, `compactConcurrency`, and `downsampleConcurrency` settings: even if staying in the default, making the options visible in case of possible future changes) These changes aim to optimize data retention & resolution for needed use cases and ensure better performance. References: 1. [Thanos Compact Component](https://thanos.io/tip/components/compact.md/) 2. [Recommendations for Running Thanos and Prometheus](https://zapier.com/blog/five-recommendations-when-running-thanos-and-prometheus/) 3. [Red Hat Advanced Cluster Management Observability](https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.9/html/observability/customizing-observability#adding-advanced-config:~:text=is%20not%20displayed.-,4.3.%C2%A0Adding%20advanced%20configuration%20for%20retention,-Add%20the%20advanced) Signed-off-by: /Thor(sten)?/ Schwesig <89909507+schwesig@users.noreply.github.com>
larsks
pushed a commit
to schwesig/OCP-on-NERC_nerc-ocp-config
that referenced
this issue
Jul 2, 2024
This PR addresses the retention rate issues as discussed in nerc-project/operations#618 (comment) (having more than 30d raw etc.). The changes include updating the retention and concurrency settings for the Thanos Compactor to enhance observability and metrics performance. We will stay with the defaults where possible, adding remarks with the defaults to better understand the next changes or possible errors. Changes to focus on the needs for class, cost, and invoice analysis, as for future predictions: - Updated `retentionResolutionRaw` from 30d to 90d (quarterly high details for deep analysis, especially GPUs) - Updated `retentionResolution5m` from 90d to 360d (for cost, usage, and invoices; 15 minutes could be enough, but is not a default option) - Set `retentionResolution1h` to 0d (retain forever, following the default and recommendation) - Added `blockDuration`, `cleanupInterval`, `deleteDelay`, `retentionInLocal`, `consistencyDelay`, `compactConcurrency`, and `downsampleConcurrency` settings: even if staying in the default, making the options visible in case of possible future changes) These changes aim to optimize data retention & resolution for needed use cases and ensure better performance. References: 1. [Thanos Compact Component](https://thanos.io/tip/components/compact.md/) 2. [Recommendations for Running Thanos and Prometheus](https://zapier.com/blog/five-recommendations-when-running-thanos-and-prometheus/) 3. [Red Hat Advanced Cluster Management Observability](https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.9/html/observability/customizing-observability#adding-advanced-config:~:text=is%20not%20displayed.-,4.3.%C2%A0Adding%20advanced%20configuration%20for%20retention,-Add%20the%20advanced) Signed-off-by: /Thor(sten)?/ Schwesig <89909507+schwesig@users.noreply.github.com>
This was referenced Jul 3, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Follow up from
after nodes were successfully added.
Known checks
Status
Currently in the monitoring state
all needed nodes are available
currently memcache works fine
observability-thanos-store-shard-0-1 and observability-thanos-store-shard-0-2 are good
observability-thanos-store-shard-0-0
creates
The text was updated successfully, but these errors were encountered: