-
Notifications
You must be signed in to change notification settings - Fork 188
Always include all destinations #1399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always include all destinations #1399
Conversation
Move destination configs to the end, they're the least interesting, typically. Signed-off-by: Pete Wall <pete.wall@grafana.com>
Signed-off-by: Pete Wall <pete.wall@grafana.com>
|
@petewall Is there any specific reason why you added to add all destinations to all deployments? e.g. now our alloy-logs deployment also got prometheus and otel/traces configurations and one of our configurations is broken due to that ===== /ConfigMap monitoring-dev/monitoring-dev-alloy-logs ======
4,23d3
< // Destination: logs_service (loki)
< otelcol.exporter.loki "logs_service" {
< forward_to = [loki.write.logs_service.receiver]
< }
<
< loki.write "logs_service" {
< endpoint {
< url = "http://loki-central-gateway.loki-central.svc.cluster.local/loki/api/v1/push"
< tls_config {
< insecure_skip_verify = false
< }
< min_backoff_period = "500ms"
< max_backoff_period = "5m"
< max_backoff_retries = "0"
< }
< external_labels = {
< "cluster" = "dev-us-east-1",
< "k8s_cluster_name" = "dev-us-east-1",
< }
< }
256a237,355
> }
> }
> // Destination: metrics_service (prometheus)
> otelcol.exporter.prometheus "metrics_service" {
> add_metric_suffixes = true
> forward_to = [prometheus.remote_write.metrics_service.receiver]
> }
>
> prometheus.remote_write "metrics_service" {
> endpoint {
> url = "http://mimir-central-nginx.mimir-central.svc.cluster.local/api/v1/push"
> headers = {
> }
> tls_config {
> insecure_skip_verify = false
> }
> send_native_histograms = false
>
> queue_config {
> capacity = 10000
> min_shards = 1
> max_shards = 50
> max_samples_per_send = 2000
> batch_send_deadline = "5s"
> min_backoff = "30ms"
> max_backoff = "5s"
> retry_on_http_429 = true
> sample_age_limit = "0s"
> }
>
> write_relabel_config {
> source_labels = ["cluster"]
> regex = ""
> replacement = "dev-us-east-1"
> target_label = "cluster"
> }
> write_relabel_config {
> source_labels = ["k8s_cluster_name"]
> regex = ""
> replacement = "dev-us-east-1"
> target_label = "k8s_cluster_name"
> }
> }
>
> wal {
> truncate_frequency = "20m"
> min_keepalive_time = "5m"
> max_keepalive_time = "30m"
> }
> }
> // Destination: logs_service (loki)
> otelcol.exporter.loki "logs_service" {
> forward_to = [loki.write.logs_service.receiver]
> }
>
> loki.write "logs_service" {
> endpoint {
> url = "http://loki-central-gateway.loki-central.svc.cluster.local/loki/api/v1/push"
> tls_config {
> insecure_skip_verify = false
> }
> min_backoff_period = "500ms"
> max_backoff_period = "5m"
> max_backoff_retries = "0"
> }
> external_labels = {
> "cluster" = "dev-us-east-1",
> "k8s_cluster_name" = "dev-us-east-1",
> }
> }
> // Destination: traces_service (otlp)
>
> otelcol.processor.attributes "traces_service" {
> output {
> metrics = [otelcol.processor.transform.traces_service.input]
> logs = [otelcol.processor.transform.traces_service.input]
> traces = [otelcol.processor.transform.traces_service.input]
> }
> }
>
> otelcol.processor.transform "traces_service" {
> error_mode = "ignore"
>
> trace_statements {
> context = "resource"
> statements = [
> `set(attributes["cluster"], "dev-us-east-1")`,
> `set(attributes["k8s.cluster.name"], "dev-us-east-1")`,
> ]
> }
>
> output {
> traces = [otelcol.processor.batch.traces_service.input]
> }
> }
>
> otelcol.processor.batch "traces_service" {
> timeout = "2s"
> send_batch_size = 8192
> send_batch_max_size = 0
>
> output {
> traces = [otelcol.exporter.otlphttp.traces_service.input]
> }
> }
> otelcol.exporter.otlphttp "traces_service" {
> client {
> endpoint = "http://tempo-central-gateway.tempo-central.svc.cluster.local:80"
> tls {
> insecure = false
> insecure_skip_verify = false
> }
> }
>
> retry_on_failure {
> enabled = true
> initial_interval = "5s"
> max_interval = "30s"
> max_elapsed_time = "5m"
> client {That leads to From what I guess here is that I wish I could just disable the un-used config again, but now I have to fix a problem for code that is not even used. Especially with the vision of https://github.com/grafana/k8s-monitoring-helm/blob/039a96d76c347dd165cf70777cf5217bf8b7299d/charts/k8s-monitoring/docs/Migration.md
|
|
A workaround for that seems to be to manually delete Edit: I don't know how, but this only fixed 80% of the daemonset's pods Some pods error with now Edit 2: This might have been a race condition with still half running alloy-logs instances. After I shut down the DS, ran the cleanup DS, re-spawned the alloy-logs DS, the error is gone. Edit 3: Well, because of another issue with our installation we recreate the alloy-logs pods every 15min and original issue is back now. So the pods error with the initial error again. Edit 4: Please let us disable the un-used config :) |
Also, move destination configs to the end, they're the least interesting, typically.