fix: promtail; clean up metrics generated from logs after a config reload. #11882

ptodev · 2024-02-06T19:06:08Z

It is possible to generate metrics from log lines in Promtail. If the config file is reloaded, those metrics are currently not cleaned up. This causes a few issues:

If a metric is not necessary anymore, it is still visible on the /metrics endpoint.
A metric with the same name might have a different meaning after the config file reload, so it'd make sense to clean these up.

This will help us fix a bug reported in the Grafana Agent.

I suppose no changelog entry is required because PromQL is already designed to handle counter resets? But if you think it's appropriate, I could add a changelog entry?

I tested this change locally using a config file like this:

Promtail config

server:
  http_listen_port: 9080
  grpc_listen_port: 0
  enable_runtime_reload: true
positions:
  filename: /Users/paulintodev/Desktop/log_metrics_bug/tmp_promtail/positions
clients:
  - url: "https://logs-prod-008.grafana.net/loki/api/v1/push"
    basic_auth:
      username: ""
      password: ""
scrape_configs:
- job_name: log files
  static_configs:
  - labels:
      __path__: /Users/paulintodev/Desktop/log_metrics_bug/*.log
  pipeline_stages:
    - metrics:
        log_lines_total:
          type: Counter
          description: "total number of log lines"
          prefix: my_promtail_custom_
          max_idle_duration: 24h
          config:
            match_all: true
            action: inc
        log_lines_total2:
          type: Counter
          description: "total number of log lines"
          prefix: my_promtail_custom2_
          max_idle_duration: 24h
          config:
            match_all: true
            action: inc

Then I'd do a curl localhost:9080/reload and check http://localhost:9080/metrics.

I didn't add a whole lot of unit tests because TBH we should probably focus on adding tests in the Agent, given that we're sunsetting Promtail over time.

CLAassistant · 2024-02-06T19:06:14Z

All committers have signed the CLA.

cstyan

@ptodev need to rebase/merge in master to get some CI updates

cstyan · 2024-02-23T20:40:00Z

clients/pkg/logentry/stages/metrics_test.go

+	pl.Cleanup()
+
+	if err := testutil.GatherAndCompare(registry,
+		strings.NewReader("")); err != nil {
+		t.Fatalf("mismatch metrics: %v", err)
+	}


shouldn't we specifically check for the absence of the metric?

After pl.Cleanup(), no metrics should be visible. I don't see an advantage in checking individual metrics when there shouldn't be any of them.

what if there's originally two metrics stages, and after we reload there should be only one? we should still check for it's existence and correct value, I think right now we're deleting metrics for stages that will still exist after the reload which would not be correct

I'd argue that deleting all the metrics is the right thing to do. The fact that a metric exists doesn't mean that it's used in the same way in the new config. It's possible that the user changed the meaning of each metric, but retained the metric names. In that case it would make sense to start with fresh metrics.

In the future we could make the code smart and reload only the stages which changed, but I think that's a separate issue. To do that, we could avoid calling the Cleanup method for stages which are completely unchanged as a whole.

we should document this clearly in the metrics config section

loki/docs/sources/send-data/promtail/configuration.md

Line 678 in a00f1f1

#### metrics

and maybe as a note here as well

I added this line to both doc pages:

If Promtail's configuration is reloaded, all metrics will be reset.

clients/pkg/logentry/stages/metrics.go

cstyan

@ptodev sorry for the delay, I think we just need some docs, also have you guys been using this code downstream in Agent/Alloy already or would merging this be the first usage of this code?

cstyan · 2024-04-23T00:07:06Z

clients/pkg/logentry/stages/metrics_test.go

+	pl.Cleanup()
+
+	if err := testutil.GatherAndCompare(registry,
+		strings.NewReader("")); err != nil {
+		t.Fatalf("mismatch metrics: %v", err)
+	}


we should document this clearly in the metrics config section

loki/docs/sources/send-data/promtail/configuration.md

Line 678 in a00f1f1

#### metrics

and maybe as a note here as well

ptodev · 2024-04-23T15:38:49Z

have you guys been using this code downstream in Agent/Alloy already or would merging this be the first usage of this code?

Thank you for the review! This is the first usage. Alloy and Agent import the Promtail code, so in order to update them I will need to merge this PR first.

cstyan

I'm going to approve/merge this with a note that if we get reports of issues I'll revert this change. Hopefully we here from the Alloy team after a short period of time whether they have confirmed usages that are working well or not.

…load. (#11882) (cherry picked from commit 39a7181)

ptodev requested a review from a team as a code owner February 6, 2024 19:06

pull-request-size bot added the size/L label Feb 6, 2024

cstyan reviewed Feb 23, 2024

View reviewed changes

ptodev force-pushed the ptodev/reset-promtail-metrics branch from 62e6140 to a4e8ae0 Compare February 26, 2024 10:05

cstyan changed the title ~~[promtail] Clean up metrics generated from logs after a config reload.~~ fix: promtail; clean up metrics generated from logs after a config reload. Feb 26, 2024

cstyan reviewed Feb 26, 2024

View reviewed changes

clients/pkg/logentry/stages/metrics.go Show resolved Hide resolved

ptodev force-pushed the ptodev/reset-promtail-metrics branch from a4e8ae0 to 85b0900 Compare February 27, 2024 11:01

cstyan reviewed Apr 23, 2024

View reviewed changes

Unregister metrics generated from logs after a config reload.

d4960c7

ptodev force-pushed the ptodev/reset-promtail-metrics branch from 85b0900 to 1d1261d Compare April 23, 2024 15:31

github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Apr 23, 2024

Update docs regarding metric reset.

a300762

ptodev force-pushed the ptodev/reset-promtail-metrics branch from 1d1261d to a300762 Compare April 23, 2024 15:34

ptodev requested a review from cstyan April 23, 2024 15:38

cstyan approved these changes Apr 24, 2024

View reviewed changes

cstyan merged commit 39a7181 into main Apr 24, 2024
58 checks passed

cstyan deleted the ptodev/reset-promtail-metrics branch April 24, 2024 21:45

loki-gh-app bot mentioned this pull request Apr 29, 2024

chore(k200): release 3.1.0 #12812

Open

loki-gh-app bot mentioned this pull request May 6, 2024

chore(k201): release 3.1.0 #12894

Open

MasslessParticle added the backport k190 label May 10, 2024

grafanabot pushed a commit that referenced this pull request May 10, 2024

fix: promtail; clean up metrics generated from logs after a config re…

d2cc31d

…load. (#11882) (cherry picked from commit 39a7181)

grafanabot mentioned this pull request May 10, 2024

chore: [k190] fix: promtail; clean up metrics generated from logs after a config reload. #12938

Merged

loki-gh-app bot mentioned this pull request May 13, 2024

chore(k202): release 3.1.0 #12945

Open

This was referenced May 13, 2024

Update Loki and sync some of the Promtail code grafana/alloy#836

Merged

Update Loki dependency grafana/agent#6905

Merged

loki-gh-app bot mentioned this pull request May 20, 2024

chore(k203): release 3.1.0 #12988

Open

This was referenced May 27, 2024

chore(k204): release 3.1.0 #13037

Open

chore(k205): release 3.1.0 #13102

Open

This was referenced Jun 10, 2024

chore(k206): release 3.1.0 #13184

Open

chore(k207): release 3.1.0 #13225

Open

loki-gh-app bot mentioned this pull request Jun 24, 2024

chore(k208): release 3.1.0 #13291

Open

ptodev mentioned this pull request Jun 27, 2024

Fix issue with config reload when using a log pipeline with a metric stage grafana/agent#6971

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: promtail; clean up metrics generated from logs after a config reload. #11882

fix: promtail; clean up metrics generated from logs after a config reload. #11882

ptodev commented Feb 6, 2024

CLAassistant commented Feb 6, 2024 •

edited

Loading

cstyan left a comment

cstyan Feb 23, 2024

ptodev Feb 27, 2024

cstyan Feb 27, 2024

ptodev Feb 27, 2024

cstyan Apr 23, 2024

ptodev Apr 23, 2024

cstyan left a comment

cstyan Apr 23, 2024

ptodev commented Apr 23, 2024

cstyan left a comment

fix: promtail; clean up metrics generated from logs after a config reload. #11882

fix: promtail; clean up metrics generated from logs after a config reload. #11882

Conversation

ptodev commented Feb 6, 2024

CLAassistant commented Feb 6, 2024 • edited Loading

cstyan left a comment

Choose a reason for hiding this comment

cstyan Feb 23, 2024

Choose a reason for hiding this comment

ptodev Feb 27, 2024

Choose a reason for hiding this comment

cstyan Feb 27, 2024

Choose a reason for hiding this comment

ptodev Feb 27, 2024

Choose a reason for hiding this comment

cstyan Apr 23, 2024

Choose a reason for hiding this comment

ptodev Apr 23, 2024

Choose a reason for hiding this comment

cstyan left a comment

Choose a reason for hiding this comment

cstyan Apr 23, 2024

Choose a reason for hiding this comment

ptodev commented Apr 23, 2024

cstyan left a comment

Choose a reason for hiding this comment

CLAassistant commented Feb 6, 2024 •

edited

Loading