Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promtail 2.2.0 crashes when pushing to multiple Loki instances: "concurrent map read and map write" #3571

Closed
mxsiegle opened this issue Apr 1, 2021 · 1 comment · Fixed by #3573

Comments

@mxsiegle
Copy link

mxsiegle commented Apr 1, 2021

Hey,

we are running Promtail 2.2.0 as Pushserver (as upstream for other Promtail instances) and are trying to send the logs to multiple Loki instances.
As soon as I configure a 2nd Loki instance in the Promtail clients: section, it fails to start and crashes immediately with the following error:

level=info ts=2021-04-01T08:34:36.786819377Z caller=pushtarget.go:78 msg="starting push server" job=pushserver
level=info ts=2021-04-01T08:34:36.787261628Z caller=server.go:229 http=[::]:3500 grpc=[::]:3600 msg="server listening on addresses"
level=info ts=2021-04-01T08:34:36.788144852Z caller=kubernetes.go:264 component=discovery discovery=kubernetes msg="Using pod service account via in-cluster config"
level=info ts=2021-04-01T08:34:36.789287805Z caller=server.go:229 http=[::]:3101 grpc=[::]:9095 msg="server listening on addresses"
level=info ts=2021-04-01T08:34:36.789499173Z caller=main.go:112 msg="Starting Promtail" version="(version=2.2.0, branch=HEAD, revision=a27c79913)"
[...truncated...]
ts=2021-04-01T08:34:42.041918277Z caller=log.go:124 level=info msg="Re-opening truncated file /var/log/pods/default_nginx-ingress-local-controller-78c48fc9f6-jh664_27fe488e-01dc-496b-a037-510c6ac74acd/nginx-ingress-controller/0.log ..."
ts=2021-04-01T08:34:42.042127803Z caller=log.go:124 level=info msg="Successfully reopened truncated /var/log/pods/default_nginx-ingress-local-controller-78c48fc9f6-jh664_27fe488e-01dc-496b-a037-510c6ac74acd/nginx-ingress-controller/0.log"
fatal error: concurrent map read and map write

goroutine 296 [running]:
runtime.throw(0x2c03e70, 0x21)
	/usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc00008bbc0 sp=0xc00008bb90 pc=0x43ae72
runtime.mapaccess2_faststr(0x29d64e0, 0xc000a21590, 0x2bd2073, 0xd, 0x14c1c950e, 0x444a360)
	/usr/local/go/src/runtime/map_faststr.go:116 +0x4a5 fp=0xc00008bc30 sp=0xc00008bbc0 pc=0x4179e5
github.com/grafana/loki/pkg/promtail/client.(*client).getTenantID(...)
	/src/loki/pkg/promtail/client/client.go:375
github.com/grafana/loki/pkg/promtail/client.(*client).processEntry(0xc000b1d040, 0xc000a21590, 0xc0117f48916f2051, 0x14c1d844a, 0x444a360, 0xc001485680, 0x222, 0x0, 0x0, 0x0, ...)
	/src/loki/pkg/promtail/client/client.go:406 +0x8c fp=0xc00008bc80 sp=0xc00008bc30 pc=0xd29a0c
github.com/grafana/loki/pkg/promtail/client.(*client).run(0xc000b1d040)
	/src/loki/pkg/promtail/client/client.go:234 +0x285 fp=0xc00008bfd8 sp=0xc00008bc80 pc=0xd27545
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc00008bfe0 sp=0xc00008bfd8 pc=0x473ae1
created by github.com/grafana/loki/pkg/promtail/client.New
	/src/loki/pkg/promtail/client/client.go:198 +0x4b2

The full log is over 1000 lines long, so I attached it as a file:
Promtail_crash_multiple_targets.txt

The Promtail config looks like this:

clients:
  - url: https://<loki1>/loki/api/v1/push
    basic_auth:
      username: <user>
      password: <pass>
  - url: https://<loki2>/loki/api/v1/push
    basic_auth:
      username: <user>
      password: <pass>

I configured it with the syntax described in the docs: https://grafana.com/docs/loki/latest/clients/promtail/configuration/#supported-contents-and-default-values-of-configyaml

It crashes 99% of the time immediately after startup, but sometimes it randomly works and pushes the logs to both Loki instances. I assume it's a synchronicity issue with that map object (would match the error message too).
Let me know if you need any other infos.

Thanks.

cyriltovena added a commit to cyriltovena/loki that referenced this issue Apr 2, 2021
This was because each client would try to mutate the original set of labels.
Instead of cloning the map which can be expensive, I created a little helper that generate the string
from the labelset without a set of label name.

Fixes grafana#3571

I also added a test to see if this was reproducible and it was indeed:

```
~/go/src/github.com/grafana/loki master*
❯ go test -timeout 30s -tags dev,gofuzz -race -run ^TestMultiClient_Handle_Race$ github.com/grafana/loki/pkg/promtail/client -v -count=1 -timeout=0s
=== RUN   TestMultiClient_Handle_Race
==================
WARNING: DATA RACE
Read at 0x00c0000b77d0 by goroutine 22:
  runtime.mapaccess2_faststr()
      /usr/local/Cellar/go/1.16.2/libexec/src/runtime/map_faststr.go:107 +0x0
  github.com/grafana/loki/pkg/promtail/client.(*client).getTenantID()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:376 +0xcb
  github.com/grafana/loki/pkg/promtail/client.(*client).processEntry()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:407 +0x9a
  github.com/grafana/loki/pkg/promtail/client.(*client).run()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:235 +0x36a

Previous write at 0x00c0000b77d0 by goroutine 21:
  runtime.mapdelete_faststr()
      /usr/local/Cellar/go/1.16.2/libexec/src/runtime/map_faststr.go:297 +0x0
  github.com/grafana/loki/pkg/promtail/client.(*client).processEntry()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:408 +0x144
  github.com/grafana/loki/pkg/promtail/client.(*client).run()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:235 +0x36a

Goroutine 22 (running) created at:
  github.com/grafana/loki/pkg/promtail/client.New()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:198 +0x7dc
  github.com/grafana/loki/pkg/promtail/client.TestMultiClient_Handle_Race()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/multi_test.go:126 +0x2c7
  testing.tRunner()
      /usr/local/Cellar/go/1.16.2/libexec/src/testing/testing.go:1194 +0x202

Goroutine 21 (running) created at:
  github.com/grafana/loki/pkg/promtail/client.New()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:198 +0x7dc
  github.com/grafana/loki/pkg/promtail/client.TestMultiClient_Handle_Race()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/multi_test.go:124 +0x187
  testing.tRunner()
      /usr/local/Cellar/go/1.16.2/libexec/src/testing/testing.go:1194 +0x202
==================
    testing.go:1093: race detected during execution of test
--- FAIL: TestMultiClient_Handle_Race (0.00s)
=== CONT
    testing.go:1093: race detected during execution of test
FAIL
FAIL	github.com/grafana/loki/pkg/promtail/client	0.022s
FAIL
```

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
slim-bean added a commit that referenced this issue Apr 6, 2021
* Fixes a race when using specific tenant and multi-client.

This was because each client would try to mutate the original set of labels.
Instead of cloning the map which can be expensive, I created a little helper that generate the string
from the labelset without a set of label name.

Fixes #3571

I also added a test to see if this was reproducible and it was indeed:

```
~/go/src/github.com/grafana/loki master*
❯ go test -timeout 30s -tags dev,gofuzz -race -run ^TestMultiClient_Handle_Race$ github.com/grafana/loki/pkg/promtail/client -v -count=1 -timeout=0s
=== RUN   TestMultiClient_Handle_Race
==================
WARNING: DATA RACE
Read at 0x00c0000b77d0 by goroutine 22:
  runtime.mapaccess2_faststr()
      /usr/local/Cellar/go/1.16.2/libexec/src/runtime/map_faststr.go:107 +0x0
  github.com/grafana/loki/pkg/promtail/client.(*client).getTenantID()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:376 +0xcb
  github.com/grafana/loki/pkg/promtail/client.(*client).processEntry()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:407 +0x9a
  github.com/grafana/loki/pkg/promtail/client.(*client).run()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:235 +0x36a

Previous write at 0x00c0000b77d0 by goroutine 21:
  runtime.mapdelete_faststr()
      /usr/local/Cellar/go/1.16.2/libexec/src/runtime/map_faststr.go:297 +0x0
  github.com/grafana/loki/pkg/promtail/client.(*client).processEntry()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:408 +0x144
  github.com/grafana/loki/pkg/promtail/client.(*client).run()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:235 +0x36a

Goroutine 22 (running) created at:
  github.com/grafana/loki/pkg/promtail/client.New()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:198 +0x7dc
  github.com/grafana/loki/pkg/promtail/client.TestMultiClient_Handle_Race()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/multi_test.go:126 +0x2c7
  testing.tRunner()
      /usr/local/Cellar/go/1.16.2/libexec/src/testing/testing.go:1194 +0x202

Goroutine 21 (running) created at:
  github.com/grafana/loki/pkg/promtail/client.New()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:198 +0x7dc
  github.com/grafana/loki/pkg/promtail/client.TestMultiClient_Handle_Race()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/multi_test.go:124 +0x187
  testing.tRunner()
      /usr/local/Cellar/go/1.16.2/libexec/src/testing/testing.go:1194 +0x202
==================
    testing.go:1093: race detected during execution of test
--- FAIL: TestMultiClient_Handle_Race (0.00s)
=== CONT
    testing.go:1093: race detected during execution of test
FAIL
FAIL	github.com/grafana/loki/pkg/promtail/client	0.022s
FAIL
```

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* don't overload variable name

Co-authored-by: Edward Welch <edward.welch@grafana.com>
slim-bean pushed a commit that referenced this issue Apr 6, 2021
* Fixes a race when using specific tenant and multi-client.

This was because each client would try to mutate the original set of labels.
Instead of cloning the map which can be expensive, I created a little helper that generate the string
from the labelset without a set of label name.

Fixes #3571

I also added a test to see if this was reproducible and it was indeed:

```
~/go/src/github.com/grafana/loki master*
❯ go test -timeout 30s -tags dev,gofuzz -race -run ^TestMultiClient_Handle_Race$ github.com/grafana/loki/pkg/promtail/client -v -count=1 -timeout=0s
=== RUN   TestMultiClient_Handle_Race
==================
WARNING: DATA RACE
Read at 0x00c0000b77d0 by goroutine 22:
  runtime.mapaccess2_faststr()
      /usr/local/Cellar/go/1.16.2/libexec/src/runtime/map_faststr.go:107 +0x0
  github.com/grafana/loki/pkg/promtail/client.(*client).getTenantID()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:376 +0xcb
  github.com/grafana/loki/pkg/promtail/client.(*client).processEntry()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:407 +0x9a
  github.com/grafana/loki/pkg/promtail/client.(*client).run()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:235 +0x36a

Previous write at 0x00c0000b77d0 by goroutine 21:
  runtime.mapdelete_faststr()
      /usr/local/Cellar/go/1.16.2/libexec/src/runtime/map_faststr.go:297 +0x0
  github.com/grafana/loki/pkg/promtail/client.(*client).processEntry()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:408 +0x144
  github.com/grafana/loki/pkg/promtail/client.(*client).run()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:235 +0x36a

Goroutine 22 (running) created at:
  github.com/grafana/loki/pkg/promtail/client.New()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:198 +0x7dc
  github.com/grafana/loki/pkg/promtail/client.TestMultiClient_Handle_Race()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/multi_test.go:126 +0x2c7
  testing.tRunner()
      /usr/local/Cellar/go/1.16.2/libexec/src/testing/testing.go:1194 +0x202

Goroutine 21 (running) created at:
  github.com/grafana/loki/pkg/promtail/client.New()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/client.go:198 +0x7dc
  github.com/grafana/loki/pkg/promtail/client.TestMultiClient_Handle_Race()
      /Users/ctovena/go/src/github.com/grafana/loki/pkg/promtail/client/multi_test.go:124 +0x187
  testing.tRunner()
      /usr/local/Cellar/go/1.16.2/libexec/src/testing/testing.go:1194 +0x202
==================
    testing.go:1093: race detected during execution of test
--- FAIL: TestMultiClient_Handle_Race (0.00s)
=== CONT
    testing.go:1093: race detected during execution of test
FAIL
FAIL	github.com/grafana/loki/pkg/promtail/client	0.022s
FAIL
```

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* don't overload variable name

Co-authored-by: Edward Welch <edward.welch@grafana.com>
(cherry picked from commit 0107a11)
@mxsiegle
Copy link
Author

mxsiegle commented Apr 6, 2021

Thank you very much for the quick reaction.
I can confirm that v2.2.1 fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant