Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in ExecuteTextString #3277

Closed
grobinson-grafana opened this issue Mar 3, 2023 · 5 comments · Fixed by #3278
Closed

Race condition in ExecuteTextString #3277

grobinson-grafana opened this issue Mar 3, 2023 · 5 comments · Fixed by #3278

Comments

@grobinson-grafana
Copy link
Contributor

What did you do?

I have been looking into a data corruption issue in Slack notifications in Grafana (grafana/grafana#63584) and have found a race condition in ExecuteTextString in Alertmanager that I suspect to be the cause. I have been able to reproduce this bug in main, commit 41eb121.

I have the following Alertmanager configuration:

receivers:
- name: test
  slack_configs:
  - api_url: redacted
    title: '{{ template "slack.title" . }}'
route:
  receiver: test
  group_by: ['...']
  group_wait: 5s
  group_interval: 10s
  repeat_interval: 15s
templates:
  - templates/*.tmpl

and the following template in test.tmpl:

{{ define "slack.title" }}
  {{ reReplaceAll ":" "-" .ExternalURL | title }}
{{ end }}

I created a single rule that has 4 alert instances:

Screenshot 2023-03-03 at 15 34 44

What did you expect to see?

I expected to see the following in all messages:

Http-//Georges-Air.fritz.box-9093
Http-//Georges-Air.fritz.box-9093
Http-//Georges-Air.fritz.box-9093
Http-//Georges-Air.fritz.box-9093

What did you see instead? Under which circumstances?

However most messages are corrupted. For example:

HHtt//EegeeS-Air..tz.box-9093
Http-//Georges-Air.fritz.box-
Http-//Georges-Air.fritz.box-9093
Ht

I think the issue is related to how Alertmanager uses text/template across goroutine as the bug is no longer reproducible with the following changes:

diff --git a/template/template.go b/template/template.go
index a42cf03d..9e56bf1f 100644
--- a/template/template.go
+++ b/template/template.go
@@ -23,6 +23,7 @@ import (
        "regexp"
        "sort"
        "strings"
+       "sync"
        tmpltext "text/template"
        "time"

@@ -38,6 +39,7 @@ import (
 type Template struct {
        text *tmpltext.Template
        html *tmplhtml.Template
+       mu   sync.Mutex

        ExternalURL *url.URL
 }
@@ -131,6 +133,8 @@ func (t *Template) FromGlob(path string) error {

 // ExecuteTextString needs a meaningful doc comment (TODO(fabxc)).
 func (t *Template) ExecuteTextString(text string, data interface{}) (string, error) {
+       t.mu.Lock()
+       defer t.mu.Unlock()
        if text == "" {
                return "", nil
        }
@grobinson-grafana
Copy link
Contributor Author

I've just spent some more time looking into the cause of this issue, and I think the issue is that the title function in DefaultFuncs is not thread-safe:

"title":   cases.Title(language.AmericanEnglish).String,

I have been able to replicate this in a simple Go program:

package main

import (
	"bytes"
	"log"
	"text/template"
	"sync"

	"golang.org/x/text/cases"
	"golang.org/x/text/language"
)

func main() {
	data := map[string]string{"Message": "Waltz, bad nymph, for quick jigs vex"}
	tmplText := `{{ lower .Message | title }}`

	tmpl, err := template.New("").Parse("")
	if err != nil {
		log.Fatal(err)
	}
	funcs := map[string]any{
		"lower": cases.Lower(language.AmericanEnglish).String,
		"title": cases.Title(language.AmericanEnglish).String,
	}
	tmpl = tmpl.Funcs(funcs)

	wg := sync.WaitGroup{}
	for i := 0; i < 10; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			newTmpl, err := tmpl.Clone()
			if err != nil {
				log.Fatal(err)
			}
			newTmpl, err = newTmpl.New("").Parse(tmplText)
			if err != nil {
				log.Fatal(err)
			}
			buf := bytes.Buffer{}
			newTmpl.Execute(&buf, data)
			log.Println(buf.String())
		}()
	}
	wg.Wait()
}

Here is an example of the output:

2023/03/03 16:40:51 Waltz, BaD
2023/03/03 16:40:51 Waltz, Bad Nymph, For Quick Jigs Vex
2023/03/03 16:40:51 Waltz, Bad Nymph, For Quick Jigs Vex
2023/03/03 16:40:51 Waltz, Bad Nymph, For Quick Jigs Vex
2023/03/03 16:40:51 Waltz, Bad Nymph, For Quick Jigs Vex
2023/03/03 16:40:51 Waltz, Bad Nymph, For Quick Jigs Vex
2023/03/03 16:40:51  Nym
2023/03/03 16:40:51 Wph, For icc Jigs Vex
2023/03/03 16:40:51 Waltz, Bad Nymph, For Quick Jigs Vex
2023/03/03 16:40:51 Waltz, Bad Nymph, For Quick Jigs Vex

@grobinson-grafana
Copy link
Contributor Author

If we look at the docs for golang.org/x/text/cases:

https://pkg.go.dev/golang.org/x/text/cases#Caser

A Caser transforms given input to a certain case. It implements transform.Transformer.
A Caser may be stateful and should therefore not be shared between goroutines.

I think we either need to change golang.org/x/text/cases to strings or use a mutex.

@gotjosh
Copy link
Member

gotjosh commented Mar 3, 2023

It seems like strings.Title is deprecated - I think we can instantiate a new copy of the case whenever we call the function or implement our own function that uses a mutex just for this use of this function.

I don't think it's worth locking the whole of ExecuteTextString for just a couple of functions.

@grobinson-grafana
Copy link
Contributor Author

I agree! It looks like this issue can be fixed with a new caser for each function call:

funcs := map[string]any{
	"lower": func(text string) string { return cases.Lower(language.AmericanEnglish).String(text) },
	"title": func(text string) string { return cases.Title(language.AmericanEnglish).String(text) },
}
2023/03/03 21:33:15 Waltz, Bad Nymph, For Quick Jigs Vex
2023/03/03 21:33:15 Waltz, Bad Nymph, For Quick Jigs Vex
2023/03/03 21:33:15 Waltz, Bad Nymph, For Quick Jigs Vex
2023/03/03 21:33:15 Waltz, Bad Nymph, For Quick Jigs Vex
2023/03/03 21:33:15 Waltz, Bad Nymph, For Quick Jigs Vex
2023/03/03 21:33:15 Waltz, Bad Nymph, For Quick Jigs Vex
2023/03/03 21:33:15 Waltz, Bad Nymph, For Quick Jigs Vex
2023/03/03 21:33:15 Waltz, Bad Nymph, For Quick Jigs Vex
2023/03/03 21:33:15 Waltz, Bad Nymph, For Quick Jigs Vex
2023/03/03 21:33:15 Waltz, Bad Nymph, For Quick Jigs Vex

@joeblubaugh
Copy link
Contributor

There was a similar bug not long ago - it manifested as panics: #3064

If there are any other cases (heh) where a Caser is re-used they should also be updated to discard and create a new caser instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants