-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix panic when ruler.external_url is set to empty string #2915
Fix panic when ruler.external_url is set to empty string #2915
Conversation
Panic is ``` panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x7126c0] goroutine 108 [running]: net/url.(*URL).String(0x0) /usr/local/go/src/net/url/url.go:800 +0x40 github.com/grafana/mimir/pkg/ruler.DefaultTenantManagerFactory.func1({0x257e040, 0xc000139b80}, {0xc000945973, 0x9}, 0x0?, {0x25641e0, 0xc000607720}, {0x2577428?, 0xc0004f0050}) /__w/mimir/mimir/pkg/ruler/compat.go:273 +0x345 github.com/grafana/mimir/pkg/ruler.(*DefaultMultiTenantManager).newManager(0xc00083f900, {0x257e040, 0xc000139b80}, {0xc000945973, 0x9}) /__w/mimir/mimir/pkg/ruler/manager.go:219 +0x163 ``` This happens because the parsing in dskit places a nil in the URL when the value in YAML is an empty string. By contrast, when the value is set as a CLI flag, it invokes `url.Parse("")``, which returns a non-nil `*url.Url`. In the ruler we need a non-nil URL, otherwise prometheus code panics. I didn't change this in dskit because that behaviour there has a unit test to ensure that marshaling to YAML and then unmarshalling is effectively a noop. This is the [code](https://github.com/grafana/mimir/blob/ecefbb673367c7047b0f9a04c8f614d229dfd656/vendor/github.com/grafana/dskit/flagext/url.go#L35-L39) in dskit, this is the [test](https://github.com/grafana/dskit/blob/bbabef49ebf558538749d5b339bf81d96edfe512/flagext/url_test.go#L55-L73). Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
pkg/ruler/compat.go
Outdated
return rules.NewManager(&rules.ManagerOptions{ | ||
Appendable: NewPusherAppendable(p, userID, overrides, totalWrites, failedWrites), | ||
Queryable: embeddedQueryable, | ||
QueryFunc: wrappedQueryFunc, | ||
Context: user.InjectOrgID(ctx, userID), | ||
GroupEvaluationContextFunc: FederatedGroupContextFunc, | ||
ExternalURL: cfg.ExternalURL.URL, | ||
NotifyFunc: SendAlerts(notifier, cfg.ExternalURL.URL.String()), | ||
ExternalURL: externalURL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be safer to not rely on that ExternalURL
is not nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code that uses this URL is just a single function:
mimir/vendor/github.com/prometheus/prometheus/template/template.go
Lines 131 to 140 in 8188a22
func NewTemplateExpander( | |
ctx context.Context, | |
text string, | |
name string, | |
data interface{}, | |
timestamp model.Time, | |
queryFunc QueryFunc, | |
externalURL *url.URL, | |
options []string, | |
) *Expander { |
I will submit a PR against prometheus to propose this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting - the template doesn't cause a panic - it causes an error
mimir/vendor/github.com/prometheus/prometheus/template/template.go
Lines 374 to 383 in 8188a22
if r := recover(); r != nil { | |
var ok bool | |
resultErr, ok = r.(error) | |
if !ok { | |
resultErr = fmt.Errorf("panic expanding template %v: %v", te.name, r) | |
} | |
} | |
if resultErr != nil { | |
templateTextExpansionFailures.Inc() | |
} |
which will be included in the rendered template that the user sees
result = fmt.Sprintf("<error expanding template: %s>", err) |
So I think prometheus handles nil well enough. In this case it should be enough for us to just prevent that single panic in the description of this PR. I pushed the change in 33755e4. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think?
Looks great, thanks for addressing my comment
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if @replay is good
Panic is on 2.3.0-rc0
This happens because the parsing in dskit places a nil in the URL when
the value in YAML is an empty string. By contrast, when the value is set
as a CLI flag, it invokes
url.Parse("")
, which returns a non-nil*url.Url
.In the ruler we need a non-nil URL, otherwise prometheus code panics.
I didn't change this in dskit because that behaviour there has a unit
test to ensure that marshaling to YAML and then unmarshalling is
effectively a noop. This is the code in dskit,
this is the test.
Signed-off-by: Dimitar Dimitrov dimitar.dimitrov@grafana.com
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]