Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix panic when ruler.external_url is set to empty string #2915

Merged
merged 3 commits into from
Sep 8, 2022

Conversation

dimitarvdimitrov
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov commented Sep 7, 2022

Panic is on 2.3.0-rc0

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x7126c0]
goroutine 108 [running]:
net/url.(*URL).String(0x0)
	/usr/local/go/src/net/url/url.go:800 +0x40
github.com/grafana/mimir/pkg/ruler.DefaultTenantManagerFactory.func1({0x257e040, 0xc000139b80}, {0xc000945973, 0x9}, 0x0?, {0x25641e0, 0xc000607720}, {0x2577428?, 0xc0004f0050})
	/__w/mimir/mimir/pkg/ruler/compat.go:273 +0x345
github.com/grafana/mimir/pkg/ruler.(*DefaultMultiTenantManager).newManager(0xc00083f900, {0x257e040, 0xc000139b80}, {0xc000945973, 0x9})
	/__w/mimir/mimir/pkg/ruler/manager.go:219 +0x163

This happens because the parsing in dskit places a nil in the URL when
the value in YAML is an empty string. By contrast, when the value is set
as a CLI flag, it invokes url.Parse(""), which returns a non-nil
*url.Url.

In the ruler we need a non-nil URL, otherwise prometheus code panics.

I didn't change this in dskit because that behaviour there has a unit
test to ensure that marshaling to YAML and then unmarshalling is
effectively a noop. This is the code in dskit,
this is the test.

Signed-off-by: Dimitar Dimitrov dimitar.dimitrov@grafana.com

  • ? Tests updated
  • NA Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Panic is
```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x7126c0]
goroutine 108 [running]:
net/url.(*URL).String(0x0)
	/usr/local/go/src/net/url/url.go:800 +0x40
github.com/grafana/mimir/pkg/ruler.DefaultTenantManagerFactory.func1({0x257e040, 0xc000139b80}, {0xc000945973, 0x9}, 0x0?, {0x25641e0, 0xc000607720}, {0x2577428?, 0xc0004f0050})
	/__w/mimir/mimir/pkg/ruler/compat.go:273 +0x345
github.com/grafana/mimir/pkg/ruler.(*DefaultMultiTenantManager).newManager(0xc00083f900, {0x257e040, 0xc000139b80}, {0xc000945973, 0x9})
	/__w/mimir/mimir/pkg/ruler/manager.go:219 +0x163
```

This happens because the parsing in dskit places a nil in the URL when
the value in YAML is an empty string. By contrast, when the value is set
 as a CLI flag, it invokes `url.Parse("")``, which returns a non-nil
 `*url.Url`.

In the ruler we need a non-nil URL, otherwise prometheus code panics.

I didn't change this in dskit because that behaviour there has a unit
test to ensure that marshaling to YAML and then unmarshalling is
effectively a noop. This is the [code](https://github.com/grafana/mimir/blob/ecefbb673367c7047b0f9a04c8f614d229dfd656/vendor/github.com/grafana/dskit/flagext/url.go#L35-L39) in dskit,
this is the [test](https://github.com/grafana/dskit/blob/bbabef49ebf558538749d5b339bf81d96edfe512/flagext/url_test.go#L55-L73).

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov added bug Something isn't working component/ruler labels Sep 7, 2022
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
return rules.NewManager(&rules.ManagerOptions{
Appendable: NewPusherAppendable(p, userID, overrides, totalWrites, failedWrites),
Queryable: embeddedQueryable,
QueryFunc: wrappedQueryFunc,
Context: user.InjectOrgID(ctx, userID),
GroupEvaluationContextFunc: FederatedGroupContextFunc,
ExternalURL: cfg.ExternalURL.URL,
NotifyFunc: SendAlerts(notifier, cfg.ExternalURL.URL.String()),
ExternalURL: externalURL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be safer to not rely on that ExternalURL is not nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code that uses this URL is just a single function:

func NewTemplateExpander(
ctx context.Context,
text string,
name string,
data interface{},
timestamp model.Time,
queryFunc QueryFunc,
externalURL *url.URL,
options []string,
) *Expander {

I will submit a PR against prometheus to propose this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting - the template doesn't cause a panic - it causes an error

if r := recover(); r != nil {
var ok bool
resultErr, ok = r.(error)
if !ok {
resultErr = fmt.Errorf("panic expanding template %v: %v", te.name, r)
}
}
if resultErr != nil {
templateTextExpansionFailures.Inc()
}

which will be included in the rendered template that the user sees

result = fmt.Sprintf("<error expanding template: %s>", err)

So I think prometheus handles nil well enough. In this case it should be enough for us to just prevent that single panic in the description of this PR. I pushed the change in 33755e4. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think?

Looks great, thanks for addressing my comment

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if @replay is good

@dimitarvdimitrov dimitarvdimitrov merged commit 427d556 into main Sep 8, 2022
@dimitarvdimitrov dimitarvdimitrov deleted the dimitar/fix-panic-on-empty-external-url branch September 8, 2022 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/ruler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants