New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scrape: Add global jitter for HA server #5181
Conversation
32b4fbd
to
971b481
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's be good to also include hostname, as that'll also catch test machines using the same config.
could you elaborate on it? what machines? |
971b481
to
7bc9128
Compare
Any other machine on which someone copies over a production config verbatim. |
what about docker setup, for those who doesn't specify hostname it will move around the offset on re-runs. 🤔 |
7bc9128
to
7e6ed45
Compare
That's an acceptable tradeoff. |
updated with
could you restart the CI test build? |
@xjewer Nice! This graph you provide is for what? |
d05320f
to
a677af8
Compare
this one is for http probes |
a677af8
to
44a9e87
Compare
Covers issue in prometheus#4926 (comment) where the HA setup become a problem for targets unable to be scraped simultaneously. The new jitter per server relies on the hostname and external labels which necessarily to be uniq. As before, scrape offset will be calculated with regard the absolute time, so even restart/reload doesn't change scrape time per scrape target + prometheus instance. Signed-off-by: Aleksei Semiglazov <xjewer@gmail.com>
44a9e87
to
06db8a2
Compare
fixed merge conflicts and remarks, ready for review |
any chance to be merged into next release? cc @brian-brazil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.8 is in progress, so it'll be 2.9 at this stage.
"reflect" | ||
"sync" | ||
"time" | ||
|
||
"github.com/go-kit/kit/log" | ||
"github.com/go-kit/kit/log/level" | ||
|
||
"github.com/prometheus/common/model" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're trying to get rid of usage of this library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I use github.com/prometheus/prometheus/pkg/labels
instead?
I can add this to GlobalConfig:
// GetExternalLabels wraps external labels as a labels.Labels.
func (c *GlobalConfig) GetExternalLabels() labels.Labels {
labelSet := make(map[string]string, len(c.ExternalLabels))
for n, v := range c.ExternalLabels {
labelSet[string(n)] = string(v)
}
return labels.FromMap(labelSet)
}
but as this isn't related to the Jitter, let's leave it as is and do clean-up in follow up PRs, I can help you out with it.
Didn't find any issue regarding liquidation the github.com/prometheus/common/model
library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model.Labelset is already a map[string]string, so you can use FromMap directly. You can also re-use its Hash method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is, the model.Labelset
isn't a map[string]string
, it's map[LabelName]LabelValue
even if LabelName
and LabelValue
are strings, so it's not the same and golang can't do type conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can cast it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what am I doing wrong?
labelSet := map[string]string(cfg.GlobalConfig.ExternalLabels)
cannot convert cfg.GlobalConfig.ExternalLabels (type model.LabelSet) to type map[string]string
https://play.golang.org/p/EwQjDtB00dJ
prog.go:13:23: cannot convert ls2 (type LabelSet2) to type map[string]string
where as LabelSet works as mentioned before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right you are.
Use fqdn if possible, otherwise fall back to the hostname. It adds extra random seed to calculate server hash to be distinguish on machines with the same hostname, but different DC. Signed-off-by: Aleksei Semiglazov <xjewer@gmail.com>
06db8a2
to
8e553d6
Compare
Thanks! |
* scrape: Add global jitter for HA server Covers issue in prometheus#4926 (comment) where the HA setup become a problem for targets unable to be scraped simultaneously. The new jitter per server relies on the hostname and external labels which necessarily to be uniq. As before, scrape offset will be calculated with regard the absolute time, so even restart/reload doesn't change scrape time per scrape target + prometheus instance. Use fqdn if possible, otherwise fall back to the hostname. It adds extra random seed to calculate server hash to be distinguish on machines with the same hostname, but different DC. Signed-off-by: Aleksei Semiglazov <xjewer@gmail.com>
Covers issue in #4926 (comment)
where the HA setup become a problem for targets unable to be scraped simultaneously.
The new jitter per server relies on a hostname and external labels which necessarily to be uniq.
As before, scrape offset will be calculated with regard the absolute time, so even
restart/reload doesn't change scrape time per scrape target + prometheus instance.
Jitter interval was chosen within the boundaries 0-59 seconds, which should be enough
to bring entropy for the scrapes across HA setup.