Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrape: Add global jitter for HA server #5181

Merged
merged 2 commits into from Mar 12, 2019

Conversation

xjewer
Copy link
Contributor

@xjewer xjewer commented Feb 4, 2019

Covers issue in #4926 (comment)
where the HA setup become a problem for targets unable to be scraped simultaneously.
The new jitter per server relies on a hostname and external labels which necessarily to be uniq.

As before, scrape offset will be calculated with regard the absolute time, so even
restart/reload doesn't change scrape time per scrape target + prometheus instance.

Jitter interval was chosen within the boundaries 0-59 seconds, which should be enough
to bring entropy for the scrapes across HA setup.

@xjewer xjewer force-pushed the feature_server_jitter branch 2 times, most recently from 32b4fbd to 971b481 Compare February 4, 2019 12:07
scrape/target.go Outdated Show resolved Hide resolved
Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's be good to also include hostname, as that'll also catch test machines using the same config.

@xjewer
Copy link
Contributor Author

xjewer commented Feb 4, 2019

It's be good to also include hostname, as that'll also catch test machines using the same config.

could you elaborate on it? what machines?

@brian-brazil
Copy link
Contributor

could you elaborate on it? what machines?

Any other machine on which someone copies over a production config verbatim.

@xjewer
Copy link
Contributor Author

xjewer commented Feb 4, 2019

someone copies over a production config

what about docker setup, for those who doesn't specify hostname it will move around the offset on re-runs. 🤔

@brian-brazil
Copy link
Contributor

That's an acceptable tradeoff.

@xjewer
Copy link
Contributor Author

xjewer commented Feb 4, 2019

updated with hostname, but smth went wrong with an import:

go: gopkg.in/vmihailenco/msgpack.v2@v2.9.1: unrecognized import path "gopkg.in/vmihailenco/msgpack.v2" (parse https://gopkg.in/vmihailenco/msgpack.v2?go-get=1: no go-import meta tags ())

could you restart the CI test build?

@xjewer
Copy link
Contributor Author

xjewer commented Feb 5, 2019

I applied this patch to our 2.6.1 HA setup and it seems to work as expected:

image

scrape/target.go Outdated Show resolved Hide resolved
scrape/manager.go Outdated Show resolved Hide resolved
scrape/manager.go Outdated Show resolved Hide resolved
@bwplotka
Copy link
Member

bwplotka commented Feb 8, 2019

@xjewer Nice! This graph you provide is for what?

@xjewer xjewer force-pushed the feature_server_jitter branch 2 times, most recently from d05320f to a677af8 Compare February 11, 2019 22:04
@xjewer
Copy link
Contributor Author

xjewer commented Feb 11, 2019

@xjewer Nice! This graph you provide is for what?

this one is for http probes

Covers issue in prometheus#4926 (comment)
where the HA setup become a problem for targets unable to be scraped simultaneously.
The new jitter per server relies on the hostname and external labels which necessarily to be uniq.

As before, scrape offset will be calculated with regard the absolute time, so even
restart/reload doesn't change scrape time per scrape target + prometheus instance.

Signed-off-by: Aleksei Semiglazov <xjewer@gmail.com>
@xjewer
Copy link
Contributor Author

xjewer commented Mar 2, 2019

fixed merge conflicts and remarks, ready for review

@xjewer
Copy link
Contributor Author

xjewer commented Mar 5, 2019

any chance to be merged into next release? cc @brian-brazil

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.8 is in progress, so it'll be 2.9 at this stage.

scrape/manager.go Outdated Show resolved Hide resolved
"reflect"
"sync"
"time"

"github.com/go-kit/kit/log"
"github.com/go-kit/kit/log/level"

"github.com/prometheus/common/model"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to get rid of usage of this library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I use github.com/prometheus/prometheus/pkg/labels instead?

I can add this to GlobalConfig:

// GetExternalLabels wraps external labels as a labels.Labels.
func (c *GlobalConfig) GetExternalLabels() labels.Labels {
	labelSet := make(map[string]string, len(c.ExternalLabels))
	for n, v := range c.ExternalLabels {
		labelSet[string(n)] = string(v)
	}

	return labels.FromMap(labelSet)
}

but as this isn't related to the Jitter, let's leave it as is and do clean-up in follow up PRs, I can help you out with it.

Didn't find any issue regarding liquidation the github.com/prometheus/common/model library.

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model.Labelset is already a map[string]string, so you can use FromMap directly. You can also re-use its Hash method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is, the model.Labelset isn't a map[string]string, it's map[LabelName]LabelValue even if LabelName and LabelValue are strings, so it's not the same and golang can't do type conversion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can cast it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what am I doing wrong?
labelSet := map[string]string(cfg.GlobalConfig.ExternalLabels)

cannot convert cfg.GlobalConfig.ExternalLabels (type model.LabelSet) to type map[string]string

https://play.golang.org/p/EwQjDtB00dJ

prog.go:13:23: cannot convert ls2 (type LabelSet2) to type map[string]string

where as LabelSet works as mentioned before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right you are.

scrape/manager.go Outdated Show resolved Hide resolved
scrape/target.go Outdated Show resolved Hide resolved
scrape/target.go Outdated Show resolved Hide resolved
scrape/target.go Outdated Show resolved Hide resolved
Use fqdn if possible, otherwise fall back to the hostname. It adds extra random seed
to calculate server hash to be distinguish on machines with the same hostname, but
different DC.

Signed-off-by: Aleksei Semiglazov <xjewer@gmail.com>
@brian-brazil brian-brazil merged commit 0d1a693 into prometheus:master Mar 12, 2019
@brian-brazil
Copy link
Contributor

Thanks!

@xjewer xjewer deleted the feature_server_jitter branch March 12, 2019 20:31
csmarchbanks pushed a commit to csmarchbanks/prometheus that referenced this pull request Apr 9, 2019
* scrape: Add global jitter for HA server

Covers issue in prometheus#4926 (comment)
where the HA setup become a problem for targets unable to be scraped simultaneously.
The new jitter per server relies on the hostname and external labels which necessarily to be uniq.

As before, scrape offset will be calculated with regard the absolute time, so even
restart/reload doesn't change scrape time per scrape target + prometheus instance.

Use fqdn if possible, otherwise fall back to the hostname. It adds extra random seed
to calculate server hash to be distinguish on machines with the same hostname, but
different DC.

Signed-off-by: Aleksei Semiglazov <xjewer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants