Skip to content

Add config option for remote job name#6043

Merged
cstyan merged 7 commits intoprometheus:masterfrom
cstyan:remote-job-name
Dec 12, 2019
Merged

Add config option for remote job name#6043
cstyan merged 7 commits intoprometheus:masterfrom
cstyan:remote-job-name

Conversation

@cstyan
Copy link
Copy Markdown
Member

@cstyan cstyan commented Sep 20, 2019

Track remote write queues via a map so we don't have to track the index in the remote write array. The queue names in metrics now contain the hash as a string instead of the index in the config array, so they look like this:

prometheus_remote_storage_dropped_samples_total{queue="4d50f478186c47dfcffff29ffbc4b5d4:http://localhost:1235/receive"} 0
prometheus_remote_storage_dropped_samples_total{queue="97f072436c72649a7547e91184322b58

Also adds the option of setting a job name for remote write/read in config. If not set, the name will default to the first 6 characters of the config's md5.Sum(). The config parsing does not allow for duplicate job names, the same as scrape jobs.

Note: the base commit is from another PR (#6041)

@csmarchbanks

@csmarchbanks csmarchbanks self-assigned this Sep 20, 2019
@brian-brazil
Copy link
Copy Markdown
Contributor

By the name I would expect this to have semantic meaning, in particular the same meaning as job_name. What's wrong with indexes?

@cstyan
Copy link
Copy Markdown
Member Author

cstyan commented Sep 20, 2019

Does it not have semantic meaning? A name can now be provided as a more descriptive way of identifying a remote write queue. The index is really only useful because of the ordering in the config file, which I've never particularly liked. It feels brittle.

This way, as a user with remote configs that write to the same endpoint but that have different relabel or queue configs, I can name them appropriately. For example, the remote write config that uses relabelling to drop everything except dev metrics can have a name of dev.

The PR is also branched off another change I've submitted which would remove the index as part of the queue name in metrics, instead including a hash of the config.

@brian-brazil
Copy link
Copy Markdown
Contributor

Does it not have semantic meaning?

No, it has no semantic effect as it's only used in metrics. Whereas job_name is a default for the job label.

@cstyan
Copy link
Copy Markdown
Member Author

cstyan commented Sep 20, 2019

The metric label queue, which is the combo of the write_job_name and the remote write url, is applied to all metrics related to a remote write queue. I suppose that could be changed to two labels job and url.

Do you disagree that having a name is not providing more value than just the index?

@csmarchbanks
Copy link
Copy Markdown
Member

I think being able to name queues would be helpful to identify a queue in metrics/logs rather than looking up the order in your configs. It is very similar to the name of a rule group, no real meaning, but nice for display and other internal purposes.

Rather than calling it write_job_name, perhaps just name, or queue_name to avoid any confusion with the idea of a job in Prometheus?

@beorn7
Copy link
Copy Markdown
Member

beorn7 commented Oct 15, 2019

@csmarchbanks how does this look to you now?

I feel not qualified to judge about the detailed implementation (unless I invested time to get into it), but I'd like to note that changing the queue names just because the config was reordered or a queue was added at the top rather than the bottom of the config. The queue name shows up in metrics, which is another reason why it is handy to have a meaning.

@csmarchbanks
Copy link
Copy Markdown
Member

I'd like to note that changing the queue names just because the config was reordered or a queue was added at the top rather than the bottom of the config.

Is this sentence missing a conclusion?

Generally this implementation seems correct to me, trying to think through some edge cases that might break. Right now:

  1. If you have two duplicate remote write configs they would validate parsing then blow up metrics due to having the same hash. This seems pretty far-fetched in practice, should it be protected against?

@cstyan
Copy link
Copy Markdown
Member Author

cstyan commented Oct 22, 2019

@csmarchbanks Hmm, so right now we don't do anything to stop people from having multiple remote write queues with the exact same config. I think that should be changed.

It doesn't cause any kind of error/panic with the metrics, the values are just double (or slightly higher) than what they would be with two separate queues running.

I've added a check for duplicate configs, note that configs that are the same in everything except name are still considered duplicates, and a test for this.

Copy link
Copy Markdown
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just go vet nits for now. I'll let @csmarchbanks to have a proper look first.

@cstyan cstyan requested a review from csmarchbanks October 24, 2019 03:54
config/config.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is name not included? Wouldn't we want to restart the component so the metrics pick up the new name?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah I need to change this a bit. The idea is that two remote write configs that are identical except for their names should also be considered duplicates.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csmarchbanks PTAL, still think we might be able to simplify the hashing/comparison but my brain is dead atm. The current check works.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want two remote write configs that are identical to be disallowed? I guess it doesn't make much sense to run that way, but moving Name logic to different places is extra complexity.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimize the chance for users to end up with bad config. If we allow duplicates, but with different names, it means both queues would be writing to the same endpoint with the exact same samples. At the very least they'd end up with lots of write errors in their logs.

Needing both hashing functions is unfortunate, but to me it's better than maintaining a second version of the config struct that just removes the name field, and making one generic hash function. I could rename this function to HashWithoutName so it's clearer that they're different functions?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loadtesting with all the samples duplicated would be problematic

That depends on what you're loadtesting, it doesn't have to be the storage itself. It might be something like how well you deal with duplicate samples.

I don't see how a loadbalancer could get involved in a reasonable way.

I said weird :)

I'd define identical configs as semantically invalid.

I see no semantic issues here, nothing about it breaks our data, querying, or remote storage model. It works today and causes no problems for the Prometheus server. It may not have any "normal" use cases or be efficient, but I don't think we should be 2nd guessing what users put in their config file.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be something like how well you deal with duplicate samples.

This line of arguments would require the removal of all kind of correctness check. Should I now allow duplicates in expositions by client_golang because somebody might want to test how well a consumer deals with duplicate samples?

It works today and causes no problems for the Prometheus server.

Again, with this line of argument, we could justify all kind of insane output of the Prometheus server.

I expect remote-write config duplication to happen by accident, and I want to protect my users from it. While accidental config duplication is likely to happen, valid use cases for duplicate config is all but speculation at this point. The course of action appears clear to me from that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think it's semantically incorrect to accept identical configurations in the case of remote write when duplicates have implications for the downstream remote storage.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the case of remote write when duplicates have implications for the downstream remote storage.

That's something that we get to define rather than the remote storages, and as we're going for at-least-once delivery remote storages already have to handle this semantically. Efficiency is not the same as semantics.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that providing some sanity checking of configuration would be nice, but I also think it will be weird that two configs with a different backoff is counted as "valid" but two configs with different names are rejected. As far as this PR goes, my vote would be that the name is included in any hash, and required to be unique due to the metrics we export. That would allow a user to just change the name of the queue in case they want duplicate samples being sent, but would guard against simple copy-paste without changing anything mistakes.

I am not opposed to the current implementation, just a preference to include the name in the hash. Finally, perhaps it would be good to open a separate issue/PR for sanity checking effectively duplicate configs more thoroughly?

Signed-off-by: Callum Styan <callumstyan@gmail.com>
them using the name.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
scrape job names.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
not changed, but the remote name has changed.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Copy link
Copy Markdown
Member

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very close, last couple small comments (I hope!)

config/config.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want two remote write configs that are identical to be disallowed? I guess it doesn't make much sense to run that way, but moving Name logic to different places is extra complexity.

Copy link
Copy Markdown
Member

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple nits then 👍

check, update test accordingly.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan
Copy link
Copy Markdown
Member Author

cstyan commented Dec 12, 2019

@csmarchbanks thanks, fixed. If everything passes I'll merge it.

igit-cn added a commit to igit-cn/prometheus that referenced this pull request Feb 25, 2020
Add config option for remote job name (prometheus#6043) …
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants