-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote Read #2499
Remote Read #2499
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick look...
storage/remote/client.go
Outdated
|
||
// TODO: POST is weird for a query, but as per http://stackoverflow.com/questions/978061/http-get-with-request-body | ||
// it's not valid to attach meaning to an HTTP body in a GET request. | ||
// What is the right thing to do here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
POST is the done thing.
We'll be adding it to scrapes at some point too due to url length limits when ?name[]= becomes popular
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool.
} | ||
} | ||
|
||
// mergeSamples merges two lists of sample pairs and removes duplicate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to give priority to local storage where local storage has data? That'd avoid things like downsampling on the LTS side interfering with rate().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we detect that when both storages could have gaps at arbitrary times? A naive approach where we look what the oldest sample in the locally returned data is (being fine with gaps in that) and discarding remote data younger than that timestamp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use the naieve approach. If you had a single Prometheus reporting to a LTS it's not possible that the LTS has points that the Prometheus doesn't, and that's the intended setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, good point, will do that then. Assuming a single Prometheus writing into the remote storage, we could even think about time-pruning the requests we send to the remote storage based on doing a local query first and seeing what its oldest returned samples are (since there can't be different series for the same matchers at the same time in the remote storage if it all came from the same Prometheus). Not sure I'll do that in this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, uneven purging of local data past the retention cutoff would mean we couldn't just look at the earliest sample returned in whole, but take the retention time into account as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That gets into more advanced stuff :) That only works if you're talking to a true LTS and your vector selector doesn't mention any external labelnames.
(You also don't need to query locally, you just need to know when the local storage was initilized.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm gonna leave that out for later then :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you had a single Prometheus reporting to a LTS it's not possible that the LTS has points that the Prometheus doesn't
Wasn't there an argument in #10 that someone could have wiped the storage of the local Prometheus. In this case data would exist in LTS that is not local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pilhuhn Here we are talking only about time ranges younger than the oldest sample for a series in Prometheus's local storage. There can basically be no overlapping ranges of times where both local and remote storages have points, but different ones.
This is something we need for cortex as well, to index query performance too... |
config/config.go
Outdated
RemoteTimeout model.Duration `yaml:"remote_timeout,omitempty"` | ||
BasicAuth *BasicAuth `yaml:"basic_auth,omitempty"` | ||
TLSConfig TLSConfig `yaml:"tls_config,omitempty"` | ||
ProxyURL URL `yaml:"proxy_url,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does YAML deal with embedded structs gracefully? Could we use one here to share this config with RemoteWriteConfig and remove the copy in remote/{read.go, remote.go}?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. Moving this elsewhere is a separate discussion, I personally want it kicked out to common so it can be reused with the blackbox exporter, consul exporter and alertmanager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's actually a HTTPClientConfig
in config
already, so we probably shouldn't create yet another similar HTTP client config struct for embedding here. That one supports more HTTP auth methods (bearer token, ...), but lacks the timeout and url.
@brian-brazil Was there a specific reason why we only do basic auth for remote write, or would it be possible to reuse the HTTPClientConfig
here and for the write path as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think basic auth was added ad-hoc as Weaveworks needed it. I'm for standardising the http settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not ad-hoc; some effort went into unifying the http client code (at least, the TLS handling) between the remote write path and the retrieval path: #1957
The HTTPClientConfig struct was extract in #2215, AFAICT to unify the alert manager config and the retrieval config. I think it would make sense to use it for the remote read/write config too - am happy to put together a PR for that @juliusv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomwilkie Thanks for the pointers and the offer! I think it's simple and relevant enough for me to just include it in this PR though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go for it!
// RemoteReadConfig is the configuration for reading from remote storage. | ||
type RemoteReadConfig struct { | ||
URL *URL `yaml:"url,omitempty"` | ||
RemoteTimeout model.Duration `yaml:"remote_timeout,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to end up as a max timeout I expect.
Just noting that I see a lot of types and interfaces being used here that no longer exist in dev-2.0 branch. |
@fabxc That would be me. I would tentatively say yes, although I have not taken a look at the dev-2.0 branch yet and thus have no idea how much work it would be. From a cursory glance, does this implementation still seem generally conceptually compatible with dev-2.0 or would it have to change completely? |
storage/fanin/fanin.go
Outdated
} | ||
for ; j < len(b); j++ { | ||
result = append(result, b[j]) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those can just be result = append(result, a[i:]...)
and result = append(result, b[j:]...)
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, borrowed from Cortex :)
@juliusv a fair bit will change. The new interface is here: https://github.com/prometheus/prometheus/blob/dev-2.0/storage/interface.go |
storage/fanin/fanin.go
Outdated
} | ||
|
||
func (q querier) LastSampleForLabelMatchers(ctx context.Context, cutoff model.Time, matcherSets ...metric.LabelMatchers) (model.Vector, error) { | ||
// TODO: implement querying last samples from remote storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not needed I think given it's just for federation.
For the new storage I brought up yesterday (prometheus-junkyard/tsdb#9) that it possibly shouldn't be a part of the main storage interface.
From a rough look, it should conceptually be the same. Just that we would connect series from local and remote lazily as we scan through the Though xxhash used in the new storage (with collision handling) has way fewer collisions and is faster, hashes are still no safe way to identify equal label sets and we should get rid of that practice in our entire querying layer as well – what's the point of storing collision safe if we cannot query collision safe? Ordering and comparing sequentially is a straightforward and safe way to do it. Only alternative would be someone implementing a generalized merger using hashing but also handling collisions internally (i.e. still O(n) comparisons but also building n hashes). That drifted a bit off-topic. Just trying to make a point that the changes have worthwhile effects beyond making it fit to a different interface. |
@fabxc That interface file looks so beautiful. Thanks. Just wondering now of course how much effort to still put into the 1.x version of remote read. Do you have a hunch about an 2.0 ETA or is it too early to say? Actually I think it's still important to use 1.x for test-driving a lot of the semantics around the remote storage so that we can get them right for 2.0. So I think in 1.x it makes sense to focus on those semantics and config options, but not performance improvements or such. |
If this PR works and is flagged as experimental anyway, I wouldn't make you stop that from merging. Especially for the reasons you mentioned. I've benchmarked querying and it all points towards 2.0 being faster there as well. So theoretically everything is set to go. Just that there's a mysterious deadlock in the write path that I just cannot seem to chase down. That's blocking everything right now basically. Any help would be great of course. If that is solved, we should be ready for an early alpha. But I wouldn't count on the alpha phase to be short. In particular as we want to revisit all the other stuff we want to break when cutting 2.0. |
Pushed some changes:
|
Now working on some tests and stuff if people are happy with the overall picture. |
Ok, added Fanin tests and fixed a bunch of uncovered bugs. This is all working with a local hacked-up Cortex now that supports remote read. The main remaining issue right now is that rules still go to remote storage as well, but shouldn't. I'm wondering what to do about that. I could instantiate another I'm currently leaning towards the context value option, but what do people think? |
We'll ultimately want to expose this as a per-query option on the HTTP API, so that's one option. |
@brian-brazil Ok, I implemented that. It doesn't seem too bad (considering the other options are also not great): 8fda83e |
Removed the [WIP], ready for a final review now. |
@juliusv I've had a look and its a +1 from me. httpClient changes look good! Let me know if you want me to test it out too. |
I've worked through the semantics of remote read for the 3 use cases, here's what I've come up with: https://docs.google.com/document/d/188YauRgfF0J4CYMigLsVNN34V_kUwKnApBs2dQMfBbs/edit# It turns out that the LTS semantics are sufficient to handle the other cases. |
@brian-brazil Awesome, thanks. Very useful. That all makes sense to me. How about starting with your 1st and 2nd rule in that doc, but as a followup PR? |
That's fine with me, these should go in before the alpha though as LTS is unusable without them. |
Cool. Since I have a +1 from @tomwilkie here and I maintain remote storage, I'll merge this tomorrow if there are no further objections, and then I'll follow up with the external label handling. |
One thing that just came to mind is that we probably want to add a version number as a header or something. That way any breaking changes will be possible to handle on the other end. |
Also add Content-Type header.
@brian-brazil I added an I also added a |
Merging in 3... 2... 1... |
This allows querying Cortex via Prometheus's new generic remote read protocol. See prometheus/prometheus#2499 Fixes #226
* Add Prometheus remote read support This allows querying Cortex via Prometheus's new generic remote read protocol. See prometheus/prometheus#2499 Fixes #226 * Move ParseProtoRequest to util package It is now also used by the MergeQuerier. * More review feedback + cleanups
This is a WIP approach for remote read (part of #10) with no tests yet, but I wanted to share it early in case someone sees something fundamentally wrong with it.
Features that are intentionally not included in this first iteration: