Remote Read #2499

juliusv · 2017-03-15T19:35:51Z

This is a WIP approach for remote read (part of #10) with no tests yet, but I wanted to share it early in case someone sees something fundamentally wrong with it.

Features that are intentionally not included in this first iteration:

service discovery
batching multiple label matcher sets into the same read request (although the protobuf interface supports it, it requires some refactoring to make it happen)
metadata queries or fetching the last sample of a time series (only regular range/instant queries for now)
turning off remote reads for certain queries (especially rules)

brian-brazil

Just a quick look...

brian-brazil · 2017-03-15T21:16:03Z

storage/remote/client.go

+
+	// TODO: POST is weird for a query, but as per http://stackoverflow.com/questions/978061/http-get-with-request-body
+	// it's not valid to attach meaning to an HTTP body in a GET request.
+	// What is the right thing to do here?


POST is the done thing.

We'll be adding it to scrapes at some point too due to url length limits when ?name[]= becomes popular

brian-brazil · 2017-03-15T21:32:49Z

storage/fanin/fanin.go

+	}
+}
+
+// mergeSamples merges two lists of sample pairs and removes duplicate


Do we want to give priority to local storage where local storage has data? That'd avoid things like downsampling on the LTS side interfering with rate().

How would we detect that when both storages could have gaps at arbitrary times? A naive approach where we look what the oldest sample in the locally returned data is (being fine with gaps in that) and discarding remote data younger than that timestamp?

I'd use the naieve approach. If you had a single Prometheus reporting to a LTS it's not possible that the LTS has points that the Prometheus doesn't, and that's the intended setup.

Ok, good point, will do that then. Assuming a single Prometheus writing into the remote storage, we could even think about time-pruning the requests we send to the remote storage based on doing a local query first and seeing what its oldest returned samples are (since there can't be different series for the same matchers at the same time in the remote storage if it all came from the same Prometheus). Not sure I'll do that in this PR though.

Actually, uneven purging of local data past the retention cutoff would mean we couldn't just look at the earliest sample returned in whole, but take the retention time into account as well.

That gets into more advanced stuff :) That only works if you're talking to a true LTS and your vector selector doesn't mention any external labelnames.

(You also don't need to query locally, you just need to know when the local storage was initilized.)

Yeah, I'm gonna leave that out for later then :)

If you had a single Prometheus reporting to a LTS it's not possible that the LTS has points that the Prometheus doesn't

Wasn't there an argument in #10 that someone could have wiped the storage of the local Prometheus. In this case data would exist in LTS that is not local.

@pilhuhn Here we are talking only about time ranges younger than the oldest sample for a series in Prometheus's local storage. There can basically be no overlapping ranges of times where both local and remote storages have points, but different ones.

tomwilkie · 2017-03-15T21:49:23Z

batching multiple label matcher sets into the same read request (although the protobuf interface supports it, it requires some refactoring to make it happen)

This is something we need for cortex as well, to index query performance too...

tomwilkie · 2017-03-15T21:54:07Z

config/config.go

+	RemoteTimeout model.Duration `yaml:"remote_timeout,omitempty"`
+	BasicAuth     *BasicAuth     `yaml:"basic_auth,omitempty"`
+	TLSConfig     TLSConfig      `yaml:"tls_config,omitempty"`
+	ProxyURL      URL            `yaml:"proxy_url,omitempty"`


Does YAML deal with embedded structs gracefully? Could we use one here to share this config with RemoteWriteConfig and remove the copy in remote/{read.go, remote.go}?

It does. Moving this elsewhere is a separate discussion, I personally want it kicked out to common so it can be reused with the blackbox exporter, consul exporter and alertmanager.

There's actually a HTTPClientConfig in config already, so we probably shouldn't create yet another similar HTTP client config struct for embedding here. That one supports more HTTP auth methods (bearer token, ...), but lacks the timeout and url.

@brian-brazil Was there a specific reason why we only do basic auth for remote write, or would it be possible to reuse the HTTPClientConfig here and for the write path as well?

I think basic auth was added ad-hoc as Weaveworks needed it. I'm for standardising the http settings.

Not ad-hoc; some effort went into unifying the http client code (at least, the TLS handling) between the remote write path and the retrieval path: #1957

The HTTPClientConfig struct was extract in #2215, AFAICT to unify the alert manager config and the retrieval config. I think it would make sense to use it for the remote read/write config too - am happy to put together a PR for that @juliusv?

@tomwilkie Thanks for the pointers and the offer! I think it's simple and relevant enough for me to just include it in this PR though?

brian-brazil · 2017-03-15T22:12:04Z

config/config.go

+// RemoteReadConfig is the configuration for reading from remote storage.
+type RemoteReadConfig struct {
+	URL           *URL           `yaml:"url,omitempty"`
+	RemoteTimeout model.Duration `yaml:"remote_timeout,omitempty"`


This is going to end up as a max timeout I expect.

fabxc · 2017-03-15T22:23:51Z

Just noting that I see a lot of types and interfaces being used here that no longer exist in dev-2.0 branch.
Can I trust people owning the remote/ code path to do the migration work for 2.0?

juliusv · 2017-03-15T22:28:52Z

@fabxc That would be me. I would tentatively say yes, although I have not taken a look at the dev-2.0 branch yet and thus have no idea how much work it would be. From a cursory glance, does this implementation still seem generally conceptually compatible with dev-2.0 or would it have to change completely?

fabxc · 2017-03-16T07:16:49Z

storage/fanin/fanin.go

+	}
+	for ; j < len(b); j++ {
+		result = append(result, b[j])
+	}


Those can just be result = append(result, a[i:]...) and result = append(result, b[j:]...), no?

Good point, borrowed from Cortex :)

fabxc · 2017-03-16T07:19:25Z

@juliusv a fair bit will change. The new interface is here: https://github.com/prometheus/prometheus/blob/dev-2.0/storage/interface.go

fabxc · 2017-03-16T07:21:39Z

storage/fanin/fanin.go

+}
+
+func (q querier) LastSampleForLabelMatchers(ctx context.Context, cutoff model.Time, matcherSets ...metric.LabelMatchers) (model.Vector, error) {
+	// TODO: implement querying last samples from remote storage.


That's not needed I think given it's just for federation.
For the new storage I brought up yesterday (prometheus-junkyard/tsdb#9) that it possibly shouldn't be a part of the main storage interface.

fabxc · 2017-03-16T07:32:52Z

From a rough look, it should conceptually be the same. Just that we would connect series from local and remote lazily as we scan through the SeriesSet. It wouldn't connect based on fingerprint but require those iterators to returned label sets in order.
It's fine to materialize the remote SeriesSet and order it before merging if the remote storage does not guarantee ordering.

Though xxhash used in the new storage (with collision handling) has way fewer collisions and is faster, hashes are still no safe way to identify equal label sets and we should get rid of that practice in our entire querying layer as well – what's the point of storing collision safe if we cannot query collision safe?

Ordering and comparing sequentially is a straightforward and safe way to do it. Only alternative would be someone implementing a generalized merger using hashing but also handling collisions internally (i.e. still O(n) comparisons but also building n hashes).
Ordered iterators are preferable here as they theoretically have more comparisons, but if both sets roughly have the same series, it's still just O(n) comparisons but no hashes or space allocations for hashmaps.

That drifted a bit off-topic. Just trying to make a point that the changes have worthwhile effects beyond making it fit to a different interface.

juliusv · 2017-03-16T11:32:03Z

@fabxc That interface file looks so beautiful. Thanks.

Just wondering now of course how much effort to still put into the 1.x version of remote read. Do you have a hunch about an 2.0 ETA or is it too early to say? Actually I think it's still important to use 1.x for test-driving a lot of the semantics around the remote storage so that we can get them right for 2.0. So I think in 1.x it makes sense to focus on those semantics and config options, but not performance improvements or such.

fabxc · 2017-03-16T20:01:55Z

If this PR works and is flagged as experimental anyway, I wouldn't make you stop that from merging. Especially for the reasons you mentioned.

I've benchmarked querying and it all points towards 2.0 being faster there as well. So theoretically everything is set to go. Just that there's a mysterious deadlock in the write path that I just cannot seem to chase down. That's blocking everything right now basically.
There's also an issue where we see query artefacts that PromQL tests don't expose. Haven't gotten around to investigating yet but probably some minor bug.

Any help would be great of course.

If that is solved, we should be ready for an early alpha. But I wouldn't count on the alpha phase to be short. In particular as we want to revisit all the other stuff we want to break when cutting 2.0.

juliusv · 2017-03-20T13:17:56Z

Pushed some changes:

within a series, use only those remote storage sample that are before the first local sample
make both RemoteReadConfig and RemoteWriteConfig use HTTPClientConfig for configuring their HTTP client options
move retrieval.NewHTTPClient to httputil.NewClientFromConfig, adjust
fix/unify remote storage timeout and context usage

juliusv · 2017-03-20T13:18:26Z

Now working on some tests and stuff if people are happy with the overall picture.

juliusv · 2017-03-20T23:20:20Z

Ok, added Fanin tests and fixed a bunch of uncovered bugs. This is all working with a local hacked-up Cortex now that supports remote read.

The main remaining issue right now is that rules still go to remote storage as well, but shouldn't. I'm wondering what to do about that. I could instantiate another promql.Engine in main.go that only is based on the local storage and use that for rules, but the current promql.Engine is kinda meant as a semi-singleton, because it contains the global query concurrency limiter and sets global gauge metrics. That could maybe be changed, but I'm wondering about other ways. The only other ways that come to mind are passing something all the way down to the Fanin querier via a context value (more untyped bag of values madness) or modifying all function signatures in the call stack to have another argument to indicate local-only queries (ugh, no?).

I'm currently leaning towards the context value option, but what do people think?

brian-brazil · 2017-03-20T23:24:08Z

I'm currently leaning towards the context value option, but what do people think?

We'll ultimately want to expose this as a per-query option on the HTTP API, so that's one option.

juliusv · 2017-03-20T23:51:01Z

@brian-brazil Ok, I implemented that. It doesn't seem too bad (considering the other options are also not great): 8fda83e

juliusv · 2017-03-21T11:14:20Z

Removed the [WIP], ready for a final review now.

tomwilkie · 2017-03-22T11:47:44Z

@juliusv I've had a look and its a +1 from me. httpClient changes look good! Let me know if you want me to test it out too.

brian-brazil · 2017-03-22T18:08:07Z

I've worked through the semantics of remote read for the 3 use cases, here's what I've come up with: https://docs.google.com/document/d/188YauRgfF0J4CYMigLsVNN34V_kUwKnApBs2dQMfBbs/edit#

It turns out that the LTS semantics are sufficient to handle the other cases.

juliusv · 2017-03-22T21:49:52Z

@brian-brazil Awesome, thanks. Very useful. That all makes sense to me.

How about starting with your 1st and 2nd rule in that doc, but as a followup PR?

brian-brazil · 2017-03-22T21:53:11Z

That's fine with me, these should go in before the alpha though as LTS is unusable without them.

juliusv · 2017-03-22T21:55:32Z

Cool. Since I have a +1 from @tomwilkie here and I maintain remote storage, I'll merge this tomorrow if there are no further objections, and then I'll follow up with the external label handling.

brian-brazil · 2017-03-23T15:46:16Z

One thing that just came to mind is that we probably want to add a version number as a header or something. That way any breaking changes will be possible to handle on the other end.

Also add Content-Type header.

juliusv · 2017-03-24T16:41:56Z

@brian-brazil I added an X-Prometheus-Remote-Write-Version and X-Prometheus-Remote-Read-Version header, each with version 0.0.1, similar to how we do it for the scrape protocol.

I also added a Content-Type: application/x-protobuf header, as requested in #2522.

juliusv · 2017-03-26T17:54:34Z

Merging in 3... 2... 1...

This allows querying Cortex via Prometheus's new generic remote read protocol. See prometheus/prometheus#2499 Fixes #226

* Add Prometheus remote read support This allows querying Cortex via Prometheus's new generic remote read protocol. See prometheus/prometheus#2499 Fixes #226 * Move ParseProtoRequest to util package It is now also used by the MergeQuerier. * More review feedback + cleanups

discordianfish added the in progress label Mar 15, 2017

juliusv mentioned this pull request Mar 15, 2017

Remote storage #10

Closed

juliusv force-pushed the remote-read branch from 191797e to 002400f Compare March 15, 2017 19:41

brian-brazil reviewed Mar 15, 2017

View reviewed changes

tomwilkie reviewed Mar 15, 2017

View reviewed changes

brian-brazil reviewed Mar 15, 2017

View reviewed changes

fabxc reviewed Mar 16, 2017

View reviewed changes

juliusv added 5 commits March 20, 2017 13:13

[WIP] Remote Read

02395a2

Rename remote.Storage to remote.Writer

406b65d

Make remote read/write use config.HTTPClientConfig

eb14678

Move retrieval.NewHTTPClient -> httputil.NewClientFromConfig

815762a

Fix/unify context-based remote storage timeouts

9b33cfc

juliusv force-pushed the remote-read branch from 002400f to 9b33cfc Compare March 20, 2017 13:23

Add fanin tests and fix uncovered bugs

94acd3f

juliusv force-pushed the remote-read branch from b49f929 to 94acd3f Compare March 20, 2017 23:08

Make rules only read local data

8fda83e

juliusv changed the title ~~[WIP] Remote Read~~ Remote Read Mar 21, 2017

juliusv mentioned this pull request Mar 21, 2017

Allow Cortex to be queried through Prometheus cortexproject/cortex#226

Closed

2 tasks

Add headers to indicate remote read/write version

3f23aa2

Also add Content-Type header.

juliusv added a commit to cortexproject/cortex that referenced this pull request Mar 26, 2017

Add Prometheus remote read support

5960a19

This allows querying Cortex via Prometheus's new generic remote read protocol. See prometheus/prometheus#2499 Fixes #226

juliusv mentioned this pull request Mar 26, 2017

Add Prometheus remote read support cortexproject/cortex#372

Merged

juliusv merged commit b5b0e00 into master Mar 27, 2017

discordianfish removed the in progress label Mar 27, 2017

juliusv deleted the remote-read branch March 27, 2017 12:43

Remote Read #2499

Remote Read #2499

Conversation

juliusv commented Mar 15, 2017

brian-brazil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomwilkie commented Mar 15, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabxc commented Mar 15, 2017

juliusv commented Mar 15, 2017

fabxc Mar 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabxc commented Mar 16, 2017

Choose a reason for hiding this comment

fabxc commented Mar 16, 2017

juliusv commented Mar 16, 2017

fabxc commented Mar 16, 2017

juliusv commented Mar 20, 2017

juliusv commented Mar 20, 2017

juliusv commented Mar 20, 2017

brian-brazil commented Mar 20, 2017

juliusv commented Mar 20, 2017

juliusv commented Mar 21, 2017

tomwilkie commented Mar 22, 2017

brian-brazil commented Mar 22, 2017

juliusv commented Mar 22, 2017

brian-brazil commented Mar 22, 2017

juliusv commented Mar 22, 2017

brian-brazil commented Mar 23, 2017

juliusv commented Mar 24, 2017

juliusv commented Mar 26, 2017

fabxc Mar 16, 2017 •

edited

Loading