Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote Read #2499

Merged
merged 8 commits into from
Mar 27, 2017
Merged

Remote Read #2499

merged 8 commits into from
Mar 27, 2017

Conversation

juliusv
Copy link
Member

@juliusv juliusv commented Mar 15, 2017

This is a WIP approach for remote read (part of #10) with no tests yet, but I wanted to share it early in case someone sees something fundamentally wrong with it.

Features that are intentionally not included in this first iteration:

  • service discovery
  • batching multiple label matcher sets into the same read request (although the protobuf interface supports it, it requires some refactoring to make it happen)
  • metadata queries or fetching the last sample of a time series (only regular range/instant queries for now)
  • turning off remote reads for certain queries (especially rules)

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick look...


// TODO: POST is weird for a query, but as per http://stackoverflow.com/questions/978061/http-get-with-request-body
// it's not valid to attach meaning to an HTTP body in a GET request.
// What is the right thing to do here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POST is the done thing.

We'll be adding it to scrapes at some point too due to url length limits when ?name[]= becomes popular

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool.

}
}

// mergeSamples merges two lists of sample pairs and removes duplicate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to give priority to local storage where local storage has data? That'd avoid things like downsampling on the LTS side interfering with rate().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we detect that when both storages could have gaps at arbitrary times? A naive approach where we look what the oldest sample in the locally returned data is (being fine with gaps in that) and discarding remote data younger than that timestamp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use the naieve approach. If you had a single Prometheus reporting to a LTS it's not possible that the LTS has points that the Prometheus doesn't, and that's the intended setup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good point, will do that then. Assuming a single Prometheus writing into the remote storage, we could even think about time-pruning the requests we send to the remote storage based on doing a local query first and seeing what its oldest returned samples are (since there can't be different series for the same matchers at the same time in the remote storage if it all came from the same Prometheus). Not sure I'll do that in this PR though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, uneven purging of local data past the retention cutoff would mean we couldn't just look at the earliest sample returned in whole, but take the retention time into account as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That gets into more advanced stuff :) That only works if you're talking to a true LTS and your vector selector doesn't mention any external labelnames.

(You also don't need to query locally, you just need to know when the local storage was initilized.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm gonna leave that out for later then :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you had a single Prometheus reporting to a LTS it's not possible that the LTS has points that the Prometheus doesn't

Wasn't there an argument in #10 that someone could have wiped the storage of the local Prometheus. In this case data would exist in LTS that is not local.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pilhuhn Here we are talking only about time ranges younger than the oldest sample for a series in Prometheus's local storage. There can basically be no overlapping ranges of times where both local and remote storages have points, but different ones.

@tomwilkie
Copy link
Member

batching multiple label matcher sets into the same read request (although the protobuf interface supports it, it requires some refactoring to make it happen)

This is something we need for cortex as well, to index query performance too...

config/config.go Outdated
RemoteTimeout model.Duration `yaml:"remote_timeout,omitempty"`
BasicAuth *BasicAuth `yaml:"basic_auth,omitempty"`
TLSConfig TLSConfig `yaml:"tls_config,omitempty"`
ProxyURL URL `yaml:"proxy_url,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does YAML deal with embedded structs gracefully? Could we use one here to share this config with RemoteWriteConfig and remove the copy in remote/{read.go, remote.go}?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. Moving this elsewhere is a separate discussion, I personally want it kicked out to common so it can be reused with the blackbox exporter, consul exporter and alertmanager.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's actually a HTTPClientConfig in config already, so we probably shouldn't create yet another similar HTTP client config struct for embedding here. That one supports more HTTP auth methods (bearer token, ...), but lacks the timeout and url.

@brian-brazil Was there a specific reason why we only do basic auth for remote write, or would it be possible to reuse the HTTPClientConfig here and for the write path as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think basic auth was added ad-hoc as Weaveworks needed it. I'm for standardising the http settings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not ad-hoc; some effort went into unifying the http client code (at least, the TLS handling) between the remote write path and the retrieval path: #1957

The HTTPClientConfig struct was extract in #2215, AFAICT to unify the alert manager config and the retrieval config. I think it would make sense to use it for the remote read/write config too - am happy to put together a PR for that @juliusv?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomwilkie Thanks for the pointers and the offer! I think it's simple and relevant enough for me to just include it in this PR though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go for it!

// RemoteReadConfig is the configuration for reading from remote storage.
type RemoteReadConfig struct {
URL *URL `yaml:"url,omitempty"`
RemoteTimeout model.Duration `yaml:"remote_timeout,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to end up as a max timeout I expect.

@fabxc
Copy link
Contributor

fabxc commented Mar 15, 2017

Just noting that I see a lot of types and interfaces being used here that no longer exist in dev-2.0 branch.
Can I trust people owning the remote/ code path to do the migration work for 2.0?

@juliusv
Copy link
Member Author

juliusv commented Mar 15, 2017

@fabxc That would be me. I would tentatively say yes, although I have not taken a look at the dev-2.0 branch yet and thus have no idea how much work it would be. From a cursory glance, does this implementation still seem generally conceptually compatible with dev-2.0 or would it have to change completely?

}
for ; j < len(b); j++ {
result = append(result, b[j])
}
Copy link
Contributor

@fabxc fabxc Mar 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those can just be result = append(result, a[i:]...) and result = append(result, b[j:]...), no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, borrowed from Cortex :)

@fabxc
Copy link
Contributor

fabxc commented Mar 16, 2017

@juliusv a fair bit will change. The new interface is here: https://github.com/prometheus/prometheus/blob/dev-2.0/storage/interface.go

}

func (q querier) LastSampleForLabelMatchers(ctx context.Context, cutoff model.Time, matcherSets ...metric.LabelMatchers) (model.Vector, error) {
// TODO: implement querying last samples from remote storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not needed I think given it's just for federation.
For the new storage I brought up yesterday (prometheus-junkyard/tsdb#9) that it possibly shouldn't be a part of the main storage interface.

@fabxc
Copy link
Contributor

fabxc commented Mar 16, 2017

From a rough look, it should conceptually be the same. Just that we would connect series from local and remote lazily as we scan through the SeriesSet. It wouldn't connect based on fingerprint but require those iterators to returned label sets in order.
It's fine to materialize the remote SeriesSet and order it before merging if the remote storage does not guarantee ordering.

Though xxhash used in the new storage (with collision handling) has way fewer collisions and is faster, hashes are still no safe way to identify equal label sets and we should get rid of that practice in our entire querying layer as well – what's the point of storing collision safe if we cannot query collision safe?

Ordering and comparing sequentially is a straightforward and safe way to do it. Only alternative would be someone implementing a generalized merger using hashing but also handling collisions internally (i.e. still O(n) comparisons but also building n hashes).
Ordered iterators are preferable here as they theoretically have more comparisons, but if both sets roughly have the same series, it's still just O(n) comparisons but no hashes or space allocations for hashmaps.

That drifted a bit off-topic. Just trying to make a point that the changes have worthwhile effects beyond making it fit to a different interface.

@juliusv
Copy link
Member Author

juliusv commented Mar 16, 2017

@fabxc That interface file looks so beautiful. Thanks.

Just wondering now of course how much effort to still put into the 1.x version of remote read. Do you have a hunch about an 2.0 ETA or is it too early to say? Actually I think it's still important to use 1.x for test-driving a lot of the semantics around the remote storage so that we can get them right for 2.0. So I think in 1.x it makes sense to focus on those semantics and config options, but not performance improvements or such.

@fabxc
Copy link
Contributor

fabxc commented Mar 16, 2017

If this PR works and is flagged as experimental anyway, I wouldn't make you stop that from merging. Especially for the reasons you mentioned.

I've benchmarked querying and it all points towards 2.0 being faster there as well. So theoretically everything is set to go. Just that there's a mysterious deadlock in the write path that I just cannot seem to chase down. That's blocking everything right now basically.
There's also an issue where we see query artefacts that PromQL tests don't expose. Haven't gotten around to investigating yet but probably some minor bug.

Any help would be great of course.

If that is solved, we should be ready for an early alpha. But I wouldn't count on the alpha phase to be short. In particular as we want to revisit all the other stuff we want to break when cutting 2.0.

@juliusv
Copy link
Member Author

juliusv commented Mar 20, 2017

Pushed some changes:

  • within a series, use only those remote storage sample that are before the first local sample
  • make both RemoteReadConfig and RemoteWriteConfig use HTTPClientConfig for configuring their HTTP client options
  • move retrieval.NewHTTPClient to httputil.NewClientFromConfig, adjust
  • fix/unify remote storage timeout and context usage

@juliusv
Copy link
Member Author

juliusv commented Mar 20, 2017

Now working on some tests and stuff if people are happy with the overall picture.

@juliusv
Copy link
Member Author

juliusv commented Mar 20, 2017

Ok, added Fanin tests and fixed a bunch of uncovered bugs. This is all working with a local hacked-up Cortex now that supports remote read.

The main remaining issue right now is that rules still go to remote storage as well, but shouldn't. I'm wondering what to do about that. I could instantiate another promql.Engine in main.go that only is based on the local storage and use that for rules, but the current promql.Engine is kinda meant as a semi-singleton, because it contains the global query concurrency limiter and sets global gauge metrics. That could maybe be changed, but I'm wondering about other ways. The only other ways that come to mind are passing something all the way down to the Fanin querier via a context value (more untyped bag of values madness) or modifying all function signatures in the call stack to have another argument to indicate local-only queries (ugh, no?).

I'm currently leaning towards the context value option, but what do people think?

@brian-brazil
Copy link
Contributor

I'm currently leaning towards the context value option, but what do people think?

We'll ultimately want to expose this as a per-query option on the HTTP API, so that's one option.

@juliusv
Copy link
Member Author

juliusv commented Mar 20, 2017

@brian-brazil Ok, I implemented that. It doesn't seem too bad (considering the other options are also not great): 8fda83e

@juliusv juliusv changed the title [WIP] Remote Read Remote Read Mar 21, 2017
@juliusv
Copy link
Member Author

juliusv commented Mar 21, 2017

Removed the [WIP], ready for a final review now.

@tomwilkie
Copy link
Member

@juliusv I've had a look and its a +1 from me. httpClient changes look good! Let me know if you want me to test it out too.

@brian-brazil
Copy link
Contributor

I've worked through the semantics of remote read for the 3 use cases, here's what I've come up with: https://docs.google.com/document/d/188YauRgfF0J4CYMigLsVNN34V_kUwKnApBs2dQMfBbs/edit#

It turns out that the LTS semantics are sufficient to handle the other cases.

@juliusv
Copy link
Member Author

juliusv commented Mar 22, 2017

@brian-brazil Awesome, thanks. Very useful. That all makes sense to me.

How about starting with your 1st and 2nd rule in that doc, but as a followup PR?

@brian-brazil
Copy link
Contributor

That's fine with me, these should go in before the alpha though as LTS is unusable without them.

@juliusv
Copy link
Member Author

juliusv commented Mar 22, 2017

Cool. Since I have a +1 from @tomwilkie here and I maintain remote storage, I'll merge this tomorrow if there are no further objections, and then I'll follow up with the external label handling.

@brian-brazil
Copy link
Contributor

One thing that just came to mind is that we probably want to add a version number as a header or something. That way any breaking changes will be possible to handle on the other end.

@juliusv
Copy link
Member Author

juliusv commented Mar 24, 2017

@brian-brazil I added an X-Prometheus-Remote-Write-Version and X-Prometheus-Remote-Read-Version header, each with version 0.0.1, similar to how we do it for the scrape protocol.

I also added a Content-Type: application/x-protobuf header, as requested in #2522.

@juliusv
Copy link
Member Author

juliusv commented Mar 26, 2017

Merging in 3... 2... 1...

juliusv added a commit to cortexproject/cortex that referenced this pull request Mar 26, 2017
This allows querying Cortex via Prometheus's new generic remote read
protocol.

See prometheus/prometheus#2499

Fixes #226
@juliusv juliusv merged commit b5b0e00 into master Mar 27, 2017
@juliusv juliusv deleted the remote-read branch March 27, 2017 12:43
juliusv added a commit to cortexproject/cortex that referenced this pull request Mar 29, 2017
* Add Prometheus remote read support

This allows querying Cortex via Prometheus's new generic remote read
protocol.

See prometheus/prometheus#2499

Fixes #226

* Move ParseProtoRequest to util package

It is now also used by the MergeQuerier.

* More review feedback + cleanups
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants