Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Select() for both sample+metadata queries is unsuitable for remote storage implementations #4057

Open
juliusv opened this issue Apr 6, 2018 · 8 comments

Comments

@juliusv
Copy link
Member

juliusv commented Apr 6, 2018

The storage.Querier interface has a Select() method which can be used to retrieve either only metadata or bulk sample data of series. In the local storage case, this works nicely, as sample data is loaded lazily when accessed, and thus metadata-only queries never over-fetch sample data. This breaks down for remote storage implementations of the same interface, where we cannot afford to lazily fetch sample data only as it's accessed (that would require more remote round trips). We need some way of knowing beforehand (when we call Select()) whether a query is metadata-only or whether it actually needs sample data. Either a separate method or query parameter would work.

Context: Cortex reuses Prometheus's web API and PromQL packages to offer comparable functionality based on a different storage engine. In the past, we were able to solve this issue because the Prometheus web API packages allowed injecting two different queriers: one that would be used as part of the PromQL engine (the full sample querier) and one that would be used for metadata queries. This got broken in 7ccd4b3#diff-d81f5cda89ea7b129ba708b586c2bc83L132, due to a PromQL engine restructuring (the querier is not part of the engine anymore, it gets passed in for every query, and the web API only knows about one querier now, because we cannot encapsulate the other one in the engine anymore).

The issue about this on the Cortex side is cortexproject/cortex#787

@juliusv
Copy link
Member Author

juliusv commented Apr 6, 2018

I already mentioned the option of adding a separate querier method or Select() parameter.

The other option would be to modify the web API package to allow explicitly passing in two different Queryables, one that's used only for metadata queries and another that's used for full (PromQL and remote read) queries.

@jacksontj
Copy link
Contributor

jacksontj commented Jun 6, 2018

I have this same issue for promxy. I could also live with a flag in the SelectParams but that also seems a bit hacky. To me it seems like having a second interface that you could optionally implement (something like MetadataQuerier) which would be called for just metadata queries would be clean.

jacksontj added a commit to jacksontj/promxy that referenced this issue Jul 11, 2018
jacksontj added a commit to jacksontj/promxy that referenced this issue Aug 15, 2018
jacksontj added a commit to jacksontj/promxy that referenced this issue Aug 15, 2018
jacksontj added a commit to jacksontj/promxy that referenced this issue Aug 15, 2018
jacksontj added a commit to jacksontj/prometheus that referenced this issue Aug 27, 2018
When prom2 came out the storage querier interface consolidated to a
single Select() method. While doing this it makes it impossible as the
implementer of the querier to know if you are being called for metadata
or actual data. The workaround has been to check if the SelectParams are
nil, which the federation call is always nil. This has 2 negative
consequences (1) remote implementations interpret this as a metadata
call, which makes the federation endpoint return nothing. (2) this means
that the storage implementations don't get the same information passed
down to them as far as SelectParams goes.

This diff simply adds SelectParams to the Select() call in the
federation handler

Mitigation for prometheus#4057

Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>
jacksontj added a commit to jacksontj/prometheus that referenced this issue Aug 27, 2018
When prom2 came out the storage querier interface consolidated to a
single Select() method. While doing this it makes it impossible as the
implementer of the querier to know if you are being called for metadata
or actual data. The workaround has been to check if the SelectParams are
nil, which the federation call is always nil. This has 2 negative
consequences (1) remote implementations interpret this as a metadata
call, which makes the federation endpoint return nothing. (2) this means
that the storage implementations don't get the same information passed
down to them as far as SelectParams goes.

This diff simply adds SelectParams to the Select() call in the
federation handler

Mitigation for prometheus#4057

Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>
brian-brazil pushed a commit that referenced this issue Aug 28, 2018
When prom2 came out the storage querier interface consolidated to a
single Select() method. While doing this it makes it impossible as the
implementer of the querier to know if you are being called for metadata
or actual data. The workaround has been to check if the SelectParams are
nil, which the federation call is always nil. This has 2 negative
consequences (1) remote implementations interpret this as a metadata
call, which makes the federation endpoint return nothing. (2) this means
that the storage implementations don't get the same information passed
down to them as far as SelectParams goes.

This diff simply adds SelectParams to the Select() call in the
federation handler

Mitigation for #4057

Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>
@krasi-georgiev
Copy link
Contributor

krasi-georgiev commented Nov 9, 2018

@juliusv does anything need to be done on the tsdb side?
If not would you mind removing the local storage label so that it doesn't appear in my queue 😜

@juliusv
Copy link
Member Author

juliusv commented Nov 9, 2018

@krasi-georgiev I changed the labels from local to remote storage.

@krasi-georgiev
Copy link
Contributor

thanks

@beorn7
Copy link
Member

beorn7 commented Feb 27, 2024

Hello from the bug scrub.

Is this still relevant? We assume not and will close this. Please re-open if we are mistaken.

@beorn7 beorn7 closed this as completed Feb 27, 2024
@jacksontj
Copy link
Contributor

This is still relevant and still not resolved. There are some workarounds in place (heuristics based around hints being passed around) but AFAIK no specific API contract for this.

Separate from this specific issue; but as a point of feedback for interfacing with community issues; when an issue hasn't been commented on in ~6 years I think its fair to give some time (literally any time) for people to respond before closing out the issue.

@beorn7
Copy link
Member

beorn7 commented Mar 1, 2024

As said, closing an issue is a reversible action, and anyone can still comment on it. Your concern would be valid if we locked those issues, but we don't. We have almost 800 open issues and a severe lack of qualified people to address them, so we have to apply some scrutiny which bugs to keep open to at least address the most relevant ones.

Re-opening this and PRs welcome to address the issue described above.

@beorn7 beorn7 reopened this Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants