Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow mechanism for propagating internal errors from the storage #2571

Closed
tomwilkie opened this Issue Apr 4, 2017 · 9 comments

Comments

Projects
None yet
3 participants
@tomwilkie
Copy link
Member

tomwilkie commented Apr 4, 2017

Currently, any error from storage is returned to the user as a 422 (see #1213 for the discussion). The argument for this seems predicated on the idea that "the storage interface doesn't report errors", but thats not the case any more.

I propose we allow the storage layer to propagate errors by exposing the apiErr type, and adding a case to https://github.com/prometheus/prometheus/blob/master/web/api/v1/api.go#L192 that passes apiErrs straight through. That way, the storage can throw an apiErr for internal error.

May need to more apiErr to a new package, so prevent circular imports.

WDYT?

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Apr 4, 2017

I think the storage layer should have no notion of any API using it (and thus of the API's error types) and should instead define its own error types that can be interpreted by users of the storage layer. These errors should probably live in a relatively non-implementation-specific package (maybe the existing github.com/prometheus/prometheus/storage - or a new one) and should be kept general enough to work for different storage implementations, so that whoever interprets these errors doesn't have to know what exact storage implementation they are dealing with.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Apr 4, 2017

Do we currently need to know anything more than "the storage reported an error"?

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Apr 4, 2017

@brian-brazil Yes, some errors are the user's fault, at least upon ingestion:

  • "sample timestamp out of order"
  • "sample with repeated timestamp but different value"

Cortex has a few more of those that are related to rate-limting/throttling. Currently they are all on the ingestion side, but there will also be query-related ones.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Apr 4, 2017

(I guess the rate-limiting / throttling could be wrapped around everything though)

@tomwilkie

This comment has been minimized.

Copy link
Member Author

tomwilkie commented Apr 4, 2017

I'm not particularly worries about the ingestion side, as we implement it end-to-end in Cortex. Its really only important for the query side, where the promql library sits in the middle. I'm happy to make a "InternalStorageError" type in the storage package that gets interpreted as a 500 by the query layer if you like?

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Apr 4, 2017

Hm yeah, if we're happy with just treating any error from the storage as an internal error for now, then this can indeed all be done in the API handling code.

@tomwilkie

This comment has been minimized.

Copy link
Member Author

tomwilkie commented Apr 4, 2017

AFAICT the query.Exec method can still return non-storage related errors (hence the original error handling), so a special error type is still needed to differentiate storage error (which should be 500s) from non-storage errors (which are currently 422s).

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Apr 5, 2017

Ah yes, query.Exec would need to be changed to let callers know what's an internal vs. user error. The error types for that should be defined in the promql package I think.

@juliusv juliusv closed this in #2572 Apr 6, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.