Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose full API error messages to caller #2137

Closed
lrettig opened this issue Sep 10, 2020 · 7 comments
Closed

Expose full API error messages to caller #2137

lrettig opened this issue Sep 10, 2020 · 7 comments

Comments

@lrettig
Copy link
Member

lrettig commented Sep 10, 2020

See my thread with @antonlerner at #2071 (review):

I feel that we should expose only a short, nonspecific error message via the API, and print the full error to the logs (this is how the API is currently implemented). Anton feels that we should expose the full error to the caller, especially for now since we're the only customers of the API.

Should we change this?

@avive
Copy link
Contributor

avive commented Sep 10, 2020

It depends on which api service. Some api services main clients are wallets (is that what you mean by 'we' above?). The error from api services for wallets should be descriptive enough to inform users about what went wrong and the severity. I assume we want to report only critical errors via the api that requires the user to take action. For example, if there's a panic in a node's module and the node requires a restart then we want to display a modal with a short text message that doesn't have to be very detailed and give the user an option to restart the node or quit the app, but the full error technical details will be in the node's logs and will also be reported to us via telemetry for opted-in non-managed and for managed nodes. Also, the design for telemetry also accounts for managed nodes error reporting. See the telemetry smip. So, with this current design we don't rely on the api for full technical error reporting from nodes and the main goal is to inform user that there's a critical error that requires them to take action which is - restart the node or quit the app (when the wallet is configured to use that local managed node).

@avive
Copy link
Contributor

avive commented Sep 10, 2020

So to clarify more, I believe that the Node Service should only report errors to the API which are critical in the sense that the node stopped working properly. e.g. a panic in the p2p module so wallet users can choose to restart (or go check if they have a new node with bug fixes they need to install by updating the app).

2nd client are backup agents and explore / dash backends - they also need to know if the managed node stopped working properly and needs restart.

For both type of clients, client can't do anything to resolve an error error, so it is useless to show them anything that does not require them to decide if to restart, quit or update their node software.

For all other cases such as errors that don't cause modules to stop and the node from functioning properly, there's telemetry and node logs. I can't think of an API client that will be interested in these. The telemetry service can be thought of a kind of an api client to a managed centralized error collecting store API...

@lrettig
Copy link
Member Author

lrettig commented Sep 10, 2020

Sorry for not being clearer in my description. I think we're talking about two different things here. The ErrorStream endpoint in the NodeService streams all errors and panics as they occur (implementation here). I think this is what you're talking about.

What Anton and I were discussing is what happens when an error occurs inside the API itself, such as a failure finding some requested data. Here is one such example:

root, err := s.Mesh.GetLayerStateRoot(layer.Layer.Index())
if err != nil {
log.Error("error retrieving layer data: %s", err)
return status.Errorf(codes.Internal, "error retrieving layer data")
}

@avive
Copy link
Contributor

avive commented Sep 25, 2020

Sorry for not being clearer in my description. I think we're talking about two different things here. The ErrorStream endpoint in the NodeService streams all errors and panics as they occur (implementation here). I think this is what you're talking about.

What Anton and I were discussing is what happens when an error occurs inside the API itself, such as a failure finding some requested data. Here is one such example:

root, err := s.Mesh.GetLayerStateRoot(layer.Layer.Index())
if err != nil {
log.Error("error retrieving layer data: %s", err)
return status.Errorf(codes.Internal, "error retrieving layer data")
}

i guess we can have short error messages and error code that client will display and a field w more tech details which is what goes to the logs so integration tests can output more detailed errors. The expanded tech details are not interesting for non technical users who use apps with data from the api.

@lrettig
Copy link
Member Author

lrettig commented Sep 25, 2020

we can have short error messages and error code that client will display

agree, with one caveat: right now we're using this list of standard codes so there's a limit to how specific we can be with them. If necessary, we can fork, wrap, or otherwise extend this list.

@moshababo
Copy link
Contributor

@lrettig still relevant?

@lrettig
Copy link
Member Author

lrettig commented Jun 27, 2022

No, we don't want to change this behavior

@lrettig lrettig closed this as completed Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants