Expose full API error messages to caller #2137

lrettig · 2020-09-10T09:56:08Z

See my thread with @antonlerner at #2071 (review):

I feel that we should expose only a short, nonspecific error message via the API, and print the full error to the logs (this is how the API is currently implemented). Anton feels that we should expose the full error to the caller, especially for now since we're the only customers of the API.

Should we change this?

avive · 2020-09-10T10:12:45Z

It depends on which api service. Some api services main clients are wallets (is that what you mean by 'we' above?). The error from api services for wallets should be descriptive enough to inform users about what went wrong and the severity. I assume we want to report only critical errors via the api that requires the user to take action. For example, if there's a panic in a node's module and the node requires a restart then we want to display a modal with a short text message that doesn't have to be very detailed and give the user an option to restart the node or quit the app, but the full error technical details will be in the node's logs and will also be reported to us via telemetry for opted-in non-managed and for managed nodes. Also, the design for telemetry also accounts for managed nodes error reporting. See the telemetry smip. So, with this current design we don't rely on the api for full technical error reporting from nodes and the main goal is to inform user that there's a critical error that requires them to take action which is - restart the node or quit the app (when the wallet is configured to use that local managed node).

avive · 2020-09-10T10:53:14Z

So to clarify more, I believe that the Node Service should only report errors to the API which are critical in the sense that the node stopped working properly. e.g. a panic in the p2p module so wallet users can choose to restart (or go check if they have a new node with bug fixes they need to install by updating the app).

2nd client are backup agents and explore / dash backends - they also need to know if the managed node stopped working properly and needs restart.

For both type of clients, client can't do anything to resolve an error error, so it is useless to show them anything that does not require them to decide if to restart, quit or update their node software.

For all other cases such as errors that don't cause modules to stop and the node from functioning properly, there's telemetry and node logs. I can't think of an API client that will be interested in these. The telemetry service can be thought of a kind of an api client to a managed centralized error collecting store API...

lrettig · 2020-09-10T14:15:37Z

Sorry for not being clearer in my description. I think we're talking about two different things here. The ErrorStream endpoint in the NodeService streams all errors and panics as they occur (implementation here). I think this is what you're talking about.

What Anton and I were discussing is what happens when an error occurs inside the API itself, such as a failure finding some requested data. Here is one such example:

go-spacemesh/api/grpcserver/globalstate_service.go

Lines 436 to 440 in ce9d5da

    
           root, err := s.Mesh.GetLayerStateRoot(layer.Layer.Index()) 
        
           if err != nil { 
        
           	log.Error("error retrieving layer data: %s", err) 
        
           	return status.Errorf(codes.Internal, "error retrieving layer data") 
        
           }

avive · 2020-09-25T08:36:19Z

Sorry for not being clearer in my description. I think we're talking about two different things here. The ErrorStream endpoint in the NodeService streams all errors and panics as they occur (implementation here). I think this is what you're talking about.

What Anton and I were discussing is what happens when an error occurs inside the API itself, such as a failure finding some requested data. Here is one such example:

go-spacemesh/api/grpcserver/globalstate_service.go

Lines 436 to 440 in ce9d5da

root, err := s.Mesh.GetLayerStateRoot(layer.Layer.Index())

if err != nil {

log.Error("error retrieving layer data: %s", err)

return status.Errorf(codes.Internal, "error retrieving layer data")

}

i guess we can have short error messages and error code that client will display and a field w more tech details which is what goes to the logs so integration tests can output more detailed errors. The expanded tech details are not interesting for non technical users who use apps with data from the api.

lrettig · 2020-09-25T21:34:22Z

we can have short error messages and error code that client will display

agree, with one caveat: right now we're using this list of standard codes so there's a limit to how specific we can be with them. If necessary, we can fork, wrap, or otherwise extend this list.

moshababo · 2022-06-26T12:29:36Z

@lrettig still relevant?

lrettig · 2022-06-27T21:20:06Z

No, we don't want to change this behavior

lrettig added the API label Sep 10, 2020

lrettig mentioned this issue Sep 10, 2020

[Merged by Bors] - API phase 9: finish implementing grpc services #2071

Closed

lrettig closed this as completed Jun 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose full API error messages to caller #2137

Expose full API error messages to caller #2137

lrettig commented Sep 10, 2020

avive commented Sep 10, 2020 •

edited

avive commented Sep 10, 2020

lrettig commented Sep 10, 2020

avive commented Sep 25, 2020

lrettig commented Sep 25, 2020

moshababo commented Jun 26, 2022

lrettig commented Jun 27, 2022

Expose full API error messages to caller #2137

Expose full API error messages to caller #2137

Comments

lrettig commented Sep 10, 2020

avive commented Sep 10, 2020 • edited

avive commented Sep 10, 2020

lrettig commented Sep 10, 2020

avive commented Sep 25, 2020

lrettig commented Sep 25, 2020

moshababo commented Jun 26, 2022

lrettig commented Jun 27, 2022

avive commented Sep 10, 2020 •

edited