Skip to content

Commit

Permalink
add basc adr related to error handling
Browse files Browse the repository at this point in the history
Signed-off-by: R.I.Pienaar <rip@devco.net>
  • Loading branch information
ripienaar committed May 12, 2021
1 parent e8ea7b5 commit fd2181d
Show file tree
Hide file tree
Showing 3 changed files with 160 additions and 35 deletions.
2 changes: 1 addition & 1 deletion doc/README.md
Expand Up @@ -18,4 +18,4 @@ good starting point. We do not have a fixed format at present.

## ADR Statuses

Each ADR has a status, let's try to use `Proposed`, `Approved` and `Rejected`.
Each ADR has a status, let's try to use `Proposed`, `Approved` `Partially Implemented`, `Implemented` and `Rejected`.
51 changes: 17 additions & 34 deletions doc/adr/0001-jetstream-json-api-design.md
Expand Up @@ -7,25 +7,10 @@ Author: @ripienaar

Partially Implemented

The JSON API has been updated, however the urlencoded replies have not been adopted.

## Context

At present, the API encoding consists of mixed text and JSON, we should improve consistency and error handling.

## Decision

We anticipate many types of consumer for JetStream including very small devices via MQTT and those written in
languages without easy to use JSON support like C.

We will make it easy for light-weight clients by using `+OK` and `-ERR` style prefixes for client interactions,
but the server management API will be JSON all the time. The rationale being that such small devices will most
likely have their Consumers and Streams provisioned as part of the platform and often might not care for the
additional context.

In client responses we will use URL Encoding to provide additional context so multiple fields worth of
context can be provided. URL Encoding has [broad language support](https://www.rosettacode.org/wiki/URL_decoding).

### Admin APIs

#### Requests
Expand Down Expand Up @@ -72,14 +57,15 @@ Successful Stream Info:
}
```

Stream Info Error:
Consumer Info Error:

```json
{
"type": "io.nats.jetstream.api.v1.stream_info",
"type": "io.nats.jetstream.api.v1.consumer_info",
"error": {
"description": "unknown stream bob",
"code": 400
"description": "consumer not found",
"code": 404,
"error_code": 10059
}
}
```
Expand All @@ -88,7 +74,12 @@ Here we have a minimally correct response with the additional error object.

In the `error` struct we have `description` as a short human friendly explanation that should include enough context to
identify what Stream or Consumer acted on and whatever else we feel will help the user while not sharing privileged account
information. `code` will be HTTP like, 400 human error, 500 server error etc.
information. These strings are not part of the API promises, we can update and re-word or translate them at any time. Programmatic
error handling should look at the `code` which will be HTTP like, 4xx human error, 5xx server error etc. Finally, the `error_code`
indicates the specific reason for the 404 - here `10059` means the stream did not exist, helping developers identify the
real underlying cause.

More information about the `error_code` system can be found in [ADR-7](0007-error-codes.md).

Ideally the error response includes a minimally valid body of what was requested but this can be very hard to implement correctly.

Expand All @@ -106,17 +97,6 @@ Today the list API's just return `["ORDERS"]`, these will become:

With the same `error` treatment when some error happens.

### Client Interactions

To keep clients on small devices viable we will standardise our responses as in the examples below.

* `+OK` everything is fine no additional context
* `+OK "stream=ORDERS&sequence=10"` everything is fine, we have additional data to pass using standard URL encoding
* `-ERR` something failed, we have no reason
* `-ERR "reason=reason%20for%20failure&stream=STREAM"` something failed, we have additional context to provide using standard URL encoding

Additional information will be needed for example when there are overlapping Streams and one of them fail.

## Implementation

While implementing this in JetStream the following pattern emerged:
Expand All @@ -128,8 +108,11 @@ type JSApiResponse struct {
}

type ApiError struct {
Code int `json:"code"`
Description string `json:"description,omitempty"`
Code int `json:"code"`
ErrCode int `json:"err_code,omitempty"`
Description string `json:"description,omitempty"`
URL string `json:"-"`
Help string `json:"-"`
}

type JSApiConsumerCreateResponse struct {
Expand Down Expand Up @@ -169,4 +152,4 @@ Validating this in JSON Schema draft 7 is a bit awkward, not impossible and spec

## Consequences

URL Encoding does not carry data types and the response fields will need documenting.
URL Encoding does not carry data types, and the response fields will need documenting.
142 changes: 142 additions & 0 deletions doc/adr/0007-error-codes.md
@@ -0,0 +1,142 @@
# 7. NATS Server Error Codes

Date: 2021-05-12
Author: @ripienaar

## Status

Partially Implemented in [#1811](https://github.com/nats-io/nats-server/issues/1811)

The current focus is JetStream APIs, we will as a followup do a refactor and generalization and move onto other
areas of the server.

## Context

When a developer performs a Consumer Info API request she might get a 404 error, there is no way to know if this is
a 404 due to the Stream not existing or the Consumer not existing. The only way is to parse the returned error description
like `consumer not found`. Further several other error situations might arise which due to our code design would be surfaced
as a 404 when in fact it's more like a 5xx error - I/O errors and such.

If users are parsing our error strings it means our error text form part of the Public API, we can never improve errors,
fix spelling errors or translate errors into other languages.

This ADR describes an additional `error_code` that provides deeper context into the underlying cause of the 404.

## Design

We will adopt a numbering system for our errors where every error has a unique number within a range that indicates the
subsystem it belongs to.

|Range|Description|
|-----|-----------|
|1xxxx|JetStream related errors|
|2xxxx|MQTT related errors|

The JetStream API error will be adjusted like this initially with later work turning this into a more generic error
usable in other parts of the NATS Server code base.

```go
// ApiError is included in all responses if there was an error.
type ApiError struct {
Code int `json:"code"`
ErrCode int `json:"err_code,omitempty"`
Description string `json:"description,omitempty"`
URL string `json:"-"`
Help string `json:"-"`
}
```

Here the `code` and `error_code` is what we'll consider part of the Public API with `description` being specifically
out of scope for SemVer protection and changes to these will not be considered a breaking change.

The `ApiError` type will implement `error` and whenever it will be logged will append the code to the log line, for example:

```
stream not found (10059)
```

The `nats` CLI will have a lookup system like `nats error 10059` that will show details of this error including help,
urls and such. It will also assist in listing and searching errors. The same could be surfaced later in documentation
and other areas.

## Using in code

### Raising an error

Here we raise a `stream not found` error without providing any additional context to the user, the constant is
`JSStreamNotFoundErr` from which you can guess it takes no printf style interpolations vs one that does which would
end in `...ErrF`:

The go doc for this constant would also include the content of the error to assist via intellisense in your IDE.

```go
err = doThing()
if err != nil {
return ApiErrors[JSStreamNotFoundErr]
}
```

If we have to do `printf` style interpolation of the error body, here the `JSStreamRestoreErrF` has the body
`"restore failed: %s"`, the `NewF()` will simply use `Sprintf()` to create a new `ApiError` with the full string:

```go
err = doRestore()
if err != nil {
return ApiErrors[JSStreamRestoreErrF].NewF(err)
}
```

If we had to handle an error that may be an `ApiError` or a traditional go error we can use the `ErrOrNew` function,
this will look at the result from `lookupConsumer()`, if it's an `ApiError` that error will be set else a `JSConsumerNotFoundErr`
will be created. Essentially the `lookupConsumer()` would return a `JSStreamNotFoundErr` if the stream does not exist else
a `JSConsumerNotFoundErr` or go error on I/O failure for example.

```go
var resp = JSApiConsumerCreateResponse{ApiResponse: ApiResponse{Type: JSApiStreamCreateResponseType}}

_, err = lookupConsumer(stream, consumer)
if err != nil {
resp.Error = ApiErrors[JSConsumerNotFoundErr].ErrOrNew(err)
}
```

### Testing Errors

Should you need to determine if a error is of a specific kind (error code) this can be done using the `IsNatsErr()` function:

```go
err = doThing()
if IsNatsErr(err, JSStreamNotFoundErr, JSConsumerNotFoundErr) {
// create the stream and consumer
} else if err !=nil{
// other critical failure
}
```

## Maintaining the errors

The file `errors.json` holds the data used to generate the error constants, lists etc

```json
[
{
"constant": "JSClusterPeerNotMemberErr",
"code": 400,
"error_code": 10040,
"description": "peer not a member"
},
{
"constant": "JSNotEnabledErr",
"code": 503,
"error_code": 10039,
"description": "JetStream not enabled for account",
"help": "This error indicates that JetStream is not enabled either at a global level or at global and account level",
"url": "https://docs.nats.io/jetstream"
}
]
```

After editing this file run `go generate` in the top of the `nats-server` repo, and it will update the needed files. Check
in the result.

When run this will verify that the `error_code` and `constant` is unique in each error

0 comments on commit fd2181d

Please sign in to comment.