Skip to content

Commit

Permalink
Initial EEP68 - JSON
Browse files Browse the repository at this point in the history
  • Loading branch information
michalmuskala committed Feb 12, 2024
1 parent d1b6618 commit 11358df
Showing 1 changed file with 369 additions and 0 deletions.
369 changes: 369 additions & 0 deletions eeps/eep-0068.md
@@ -0,0 +1,369 @@
Author: Michał Muskała <micmus(at)whatsapp(dot)com>
Status: Draft
Type: Standards Track
Created: 12-02-2024
Erlang-Version:
Post-History:
****
# EEP 68: JSON library

Check failure on line 8 in eeps/eep-0068.md

View workflow job for this annotation

GitHub Actions / markdownlint

Heading style

eeps/eep-0068.md:8 MD003/heading-style Heading style [Expected: setext; Actual: atx] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md003.md

Check failure on line 8 in eeps/eep-0068.md

View workflow job for this annotation

GitHub Actions / markdownlint

Headings should be surrounded by blank lines

eeps/eep-0068.md:8 MD022/blanks-around-headings Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] [Context: "# EEP 68: JSON library"] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md022.md
----

Check failure on line 9 in eeps/eep-0068.md

View workflow job for this annotation

GitHub Actions / markdownlint

Horizontal rule style

eeps/eep-0068.md:9 MD035/hr-style Horizontal rule style [Expected: ****; Actual: ----] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md035.md

## Abstract

Check failure on line 11 in eeps/eep-0068.md

View workflow job for this annotation

GitHub Actions / markdownlint

Heading style

eeps/eep-0068.md:11 MD003/heading-style Heading style [Expected: setext; Actual: atx] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md003.md

This EEP proposes introducing a module `json` to the Erlang standard
library with support for encoding and decoding [JSON][1] documents
from and to Erlang data structures. The main reason is to cover
a gap in the Erlang standard library with regards to such a vastly
popular and widespread data format.

## Rationale

Check failure on line 19 in eeps/eep-0068.md

View workflow job for this annotation

GitHub Actions / markdownlint

Heading style

eeps/eep-0068.md:19 MD003/heading-style Heading style [Expected: setext; Actual: atx] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md003.md

JSON is commonly in many different use-cases:
* by web services as a lightweight and human-readable data interchange format;

Check failure on line 22 in eeps/eep-0068.md

View workflow job for this annotation

GitHub Actions / markdownlint

Lists should be surrounded by blank lines

eeps/eep-0068.md:22 MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "* by web services as a lightwe..."] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md032.md
* as a configuration language in static files;
* as data interchange format by developer tooling;
* and more.

There are many existing JSON libraries for Erlang and other BEAM languages,
however adding such a support to standard library would offer unique benefits.
Most notably being able to use it in situations where leveraging third-party
libraries is complex or cumbersome -- such as stand-alone escripts or
fundamental tooling like a build system, or inside OTP itself.

There have been previous attempts to bring JSON support into OTP, most notably
[EEP 18][EEP], which ultimately weren't adopted previously for various reasons.
However, I believe the time is right to revisit this subject with a fresh
take on an interface such support could take.

JSON is a well defined format specified in parallel in [RFC 8259][RFC] and
[ECMA 404][ECMA], however how this representation should be translated
into Erlang is not fully clear since the data structures don't present
a direct, 1:1 mapping. To help with this, this EEP proposes an interface
that presents both a convenient and "cannonical" simple API, as well
as an extensible and highly-customisable API with common underlying
implementation.

This EEP proposes a JSON library which:
* should be easy to adopt in large codebases using one of the popular,

Check failure on line 47 in eeps/eep-0068.md

View workflow job for this annotation

GitHub Actions / markdownlint

Lists should be surrounded by blank lines

eeps/eep-0068.md:47 MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "* should be easy to adopt in l..."] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md032.md
existing, open-source JSON libraries;
* will allow the existing open-source libraries with custom features
(like support for Elixir protocols) to become thin wrappers around
this library;
* will improve, or at least not regress, performance compared to
leading open-source JSON libraries.

The proposed JSON library will provide:
* JSON encoding, allowing for single-pass encoding of custom data types –-

Check failure on line 56 in eeps/eep-0068.md

View workflow job for this annotation

GitHub Actions / markdownlint

Lists should be surrounded by blank lines

eeps/eep-0068.md:56 MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "* JSON encoding, allowing for ..."] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md032.md
in particular, for Elixir, integrating with a protocol through a thin layer
(implemented outside of OTP);
* JSON decoding with some streaming support allowing to decode messages that
don't fully fit into memory;
* JSON decoding with support for decoding values split across separate
messages without fully concatenating them upfront;
* focus on high-performance encoding and decoding;
* full conformance to [RFC 8259][RFC] and [ECMA 404][ECMA] standards,
the decoder should pass the entire [JSONTestSuite][JSONTestSuite];
* simple API for common use-cases with canonical data type mapping.

## Design choices

Check failure on line 68 in eeps/eep-0068.md

View workflow job for this annotation

GitHub Actions / markdownlint

Heading style

eeps/eep-0068.md:68 MD003/heading-style Heading style [Expected: setext; Actual: atx] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md003.md

### Data mapping

We propose, in the "cannonical" API to map JSON data structues to
Erlang and back in the following way:

| **Decoding from JSON** | **Erlang** | **Encoding into JSON** |
|------------------------|----------------------|------------------------|
| Number | integer() \| float() | Number |
| Boolean | true \| false | Boolean |
| Null | null | Null |
| String | binary() | String |
| | atom() | String |
| Array | list() | Array |
| Object | #{binary() => _} | Object |
| | #{atom() => _} | Object |
| | #{integer() => _} | Object |

Erlang has generally a richer value system than JSON, therefore
there's generally more types that can be encoded into JSON,
even if they can never be produced directly by the decoder.

However, with the flexible API, as demonstrated below, the user will
be able to customize the decoding & encoding routines to produce and
consume any Erlang term as necessary in the particular application.

### Streaming vs value-based parser

When it comes to data-structure parsers it's common to encounter two
types: ones that given the data produce a complete parsed value,
and others the same data produce a stream of events that can later
be processed to extract values.

The first kind, which we'll call here value-based, is generally simpler,
usually more efficient, and more convient to use. The second one offers
unique advantages in specific use-cases: for example, where data
can't fully fit into memory.

For the proposed `json` library this EEP suggests a hybrid approach.

First, a simple, value-based API:

```erlang

Check failure on line 111 in eeps/eep-0068.md

View workflow job for this annotation

GitHub Actions / markdownlint

Code block style

eeps/eep-0068.md:111 MD046/code-block-style Code block style [Expected: indented; Actual: fenced] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md046.md
-type value() ::
integer() |
float() |
boolean() |
null |
binary() |
list(value()) |
#{binary() => value()}.

-spec decode(binary()) -> value().
```

Error handling is achieved through exceptions. The following errors
are possible:
```erlang
-type error() ::
unexpected_end |
{unexpected_sequence, binary()} |
{invalid_byte, byte()}
```

The exceptions might be enhanced through the [Error Info][ERRINFO] mechanism
with additional meta-data like byte offset where the error occured.

For the advanced and customizable API, this EEP proposes a callback-based
API that the decoder will use to produce values from the data it parses.

```erlang
-type from_binary_fun() :: fun((binary()) -> dynamic()).
-type array_start_fun() :: fun((Acc :: dynamic()) -> ArrayAcc :: dynamic()).
-type array_push_fun() :: fun((Value :: dynamic(), Acc :: dynamic()) -> NewAcc :: dynamic()).
-type array_finish_fun() :: fun((ArrayAcc :: dynamic()) -> dynamic()).
-type object_start_fun() :: fun((Acc :: dynamic()) -> ObjectAcc :: dynamic()).
-type object_push_fun() :: fun((Key :: dynamic(), Value :: dynamic(), Acc :: dynamic()) -> NewAcc :: dynamic()).
-type object_finish_fun() :: fun((ObjectAcc :: dynamic()) -> dynamic()).

-type decoders() :: #{
empty_array => term(),
array_start => array_start_fun(),
array_push => array_push_fun(),
array_finish => array_finish_fun(),
empty_object => term(),
object_start => object_start_fun(),
object_push => object_push_fun(),
object_finish => object_finish_fun(),
float => from_binary_fun(),
integer => from_binary_fun(),
string => from_binary_fun(),
null => term()
}.

-spec decode(binary(), Acc :: dynamic(), decoders()) ->
{Value :: dynamic(), FinalAcc :: dynamic(), Rest :: binary()}.
```

This allows the user to fully customize the decoded format, including
features seen in open-source JSON libraries:
* decoding string keys as atoms;
* decoding objects as lists of pairs;
* decoding floats as custom structures with decimal precision;
* decoding `null` as another atom, in particular `undefined` or `nil`;
* using `binary:copy/1` on strings that will be retained in memory;
* decoding multiple JSON messages from a single binary blob;
* and more.

Furthermore, this allows the user to only retain parts of the data structure
to achieve results similar to using a streaming SAX-like parser for data
that does't fully fit into memory.
All the callbacks are optional and have a default value correspnding to the
"simple" API behaviour and using lists as accumulators.
### Incomplete data parsing
We propose a future enhancement to the full `decode/3` API, where
it can return an `{incomplete, continuation()}` value that can be used to
decode values split across multiple binary blobs (for example as received
from a TCP socket).
```erlang
-spec decode_continue(binary(), continuation()) ->
{Value :: dynamic(), FinalAcc :: dynamic(), Rest :: binary()} |
{incomplete, continuation()}.
```
### Encoding API
For encoding this EEP again proposes two separate sets of APIs.
A simple API using "cannonical" data types:
```erlang
-type encode_value() ::
integer() |
float() |
boolean() |
null |
binary() |
atom() |
list(encode_value()) |
#{binary() | atom() | integer() => encode_value()}.
-spec encode(encode_value()) -> iodata().
```
And an advanced, callback-based API allowing for single-pass encoding
of custom data structures. This API is acompanied by a set of functions
facilitating the implementation of custom encoding callbacks.
```erlang
-type encoder() :: fun((dynamic(), encoder()) -> iodata()).
-spec encode(dynamic(), encoder()) -> iodata().
-spec encode_value(dynamic(), encoder()) -> iodata().
-spec encode_atom(atom(), encoder()) -> iodata().
-spec encode_integer(integer()) -> iodata().
-spec encode_float(float()) -> iodata().
-spec encode_list(list(), encoder()) -> iodata().
-spec encode_map(map(), encoder()) -> iodata().
-spec encode_map_checked(map(), encoder()) -> iodata().
-spec encode_key_value_list([{dynamic(), dynamic()}], encoder()) -> iodata().
-spec encode_key_value_list_checked([{dynamic(), dynamic()}], encoder()) -> iodata().
-spec encode_binary(binary()) -> iodata().
-spec encode_binary_escape_all(binary()) -> iodata().
```
The `encoder()` callback is invoked on every value during traversal.
The simple API specified above is equivalent to using the
`fun json:encode_value/2` function as the encoder.
The `*_checked/2` variants of functions offer verifying the encoder
doesn't produce repeated keys.
The default `encode_binary/1` function will emit unescaped unicode values
as allowed by the specifications; however for compatibility reasons
we provide the optional `encode_binary_escape_all/1` function
that will always produce purely ASCII messages encoding all higher
unicode values with the `\u` escape sequences.


### Formatting and pretty-printing

This EEP further proposes an additional API for formatting (and pretty-printing)
JSON messages. This API consists of transforming a textual JSON message into
a formatted JSON message.
This is the most flexible solution that orthogonally supports
formatting results of custom encoding functions like described above,
without adding the burden of complex formatting options in the middle of the
encoders.
Formatting isn't usually done in critical hot-paths of high-performance
services, thgerefore the overhead of a two-pass formatting is deemed acceptable.
```erlang
-type format_option() :: #{
indent => iodata(),
line_separator => iodata(),
after_colon => iodata()
}.
-spec format(iodata()) -> iodata().
-spec format(iodata(), format_option()) -> iodata().
```
## Reference Implementation
[PR-8111][PR] Implements the `encode/1`, `encode/2`, `decode/1`, and `decode/3`
functions as proposed in this EEP.
The formatting API and the support for incomplete message decoding is left
as a follow-up taskk.
## Appendix
### Example of a decoding trace
Given the following data:
```json
{"a": [[], {}, true, false, null, {"foo": "baz"}], "b": [1, 2.0, "three"]}
```
the decoding APIs will be called with following arguments:
```erlang
object_start(Acc0) => Acc1
string(<<"a">>) => Str1
array_start(Acc1) => Acc2
empty_array() => Arr1
array_push(Acc2, Arr1) => Acc3
empty_object() => Obj1
array_push(Obj1, Acc3) => Acc4
array_push(true, Acc4) => Acc5
array_push(false, Acc5) => Acc6
null() => Null
array_push(Null, Acc6) => Acc7
object_start(Acc7) => Acc8
string(<<"foo">>) => Str2
string(<<"baz">>) => Str3
object_push(Str2, Str3, Acc8) => Acc9
object_finish(Acc9) => Obj2
array_push(Obj2, Acc7) => Acc10
array_finish(Acc10) => Arr1
object_push(Arr1, Acc1) => Acc11
string(<<"b">>) => Str4
array_start(Acc11) => Acc12
integer(<<"1">>) => Int1
array_push(Int1, Acc12) => Acc13
float(<<"2.0">>) => Float1
array_push(Float1, Acc13) => Acc14
string(<<"three">>) => Str5
array_push(Str5, Acc14) => Acc15
array_finish(Acc15) => Arr2
object_push(Str4, Arr2, Acc11) => Acc16
object_finish(Acc16) => Obj3
% final decode/3 return
{Obj3, Acc16, <<"">>}
```
### Example of a custom encoder
An example of a custom encoder that would support using a heuristic
to differentiate pais of object-like key-value lists from plain
lists of values could look as follows:
```erlang
custom_encode(Value) -> json:encode(Value, fun encoder/2).
encoder(null, _Encode) -> <<"\"null\"">>;
encoder(nil, _Encode) -> <<"null">>;
encoder([{_, _} | _] = Value, Encode) -> json:encode_key_value_list(Value, Encode);
encoder(Other, Encode) -> json:encode_value(Other, Encode).
```
Another encoder that supports using Elixir `nil` as Null and protocols for
further customisation could look as follows:
```erlang
encoder(nil, _Encode) -> <<"null">>;
encoder(null, _Encode) -> <<"\"null\"">>;
encoder(#{__struct__ => _} = Struct, Encode) -> 'Elixir.JSONProtocol':encode(Struct, Encode);
encoder(Other, Encode) -> json:encode_value(Other, Encode).
```
[1]: https://www.json.org/json-en.html
"Introducing JSON"
[RFC]: https://datatracker.ietf.org/doc/html/rfc8259
"The JavaScript Object Notation (JSON) Data Interchange Format"
[ECMA]: https://ecma-international.org/publications-and-standards/standards/ecma-404/
"The JSON data interchange syntax"
[EEP]: https://github.com/erlang/eep/blob/master/eeps/eep-0018.md
"EEP 18: JSON bifs"
[ERRINFO]: https://github.com/erlang/eep/blob/master/eeps/eep-0054.md
"EEP 54: Provide more information about errors"
[JSONTestSuite]: https://github.com/nst/JSONTestSuite
[PR]: https://github.com/erlang/otp/pull/8111
## Copyright
This document is placed in the public domain or under the CC0-1.0-Universal
license, whichever is more permissive.

0 comments on commit 11358df

Please sign in to comment.