This project implements a JSON parser and serializer which operates around immutable data structures for the JSON elements.
Having read a number of articles around data oriented programming (cf. Brian Goetz, Data-Oriented Programming, note the section "Example: JSON" in particular) where the JSON format has been of special interest, and being quite dissatisfied with the usage experience of popular JSON libraries like GSON or Jackson the motivation to implement an alternative library was high enough to start the project.
The JSON type hierarchy is very simple and strict
enough to apply the algebraic data types introduced
through sealed
classes and interfaces (union types)
and record
s (product types)
efficiently. These ideas struck with me, so I
started to look around for a parser which
returns an immutable JsonElement
from a stream
of characters.
I then found JEP 198: Light-Weight JSON API which names immutable data types and a builder-style API as part if its goals. The immutable type hierarchy, the builder-style API plus the implicitly required parser which returns immutable instances of the JSON type hierarchy finally lead to this experiment.
Learning a tiny little bit of Clojure taught me another series of important things, the most striking being Rich Hickey's keynote about The Value of Values at the Jaxconf 2012 in San Francisco. Treating values as immutable things changes the mental model of programming at least if you're coming from the object-oriented world.
Yet reading a potentially large file of JSON text and returning a single immutable instance of some type is an interesting tasks which requires some intermediate mutable objects hopefully hidden beneath the facade of the parser. We finally managed to use mutable builders throughout the parsing phase and to return immutable instances in the end.
Maven Coordinates
Group ID: io.github.ralfspoeth
Artifact ID: json
In your pom.xml
add
<dependency>
<groupId>io.github.ralfspoeth</groupId>
<artifactId>json</artifactId>
<version>1.0.8</version>
</dependency>
or, when using Gradle (Groovy)
implementation 'io.github.ralfspoeth:json:1.0.8'
or, with Gradle (Kotlin), put
implementation("io.github.ralfspoeth:json:1.0.8")
in your build file.
If you are using JPMS modules with a module-info.java
file, add
module your.module {
requires io.github.ralfspoeth.json;
// more
}
The module io.github.ralfspoeth.json
exports two packages that you
may use in your application:
import io.github.ralfspoeth.json.*;
import io.github.ralfspoeth.json.io.*;
The first package contains the data types (Element
and its descendants)
and the second contains the JsonReader
and JsonWriter
classes.
The package io.github.ralfspoeth.json.query
is immature and not exported.
It is nevertheless available when you don't define a module-info
with your
application; note that the package may be changed or even deleted.
RFC 7159
specifies the JSON data interchange format which
has become the lingua franca for RESTful webservices.
JSON serializes structured data in a human-readable
text format. It supports four primitive types
(strings, double
numbers, booleans and null
) and
two aggregates types (arrays of primitive or aggregate types
and objects which are basically
maps of names (strings) and values of primitive or aggregate types).
[{
"name": "Gaius",
"age": 41,
"pro": false,
"publications": ["De bello gallico"],
"
}, {
"name": "Cicero",
"senator": true,
"children": null
}]
This text represents an array of two objects; the
outer form reads [a, b]
where a
and b
are the
objects.
Braces {
and }
enclose these two objects
with name-value-pairs separated by commas, like
{ nvp1, nvp2, ...}
.
Each name-value-pair consists of a name of type
string and a value of any other data type mentioned
above. The name-value-pairs make the properties or
attributes of an object.
The name
property of the first object is associated
with the string "Gaius"
, the pro
attribute
with the value false
. The value of the publications
attribute is an array of a single string valued
"De bello gallico"
.
Wikipedia has more on JSON here.
JSON is schema-less, that is, you cannot prescribe the structure of a JSON document using some kind of schema. This sets JSON apart from XML which allows for the specification of document type definitions DTDs or XML schema definitions (XSD). XML, once hyped as the next big thing and with numerous applications still widely in use, has been surpassed by JSON according to Google trends (try Google Trends: JSON vs. XML);
The objects do not expose some notation of a type
or class. Two objects are considered equal if their
attributes are equal. Arrays may contain any combination
of instances, including both primitive and structured
types as in [null, true, false, 1, {"x":5}, [2, 3, 4]]
The first attempt can be easily copied from the sources cited above. Let's define a sealed interface
package json;
sealed interface Element permits ...;
and provide implementations very much like
package json;
final class Boolean implements Element{...}
final class Number implements Element{...}
final class Null implements Element{...}
final class String implements Element{...}
final class Array implements Element{...}
final class Object implements Element{...}
The problem is that while possible almost
all the names collide with class names in the
core package java.lang
; once we consider modelling
the String
class as record
with single
component of class java.lang.String
things
start to get clumsy. We therefore decided to
prefix the class names with Json
or JSON
.
While JSON
is clearly closer to the JSON specification,
it's more difficult to read than Json
; since
following the spec was not so much a goal as
the ease of use we decided to go with Json
instead
of JSON
as the prefix for the concrete types; we left
the Element
interface unchanged.
At the top of the hierarchy we then had
public sealed interface Element {}
All implementations must be final
or non-sealed
in order to comply with the contract for sealed
interfaces; since we don't design for further
inheritance we will implement final
classes only.
The two only instance of type Boolean
are true
and
false
in JSON notation; we model them as an enum
because it is implicitly final and the behaviour
of its equals
and hashCode
methods comes without
any surprises.
public enum JsonBoolean implements Element {
TRUE, FALSE
}
As with booleans we decided to implement the null
as a singleton class.
The singleton pattern goes like
final class Singleton {
static final Singleton INSTANCE = new Singleton();
private Singleton(){}
}
and translates into
public final class JsonNull implements Element {
private JsonNull() {} // prevent instantiation
public static final JsonNull INSTANCE = new JsonNull();
}
There is strictly speaking no need to wrap JSON strings into records with a single component of type string. But in order to make JSON strings part of the sealed hierarchy we have to do so:
public record JsonString(String value) implements Element {
}
This comes in handy once we deal with aggregate
types like arrays of Element
rather than
arrays of Element
UNION String
which we
cannot express in Java.
With the same reasoning we model numbers like this:
public record JsonNumber(double value) implements Element {
}
Note that JavaScript doesn't cater for differences
between numerical data types -- which is enormously
limiting, and that we use the primitive Java type
because null
values or not acceptable either way.
As with strings we need to wrap the array in some container - a final class or a record - plus we want to make sure the contents is immutable:
public record JsonArray(List<Element> elements) implements Element {
public JsonArray {
elements = List.copyOf(elements); // defensive copy
}
}
The canonical constructor is overridden such that
it uses a copy of the list provided;
that method is clever enough NOT to copy the list
parameter if it can be sure that that parameter
is already an immutable instance -- most notably if
it has been instantiated using List.of(...)
.
This method also makes sure no actual null
instance
is passed in within the list of elements.
(JsonNull
s are acceptable of course.)
The same is true for JsonObject
s. We model the properties
or attributes or members as a map of String
s (not JsonString
s since
this wouldn't add any value and is much easier to use by clients)
to Element
s:
public record JsonObject(Map<String, Element> members) implements Element {
public JsonObject {
members = Map.copyOf(members); // defensive copy
}
}
Map.copyOf
provides a copy but returns the original map
when that is already immutable, especially when instantiated using
Map.of(...)
.
Since both aggregate types JsonObject
and JsonArray
are
shallowly immutable (or unmodifiable) and all basic types
are immutable, the aggregate types are effectively immutable as well.
This makes instance of the entire hierarchy immutable.
In lieu with the JSON specification which differentiates between primitive and structured types, we differentiate between basic and aggregate types like so:
public sealed interface Element permits Basic, Aggregate {...}
public sealed interface Basic extends Element permits
JsonBoolean, JsonNull, JsonNumber, JsonString {}
public sealed interace Aggregate extends Element permits
JsonArray, JsonObject {...}
Naming primitive types basic and structured types aggregates has been a deliberate decision since the term primitive collides with the notion of primitive types in the Java language.
Both aggregate types serve as functions: JsonObject
s are
functions of String
s and JsonArray
s are functions of
an int
index:
Map<String, Element> members; // given
var obj = new JsonObject(members);
Function<String, Element> fun = obj; // legal
List<Element> lst; // given
var arr = new JsonArray(lst);
IntFunction<Element> ifun = arr; // legal
The Builder pattern allows for a piecemeal construction of immutable data and works like this:
var immutable = new Builder(...).add(...).add(...).build();
It does not make much sense to provide builders for the basic data types; yet very much so for the aggregate types. This is another reason why we introduced the distinction between the two.
The Builder
interface has been implemented as an inner interface
class of the Aggregate
interface with two implementations:
public sealed interface Aggregate permits JsonArray, JsonObject {
sealed interface Builder<T extends Aggregate> {
T build();
// ...
}
final class ArrayBuilder implements Builder<JsonArray>{...}
final class ObjectBuilder implements Builder<JsonObject>{...}
// ...
}
Since the implementing classes reside within the same compilation unit
as the Builder
there is no need for the permits
clause.
The array builder simply provides a method that adds an Element
:
final class ArrayBuilder implements Builder<JsonArray> {
item(Element e) {
// add to mutable list
}
JsonArray build() {
return new JsonArray(List.of(mutableList));
}
}
The object builder is not so different:
final class ObjectBuilder implements Builder<JsonObject> {
named(String name, Element e) {
// put into mutable map
}
JsonObject build() {
return new JsonObject(Map.of(mutableMap));
}
}
Both builders are instantiable through static methods in the
Element
interface only:
JsonObjectBuilder objectBuilder();
JsonArrayBuilder arrayBuilder();
The implementing classes both need to be public because they provide
different methods for adding intermediate data;
JsonArray
provides an item(Element)
method and
JsonObject
a named(String, Element)
method in order to
add data their internal structures.
The parser implementation named JsonReader
in package
io.github.ralfspoeth.json.io
implements the AutoCloseable
interface and is
meant to be used in try-with-resources statements like so:
Reader src = ...
try(var rdr = new JsonReader(src)) {
return rdr.readElement();
}
It uses a Lexer
internally which tokenizes a character stream
into tokens like braces, brackets, comma, colon, number literals,
string literals, and null
, true
, and false
.
The parser uses a stack of nodes wich encapsulate builders, special tokens, or an element.
It utilizes an inner sealed interface to cater for this limited set of stack elements.
The JsonWriter
class is instantiated with its
default behaviour of indenting the members of JSON
objects by 4 characters and putting each member in
a separate line. Arrays are printed interspersed
by commas and a white space but in a single line.
The usage is similar to that of the JsonReader
with
the exception that it uses a single factory method currently
but not constructor:
Element object = ...
Writer w = ...
try(var wrt = JsonWriter.createDefaultWriter(w)){
wrt.write(object);
}
The JsonWriter
provides the static method
minimize
which removes whitespace safely from
a given input stream.
The package query
provides simple utilities
for querying data based on some root element.
The Path
class is inspired by the XPath
specification yet lacks almost all of its features;
it's currently just a toy.
A Path
instance is instantiated using the factory
method Path::of
like so:
var path = Path.of("a/b/c");
The path expression is split using the /
character.
Given the statement above, we obtain the equivalent of
var path = Path.of("c", Path.of("b", Path.of("a")));
where the second parameter is the parent path.
We then use Path::evaluate
which returns a stream
of Element
s. Consider this root object root
{
"a": {
"b": {
"c": true
}
}
}
then
assert JsonBoolean.TRUE==path.evaluate(root).findFirst().get();
will not throw an AssertionError
.
The syntax for the patterns is
a..b
wherea
andb
are integers; a range pattern applicable to arrays;#regex
whereregex
is a regular expression filtering attributes of objects;name
wherename
is just the member name of the root object.
Given [2, 3, 5, 7, 11]
then Path.of("0..2")
yields
the stream of the first two array elements 2
and 3
.
Given {"a0":true,"a1":false}
then Path.of("#a.")
yields the stream of true
and false
.
Given {"a":{"b":5}}
then Path.of("a/b")
yields
the stream of 5d
.
Clojure uses maps to aggregate data and prefers keywords as keys in these maps. Here is a link to a video from Rich Hickey: Just use maps
In order to use this Java library, include this in your deps.edn
file:
{:deps {
io.github.ralfspoeth/json {:mvn/version "1.0.8"}
}}
Import the Element
and IO classes into your namespace like this
(ns your.name.space
(:import (io.github.ralfspoeth.json Element Basic JsonNull JsonArray JsonObject)
(java.io Reader)
(io.github.ralfspoeth.json.io JsonReader))
(:require [clojure.java.io :as io]))
Use this function in order to read JSON data from some
java.io.Reader
(defn read-elem [^Reader rdr]
(with-open [jsrd (JsonReader. rdr)]
(.readElement jsrd)))
and then, in order to turn the resulting Element
into
a clojure map
(defn map-json ([^Element elem]
(cond
(instance? JsonNull elem) nil,
(instance? Basic elem) (.value elem)
(instance? JsonArray elem) (mapv map-json (.elements elem))
(instance? JsonObject elem) (zipmap
(map keyword (->> elem (.members) (.keySet))),
(map map-json (->> elem (.members) (.values)))))))
Package conv
contains the utility class StandardConversions
which converts Element
s into primitive types int
, long
, double
or
boolean
and to String
or a given Enum
type.
All conversion methods take any Element
type as an argument and may
throw IllegalArgumentException
for the sake of simplicity.
The conversion methods in the StandardConversions
class respects
that many JSON authors put all values into double-quotes, even null
, true
, and false
as well as numbers. These values are parsed into JsonString
instance;
their contains is converted into numbers, boolean values and null
if possible
as well.
The methods intValue
, longValue
and doubleValue
utilize the
parse<Type>
methods of the respective Integer
, Long
and Double
classes for JsonString
s, and standard conversion from double
to int
and long
for JsonNumber
s. JsonBoolean
are converted to 1 and 0 for TRUE
and FALSE
,
respectively.
The stringValue
conversion uses natural conversions for all
Basic
types, and the toString
methods applied on the contained
list
s and map
s of the Aggregate
types.
The booleanValue
conversion does the obvious conversions for JsonBoolean
and JsonString
.
The enumValue...
methods takes two arguments: a class declared with the enum
keyword, and the Element
which must be of type JsonString
.
While enumValue
uses the Enum::valueOf
method, the enumValueIgnoreCase
converts the value and all the constants' names defined in the enum class
to uppercase strings before selecting the enum constant.