Skip to content

ralfspoeth/json

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Java JSON IO Library

This project implements a JSON parser and serializer which operates around immutable data structures for the JSON elements.

Motivation

Having read a number of articles around data oriented programming (cf. Brian Goetz, Data-Oriented Programming, note the section "Example: JSON" in particular) where the JSON format has been of special interest, and being quite dissatisfied with the usage experience of popular JSON libraries like GSON or Jackson the motivation to implement an alternative library was high enough to start the project.

The JSON type hierarchy is very simple and strict enough to apply the algebraic data types introduced through sealed classes and interfaces (union types) and records (product types) efficiently. These ideas struck with me, so I started to look around for a parser which returns an immutable JsonElement from a stream of characters.

I then found JEP 198: Light-Weight JSON API which names immutable data types and a builder-style API as part if its goals. The immutable type hierarchy, the builder-style API plus the implicitly required parser which returns immutable instances of the JSON type hierarchy finally lead to this experiment.

Learning a tiny little bit of Clojure taught me another series of important things, the most striking being Rich Hickey's keynote about The Value of Values at the Jaxconf 2012 in San Francisco. Treating values as immutable things changes the mental model of programming at least if you're coming from the object-oriented world.

Yet reading a potentially large file of JSON text and returning a single immutable instance of some type is an interesting tasks which requires some intermediate mutable objects hopefully hidden beneath the facade of the parser. We finally managed to use mutable builders throughout the parsing phase and to return immutable instances in the end.

Getting Started

Maven Coordinates

Group ID: io.github.ralfspoeth
Artifact ID: json

In your pom.xml add

<dependency>
    <groupId>io.github.ralfspoeth</groupId>
    <artifactId>json</artifactId>
    <version>1.0.8</version>
</dependency>

or, when using Gradle (Groovy)

implementation 'io.github.ralfspoeth:json:1.0.8'

or, with Gradle (Kotlin), put

implementation("io.github.ralfspoeth:json:1.0.8")

in your build file.

If you are using JPMS modules with a module-info.java file, add

module your.module {
    requires io.github.ralfspoeth.json;
    // more
}

The module io.github.ralfspoeth.json exports two packages that you may use in your application:

import io.github.ralfspoeth.json.*;
import io.github.ralfspoeth.json.io.*;

The first package contains the data types (Element and its descendants) and the second contains the JsonReader and JsonWriter classes.

The package io.github.ralfspoeth.json.query is immature and not exported. It is nevertheless available when you don't define a module-info with your application; note that the package may be changed or even deleted.

JSON

RFC 7159 specifies the JSON data interchange format which has become the lingua franca for RESTful webservices. JSON serializes structured data in a human-readable text format. It supports four primitive types (strings, double numbers, booleans and null) and two aggregates types (arrays of primitive or aggregate types and objects which are basically maps of names (strings) and values of primitive or aggregate types).

Example:

[{
    "name": "Gaius",
    "age": 41,
    "pro": false,
    "publications": ["De bello gallico"],
    "
}, {
    "name": "Cicero",
    "senator": true,
    "children": null
}]

This text represents an array of two objects; the outer form reads [a, b] where a and b are the objects. Braces { and } enclose these two objects with name-value-pairs separated by commas, like { nvp1, nvp2, ...}. Each name-value-pair consists of a name of type string and a value of any other data type mentioned above. The name-value-pairs make the properties or attributes of an object. The name property of the first object is associated with the string "Gaius", the pro attribute with the value false. The value of the publications attribute is an array of a single string valued "De bello gallico".

Wikipedia has more on JSON here.

JSON is schema-less, that is, you cannot prescribe the structure of a JSON document using some kind of schema. This sets JSON apart from XML which allows for the specification of document type definitions DTDs or XML schema definitions (XSD). XML, once hyped as the next big thing and with numerous applications still widely in use, has been surpassed by JSON according to Google trends (try Google Trends: JSON vs. XML);img.png

Remarks

The objects do not expose some notation of a type or class. Two objects are considered equal if their attributes are equal. Arrays may contain any combination of instances, including both primitive and structured types as in [null, true, false, 1, {"x":5}, [2, 3, 4]]

Modelling the data in Java

First Attempt

The first attempt can be easily copied from the sources cited above. Let's define a sealed interface

package json;
sealed interface Element permits ...;

and provide implementations very much like

package json;
final class Boolean implements Element{...}
final class Number implements Element{...}
final class Null implements Element{...}
final class String implements Element{...}
final class Array implements Element{...}
final class Object implements Element{...}

The problem is that while possible almost all the names collide with class names in the core package java.lang; once we consider modelling the String class as record with single component of class java.lang.String things start to get clumsy. We therefore decided to prefix the class names with Json or JSON.

While JSON is clearly closer to the JSON specification, it's more difficult to read than Json; since following the spec was not so much a goal as the ease of use we decided to go with Json instead of JSON as the prefix for the concrete types; we left the Element interface unchanged.

At the top of the hierarchy we then had

public sealed interface Element {}

All implementations must be final or non-sealed in order to comply with the contract for sealed interfaces; since we don't design for further inheritance we will implement final classes only.

Modelling Boolean as Enum

The two only instance of type Boolean are true and false in JSON notation; we model them as an enum because it is implicitly final and the behaviour of its equals and hashCode methods comes without any surprises.

public enum JsonBoolean implements Element {
    TRUE, FALSE
}

Modelling null as Singleton

As with booleans we decided to implement the null as a singleton class. The singleton pattern goes like

final class Singleton {
    static final Singleton INSTANCE = new Singleton();
    private Singleton(){}
}

and translates into

public final class JsonNull implements Element {
    private JsonNull() {} // prevent instantiation
    public static final JsonNull INSTANCE = new JsonNull(); 
}

Modelling String as Record of String

There is strictly speaking no need to wrap JSON strings into records with a single component of type string. But in order to make JSON strings part of the sealed hierarchy we have to do so:

public record JsonString(String value) implements Element {
}

This comes in handy once we deal with aggregate types like arrays of Element rather than arrays of Element UNION String which we cannot express in Java.

Modelling Number as Record of double

With the same reasoning we model numbers like this:

public record JsonNumber(double value) implements Element {
}

Note that JavaScript doesn't cater for differences between numerical data types -- which is enormously limiting, and that we use the primitive Java type because null values or not acceptable either way.

Modelling Array as Record of an Immutable List

As with strings we need to wrap the array in some container - a final class or a record - plus we want to make sure the contents is immutable:

public record JsonArray(List<Element> elements) implements Element {
    public JsonArray {
        elements = List.copyOf(elements); // defensive copy
    }
}

The canonical constructor is overridden such that it uses a copy of the list provided; that method is clever enough NOT to copy the list parameter if it can be sure that that parameter is already an immutable instance -- most notably if it has been instantiated using List.of(...). This method also makes sure no actual null instance is passed in within the list of elements. (JsonNulls are acceptable of course.)

Modelling Object as Record of an Immutable Map

The same is true for JsonObjects. We model the properties or attributes or members as a map of Strings (not JsonStrings since this wouldn't add any value and is much easier to use by clients) to Elements:

public record JsonObject(Map<String, Element> members) implements Element {
    public JsonObject {
        members = Map.copyOf(members); // defensive copy
    }
}

Map.copyOf provides a copy but returns the original map when that is already immutable, especially when instantiated using Map.of(...).

Since both aggregate types JsonObject and JsonArray are shallowly immutable (or unmodifiable) and all basic types
are immutable, the aggregate types are effectively immutable as well. This makes instance of the entire hierarchy immutable.

Differentiating between Aggregate and Basic Types

In lieu with the JSON specification which differentiates between primitive and structured types, we differentiate between basic and aggregate types like so:

public sealed interface Element permits Basic, Aggregate {...}
public sealed interface Basic extends Element permits
    JsonBoolean, JsonNull, JsonNumber, JsonString {}
public sealed interace Aggregate extends Element permits
    JsonArray, JsonObject {...}

Naming primitive types basic and structured types aggregates has been a deliberate decision since the term primitive collides with the notion of primitive types in the Java language.

Aggregates are Functions

Both aggregate types serve as functions: JsonObjects are functions of Strings and JsonArrays are functions of an int index:

Map<String, Element> members; // given
var obj = new JsonObject(members);
Function<String, Element> fun = obj; // legal

List<Element> lst; // given
var arr = new JsonArray(lst);
IntFunction<Element> ifun = arr; // legal

Builders

The Builder pattern allows for a piecemeal construction of immutable data and works like this:

var immutable = new Builder(...).add(...).add(...).build();

It does not make much sense to provide builders for the basic data types; yet very much so for the aggregate types. This is another reason why we introduced the distinction between the two.

The Builder interface has been implemented as an inner interface class of the Aggregate interface with two implementations:

public sealed interface Aggregate permits JsonArray, JsonObject {
    sealed interface Builder<T extends Aggregate> {
        T build();
        // ...
    }
    final class ArrayBuilder implements Builder<JsonArray>{...}
    final class ObjectBuilder implements Builder<JsonObject>{...}
    // ...
}

Since the implementing classes reside within the same compilation unit as the Builder there is no need for the permits clause.

ArrayBuilder

The array builder simply provides a method that adds an Element:

final class ArrayBuilder implements Builder<JsonArray> {
    item(Element e) {
        // add to mutable list
    }
    JsonArray build() {
        return new JsonArray(List.of(mutableList));
    }
}

ObjectBuilder

The object builder is not so different:

final class ObjectBuilder implements Builder<JsonObject> {
    named(String name, Element e) {
        // put into mutable map
    }
    JsonObject build() {
        return new JsonObject(Map.of(mutableMap));
    }
}

Both builders are instantiable through static methods in the Element interface only:

JsonObjectBuilder objectBuilder();
JsonArrayBuilder arrayBuilder();

The implementing classes both need to be public because they provide different methods for adding intermediate data; JsonArray provides an item(Element) method and JsonObject a named(String, Element) method in order to add data their internal structures.

IO: Reading and Writing JSON Data

JsonReader

The parser implementation named JsonReader in package io.github.ralfspoeth.json.io implements the AutoCloseable interface and is meant to be used in try-with-resources statements like so:

Reader src = ...
try(var rdr = new JsonReader(src)) {
    return rdr.readElement();
}

It uses a Lexer internally which tokenizes a character stream into tokens like braces, brackets, comma, colon, number literals, string literals, and null, true, and false. The parser uses a stack of nodes wich encapsulate builders, special tokens, or an element. It utilizes an inner sealed interface to cater for this limited set of stack elements.

JsonWriter

The JsonWriter class is instantiated with its default behaviour of indenting the members of JSON objects by 4 characters and putting each member in a separate line. Arrays are printed interspersed by commas and a white space but in a single line.

The usage is similar to that of the JsonReader with the exception that it uses a single factory method currently but not constructor:

Element object = ... 
Writer w = ... 
try(var wrt = JsonWriter.createDefaultWriter(w)){
    wrt.write(object);
}

The JsonWriter provides the static method minimize which removes whitespace safely from a given input stream.

Querying (Experimental)

The package query provides simple utilities for querying data based on some root element.

The Path Utility

The Path class is inspired by the XPath specification yet lacks almost all of its features; it's currently just a toy.

Basic Usage

A Path instance is instantiated using the factory method Path::of like so:

var path = Path.of("a/b/c");

The path expression is split using the / character. Given the statement above, we obtain the equivalent of

var path = Path.of("c", Path.of("b", Path.of("a")));

where the second parameter is the parent path. We then use Path::evaluate which returns a stream of Elements. Consider this root object root

{
    "a": {
        "b": {
            "c": true
        }
    }
}

then

assert JsonBoolean.TRUE==path.evaluate(root).findFirst().get();

will not throw an AssertionError.

Syntax

The syntax for the patterns is

  • a..b where a and b are integers; a range pattern applicable to arrays;
  • #regex where regex is a regular expression filtering attributes of objects;
  • name where name is just the member name of the root object.

Examples

Given [2, 3, 5, 7, 11] then Path.of("0..2") yields the stream of the first two array elements 2 and 3.

Given {"a0":true,"a1":false} then Path.of("#a.") yields the stream of true and false.

Given {"a":{"b":5}} then Path.of("a/b") yields the stream of 5d.

Use in Clojure

Clojure uses maps to aggregate data and prefers keywords as keys in these maps. Here is a link to a video from Rich Hickey: Just use maps

In order to use this Java library, include this in your deps.edn file:

{:deps {
    io.github.ralfspoeth/json {:mvn/version "1.0.8"}
    }}

Import the Element and IO classes into your namespace like this

(ns your.name.space
    (:import (io.github.ralfspoeth.json Element Basic JsonNull JsonArray JsonObject)
    (java.io Reader)
    (io.github.ralfspoeth.json.io JsonReader))
    (:require [clojure.java.io :as io]))

Use this function in order to read JSON data from some java.io.Reader

(defn read-elem [^Reader rdr]
    (with-open [jsrd (JsonReader. rdr)]
    (.readElement jsrd)))

and then, in order to turn the resulting Element into a clojure map

(defn map-json ([^Element elem]
    (cond
      (instance? JsonNull elem) nil,
      (instance? Basic elem) (.value elem)
      (instance? JsonArray elem) (mapv map-json (.elements elem))
      (instance? JsonObject elem) (zipmap
                                    (map keyword (->> elem (.members) (.keySet))),
                                    (map map-json (->> elem (.members) (.values)))))))

Standard Conversions

Package conv contains the utility class StandardConversions which converts Elements into primitive types int, long, double or boolean and to String or a given Enum type. All conversion methods take any Element type as an argument and may throw IllegalArgumentException for the sake of simplicity. The conversion methods in the StandardConversions class respects that many JSON authors put all values into double-quotes, even null, true, and false as well as numbers. These values are parsed into JsonString instance; their contains is converted into numbers, boolean values and null if possible as well.

Numerical Conversions

The methods intValue, longValue and doubleValue utilize the parse<Type> methods of the respective Integer, Long and Double classes for JsonStrings, and standard conversion from double to int and long for JsonNumbers. JsonBoolean are converted to 1 and 0 for TRUE and FALSE, respectively.

String Conversion

The stringValue conversion uses natural conversions for all Basic types, and the toString methods applied on the contained lists and maps of the Aggregate types.

Boolean Conversion

The booleanValue conversion does the obvious conversions for JsonBoolean and JsonString.

Enum Conversion

The enumValue... methods takes two arguments: a class declared with the enum keyword, and the Element which must be of type JsonString. While enumValue uses the Enum::valueOf method, the enumValueIgnoreCase converts the value and all the constants' names defined in the enum class to uppercase strings before selecting the enum constant.