Skip to content

justjake/ktree

Repository files navigation

ktree

Kotlin Multiplatorm implementation of Tree Notation.

Tree Notation specifies a very simple language for indented parsing text into a tree data structure. Here's an example of a tree written in Tree Notation:

package ktree

author
 name Jake
 email jake@example.com

dependencies
 multiplatform >=2
  resolved https://example.com/multiplatform
  checksum abcdef1234

With the appropriate notation parser settings, this text parses into the following tree structure:

├─ 0. [package, ktree]
├─ 1. []
├─ 2. [author]
│  ├─ 0. [name, Jake]
│  └─ 1. [email, jake@example.com]
├─ 3. []
└─ 4. [dependencies]
   └─ 0. [multiplatform, >=2]
      ├─ 0. [resolved, https://example.com/multiplatform]
      └─ 1. [checksum, abcdef1234]
As JSON
{
  "children": [
    {
      "cells": [
        "package",
        "ktree"
      ]
    },
    {
    },
    {
      "cells": [
        "author"
      ],
      "children": [
        {
          "cells": [
            "name",
            "Jake"
          ]
        },
        {
          "cells": [
            "email",
            "jake@example.com"
          ]
        }
      ]
    },
    {
    },
    {
      "cells": [
        "dependencies"
      ],
      "children": [
        {
          "cells": [
            "multiplatform",
            ">=2"
          ],
          "children": [
            {
              "cells": [
                "resolved",
                "https://example.com/multiplatform"
              ]
            },
            {
              "cells": [
                "checksum",
                "abcdef1234"
              ]
            }
          ]
        }
      ]
    }
  ]
}

Project Status

Experimental. All features probably contain significant bugs and are under heavy development.

Outstanding TODOs before releasing version 0.0.1:

This project intends to target all of Kotlin's supported platforms eventually. For now, multiplatform kotlin projects are alpha status, so don't expect too much.

Platform feature support:

  • All platforms: "Pure" features like parsing, tree manipulation, and printing
  • JVM, NodeJS: filesystem access

Tree Notation

Official spec.

A Tree Notation variant is defined by three symbols:

data class TreeNotation(
    /**
     * nodeBreakSymbol delimits nodes.
     */
    val nodeBreakSymbol: String = "\n",

    /**
     * wordBreakSymbol delimits cells in the node string.
     */
    val wordBreakSymbol: String = "\t",

    /**
     * edgeSymbol is used to indicate the parent/child relationship between nodes.
     * Nodes prefixed with the same number of edge symbols are siblings.
     * A node prefixed with more edge symbols than the previous node is the previous
     * node's child.
     * If null, parse as Grid Notation, with no parent-child concepts.
     */
    val edgeSymbol: String? = "\t",
)

Nodes

Each line (by default) in the file is a node in the tree. The text in a node is split into cells. The children of a node are the lines underneath a node that are indented by one edgeSymbol ( default: one tab).

The example below contains a parent node author with two child nodes:

author
 name Jake
 email jake@example.com

Tree notation is canonically serialized to JSON as objects with cells: String[] and children: Node[] properties, which are omitted when empty. This serialization should make the result of parsing clear:

{
  "cells": [
    "author"
  ],
  "children": [
    {
      "cells": [
        "name",
        "Jake"
      ]
    },
    {
      "cells": [
        "email",
        "jake@example.com"
      ]
    }
  ]
}

Indentation

The official Tree Notation spec is ambiguous about how to handle over-indented child nodes. ktree allows configuring this behavior.

Strict Indentation

import tl.jake.ktree.TreeNotation

TreeNotation() // default
TreeNotation(overIndentBehavior = TreeNotation.OverIndentBehavior.Strict)

In strict mode (the default), we only parse one additional indentation level for a node. The remaining edge symbols are treated as part of the first cell, or as cell seperators. This has the consiquence that similarly-over-indented nodes that visually appear to be siblings will be parsed as parent-child instead.

parent
   over-indented child 1
   over-indented child 2
       over-indented child 3

parses to

└─ 0. [parent]
   └─ 0. [, , over-indented, child, 1]
      └─ 0. [, over-indented, child, 2]
         └─ 0. [, , , , over-indented, child, 3]
As JSON
{
  "children": [
    {
      "cells": [
        "parent"
      ],
      "children": [
        {
          "cells": [
            "",
            "",
            "over-indented",
            "child",
            "1"
          ],
          "children": [
            {
              "cells": [
                "",
                "over-indented",
                "child",
                "2"
              ],
              "children": [
                {
                  "cells": [
                    "",
                    "",
                    "",
                    "",
                    "over-indented",
                    "child",
                    "3"
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Equally indented children are siblings

import tl.jake.ktree.TreeNotation

TreeNotation(
    overIndentBehavior = TreeNotation.OverIndentBehavior.EquallyIndentedChildrenAreSiblings
)

With this setting, all children with the same indentation level are treated as siblings. The extra indents are parsed as part of the node's text, as either wordBreakSymbols, or as text of the first word.

parent
   over-indented child 1
   over-indented child 2
       over-indented child 3

parses to

└─ 0. [parent]
   ├─ 0. [, , over-indented, child, 1]
   └─ 1. [, , over-indented, child, 2]
      └─ 0. [, , , , , over-indented, child, 3]
As JSON
{
  "children": [
    {
      "cells": [
        "parent"
      ],
      "children": [
        {
          "cells": [
            "",
            "",
            "over-indented",
            "child",
            "1"
          ]
        },
        {
          "cells": [
            "",
            "",
            "over-indented",
            "child",
            "2"
          ],
          "children": [
            {
              "cells": [
                "",
                "",
                "",
                "",
                "",
                "over-indented",
                "child",
                "3"
              ]
            }
          ]
        }
      ]
    }
  ]
}

Serialization Language (arbito)

ktree provides kotlinx.serialization encoders and decoders that allow you to convert Kotlin data structures to and from a Tree Notation representation. This representation is named "arbito", a shortening of "arbolito", which means "little tree" in Spanish.

We can use the following Kotlin classes to decode the first example in a type-safe manner:

import kotlinx.serialization.SerialName
import kotlinx.serialization.Serializable
import tl.jake.ktree.serialization.Inline

@Serializable
data class PackageSpec(
    @SerialName("package") val name: String,
    val author: Author,
    val dependencies: Map<String, Dependency>,
)

@Serializable
data class Author(val name: String, val email: String)
@Serializable
data class Dependency(
    @Inline val constraint: String,
    val resolved: String? = null,
    val checksum: String? = null,
)

To decode Tree Notation as a PackageSpec instance, call decodeFromTree:

import tl.jake.ktree.TreeNotation
import tl.jake.ktree.serialization.decodeFromTree

val tree = TreeNotation.Spaces.parse(example)
val pkg = decodeFromTree<PackageSpec>(tree)
println(pkg.prettyToString())
PackageSpec(
    name=ktree,
    author=Author(
        name=Jake,
        email=jake@example.com
    ),
    dependencies={
        multiplatform=Dependency(
            constraint=>=2,
            resolved=https://example.com/multiplatform,
            checksum=abcdef1234
        )
    }
)

Specification by example

Classes values encoded as a node, and its fields are encoded as child nodes inside the node. By default, each field is encoded as a child node with the first cell as the field name, and the remaining cells and descendants as the field value.

  • If a field is annotated @Inline, it may be encoded as a cell in the value's root node. Inline cells are decoded from the root node in order of declaration.

  • If a field is annotated @Anonymous, it may be encoded as a child node without the field name. Anonymous fields are decoded in order of declaration.

Here's some examples of encoding the following class:

class Team(val lead: Person)
class Person(
    @Inline val name: String,
    val age: Int,
    @Anonymous val occupation: String
)

Compact encoding

lead Jake
 age 29
 Software engineer

Unambiguous encoding

lead
 name Jake
 age 29
 occupation Software engineer

Primitives

Primitives like strings and numbers are typically coded as single cells within a node. There are no type sigils to differentiate between the different numeric and string types, so the serialization is not entirely self-describing.

Numeric primitives are coded as their Kotlin .toString() and String.toType() mirror methods.

Null is encoded as a cell containing the string null. In cases where this is ambiguous, for example a field with type String?, a string "null" is encoded as \null.

Strings are coded as JSON strings without the opening or closing quotes. Control characters like newlines and tabs are escaped as \n and \t in output.

  • Strings that start with a \ character are encoded with an extra \ prepended. For example, to encode the string "\ is the worst character" would encode to cell \\\ is the worst character.

  • TODO If a string cell ends with a \ character, the string continues into the next cell. The \ is removed when decoding and replaced with the cellBreakSymbol.

  • TODO If a string cells contains annotated @Multiline, it may be encoded as a cell containing | and then indented as child nodes. The child nodes will be joined together with newlines and all of their cells joined together with tabs.

    A string containing just a | that is not a multiline string is encoded as a cell containing \|

    Here's an example of encoding a Team if all string fields were @Multiline.

    lead
     name |
      Jake
     age 29
     occupation |
      Software engineer
    

Collections

Lists are encoded as child nodes inside a parent. List items may be prefixed by a cell containing -, which is ignored, but makes visual indentation clearer. If an inline cell contains just a -, it should be written as \-.

Examples serializing class Squad:

class Squad(val members: List<Person>)

Default, unambiguous encoding:

members
 -
  name Jake
  age 29
  occupation SWE
 -
  name Tamago
  age 33
  occupation Product Manager

Member fields compacted:

members
 - Jake
  age 29
  SWE
 - Tamago
  age 33
  Product Manager

Without leading -, but with unambiguous fields. This encoding gives a lot of breathing room to list members.

members
 
  name Jake
  age 29
  occupation SWE
  
  name Tamago
  age 33
  occupation Product Manager

Here's the most compact encoding possible. No - markers, and all inline and anonymous fields compacted.

members
 Jake
  age 29
  SWE
 Tamago
  age 33
  Product Manager

Maps have two encodings, depending on the kind of key used in the map.

Maps with primitive keys are encoded like a class. Each key-value pair is a child node of the map. The first cells of the child encodes the key. The remaining cells and children of the node encode the value. (As usual, a string containing the cell break character can escape it with )

data class GameState(val scores: Map<String, Int>)
GameState(mapOf("Jake TL" to 5, "Tamago" to 12))
scores
 Jake\ TL 5
 Tamago 12

Maps with complex keys are encoded like a list of pairs with fields key and value. This encoding is a bummer.

data class ComplexGameState(val scores: Map<Person, Int>)
GameState(mapOf(
  Person("Jake TL", 29, "SWE") to 5,
  Person("Tamago", 30, "Product manager") to 12,
))
scores
 -
  key Jake\ TL
   age 29
   SWE
  value 12
 -
  key Tamago
   age 30
   SWE
  value 12

It may help to think of a map as an encoding of this Kotlin type:

data class KV<K, V>(
    @Inline @Anonymous val key: K,
    @Inline @Anonymous val value: V,
)

Other

TODO Comments are coded as any node where the first cell contains exactly //. Comments are ignored when decoding. To encode a node with a first cell containing the string "//", prepend the cell with a \.

data class GameState(val scores: Map<String, Int>)
GameState(mapOf("//" to 5))
scores
 \// 5

About

Kotlin implementation of Tree Notation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages