Skip to content

flynx/serialize.js

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

serilize.js: Extended JSON serilization

JSON-like extended serialization/deserialization, and serialization-based object isolated deep and semi-deep copying library.

This extends the default JSON specification adding the following:

  • Recursive data structure serialization
  • Sparse array serialization
  • undefined/NaN serialization
  • Serialization of Infinity, BigInt's, Set's, Map's
  • Function serialization
  • Deep and partial-deep cleen object copy

Data not stored:

  • Attributes on arrays, maps, sets, and functions,
  • Function closures.

Motivation

This was originally built as a companion to a testing module for a programming class, illustrating several concepts, including: guaranteed clean isolation of data structures via serialization, instrumenting code and tooling design, basic parsing, among others.

Installation

For basic use:

$ npm install ig-serilaize

Or just download and drop serialize.js into your code.

var serialize = require('ig-serialize')

Introduction

serialize.js provides two toolsets:

  1. A means to serialize and deserialize complex data structures into an extended JSON format.
    var obj = {
    	sparse_array: [,,,,1],
     bad_number: NaN,
     really_large_number: 99999999999999999999999n,
    }
    obj.recursive = obj
    obj.re_reference = obj.sparse_array
    
    var str = serialize(obj)
    
    // ...
    
    var copy = deserialize(str)
    This is useful when requiering serialization of data structures more complex than pure JSON can handle.
  2. A means to cleanly copy deep data structures with guaranteed isolation.
    var obj = {
    	// ...
    }
    
    var copy = deepCopy(obj)

Long strings and large BigInt's

Repeating strings and BigInt's longer that MIN_LENGTH_REF are stored by reference by default.

See: MIN_LENGTH_REF

Serializing functions

Due to how JavaScript is designed it is not possible to trivially and fully clone a function with all of it's references, .serilaize(..) will not attempt to clone any state a function may have, this will lead to loosing:

  • Function closure
  • Attributes set on the function or any of it's prototypes, including the .__proto__ value if it was changed.

Thus, care must be taken when serializing structures containing function.

API

eJSON

An JSON-api-compatible object providing .stringify(..) and .parse(..) static methods.

serialize(..) / eJSON.stringify(..)

Serialize a JavaScript value into a JSON/eJSON string.

serialize(<value>)
eJSON.stringify(<value>)
	-> <string>

More control:

serialize(obj, options){
serialize(obj, indent, depth=0, options){
	-> <string>

Options format:

{
	// pretty-printing indent...
	// (default: undefined)
	indent: undefined,
	
	// outout root indent...
	// (default: 0)
	depth: 0,
	
	// minimal referenced string/bigint length...
	// (default: MIN_LENGTH_REF)
	min_length_ref: MIN_LENGTH_REF,
	
	// functions list...
	// (default: undefined)
	functions: undefined,
}

Supported options:

  • indent controls formatting and nested value indent, if set to a number that number of spaces will be used to indent nested values if given a string that string is used for indenting, note that only whitespace is supported currently. Default: undefined (disabled)
  • depth if given is a number of indent's, used to set top level indent depth of the returned string, this can be useful when pretty-printing or nesting the output. Default: 0
  • min_length_ref sets the minimal length of a string or big-int value for referencing when encountered repeatedly. If set to 0 or Infinity referencing of strings and big-ints will be is disabled. Default: 'MIN_LENGTH_REF'
  • functions if passed an array, encounterd functions will be pushed to it and stored in the output by index. Default: undefined

deserialize(..) / eJSON.parse(..)

Deserialize a JSON/eJSON into a value.

deserialize(<string>)
eJSON.parse(<string>)
	-> <value>

Deserializing function is disabled by default as it can be a security risk if the eJSON came from an untrusted source.

Enable function deserialization:

deserialize(<string>, true)
eJSON.parse(<string>, true)
deserialize(<string>, {functions: true})
eJSON.parse(<string>, {functions: true})
	-> <value>

Passing a function list (generated by serialize(<value>, {functions: <functions>})) for deserialization:

deserialize(<string>, {functions: <functions>})
eJSON.parse(<string>, {functions: <functions>})
	-> <value>

deepCopy(..)

Deep-copy an object.

deepCopy(<value>)
	-> <value>

The returned object is a fully sanitized through serialization clean copy.

partialDeepCopy(..)

Partially deep-copy and object, retaining only references to functions.

partialDeepCopy(<value>)
	-> <value>

The returned object is a partially sanitized through serialization clean copy with function references copied as-is from the input retaining and "transferring" all associated function state like attributes and closures.

Note that this is by definition a controlled state leak from input object to copy, so care must be taken to control object-specific state in function closures and attributes -- keeping function state independent from object state is in general a good practice.

MIN_LENGTH_REF / <options>.min_length_ref

Defines the default minimum length of repeating string or BigInt's to include as a reference in the output.

If set to 0, referencing will be disabled.

Default: 96

Format

The output of .serialize(..) is a strict superset of standard JSON, while the input format is a bit more relaxed than in several details.

Structural paths

Paths are used for internal references in cases when objects are encountered multiple times, e.g. in recursion.

A path is an array of keys, the semantics of each key depend on the data structure traversed:

  • Array expects a number
  • Set expects a number -- item order in set
  • Map expects pair of consecutive numbers -- the first indicates item order the second if 0 selects the key, if 1 selects the value.
  • Object expects a string

An empty path indicates the root object.

Notes:

  • String path items are unambiguous and are always treated as attributes.
    This enables referencing of attributes of any object like arrays, maps, ...etc.
  • Map/set paths are structured as if sets and maps are represented by arrays structured as their respective constructors expect as input.

Referencing

If an object is encountered for a second time it will be serialized as a reference by path to the first occurrence.

Grammar:

<ref> ::= 
	'<REF[]>'
	| '<REF[' <path-items> ']> 
	
<path-items> ::=
	<item>
	| <item> ',' <path-items>
	
<item> ::=
	<number>
	| <string> 

Example:

// a recursive array...
var o = []
o.o = o

// root object reference...
serialize(o) // -> '[<REF[]>]'

// array item...
serialize([o]) // -> '[[<REF[0]>]]'

// set item...
// NOTE: the path here is the same as in the above example -- since we 
//	use ordered topology for paths sets do not differ from arrays.
serialize(new Set([o])) // -> 'Set([[<REF[0]>]])'

// map key...
serialize(new Map([[o, 'value']])) // -> 'Map([[[<REF[0,0]>],"value"]])'

// map value...
serialize(new Map([['key', o]])) // -> 'Map([["key",[<REF[0,1]>]]])'

Null types

In addition to null, serialize.js adds support for undefined and NaN which are stored as-is.

Example:

serialize([null, undefined, NaN]) // -> '[null,undefined,NaN]'

Sparse arrays

Sparse arrays are represented in the same way JavaScript handles them syntactically -- with commas separating empty "positions".

Example:

serialize([,]) // -> '[,]'
serialize([,,]) // -> '[,,]'

Trailing commas are handled in the same way JavaScript handles them:

// trailing commas are ignored...
serialize([1,]) // -> '[1]'

// sparse element...
serialize([1,,]) // -> '[1,,]'

BigInt

Serialized as represented in JavaScript.

serialize(9999999999n) // -> '9999999999n'

Infinity

Serialized as represented in JavaScript

serialize(Infinity) // -> 'Infinity'
serialize(-Infinity) // -> '-Infinity'

Map / Set

Maps and sets are stored in the same format as their respective constructor calls, dropping the new keyword:

Grammar:

<map> ::=
	'Map([])
	| 'Map([' <map-items> '])

<map-items> ::=
	<map-item>
	| <map-item> ',' <map-items> 

<map-item> ::=
	'[' <value> ',' <value> ']'
<set> ::=
	'Set([])
	| 'Set([' <set-items> '])

<set-items> ::=
	<value>
	| <value> ',' <set-items> 

Examples:

serialize(new Set([1,2,3])) // -> 'Set([1,2,3])'
serialize(new Map([['a', 1], ['b', 2]])) // -> 'Map([["a",1],["b",2]])'

Functions

A function can be stored in one of two ways:

  1. As code (default)
  2. As a function index, if an array to store function references is given.

Grammar:

<func> ::=
	'<FUNC[' <length> ',(' <func-spec> ')]>

<func-spec> ::=
	<code>
	| <index>

<length> ::= <number>

<index> ::= <number>

<length> is the length of the next block in chars, including the braces.

serialize(function(){}) // -> '<FUNC[14,(function(){})]>'

var functions = []
serialize(function(){}, {functions}) // -> '<FUNC[3,(0)]>'

Note that deserializing functions is disabled by default as it can pose a security risk if the input to deserialize(..) is not trusted. (see: deserialize(..))

Running tests

serialize.js uses 'test.js' for testing.

Get the development dependencies:

$ npm install -D

Run the tests:

$ npm test

To run the tests directly:

$ ./test.js

To run the tests with modifier chains of length 3:

$ ./test.js -m 3

License

BSD 3-Clause License

Copyright (c) 2014-2026, Alex A. Naanou,
All rights reserved.

About

Extended JSON serilization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published