Skip to content

kentnl/JSON-PrettyCompact

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NAME

JSON-PrettyCompact - More Compact, But still Pretty, JSON

SYNOPSIS

use JSON::PrettyCompact;
print JSON::PrettyCompact->new->encode( { ... } );

DESCRIPTION

This library produces a very rudimentary wrapper around the standard JSON modules to pack more characters into a visible area, with minimial line overflows, opting to find a sweet spot between the standards of pretty(0) and pretty(1) typically made available, while still being valid JSON, and thus still being machine readable, while also being easy to edit.

Specifically, the motivation was to have a simple human-modifiable, machine-readable format, that I could still run a "tidy this up" command after editing it, which more portable than simply doing the same thing with a .pl file, kept tidy with perltidy, and consumed with do.

pretty(0) is usually far too compact to read, as the lines spew on forever horizontally, and its unlikely whatever you read that in will soft-wrap it in ways amenable to reading.

pretty(1) is usually far too verbose to read, with simple key-value maps and arrays consuming a line per element, spewing out vertically and giving your scrollbar a workout.

Instead, this module factors for a given amount of horizontal space that is considered "ideal" for easy consumption, and tries to split units elegantly that exceed this given space, resulting in sequences of inherently small units being clustered linewise.

Its clearly not a perfect implementation, and there are places where it could possibly be even more compact, but its too complicated.

But the results speak for themselves

Try flow this code into either pretty(0) or pretty(1) JSON formatting and see how much more painful it is to decipher in both:

{
  "a": 1, "b": {"c": 1}, "d": [1], "e": [1, 2], "f": [1, 2, 3],
  "g": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "h": [{"a": 1}],
  "i": [{"a": 1}, {"a": 1}],
  "j": [
    {"a": 1}, {"a": 1}, {"a": 1}, {"a": 1}, {"a": 1},
    {"b": {"c": {"d": {"e": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}}}},
    {"f": 1}, {"f": 1}, {"f": 1}, {"f": 1}, {"f": 1}
  ],
  "k": {"a": 1, "b": 2, "c": 3, "d": 4, "e": 5, "f": 6}
}

And yet, this is our default presentation.

METHODS

new

JSON::PrettyCompact->new( %params );

Create an encoder for pretty compact output.

DEFAULTS

canonical      =>  1 # people typically want this
indent         =>  2 # A compact, but still readable amount of indent
space_after    =>  1 # This really helps with reading
space_before   =>  0 # This ain't too helpful
width          => 69 # A nice default for terminals and text files
width_is_local =>  1 # Reduced vertical spewing
fold_hashes    =>  1 # Useful 60% of the time
fold_arrays    =>  1 # Useful 90% of the time

PARAMETERS

canonical

When processing Hashes, ensure to emit in sorted key order.

This parameter is also passed to the internal JSON encoder, as if you want consistent results, you typically want it everywhere.

This is enabled by default, as the primary audience is human beings, and they don't like things randomly changing.

indent

When determining that a structure must be split over multiple lines, use this many spaces of indent.

This defaults to 2, which gives a very compact result, while still being relatively easy to read.

In conjunction with width_is_local=0, each indent level loses this many additional characters of space, and subsequently tips the hand in favour of more lines of output.

space_after

Controls whether or not to insert a space after both ":" and "," in JSON delimiters.

This parameter is also passed to the internal JSON encoder, as it makes for a more consistent read.

This is enabled by default, as it seems to improve legibility with minimal consequences to overall compaction

space_before

Controls whether or not to insert a space before ":" JSON delimiters.

This parameter is also passed to the internal JSON encoder, again for consistent formatting.

This is disabled by default, as it doesn't seem to really improve overall legibility.

width

Controls how many characters any given data structure is allowed to consume, before crying that Thats too many characters, and splitting the data structure across multiple vertical lines.

Ideally, this should be a moderately large number, as smaller numbers basically regress this entire module back to pretty(1) output, but with all the heavy lifting done by this module instead of letting the JSON backend do it, which, well, don't do that.

When used in conjunction with width_is_local=0, cumulative indentation is factored into the overall space consumption, increasing the chances that a deeply nested, but small, element may be needlessly split across many lines.

This is NOT a HARD limit, as there's not much that can be done once you run out of places to stuff a \n, and you reach width characters of indentation, you're still going to need to put records somewhere.

But if your objective is to try to fit inside a terminal or editor window without horizontal scrolling, as opposed to just general visual horizontal space minimization, you'll want this value to reflect that.

When width_is_local=1, much smaller values of width are viable.

But in general, you don't want rediculously small values of width, say, smaller than 10, or you'll just be wasting your time.

The default value of width is 69 as this gives nice results in either condition.

width_is_local

This controls whether the space allocation declared by width is local to the unit, or if the width should be a global goal for a wrap limit, preferring to snap even shorter elements across multiple lines when the indentation pushes them up against the width.

width_is_local=0 is best used in conjunction with large values of width, or it will regress this entire module back to pretty(1)-like behaviour.

width_is_local=1 is more flexible, and only aims to keep each unit horizontally small, while not caring too much if the unit is 300 columns deep, and works well with both large and small values

The default is width_is_local=1, as this seems to have much less surprising formatting.

fold_hashes

This controls how aggressively the internal multi-line splitter for hashes tries to pack mutiple small sub-units into a single line.

For instance, if a hash has 100 elements, and you're only getting a width of 30, with fold_hashes=0, each key will get its own line, regardless of the fact it may only have a value 1 character wide, leaving a lot of unused horizontal space on its right.

With fold_hashes=1, several key: value pairs can be squashed into a single line of output, greatly reducing vertical space consumption.

This is really useful when you have a lot of data structures with mostly consistent key lengths, and mostly consistent value lengths.

However, with wildly varying key/value lengths, this can be harmful, as it becomes prone to having hash printouts where it is mostly one line per result, and occasional lines with trailing second entries, and these can be overlooked.

For example:

"prereqs": {
  "build": {"requires": {"List::Util": "0", "Test::More": "0"}},
  "configure": {"requires": {"Module::Build": 0.36}},
  "runtime": {
    "requires": {
      "Text::Aligner": "0.05", "perl": "5.008", "strict": "0",
      "warnings": "0"
    }
  }
},

This is not the worst example, but you can imagine how it might be bad if your data was:

{
  ...
  "romestupidlylongkeynamehereomgedfghedfekyepryue": 4,
  "somestupidlylongkeynamehereomgwtfbbq": 5, "tome": 6,
  "tomestupidlylongkeynamehereomgwtfbbqaegttyyh767": 7,
  ...
}

You could easily miss that.

But as I've found it on balance mostly useful, the default is fold_hashes=1.

fold_arrays

Similar to fold_hashes, this tries to compact more units inside each line of a multi-line array when the array is deemed to need prettifying to fit within the width goals.

And similar to fold_hashes, there are caveats that come in some cases when arrays are filled with wildy varying element lengths.

However, in general arrays seem to be more self-consistent, and arrays of numerical data benefit greatly from this option, and yeilding nice results like:

[
  1, 2, 3, 4, 5,
  6, 7, 8, 9, 10
]

In situations that would have otherwise degraded to

[
  1,
  2,
  3,
  4,
  5,
  6,
  7,
  8,
  9,
  10
]

Subsequently, by default, fold_arrays=1.

encode

my $string = $insance->encode($data);

This is the only method you're really interested in, it formats $data as best it can as per parameters passed to new.

clone

my $copy = $instance->clone(%changed_params);

This is a simple adapter to create a new JSON::PrettyCompact instance, re-using the settings from an existing one, in order to change a parameter or two.

It is very unsophisticated and you could do the same thing by keeping your original parameters hash around, modifying them, and then calling new with them.

Though I guess this is also useful if you're a bad person and modify internal state directly.

Just be aware that clone has no way to clone the internal encoders, and will create them fresh from the final parameters from combining "old" settings and the passed ones.

ENCODING

Presently this module doesn't provide any explict knobs for handling encoding, though if I'm bored enough I may eventually.

Subsequently, its return value is a character string, and you'll probably want slosh it through an encoder like utf8::encode before you pass it off to print.

CAVEATS

This can in no way be used everywhere standard JSON tools are used.

Particularly, heaps of the features are unsupported. And any exotic data types outside simple scalars, hashes and arrays have no support implemented.

If you're lucky, and those exist, they might be able to fit within a width limit and not break anything. Other times you might just get a big fat die.

Also, allow_nonref behaviour permitting encode($scalar) is functionally hard enabled.

The output format is also really not intended for general purpose serialization, and has limitations with regards to anything line-based, like diff, in the same ways compact representation can be a bit of a nightmare if you hit it in a git rebase.

Its slightly less bad than pure compact representation in this regard, because there's still a chance a change will only affect one line, instead of changing the one line that comprises the whole file, but you'll also get confused when a structure length changes somewhere and an entire subtree gets folded/unfolded, or folded subsections get reflowed due to being able to suddenly fit fewer or more elements.

Reasonable patches welcome, I guess.

AUTHOR

Kent Fredric <kentnl@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2020 by Kent Fredric <kentfredric@gmail.com>.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

About

More Compact, But still Pretty, JSON

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages