Skip to content

Commit

Permalink
Reorganize
Browse files Browse the repository at this point in the history
  • Loading branch information
sstephenson committed Mar 25, 2019
1 parent 6f608dd commit 6aeb7eb
Show file tree
Hide file tree
Showing 7 changed files with 48 additions and 22 deletions.
1 change: 0 additions & 1 deletion README

This file was deleted.

28 changes: 28 additions & 0 deletions README.md
@@ -0,0 +1,28 @@
# jwalk

jwalk is a streaming JSON parser for Unix:

* _streaming_, in that individual JSON tokens are parsed as soon as they are read
* _for Unix_, in that its line-based output is designed to be used and manipulated by the standard Unix toolset

jwalk is written in POSIX-compliant awk, sed, and sh, and does not require a C compiler. It is intended to run from source on any contemporary Unix system.

It can parse large documents slowly, but steadily, in constant memory space.

Each line of jwalk output consists of tab-separated fields describing a JSON token:

* zero or more fields, collectively the _path_, containing the string keys used to access the token, followed by
* one field specifying the token’s _type_, followed by
* one field containing the token’s string _value_

String values are encoded as UTF-8, and are unescaped with the exception of `\n`, `\t`, and `\\`.

When you need more control over the output than `grep` and `cut` provide, you can write a jwalk _examiner_. An examiner is an Awk script with [easy access to parser fields](lib/jwalk/examine.awk).

To install jwalk, create an executable symlink to `lib/jwalk.sh` named `jwalk` and place it in your path.

You can easily embed jwalk in another project. Just include jwalk’s `lib/` directory and run `sh lib/jwalk.sh`.

---

© 2019 Sam Stephenson
2 changes: 1 addition & 1 deletion bin/jwalk
19 changes: 9 additions & 10 deletions libexec/jwalk/jwalk.sh → lib/jwalk.sh
Expand Up @@ -21,7 +21,7 @@ usage() {

# Find ourselves

abs_dirname() {
realpath_dirname() {
path="$1"
while :; do
cd -P "${path%/*}"
Expand All @@ -36,19 +36,14 @@ abs_dirname() {
pwd
}

LIBEXEC="$(abs_dirname "$0")"
LIB="$(realpath_dirname "$0")"
TMPDIR="${TMPDIR:-/tmp}"


# Process command-line arguments

unset args stored_scripts json_file

append() {
var="$1"
eval "shift; $var=\"\$$var\$@ \""
}

store() {
path="${TMPDIR%/}/jwalk.$$.$1"
escaped_path="$(escape "$path")"
Expand All @@ -57,6 +52,10 @@ store() {
trap 'eval "rm -f $stored_scripts"' EXIT
}

append() {
eval "shift; $1=\"\$$1\$@ \""
}

escape() {
printf '%s\n' "$1" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/'/"
}
Expand Down Expand Up @@ -105,15 +104,15 @@ walk() {
}

examine() {
awk -f "$LIBEXEC/jwalk-examine.awk" "$@"
awk -f "$LIB/jwalk/examine.awk" "$@"
}

parse() {
awk -f "$LIBEXEC/jwalk-parse.awk"
awk -f "$LIB/jwalk/parse.awk"
}

tokenize() {
sh "$LIBEXEC/jwalk-tokenize.sh"
sh "$LIB/jwalk/tokenize.sh"
}

if [ -n "$json_file" ] && [ "$json_file" != "-" ]; then
Expand Down
14 changes: 7 additions & 7 deletions libexec/jwalk/jwalk-examine.awk → lib/jwalk/examine.awk
@@ -1,21 +1,21 @@
# jwalk: a streaming JSON parser for Unix
# (c) Sam Stephenson / https://jwalk.sh

# jwalk-examine.awk provides an awk runtime environment for working with the
# output generated by jwalk-parse.awk. For each line of parser input,
# jwalk-examine.awk sets the following awk variables:
# jwalk/examine.awk provides an awk runtime environment for working with the
# output generated by jwalk/parse.awk. For each line of parser input,
# jwalk/examine.awk sets the following awk variables:
#
# keys an array of zero or more strings, representing the key path,
# indexed forward and backward by 1-based index
# indexed forward starting at 1 and backward at -1
# path the key path as a string, with each key separated by a tab (FS)
# key the rightmost (i.e., last) key of the key path
# type the type of the value: one of "number", "string", "literal",
# "array", or "object"
# type the type of the value: one of "number", "string", "boolean",
# "null", "array", or "object"
# leaf 0/false when type is "array" or "object"; 1/true otherwise
# value (aliased as _) the string representation of the JSON value
#
# The special characters newline, tab, and backslash remain escaped in these
# variables' values, as they are in jwalk-parse.awk output. The unescape()
# variables' values, as they are in jwalk/parse.awk output. The unescape()
# function will replace escaped characters with their corresponding values.

BEGIN {
Expand Down
4 changes: 2 additions & 2 deletions libexec/jwalk/jwalk-parse.awk → lib/jwalk/parse.awk
@@ -1,8 +1,8 @@
# jwalk: a streaming JSON parser for Unix
# (c) Sam Stephenson / https://jwalk.sh

# jwalk-parse.awk parses a stream of JSON tokens, one per line, as generated
# by jwalk-tokenize.sh. For each value in the token stream, jwalk-parse.awk
# jwalk/parse.awk parses a stream of JSON tokens, one per line, as generated
# by jwalk/tokenize.sh. For each value in the token stream, jwalk/parse.awk
# writes a line of tab-separated values in the following form:
#
# [key '\t' ...] type '\t' value
Expand Down
2 changes: 1 addition & 1 deletion libexec/jwalk/jwalk-tokenize.sh → lib/jwalk/tokenize.sh
@@ -1,7 +1,7 @@
# jwalk: a streaming JSON parser for Unix
# (c) Sam Stephenson / https://jwalk.sh

# jwalk-tokenize.sh reads a well-formed JSON value from standard input and
# jwalk/tokenize.sh reads a well-formed JSON value from standard input and
# writes a stream of JSON tokens to standard output, one token per line.

set -e
Expand Down

0 comments on commit 6aeb7eb

Please sign in to comment.