Skip to content
Permalink
Browse files

Update documentation

  • Loading branch information...
sstephenson committed Aug 25, 2019
1 parent 4e17c4c commit 4cd1c78dcbc1815afd6b4ec55572e002b95fa65e
Showing with 217 additions and 70 deletions.
  1. +76 −0 CODE_OF_CONDUCT.md
  2. +141 −70 README.md
@@ -0,0 +1,76 @@
# Contributor Covenant Code of Conduct

## Our Pledge

In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.

## Our Standards

Examples of behavior that contributes to creating a positive environment
include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.

## Scope

This Code of Conduct applies within all project spaces, and it also applies when
an individual is representing the project or its community in public spaces.
Examples of representing a project or community include using an official
project e-mail address, posting via an official social media account, or acting
as an appointed representative at an online or offline event. Representation of
a project may be further defined and clarified by project maintainers.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project owner at sstephenson@gmail.com. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

[homepage]: https://www.contributor-covenant.org

For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq
211 README.md
@@ -1,21 +1,21 @@
# jwalk

jwalk is a streaming JSON parser for Unix:
jwalk is a **streaming JSON parser for Unix:** _streaming_, in that individual [JSON][json] tokens are parsed as soon as they are read from the input stream, and _for Unix_, in that its tab-delimited output is designed to be used and manipulated by the standard Unix toolset. jwalk…

* _streaming_, in that individual JSON tokens are parsed as soon as they are read from input;
* _for Unix_, in that its line-based output is designed to be used and manipulated by the standard Unix toolset.
* parses large documents slowly, but steadily, in memory space proportional to the key depth of the document
* runs from source on any contemporary POSIX system
* is written in standard [awk][awk], [sed][sed], and [sh][sh], and does not require a C compiler or precompiled binaries
* can easily be embedded in another project

jwalk is written in standard [awk][awk], [sed][sed], and [sh][sh], and does not require a C compiler. It is intended to run from source on any contemporary POSIX system.
jwalk is useful for working with data from JSON APIs in shell scripts, especially in bootstrap environments, but can be applied to a variety of other situations. It is a powerful command-line tool in its own right, with built-in pattern matching and support for awk scripts called _examiners_.

It can parse large documents slowly, but steadily, in memory space proportional to the key depth of the document.
## How It Works

## Reading Records From JSON
The `jwalk` command reads a JSON document from standard input or from a file specified as an argument.

The `jwalk` command is a filter which transforms a stream of JSON _tokens_ from standard input into a stream of tab-delimited, line-separated _records_ on standard output.
A pipeline inside jwalk transforms the document stream into a series of _tokens_, and then parses the tokens into _records_, one record per line, on standard output.

A token is an indivisible, non-whitespace span of JSON, such as a number, string, boolean, bracket, or brace.

Every line of jwalk output is a record, arranged as follows, with each field separated by a tab character:
Each record is a sequence of tab-separated fields:

* zero or more fields, collectively the _path_, containing the string keys used to access the value, followed by
* one field specifying the value's _type_, followed by
@@ -25,114 +25,185 @@ The type is one of `number`, `string`, `boolean`, `null`, `array`, or `object`.

### Examples

$ echo 123.45 | jwalk
number ▷ 123.45
(In this documentation, ` ▷ ` represents a tab character.)

$ echo true | jwalk
boolean ▷ true
Basic JSON values produce one record each:

$ echo '123.45' | jwalk
number ▷ 123.45

$ echo '"acab"' | jwalk
string ▷ acab

$ echo null | jwalk
$ echo 'true' | jwalk
boolean ▷ true

$ echo 'null' | jwalk
null ▷

$ echo '[123,"acab"]' | jwalk
Arrays and objects produce one record representing the type, followed by zero or more records representing their key-value pairs:

$ echo '[80,"http"]' | jwalk
array ▷
0 ▷ number ▷ 123
1 ▷ string ▷ acab
0 ▷ number ▷ 80
1 ▷ string ▷ http

$ echo '{"version":"1.0.0"}' | jwalk
object ▷
version ▷ string ▷ 1.0.0

In general, records of type `array` and `object` provide structural information. Use the `-l` (or `--leaf-only`) flag to skip these records.
You can use the `-l` (or `--leaf-only`) command-line option to omit the type record:

$ echo '[123,"acab"]' | jwalk -l
0 ▷ number ▷ 123
1 ▷ string ▷ acab
$ echo '[80,"http"]' | jwalk -l
0 ▷ number ▷ 80
1 ▷ string ▷ http

$ echo '{"version":"1.0.0"}' | jwalk -l
version ▷ string ▷ 1.0.0

## Processing Records As Text
An array of objects looks like:

For simple array documents, pipe jwalk's output to `cut -f 3` to see the array's values:
$ echo '[{"lat":45.1,"lng":13.6,"name":"Rovinj"},
> {"lat":44.9,"lng":13.8,"name":"Pula"}]' | jwalk
array ▷
0 ▷ object ▷
0 ▷ lat ▷ number ▷ 45.1
0 ▷ lng ▷ number ▷ 13.6
0 ▷ name ▷ string ▷ Rovinj
1 ▷ object ▷
1 ▷ lat ▷ number ▷ 44.9
1 ▷ lng ▷ number ▷ 13.8
1 ▷ name ▷ string ▷ Pula
With `-l`, the same array looks like:

$ echo '[{"lat":45.1,"lng":13.6,"name":"Rovinj"},
> {"lat":44.9,"lng":13.8,"name":"Pula"}]' | jwalk -l
0 ▷ lat ▷ number ▷ 45.1
0 ▷ lng ▷ number ▷ 13.6
0 ▷ name ▷ string ▷ Rovinj
1 ▷ lat ▷ number ▷ 44.9
1 ▷ lng ▷ number ▷ 13.8
1 ▷ name ▷ string ▷ Pula
## Filtering Records By Path

You can use the `-p <pattern>` (or `--pattern <pattern>`) command-line option to instruct jwalk to print only the records whose keys match the given _pattern_.

A pattern describes a key or sequence of keys present anywhere in a record's path. For example, the pattern `a` matches only the records whose path contains a key `"a"`.

Patterns may contain any of the following special characters:

Character | Matches
--------- | -------
`^` | the beginning of the path
`$` | the end of the path
`.` | the boundary between two adjacent keys
`*` | zero or more occurrences of any character in a key
`.**` | zero or more keys

### Example Patterns

Pattern | Matches records
-------- | ---------------
`^a` | starting with the key `"a"`
`*.*` | with at least two keys
`a` | with the key `"a"`
(empty) | With the key `""`
`a.b.c.` | with the keys `"a"`, `"b"`, and `"c"`, followed by the key `""`
`a*c` | having any key which starts with `a` and ends with `c`
`a.*.c` | with the key `"a"`, followed by one key, followed by the key `"c"`
`a.**.c` | with the key `"a"`, followed by zero or more keys, followed by the key `"c"`
`c$` | ending with the key `"c"`

$ echo '[1,2,3]' | jwalk -l | cut -f 3
1
2
3
## Examining Records With awk

Or `wc -l` to count the number of elements in the array:
jwalk's tab-delimited, line-separated output is designed to be consumed by standard Unix tools such as `awk`, `cut`, `grep`, and `sed`.

$ echo '[1,2,3]' | jwalk -l | wc -l
3
In particular, awk's default field and record separators handle jwalk's output, such that each record's fields are accessible as `$1`, `$2`, and so on:

For simple object documents, pipe jwalk's output to `cut -f 1` to see the object's keys:
$ echo '["awk","cut","grep","sed"]' \
> | jwalk -l | awk '{print $3}'
awk
cut
grep
sed
$ echo '{"first":"Sam","last":"Stephenson"}' | jwalk -l | cut -f 1
first
last
A jwalk _examiner_ is an [awk script][awk] with a runtime environment tailored for parsing jwalk output. Specifically, examiners have access to special variables with details about the record.

Or `cut -f 1,3` to see the key-value pairs:
Pass one or more `-e <script>` options on the command line to specify examiners inline:

$ echo '{"first":"Sam","last":"Stephenson"}' | jwalk -l | cut -f 1,3
first ▷ Sam
last ▷ Stephenson
$ echo '["awk","cut","grep","sed"]' \
> | jwalk -l -e '{print value}'
awk
cut
grep
sed
The `jwalk` command also accepts a filename from the command line.
You can also store examiners in files and load them with the `-f <scriptfile>` command-line option.

Use `grep` to filter records of interest by path:
### Special Variables

$ curl -sLO https://unpkg.com/turbolinks@beta/package.json
$ jwalk -l package.json | grep -E 'scripts\t' | cut -f 2
clean
build
watch
start
test
In addition to the full set of [special variables][awk-special-variables] available to all awk programs, examiners have access to the following additional variables:

## Examining Records With awk
Variable name | Description
-------------- | -----------
`keys` | an array of zero or more strings, representing the key path, indexed forward starting at 1 and backward at -1
`path` | the key path as a string, with each key separated by a tab (or `FS`)
`key` | the rightmost or last key of the key path; equivalent to `keys[-1]`
`type` | the type of the JSON value
`leaf` | false when the type is `array` or `object`; true otherwise
`value` or `_` | the string representation of the JSON value

When a situation calls for more control over record output than `grep` and `cut` can provide, consider writing a jwalk _examiner_. An examiner is an [awk script][awk] pre-configured with variables for accessing record data.
### Unescaping String Values

Variable name | Description
------------- | -----------
`keys` | an array of zero or more strings, representing the key path, indexed forward starting at 1 and backward at -1
`path` | the key path as a string, with each key separated by a tab (or `FS`)
`key` | the rightmost or last key of the key path; equivalent to `keys[-1]`
`type` | the type of the JSON value
`leaf` | false when the type is `array` or `object`; true otherwise
`value` | (aliased as `_`) the string representation of the JSON value
The characters `\n`, `\t`, and `\` remain escaped in special variables. Pass these variables through the `unescape()` function to replace the escaped characters with unescaped values.

Pass one or more `-e <script>` options on the command line to specify examiners inline:
## Configuring jwalk

$ jwalk -l -e '$1 == "scripts" {print key}' package.json
clean
build
watch
start
test
By default, jwalk uses the `awk` and `sed` commands found in your `PATH`. You can tell it to use specific commands by setting the `JWALK_AWK` or `JWALK_SED` environment variables, such as with `JWALK_AWK=gawk` or `JWALK_SED=/usr/local/bin/gsed`.

Store more complex examiners in files and load them with the `-f <scriptfile>` command-line option.
You can log the shell commands issued by jwalk to standard error by setting the `JWALK_DEBUG` environment variable to `1`.

## Installing and Embedding jwalk

To install jwalk, run `sh lib/jwalk.sh --install` with the path to the directory where jwalk should be installed. For example:
To install jwalk, run `bin/jwalk --install` with the path to the directory where jwalk should be installed. The directory must already exist. For example:

$ sh lib/jwalk.sh --install /usr/local
$ sudo bin/jwalk --install /usr/local

Once you have a `jwalk` command in your path, you can run `jwalk --install` to embed jwalk into another project:
Once you have a `jwalk` command installed in your path, you can run `jwalk --install` to embed jwalk into another project:

$ mkdir -p vendor/jwalk
$ jwalk --install vendor/jwalk
$ vendor/jwalk/bin/jwalk -l ...

To install a git checkout of jwalk for development, either place a symlink to `bin/jwalk` somewhere in your `PATH`, or place jwalk's `bin` directory in your `PATH`.

## Testing jwalk

Run `test/check` to start the jwalk test suite. This script runs each test case in `test/cases/` and logs the results in TAP format to standard output. If any test case fails, the script exits with a non-zero status.

Input data lives in `test/corpus/` and expected output lives in `test/fixtures/`. When writing new test cases, use the existing test cases and file hierarchy as a guide.

## Contributing Back

jwalk is open-source software, freely distributable under the terms of an [MIT-style license][license]. The [source code][source] is hosted on GitHub.

We welcome contributions in the form of bug reports, pull requests, or thoughtful discussions in the GitHub [issue tracker][issues].

Please note that this project is released with a [Contributor Code of Conduct][conduct]. By participating in this project you agree to abide by its terms.

---

[© Sam Stephenson](LICENSE)
[© Sam Stephenson][license] • Part of the [Shellbound Project][shellbound]

[awk]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
[awk-special-variables]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html#tag_20_06_13_03
[conduct]: CODE_OF_CONDUCT
[issues]: https://github.com/shellbound/jwalk/issues
[json]: http://www.json.org
[license]: LICENSE
[sed]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html
[source]: https://github.com/shellbound/jwalk
[sh]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html
[shellbound]: https://github.com/shellbound

0 comments on commit 4cd1c78

Please sign in to comment.
You can’t perform that action at this time.