Skip to content

Latest commit

 

History

History
1541 lines (1400 loc) · 48.1 KB

Walk-path tutorial.md

File metadata and controls

1541 lines (1400 loc) · 48.1 KB

jtc. Walk-path easy. Tutorial (under construction)

Walk-path is a way to telling jtc how input JSON must be walked.

  1. Walk-path Lexemes
  2. Subscript lexemes
  3. Search lexemes
  4. Directives and Namespaces

Walk-path Lexemes

Walk-path is an argument of -w option (but not only, other options may also accept walk-paths).

Walk-path is made of lexemes (optionally separated with the white spaces). A lexeme - is an atomic walk-step that jtc applies when traversing JSON tree. jtc always begins walking of any walk-path starting from the JSON root.

If upon walking (i.e. applying lexemes, a.k.a. walk-steps) applying of a lexeme fails, such walk-path is considered to be empty (non-existent) and therefore not displayed. Only successfully finished walk-paths will be displayed. In order to succeed walking a walk-path, all its lexemes must be walked successfully

There are only two types of lexemes:

Though each type comes in several variants.

Subscript lexemes

There are few variants of subscripts:

let's start with the most common one - numerical offset

Numerical offsets

[n] - as like in most programming languages, in jtc numerical offsets let selecting nth instance in the currently selected (walked) JSON, staring from 0 (indices are always zero-based):

Let's work with this JSON:

bash $ JSN='["abc", false, null, { "pi": 3.14}, [ 1,"two", {"number three": 3}] ]'
bash $ <<<$JSN jtc
[
   "abc",
   false,
   null,
   {
      "pi": 3.14
   },
   [
      1,
      "two",
      {
         "number three": 3
      }
   ]
]
  • select 1st element in JSON array:
bash $ <<<$JSN jtc -w[0]
"abc"

- select 5th element in JSON array:

bash $ <<<$JSN jtc -w[4]
[
   1,
   "two",
   {
      "number three": 3
   }
]

If the selected element is non-atomic (a.k.a. iterable), i.e., Json array, or JSON object, then you may continue digging further the selected (walked) JSON tree:

- select 5th element in JSON array and then 3rd one:

bash $ <<<$JSN jtc -w[4][2]
{
   "number three": 3
}

If we try selecting a 2nd element from the resulted JSON (which has only single element), walking will fail and the output will be blank:

bash $ <<<$JSN jtc -w[4][2][1]
bash $
bash $ <<<$JSN jtc -w[4][2][0]
3

Note: numerical offset is treated like one only if spelled like shown ([n]) - no white space allowed and n must be spelled as a valid number, otherwise it's treated as a literal subscript. E.g.: [ 0 ] will address an element with the label " 0 ".

Literal subscripts

[text] - literal subscripts allow addressing (selecting) elements within JSON objects by their key (label)

There are two elements in the above JSON that are addressable with literal subscripts, let's get to them using literal subscripts. First, let's get to pi's value:

bash $ <<<$JSN jtc -w[3]
{
   "pi": 3.14
}
bash $ <<<$JSN jtc -w[3][pi]
3.14

Now let's get to the number three's value:

bash $ <<<$JSN jtc -w[4]
[
   1,
   "two",
   {
      "number three": 3
   }
]
bash $ <<<$JSN jtc -w[4][2]
{
   "number three": 3
}
bash $ <<<$JSN jtc -w[4][2][number three]
jtc json exception: unexpected_end_of_string

- why? - it happens because of a shell interpolation. Shell treats space (' ') as an argument separator, therefore option -w ends up only with partial argument, namely with [4][2][number, which is an invalid walk.

in fact, jtc there complains due to a different reason: a second part of a walk (three]) is passed to jtc as a standalone argument, which jtc treats as a filename. It tries opening and reading it, but because such file does not exist an empty result is returned. However, the empty input is an invalid JSON (by JSON standard) - that why it's a JSON parsing error is given. Here how walk-path parsing error looks like:

bash $ <<<$JSN jtc -w[4][2][number three] -
jtc json exception: walk_offset_missing_closure
bash $

To escape shell interpolation, either the whole argument must be quoted, or a space symbol (the former varian is preferred, but both will work):

bash $ <<<$JSN jtc -w'[4][2][number three]'
3
bash $ <<<$JSN jtc -w[4][2][number\ three]
3

The elements within objects also could be addressed using numerical offsets:

bash $ <<<$JSN jtc -w[4][2][0]
3

- it seems numerical notation is more succinct, why then bother using literal offsets? Because our assumptions of the order of elements within JSON objects are fragile.

Say, there's a following JSON:

bash $ ANML='{ "ANDEAN BEAR": "Bono", "AMUR TIGER": "Shadow", "GRIZZLY BEAR": "Goofy" }'

And we want to get get the name of ANDEAN BEAR. Being lazy one can do it by a numerical offset, assuming here that the index of the required entry would be 0 (indeed, it's listed first there in the object), let's see:

bash $ <<<$ANML jtc -w[0]
"Shadow"

Bummer! Instead of selecting the name of ANDEAN BEAR we got the name of AMUR TIGER. That's because our assumption of the index was wrong.

JSON standard defines JSON objects as unordered set of name/value pairs. That means that the order of elements (name/value pairs) within JSON objects will be defined by a processing program (and not by user, like in JSON arrays). Some programs will retain the same order, others will reorder them - it all boils down to the internal implementation specifics.

jtc always rearranges all the elements within the JSON objects by their keys (labels) in the alphabetical order, thus for jtc the above JSON looks like this:

bash $ <<<$ANML jtc
{
   "AMUR TIGER": "Shadow",
   "ANDEAN BEAR": "Bono",
   "GRIZZLY BEAR": "Goofy"
}

That is a serious enough reason to select elements in JSON objects by their keys/labels:

bash $ <<<$ANML jtc -w'[ANDEAN BEAR]'
"Bono"

There's a curious case, when the label matches a numerical subscript, i.e. consider:

bash $ <<<'{ "0": 12345, "#": "abcde"}' jtc
{
   "#": "abcde",
   "0": 12345
}

Addressing JSON root with [0] will return "abcde":

bash $ <<<'{ "0": 12345, "#": "abcde"}' jtc -w[0]
"abcde"
  • How to get to the value of the label "0"? For that we need to use a non-recursive search lexeme (namely, >0<l).

NOTE: there's a generic rule for all other types of subscripts: _If parsing of a subscript does not result in either of a type (i.e. it's neither a numerical offsets, nor a range subscript, nor addressing parents), then it's treated as a literal subscript.

Range subscripts

[n:N] - selects each element in the iterable, starting from nth index and ending with Nth - 1, i.e. N is the index of the element following the last in the range. Both values n and N are optional and both could be omitted

For those who are familiar with Python addressing, grasping this one is easy - it's matches Python's addressing concept entirely.

Range subscript makes the walk-path iterable, i.e. it's like selecting multiple elements with just one iterable walk instead of specifying multiple offsets, compare:

bash $ <<<$JSN jtc -w[0] -w[1] -w[2]
"abc"
false
null
bash $ <<<$JSN jtc -w[0:3]
"abc"
false
null

Default range indices

Either of indices in the range subscript n or N could be missed, then the index in the omitted position takes a default value.

i.e. a default index in the first position means: from the very first value in the iterable, while a default index in the second position means: till the last value in the iterable

it's quite handy when we need to select only portion of the elements in the iterable either starting form its beginning, or till it's last element, because sometimes we might not know upfront a number of elements in the iterable.

  • select 2 elements from the beginning of the JSON root's iterable:
bash $ <<<$JSN jtc -w[:2]
"abc"
false
  • select all elements staring from 3rd one:
bash $ <<<$JSN jtc -w[2:]
null
{
   "pi": 3.14
}
[
   1,
   "two",
   {
      "three": 3
   }
]

when both indices are missed [:] then each element in the iterable will be selected (walked):

bash $ <<<$JSN jtc -w[:]
"abc"
false
null
{
   "pi": 3.14
}
[
   1,
   "two",
   {
      "three": 3
   }
]

The range indices (as well as any lexemes) can appear in the walk-path any number of times. The above example shows iterating over the top iterable (or, the first tier) in JSON tree hierarchy, to iterate over all iterables in the second tier of the JSON tree, do this:

bash $ <<<$JSN jtc -w[:][:]
3.14
1
"two"
{
   "three": 3
}

- an each element in the top iterable will be walked and then attempted to walk the children of the walked element itself, one by one. Because first three elements are not iterable, they will not be shows (they cannot be iterated over):

bash $ <<<$JSN jtc -w[0][:]
bash $

If you like to see (print) both walks of the top iterable and then each of the iterable at the second tier, then provide two walk paths:

bash $ <<<$JSN jtc -w[:] -w[:][:]
"abc"
false
null
{
   "pi": 3.14
}
3.14
[
   1,
   "two",
   {
      "three": 3
   }
]
1
"two"
{
   "three": 3
}

- Note how jtc interleaves the walks - it puts relevant walkings in a good (relevant) order, rather than dumping results of the first walk and then of the second. If one prefers seeing the latter behavior, option -n will do the trick, compare:

bash $ <<<$JSN jtc -w[:] -w[:][:] -n
"abc"
false
null
{
   "pi": 3.14
}
[
   1,
   "two",
   {
      "three": 3
   }
]
3.14
1
"two"
{
   "three": 3
}

Alternative range notation

[+n] is the alternative range notation for [n:], they both do exactly the same thing - walk each element in the iterable starting from nth element:

bash $ <<<$JSN jtc -w[+3]
{
   "pi": 3.14
}
[
   1,
   "two",
   {
      "three": 3
   }
]
bash $ <<<$JSN jtc -w[3:]
{
   "pi": 3.14
}
[
   1,
   "two",
   {
      "three": 3
   }
]

Using either of notations is a matter of personal preference and has no impact onto the way of walking JSON tree

Ranges with positive indices

Positive indices (and 0) in the range notation ([n:N]) always refer to the index offset from the beginning of the iterable.

When both n and N are positive, naturally N must be > n, if N <= n, it'll result in a blank output:

bash $ <<<$JSN jtc -w[2:1]
bash $
bash $ <<<$JSN jtc -w[2:2]
bash $

Case where N = n + 1, e.g., [3:4] is equal to spelling just a numerical offset alone:

bash $ <<<$JSN jtc -w[3:4]
{
   "pi": 3.14
}
bash $ <<<$JSN jtc -w[3]
{
   "pi": 3.14
}

Ranges with negative indices

A negative index in the range subscript refers to the offset from the end of the iterable. In the range subscripts it's okay to mix and match positive and negative indices in any position.

  • select last 3 elements from the top array:
bash $ <<<$JSN jtc -w[-3:]
null
{
   "pi": 3.14
}
[
   1,
   "two",
   {
      "three": 3
   }
]
  • select all elements in the range from the 2nd till the one before the last one:
bash $ <<<$JSN jtc -w[1:-1]
false
null
{
   "pi": 3.14
}

When either of indices is given outside of the actual range of the iterable, jtc tolerates it fine re-adjusting respective range indices properly to the beginning and the end of actual range of the iterable:

bash $ <<<$JSN jtc -w[-100:100]
"abc"
false
null
{
   "pi": 3.14
}
[
   1,
   "two",
   {
      "three": 3
   }
]

However, when the range is unknown, it's best to use the notation with the default range values (i.e., [:])

Addressing parents

One of the nifty features that makes jtc very powerful when coming up with queries, is the ability to address parents off the walked elements, i.e., those JSON elements higher up in the the JSON hierarchy tree that make the path towards the currently walked element.

There are 2 ways to address parents:

  • [-n] will address parent(s) in the path (made of offsets from the JSON root to the currently walked element) offsetting it from the currently walked element
  • [^n] will do address parent(s) but offsetting it from the JSON root

Not sure if the definition above is easy to understand, but the concept is, so it's probably much easier to show it with the example. Let's see the walk path where we selected the JSON element 3:

bash $ <<<$JSN jtc -w'[4][2][number three]'
3

The walk path from the JSON root towards the element 3 is [4][2][number three].

In fact, every walk at any given step (even when it's done via recursive search lexemes) internally always maintains a representation expressed via subscript and literal offsets only. E.g. the same number 3 could have been selected using a recursive search walk:

bash $ <<<$JSN jtc -w'<3>d'
3

but internally, the path towards this JSON element would be built as:

bash $ <<<$JSN jtc -w'<3>d' -dddd 2>&1 | grep "built path vector"
....walk_(), built path vector: 00000004-> 00000002-> number three
....walk_(), finished walking: with built path vector: 00000004-> 00000002-> number three

i.e. it still would bve [4][2][number three]. That's why jtc is known to be a walk-path based utility.

Offsetting path from a leaf

Thus, if we list indices for the above walk-path starting from the leaf, it'll be like this:

Index from the leaf:   3    2  1      0
          walk-path: (root)[4][2][number three]

Thus in order to select either of parents, we just need to pick a respective index in the path. E.g.:

  • [-1] will address an immediate parent of the value 3
  • [-2] will address a parent of the parent of the value 3
  • [-3] wil address the JSON root itself. Note: [-0] will address the value 3 itself, so there's no much of a point to use such addressing, while indices greater root's (in that example are [-4], [-5], etc will keep addressing the JSON root) Take a look:
bash $ <<<$JSN jtc -w'[4][2][number three][-1]'
{
   "number three": 3
}
bash $ <<<$JSN jtc -w'[4][2][number three][-2]'
[
   1,
   "two",
   {
      "number three": 3
   }
]
bash $ <<<$JSN jtc -w'[4][2][number three][-3]'
[
   "abc",
   false,
   null,
   {
      "pi": 3.14
   },
   [
      1,
      "two",
      {
         "number three": 3
      }
   ]
]

Offsetting path from the root

Now, let's list all the indices for the same walk-path starting from the root:

Index from the root:   0    1  2      3
          walk-path: (root)[4][2][number three]

You must get already the idea: the addressing parent off the root takes those indices:

bash $ <<<$JSN jtc -w'[4][2][number three][^0]'
[
   "abc",
   false,
   null,
   {
      "pi": 3.14
   },
   [
      1,
      "two",
      {
         "number three": 3
      }
   ]
]
bash $ <<<$JSN jtc -w'[4][2][number three][^1]'
[
   1,
   "two",
   {
      "number three": 3
   }
]
bash $ <<<$JSN jtc -w'[4][2][number three][^2]'
{
   "number three": 3
}
bash $ <<<$JSN jtc -w'[4][2][number three][^3]'
3

Let's recap both addressing schemas (for the given walk in the example) on the same diagram:

                                                           etc.
                                                           [^4]
to address a parent from the root: [^0]   [^1]  [^2]       [^3]
                                    |      |     |          |
                                    v      v     v          v
                        walk-path: root > [4] > [2] > [number three]
                                    ^      ^     ^          ^
                                    |      |     |          |
  to address a parent from a leaf: [-3]   [-2]  [-1]       [-0]
                                   [-4]
                                   etc.

Yes, agree, addressing parents when a walk-path is made of only subscript, probably is a dull idea (and here it's done only for the instructive purposes) - indeed, we just walked that path from the root, why getting back using parent addressing instead of stopping it at the required place? Ergo, it makes sense to use parent addressing together with (after) search lexemes.

Search lexemes

Search lexemes allow performing various searches across JSSON tree, there are two major notations for search lexemes:

  • <expr> - performs a recursive search of expr from the currently selected JSON element
  • >expr< - performs a non-recrusive search of expr for a currently selected JSON iterable

A complete notation for search lexemes (both, recursive and non-recursive), look like this: <expr>SQ (>expr<SQ), where:

  • expr is a content of the lexeme, depending on the lexeme suffix, its semantic may vary: it could be either of:
    • a value to match
    • a Regular Expression to search for
    • a namespace (think of a namespace as of a variable that can hold any JSON type/structure)
    • a template
    • empty
  • S is an optional one-letter suffix that defines the behavior of the lexeme
  • Q is a quantifier, whose function generally is analogous to the function of numerical offset and range subscripts, but in some cases also might vary, as per documentation. the quantifier must also follow the suffix (if one present).

Also, there's a few lexemes that look like search lexemes but in fact they don't perform any type of search, instead they apply a certain action, they are known as directives, those are distinguishable from the searches only by the suffix

String searches

r, R, P - these are suffixes to perform JSON string searches. Suffix r is default and can be omitted:

  • <text> - searches for the occurrence of exact match of text in the JSON tree (off the currently walked element)
  • <Regexp>R - performs an RE search for the regular expression Regexp
  • <>P, <namespace>P - matches any JSON string value (a.k.a a JSON string type match), similar to <.*>R but faster. The lexeme might be empty or hold the namespace where matched value will be stored

Examples:

  • Find an exact string value:
bash $ <<<$JSN jtc -w'<two>'
"two"
  • Find a string value matching RE:
bash $ <<<$JSN jtc -w'<^t>R'
"two"
  • Find the first JSON string value:
bash $ <<<$JSN jtc -w'<>P'
"abc"

Quantifiers

By default any search lexeme is going to find only a first match occurrence. That is facilitated by a default quantifier 0. If there's a need to find any other match instances (or range of instances) a quantifier must be given.

Quantifiers may be given in either of following forms:

  • n - search will find nth match
  • n: - search will find all matches starting from nth till the last matched one
  • :N - search will find all matches starting from the first (index 0) till Nth
  • n:N - search will find all matches starting from nth till Nth - 1
  • : - search will find all matches

Observe following rules applied to all forms of quantifiers:

  1. in any of the above notations indices (n, N) are zero based
  2. both indices n, N must be positive numbers (or 0). There's only one case where quantifier may go negative (see Relative quantifiers (>..<l,>..<t))
  3. either or both of indices n, N may take a form of {Z}, where Z is a namespace holding a JSON numeric value representing an index

Some examples: let's work with this JSON:

bash $ JSS='["one", "two", ["three", "four", {"5 to 7": [ "five", "six", "seven"], "second 1": "one"  } ] ]'
bash $ <<<$JSS jtc
[
   "one",
   "two",
   [
      "three",
      "four",
      {
         "5 to 7": [
            "five",
            "six",
            "seven"
         ],
         "second 1": "one"
      }
   ]
]
  • among all JSON strings find those from 2nd till 5th inclusive:
bash $ <<<$JSS jtc -w'<>P1:5'
"two"
"three"
"four"
"five"

As it was mentioned, the quantifier indices may take values from the namespaces. Namespaces will be covered later, when directives covered, for now just take it: one way to set a value to the namespace is <var:value>v.

So, let's repeat the last example, but now using quantifier indices references in the namespaces:

  • among all JSON strings find those from 2nd till 5th inclusive:
bash $ <<<$JSS jtc -w'<Start:1>v<End:5>v <>P{Start}:{End}'
"two"
"three"
"four"
"five"

  • find all the string occurrences where letter e is present:
bash $ <<<$JSS jtc -w'<e>R:'
"one"
"three"
"five"
"seven"
"one"
  • find all the occurrences of string "one":
bash $ <<<$JSS jtc -w'<one>:'
"one"
"one"

Recursive vs Non-recursive search

In the last example, 2 instances of the string "one" were found. That's because a recursive search was applied (and hence the entire JSON tree was searched). Sometimes there'a need to perform a non-recursive search, i.e. to look for a match only among immediate children of a current iterable.

the JSON's root in the example is an array, so if we apply a non-recursive search on the root's array, only one match will be found:

bash $ <<<$JSS jtc -w'>one<:'
"one"

NOTE: the other subtle but a crucial difference is that a non-recursive search_ could be applied only on JSON iterables (i.e. arrays and objects) and it will fail on any other (atomic) types. While a recursive search could be applied onto any JSON type (even atomic).

The recursive search always begins from checking the currently selected (walked) entry, that's why it's possible to apply it even onto atomic types and match those:

bash $ <<<$JSS jtc -w'[0]<one>'
"one"
  • that feature of the recursive search comes handy when validating various JSON types (covered later)

Numerical searches

d, D, N - these are numerical searches suffixes, they share the same relevant semantics as string searches:

  • <number>d - searches for the occurrence(s) of exact match of a number in the JSON tree
  • <Regexp>D - performs an RE search for the regular expression Regexp among JSON numericals
  • <>N, <namespace>N - matches any JSON numerical value (a.k.a. JSON numerical type match), similar to <.*>D but faster. The lexeme might be empty or hold the namespace where matched value will be preserved (upon a match)
bash $ <<<$JSN jtc -w'<[13]>D1:'
1
3
bash $ <<<$JSN jtc -w'<3.14>d:'
3.14

Boolean and Null searches

b suffix stands for a boolean match, while n is a null match.

A boolean lexeme can be given in the following forms:

  • <>b, <namespace>b - in these forms, the search is performed among JSON boolean values only and matched value will be preserved in the namespace shall it be present in the lexeme
  • <true>b, <false>b - when a JSON boolean is spelled as a lexeme parameter, then it's not a namespace reference, but rather a spelled boolean value will be matched
bash $ <<<$JSN jtc -w'<>b:'
false

Json types searches

There are quite a handful of lexemes that search and match JSON types, in fact there are lexemes to cover all JSON type matches and even more. Four of those already have been covered: string type match <>P, numerical type match <>N, boolean type match <>b and null type match <>n. The others are:

  • <>a: atomic match, will match any of JSON atomic type (string, numerical, boolean, null)
  • <>o: object match, will match a JSON object type ({..})
  • <>i: array (indexable) match, will match a JSON array type ([..])
  • <>c: container type match, will match either of JSON iterable type (objects and/or arrays)
  • <>e: end node (leaf) match type, will match any of atomic types, or empty containers ({}, [])
  • <>w: wide type range match - will match any JSON type/value

All of those lexemes can stay empty, or hold the namespace that will be filled upon a successful match.

bash $ <<<$JSN jtc -rw'<>c:'
[ "abc", false, null, { "pi": 3.14 }, [ 1, "two", { "number three": 3 } ] ]
{ "pi": 3.14 }
[ 1, "two", { "number three": 3 } ]
{ "number three": 3 }

Arbitrary Json searches

lexeme with the suffix j can match any arbitrary JSON value:

bash $ <<<$JSN jtc -w'<{ "pi":3.14 }>j'
{
   "pi": 3.14
}

Even more, the parameter in the j lexeme can be a templated JSON:

bash $ <<<$JSN jtc -w'[4][2][0] <Nr3>v [^0] <{"pi": {Nr3}.14}>j [pi]'
3.14

That was the first complex walk-path shown, so, let's break it down:

  • '[4][2][0] will get to the value of "number three": 3 through offset subscripts
  • <Nr3>v - directive v will memorize the JSON number 3 in the namespace Nr3
  • [^0] will reset the walk path back to JSON root
  • <{"pi": {Nr3}.14}>j will first interpolate number 3 from the namespace Nr3 and then will find recursively the resulted JSON (which will be {"pi": 3.14})
  • [pi] will address the value in found JSON by the label offset, resulting in the final value 3.14

Obviously the j lexeme cannot be empty or result in an empty lexeme after template interpolation (as the empty space is not a valid JSON, as per spec).

There's another search lexeme suffix - s - that one will find a JSON pointed by a namespace:

bash $  <<<$JSN jtc -w'<PI:{"pi": 3.14}>v <PI>s'
{
   "pi": 3.14
}
bash $

The s lexeme also cannot be empty (it always must point to some namespace).

Original and Duplicate searches

q and Q lexemes allow finding original (first time seen) and duplicate elements respectively within the selected (walked) JSON tree. The lexemes cannot be empty - they point to a namespace which will be overwritten during the search and will be set to the found element (original or duplicate) once the match is found.

lexemes search for original or duplicate entries of any JSONs, not necessarily atomic types, here's an example:

bash $ JSD='{"Orig 1": 1, "Orig 2": "two", "list": [ "three", { "dup 1": 1, "dup 2": "two", "second dup 1": 1 } ]}'
bash $ <<<$JSD jtc
{
   "Orig 1": 1,
   "Orig 2": "two",
   "list": [
      "three",
      {
         "dup 1": 1,
         "dup 2": "two",
         "second dup 1": 1
      }
   ]
}

Let's see all the original elements in the above JSON:

bash $ <<<$JSD jtc -lrw'<org>q:'
{ "Orig 1": 1, "Orig 2": "two", "list": [ "three", { "dup 1": 1, "dup 2": "two", "second dup 1": 1 } ] }
"Orig 1": 1
"Orig 2": "two"
"list": [ "three", { "dup 1": 1, "dup 2": "two", "second dup 1": 1 } ]
"three"
{ "dup 1": 1, "dup 2": "two", "second dup 1": 1 }

As you can see there were listed all first seen JSON values (including the root itself)

Now, let's list all the duplicates:

bash $ <<<$JSD jtc -lrw'<dup>Q:'
"dup 1": 1
"dup 2": "two"
"second dup 1": 1

CAUTION: both of the lexemes facilitate their functions by memorizing in the namespace all the original values (from walked JSON node), thus both of them are quite memory hungry - keep it in mind when walking huge JSONs

Label searches

l, L, t - are suffixes to perform label based searches, they facilitate different kinds of label matching depending on the type of a search:

  • <lbl>l - finds recursively a JSON value with the label matching "lbl" exactly
  • <RE>L - finds recursively a JSON value with the label matching a regular expression RE
  • <NS>t - finds recursively a JSON value matching exactly the JSON string from the namespace NS
  • >lbl<l - addresses an immediate child in the JSON object with the label "lbl"
  • >RE<L - finds among immediate children in the JSON object a value with the label matching a regular expression RE
  • >NS<t - addresses an immediate child in the JSON object with the label from the namespace NS, or addresses an immediate child in the JSON iterable with the numerical index from the namespace NS

First two variants should not require much of a clarification, let's work with the following JSON:

bash $ JSL='{"One": 1, "obj": { "One": true, "Two": 2, "": 3 }, "45": "forty-five"}'
bash $ <<<$JSL jtc
{
   "45": "forty-five",
   "One": 1,
   "obj": {
      "": 3,
      "One": true,
      "Two": 2
   }
}
bash $ <<<$JSL jtc -rlw'<[oO]>L:'
"One": 1
"obj": { "": 3, "One": true, "Two": 2 }
"One": true
"Two": 2
bash $ <<<$JSL jtc -rlw'<One>l:'
"One": 1
"One": true

recursive form <NS>t will try matching a label from the specified namespace. The JSON type in the NS might be either JSON string, or JSON numeric, in the latter case, it's automatically converted to a string value and also can match a label expressed as a numerical value:

bash $ <<<$JSL jtc -lrw'<idx:45>v <idx>t'
"45": "forty-five"
bash $ <<<$JSL jtc -lrw'<idx:"45">v <idx>t'
"45": "forty-five"

All other JSON types in the NS will be ignored, such search will always return false.

Non-recursive behavior of label lexemes

Normally, a non-recursive search will try matching a value among immediate children of the JSON iterable. But for matching a label or index, the actual search (i.e., iterating over children of an iterable) is a superfluous task: indeed, when we want to match a value by a label (in a JSON object) or by an index (in a JSON array), we should be able to do so just by addressing them.

The non-recursive lexeme >..<l can match/address labels in JSON objects only. That makes it (with a default quantifier) just another variant of a literal subscripts. The latter lacks one ability - to address labels spelled as numerical value: in the above JSON, it won't be possible to address a JSON value "forty-five" via literal subscript, but using >..<l lexeme it is:

bash $ <<<$JSL jtc -rlw'[45]'
bash $
bash $ <<<$JSL jtc -rlw'>45<l'
"45": "forty-five"

The lexeme >..<t can do both JSON objects and arrays: - if a namespace referred by the lexeme has the type JSON string, then it will address the label (from the namespace) in the JSON object:

bash $ <<<$JSL jtc -lw'<lbl:"45">v >lbl<t'
"45": "forty-five"

- if the lexeme's namespace is set to JSON numeric type, then it can address JSON iterables by the index:

bash $ <<<$JSL jtc -lw'<idx:2">v [obj]>idx<t'
"Two": 2

Relative quantifiers

There's another feature of how these lexemes operate. Think of a quantifier instance for the lexemes: any parsed objects will hold only 1 unique label - there cannot be two equal labels among immediate children of the same JSON object.

Even though it's possible to pass to a parser an object that will hold non-unique labels, the JSON RFC (8259) does not define software behavior in that regard. jtc in such case retains the first parsed value:

bash $ <<<'{"abc":1, "abc":2}' jtc
{
   "abc": 1
}

NOTE: holding two non-unique labels would render such JSON object non-addressable: indeed, in the above JSON, if the object were to hold both values, which value then to select when addressed by the label "abc"?

Getting back to the quantifiers: any parsed objects will hold only one unique label, and so does array with its indices. If so, the usual semantic of quantifiers as a search instance in lexemes >..<l and >..>t is moot: we know that in the parsed object cannot be a second, third, forth (and so on) the same label, there can be only one. The same applies to arrays: there can be only one unique index in there, thus only instance 0 (the first and the only instance) makes sense.

Thus, a usual semantic of quantifier as a match instance (except instance 0) in these lexemes is meaningless.

Therefore it was overloaded with a different and quite handy one: a quantifier in these non-recursive lexemes allows addressing neighbors (sibling) of the matched entry. I.e., the quantifier here becomes relative (to the matched entry) and therefore can take a negative value - that is the only case when a quantifier may go negative.

Observe a relative quantifier in action:

bash $ <<<$JSL jtc -w'[obj]'
{
   "": 3,
   "One": true,
   "Two": 2
}
bash $ <<<$JSL jtc -lw'[obj] >One<l'
"One": true
bash $ <<<$JSL jtc -lw'[obj] >One<l-1'
"": 3
bash $ <<<$JSL jtc -lw'[obj] >One<l1'
"Two": 2

The relative quantifiers though are fully compatible with the quantifiers range-notation:

bash $ <<<$JSL jtc -lw'[obj] >One<l:'
"": 3
"One": true
"Two": 2

Scoped searches

When you would like to perform a search but only among values under a specific label, it's known as a scoped search. The syntax for a scoped search is rather symple: a literal offset is appeded wtih the search lexeme over colon :, e.g.: [some label]:<search lexeme> All form of quantifiers and search suffixes are supported, except label searches: l, L and t - understandably, a label cannot be scoped by a label.

For example:

bash $ <<<$JSL jtc -w'[One]:<org>q:'
1
true

Regex searches

All search lexemes supporting regex based searching have been already covered: <..>R, <..>L, <..>D, they support (RE) matching of strings, labels and numerical JSON values respectively.

There's one more aspect to it though: the regular expression also support sub-groups. Upon a successful match, the subgroups will automatically setup respective namespaces, e.g.: the first subgroup will populate a namespace $1, the second will do $2 and so on. Plus, the entire match will populate the namespace $0.

That way it's possible to extract any part(s) from the found JSON values for a later re-use.

bash $ <<<$JSL jtc -w'<(.*)[oO](.*)>L:' -T'{ "sub-group 1":{{$1}}, "sub-group 2":{{$2}}, "entire match":{{$0}} }'
{
   "entire match": "One",
   "sub-group 1": "",
   "sub-group 2": "ne"
}
{
   "entire match": "obj",
   "sub-group 1": "",
   "sub-group 2": "bj"
}
{
   "entire match": "One",
   "sub-group 1": "",
   "sub-group 2": "ne"
}
{
   "entire match": "Two",
   "sub-group 1": "Tw",
   "sub-group 2": ""
}

Directives and Namespaces

Directives:

There're few lexemes that look like searches but they do not do any searching/matching, instead they apply certain actions onto the currently walked JSONs elements or paths. They are known as directives. Directives are distinguishable from the search lexemes only by the suffix.

Directives are typically agnostic to recursive or non-recursive forms of spelling, except one - F, where the spelling has a semantical meaning. Also, the directives do not support quantifiers (the quantifiers are parsed, but silently ignored).

Namespaces:

Namespace is a way to facilitate variables in jtc. The namespace is implemented as a container (in the currently processed JSON) that holds all (sub) JSON values preserved during waling a walk-path.

The JSON values found in the namespace could be re-used later either during the interpolation, or in other lexemes of the same walk or in different (subsequent) walks.

The namespace could be populated (setup) by a lexeme either from currently walked JSON element, or with an arbitrary JSON value (in the lexeme). e.g.:

In this walk-path example a search lexeme a will find (recursively) the first occurrence of any atomic JSON and will populate the namespace Atomic with a found JSON element: -w'<Atomic>a' In this walk-path example, the same search lexeme will populate the namespace Atomic with the empty JSON array [] instead of a found atomic JSON: -w'<Atomic:[]>a'

The ability (and the form) to setup an arbitrary JSON value is universal for all lexemes (and directives) that are capable of preserving values in the namespace.

Preserve a currently walked value in the namespace

The directive <NS>v preserves currently walked value in the namespace NS. Many search lexemes are capable of doing the same on their own, but for others, as well as for the subscripts, it's still a useful feature.

bash $ <<<$JSN jtc
[
   "abc",
   false,
   null,
   {
      "pi": 3.14
   },
   [
      1,
      "two",
      {
         "number three": 3
      }
   ]
]
bash $ <<<$JSN jtc -w'[4][0]<Idx>v[-1]>Idx<t'
"two"

It's fun to see how jtc works in a slow-mo, building a walk-path step by step, one lexeme at a time:

bash $ <<<$JSN jtc -w'[4]'
[
   1,
   "two",
   {
      "number three": 3
   }
]

- addressed there the 5th JSON element in the JSON root (always begin walking from the root)

bash $ <<<$JSN jtc -w'[4][0]'
1

- addressed the 1st JSON value in the JSON iterable (found in the prior step)

bash $ <<<$JSN jtc -w'[4][0]<Idx>v'
1

- memorized a currently walked JSON in the namespace Idx (which is the JSON numeric 1)

bash $ <<<$JSN jtc -w'[4][0]<Idx>v[-1]'
[
   1,
   "two",
   {
      "number three": 3
   }
]

- stepped one level up (towards the root) from the last walked JSON

bash $ <<<$JSN jtc -w'[4][0]<Idx>v[-1]>Idx<t'
"two"

- using a value from the namespace Idx found an offset in the JSON iterable (the numeric value 1 stored in Idx points to a 2nd element in the JSON array, b/c all indices are zero-based)

Preserve a label of a currently walked

The directive <NS>k functions pretty much like <NS>v, but instead of preserving a JSON value, it'll store in the namespace NS its label (if currently walked element is a child of JSON object), or its index (if the currently walked element is a child of a JSON array):

bash $ <<<$JSN jtc
[
   "abc",
   false,
   null,
   {
      "pi": 3.14
   },
   [
      1,
      "two",
      {
         "number three": 3
      }
   ]
]
bash $ <<<$JSN jtc -w'<{"pi":3.14}>j<idx>k' -T'{idx}'
3

If the lexeme is empty (<>k) AND is the last one in the walk-path, then it does not memorize (obviously) the label/index in the namespace, but instead re-interprets the label as the JSON value. That way it become possible to rewrite labels in update (-u) operations, or re-use it in template interpolation.

bash $ <<<$JSN jtc -w'<{"pi":3.14}>j<>k'
3
bash $ <<<$JSN jtc -w'<{"pi":3.14}>j<>k' -T'{"idx": {{}}}' -r
{ "idx": 3 }

The described effect occurs only if the empty <>k lexeme appears the last in the walk-path, if the lexeme appears somewhere in the middle of the walk-path, the lexeme is completely meaningless in that form and has no effect at all.

Erase namespace

The directive <NS>z allows erasing the namespace NS. Mostly, this would be required when used together with walk branching.

For example, let's replace all even numbers in the array with their negative values:

bash $ <<<$'[1,2,3,4,5,6,7,8,9]' jtc -w'<Num>z[:]<>f<[02468]$>D:<Num>v' -T'-{Num}' -jr
[ 1, -2, 3, -4, 5, -6, 7, -8, 9 ]

If the walk began w/o initial lexeme erasing namespace Num, then the whole attempt would fail:

bash $ <<<$'[1,2,3,4,5,6,7,8,9]' jtc -w'[:]<>f<[02468]$>D:<Num>v' -T'-{Num}' -jr
[ 1, -2, -2, -4, -4, -6, -6, -8, -8 ]

Of course, knowing how Regex lexemes work, it's possible to rewrite the walk-path in a bit more succinct way:

bash $ <<<$'[1,2,3,4,5,6,7,8,9]' jtc -w'<$0>z[:]<>f<[02468]$>D:' -T'-{$0}' -jr
[ 1, -2, 3, -4, 5, -6, 7, -8, 9 ]

Walk branching

Normally, all the lexemes in the walk-path are contatenated with the logical operator AND (i.e., a walk is suscessful only if all lexemes are). Directives <..>f, <..>F and ><F introduce walk branching (the easiest way to think of it as of if .. else ..), i.e. they facilitate a control-flow logic of the walk-path exectution.

Note: the direcitve F is sensitive to the lexeme spelling (a recursive vs a non-recursive form) and provides different reactions for each of the form (this is the only directive so far that is sensitive to the lexeme encasement, all others are not).

Fail-safe directive

<>f is a fail-safe directive (facilitating if part). Once walked, it memorizes the internally maintained path to the currently walked JSON element and reinstate it, shall the walk past <>f directive fails: