Cookbook

pkoppstein edited this page Dec 16, 2016 · 30 revisions

For delicacies too choice for the manual.

TOC

Find the maximal elements of an array or stream

# Given an array of values as input, generate a stream of values of the 
# maximal elements as determined by f.
# Notes:
# 1. If the input is [] then the output stream is empty.
# 2. If f evaluates to null for all the input elements,
#    then the output stream will be the stream of all the input items.

def maximal_by(f):
  (map(f) | max) as $mx
  | .[] | select(f == $mx);

Example:

[ {"a":1, "id":1},  {"a":2, "id":2}, {"a":2, "id":3}, {"a":1, "id":4} ] | maximal_by(.a)

emits the objects with "id" equal to 2 and 3.

The above can also be used to find the maximal elements of a stream, but if the stream has a very large number of items, then an approach that requires less space might be warranted. Here is a stream-oriented function that avoids constructing any arrays:

def maximals_by(stream; f):
   (reduce stream as $x (null;  ($x|f) as $y | if . == null or . < $y then $y else . end)) as $mx
   | stream
   | select(f == $mx);

Emit the ids of JSON objects in a Riak database

The following script illustrates how curl and jq can work nicely together, especially if the entities stored at each Riak key are JSON entities.

The specific task we consider is as follows:

Task: Given a Riak database at $RIAK with a bucket $BUCKET, and assuming that each value at each riak key is a JSON entity, then for each top-level object or array of objects, emit the value if any of its "id" key; the values should be emitted as a stream, it being understood that if any of the objects does not have an "id" key, then it should be skipped.

The following script has been tested as a bash script with these values for RIAK and BUCKET:

RIAK=http://127.0.0.1:8098
BUCKET=test
curl -Ss "$RIAK/buckets/$BUCKET/keys?keys=stream" |\
  jq -r '.keys[] | @uri' |\
while read key
do
  curl -Ss "$RIAK/buckets/$BUCKET/keys/$key?keys"
done | jq 'if type == "array" then .[] | .id elif type == "object" then .id else empty end'

Filter objects based on the contents of a key

E.g., I only want objects whose genre key contains "house".

$ json='[{"genre":"deep house"}, {"genre": "progressive house"}, {"genre": "dubstep"}]'
$ echo "$json" | jq -c '.[] | select(.genre | contains("house"))'
{"genre":"deep house"}
{"genre":"progressive house"}

If it is possible that some objects might not contain the key you want to check, and you just want to ignore the objects that don't have it, then the above will need to be modified. For example:

$ json='[{"genre":"deep house"}, {"genre": "progressive house"}, {"volume": "wubwubwub"}]'
$ echo "$json" | jq -c '.[] | select(.genre | . and contains("house"))'

If your version of jq supports ? then it could also be used:

$ echo "$json" | jq -c '.[] | select(.genre | contains("house"))?'

In jq version 1.4+ (that is, in sufficiently recent versions of jq after 1.4), you can also use regular expressions, e.g. using the "$json" variable defined above:

$ echo "$json" | jq -c 'map( select(.genre | test("HOUSE"; "i")))'
[{"genre":"progressive house"},{"genre":"progressive house"}]

Note: use a semi-colon (";") to separate the arguments of test.

Find the most recent object in an S3 bucket

$ json=`aws s3api list-objects --bucket my-bucket-name`
$ echo "$json" | jq '.Contents | max_by(.LastModified) | {Key}' 

Add an element to an object array

Given an array of objects, I want to add another key to all elements in each of those objects based on existing keys:

  $ json='[{"a":1,"b":2},{"a":1,"b":1}]'
  $ echo "$json" | jq 'map(. + {color:(if (.a/.b) == 1 then "red" else "green" end)})'

>> [{"color":"green","b":2,"a":1},{"color":"red","b":1,"a":1}]

Explanation This example uses the map() operator. The filter for map copies all the keys of the input object using . and then merges this new object with the color object using the + operator. The color object itself is formed using the if conditional operator.

Note that this could also be done in the following manner:

jq 'map(.color = if (.a/.b) == 1 then "red" else "green" end)'

Zip column headers with their rows

Given the following JSON:

{
    "columnHeaders": [
        {
            "name": "ga:pagePath",
            "columnType": "DIMENSION",
            "dataType": "STRING"
        },
        {
            "name": "ga:pageviews",
            "columnType": "METRIC",
            "dataType": "INTEGER"
        }
    ],
    "rows": [
        [ "/" , 8 ],
        [ "/a", 4 ],
        [ "/b", 3 ],
        [ "/c", 2 ],
        [ "/d", 1 ]
    ]
}

How can I convert this into a form like:

[
    { "ga:pagePath": "/", "ga:pageviews": 8 },
    { "ga:pagePath": "/a", "ga:pageviews": 4 },
    { "ga:pagePath": "/b", "ga:pageviews": 3 },
    { "ga:pagePath": "/c", "ga:pageviews": 2 },
    { "ga:pagePath": "/d", "ga:pageviews": 1 }
]

Explanation Okay, so first we want to get the columnHeaders as an array of names:

(.columnHeaders | map(.name)) as $headers

Then, for each row, we take the $headers as entries (if this doesn't mean anything to you, refer to the with_entries section of the manual) and we use those to create a new object, in which the keys are the values from the entries and the values are the corresponding values on the row for each of said entries. Tricky, I know.

.rows
  | map(. as $row
        | $headers
        | with_entries({ "key": .value,
                         "value": $row[.key]}) )

Then we put it all together: wrapping it on a filter is left as an exercise for the reader.

(.columnHeaders | map(.name)) as $headers
| .rows
| map(. as $row
      | $headers
      | with_entries({"key": .value,
                      "value": $row[.key]}) )

(This recipe is from #623.)

Delete elements from objects recursively

A straightforward and general way to delete key/value pairs from all objects, no matter where they occur, is to use walk/1. (If your jq does not have walk/1, then you can copy its definition from https://github.com/stedolan/jq/blob/master/src/builtin.jq)

For example, to delete all "foo" keys, you could use the filter:

walk(if type == "object" then del(.foo) else . end)

It may also be possible to use the recurse builtin, as shown in the following example.

Let's take the recurse example from the manual, and add a bunch of useless {"foo": "bar"} to it:

{"name": "/", "foo": "bar", "children": [
  {"name": "/bin", "foo": "bar", "children": [
    {"name": "/bin/ls", "foo": "bar", "children": []},
    {"name": "/bin/sh", "foo": "bar", "children": []}]},
  {"name": "/home", "foo": "bar", "children": [
    {"name": "/home/stephen", "foo": "bar", "children": [
      {"name": "/home/stephen/jq", "foo": "bar", "children": []}]}]}]}

recurse(.children[]) | .name will give me all the names, but destroy the structure of the JSON in the process.

Is there a way to get that information, but preserve the structure?

That is, with the JSON above as input, the desired output would be:

{"name": "/", "children": [
  {"name": "/bin", "children": [
    {"name": "/bin/ls", "children": []},
    {"name": "/bin/sh", "children": []}]},
  {"name": "/home", "children": [
    {"name": "/home/stephen", "children": [
      {"name": "/home/stephen/jq", "children": []}]}]}]}

Explanation

In order to remove the "foo" attribute from each element of the structure, you want to recurse through the structure and set each element to the result of deleting the foo attribute from itself. This translates to jq as:

recurse(.children[]) |= del(.foo)

If, instead of blacklisting foo, you'd rather whitelist name and children, you could do something like:

recurse(.children[]) |= {name, children}

(This recipe is from #263.)

Extract Specific Data for While Loop in Shell Script

Thanks to @pkoppstein and @wtlangford in Issue #663, I (@RickCogley) was able to finalize a shell script to pull descriptive metadata from a database of ours, which has a REST interface.

This cookbook entry makes use of curl, while read loops, and of course jq in a bash shell script. Once the JSON metadata files are output, they can be git pushed to a git repo, and diffed to see how the database settings change over time.

We assume a JSON stream like the following, with unique values for table id's, aliases and names:

{
  "id": "99999",
  "name": "My Database",
  "description": "Lorem ipsum, the description.",
  "culture": "en-US",
  "timeZone": "CST",
  "tables": [
    {
      "id": 12341,
      "recordName": "Company",
      "recordsName": "Companies",
      "alias": "t_12341",
      "showTab": true,
      "color": "#660000"
    },
    {
      "id": 12342,
      "recordName": "Order",
      "recordsName": "Orders",
      "alias": "t_12342",
      "showTab": true,
      "color": "#006600"
    },
    {
      "id": 12343,
      "recordName": "Order Item",
      "recordsName": "Order Items",
      "alias": "t_12343",
      "showTab": true,
      "color": "#000099"
    }
  ]
}

... the goal is to extract to a file only the table aliases using curl against a db's REST interface, then use the file's aliases as input to a while loop, in which curl again can be used to grab the details about tables.

First we set variables, then run curl against the REST API. The resulting JSON stream has no newlines, so piping it through jq '.' fixes this (bonus, if you also have XML, you can pipe it through xmllint to get a similar effect: xmllint --format -). The result is output to a file which contains JSON like the above.

#!/bin/bash
db_id="98765"
db_rest_token="ABCDEFGHIJK123456789"
compcode="ACME"

curl -k "https://mydb.tld/api/$db_id/$db_rest_token/getinfo.json" |\
  jq '.' > $compcode-$db_id-Database-describe.json 
jq -r '.tables[] | "\(.alias) \(.recordName)"' \
  $compcode-$db_id-Database-describe.json > $compcode-tables.txt

The filter '.tables[] | "\(.alias) \(.recordName)"' selects the "tables" array, then from that, uses the filter "\(.foo) \(.bar)" to create a string with just those elements. Note, the -r here gives you just raw output in the file, which is what you need for the while read loop.

The output file looks like:

t_12341 Company
t_12342 Order
t_12343 Order Item

Next, the shell script uses a while read loop to parse that output file $compcode-tables.txt, then curl again to get table-specific info using the table alias talias as input. It passes the raw JSON output from the REST i/f through jq '.' to add newlines, then outputs that to a file using the two loop variables in the filename (as well as variables from the top of the script).

while read talias tname
do
  curl -k "https://mydb.tld/api/$db_id/$db_rest_token/$talias/getinfo.json" |\
    jq '.' >"$compcode-$db_id-Table-$talias-$tname-getinfo.json"
done < $compcode-tables.txt

The result is a collection of files like these:

ACME-98765-Table-t_12341-Company-getinfo.json
ACME-98765-Table-t_12342-Order-getinfo.json
ACME-98765-Table-t_12343-Order Item-getinfo.json

... that can be committed to a git repo, for diffing.

Extract data and set shell variables

A variation on the preceding entry:

$ eval "$(jq -r '@sh "a=\(.a) b=\(.b)"')"

This works because the @sh format type quotes strings to be shell-eval safe.

Another variant:

$ jq -r '@sh "a=\(.a) b=\(.b)"' | while read -r line; do eval "$line"; ...; done

To share multiple values without using eval, consider setting a bash array variable, e.g.

    vars=( $(jq -n -r '[1,2] | .[] | @sh' ) )
    for f in "${vars[@]}"; do echo "$f" ; done
    1
    2

For Windows, here is a .bat file that illustrates two approaches using jq. In the first example, the name of the variable is determined in the .bat file; in the second example, the name is determined by the jq program:

@echo off
setlocal

for /f "delims=" %%I in ('jq -n -r "\"123\""') do set A=%%I
echo A is %A%

jq -n -r  "@sh \"set B=123\"" > setvars.bat
call .\setvars.bat
echo B is %B%

Convert a CSV file with Headers to JSON

There are several freely available tools for converting CSV files to JSON. For example, the npm package d3-dsv (npm install -g d3-dsv) includes a command-line program named csv2json, which expects the first line of the input file to be a header row, and uses these as keys. Such tools may be more convenient than jq for converting CSV files to JSON, not least because there are several "standard" CSV file formats.

For trivially simple CSV files, however, the jq invocation jq -R 'split(",")' can be used to convert each line to a JSON array. If the trivially simple CSV file has a row of headers, then as shown below, jq can also be used to produce a stream or array of objects using the header values as keys.

In this recipe, therefore, we will assume that either the CSV is trivially simple or that a suitable tool for performing the basic row-by-row conversion to JSON arrays is available. One such tool is any-json.

The following jq program expects as input an array, the first element of which is to be interpreted as a row of headers, and the other elements of which are to be interpreted as rows.

    # Requires: jq 1.5

    # objectify/1 expects an array of atomic values as inputs, and packages
    # these into an object with keys specified by the "headers" array and
    # values obtained by trimming string values, replacing empty strings
    # by null, and converting strings to numbers if possible.
    def objectify(headers):
      def tonumberq: tonumber? // .;
      def trimq: if type == "string" then sub("^ +";"") | sub(" +$";"") else . end;
      def tonullq: if . == "" then null else . end;
      . as $in
      | reduce range(0; headers|length) as $i
          ({}; .[headers[$i]] = ($in[$i] | trimq | tonumberq | tonullq) );

    def csv2jsonHelper:
      .[0] as $headers
      | reduce (.[1:][] | select(length > 0) ) as $row
          ([]; . + [ $row|objectify($headers) ]);

    csv2jsonHelper

Usage example:

$ any-json input.csv | jq -f csv2json-helper.jq

Processing a large number of lines or JSON entities

Using jq 1.4 to process a file consisting of a large number of JSON entities or lines of raw text can be very challenging if any kind of reduction step is necessary, as the --slurp option requires the input to be stored in memory. One way to circumvent the limitations of jq 1.4 in this respect would be to break up the input file into smaller pieces, process them separately (perhaps in parallel), and then combine the results. Examples and utilities for parallel processing using jq can be found in jq-hopkok's parallelism folder.

The introduction of the inputs builtin in jq 1.5 allows files to be read in efficiently on an entity-by-entity or line-by-line basis. That is, the entire file no longer need be read in using the "slurp" option.

Here is an example drawn from http://stackoverflow.com/questions/31035704/use-jq-to-count-on-multiple-levels.

The input file consists of JSON entities, like so:

{"machine": "possible_victim01", "domain": "evil.com", "timestamp":1435071870}
{"machine": "possible_victim01", "domain": "evil.com", "timestamp":1435071875}
{"machine": "possible_victim01", "domain": "soevil.com", "timestamp":1435071877}
{"machine": "possible_victim02", "domain": "bad.com", "timestamp":1435071877}
{"machine": "possible_victim03", "domain": "soevil.com", "timestamp":1435071879}

The task is to produce a report consisting of a single object, like so:

    {
      "possible_victim01": {
        "total": 3,
        "evildoers": {
          "evil.com": 2,
          "soevil.com": 1
        }
      },
      "possible_victim02": {
        "total": 1,
        "evildoers": {
          "bad.com": 1
        }
      },
      "possible_victim03": {
        "total": 1,
        "evildoers": {
          "soevil.com": 1
        }
      }
    }

Here is a straightforward jq program that will do the job:

reduce inputs as $line
  ({};
   $line.machine as $machine
   | $line.domain as $domain
   | .[$machine].total as $total
   | .[$machine].evildoers as $evildoers
   | . + { ($machine): {"total": (1 + $total),
                        "evildoers": ($evildoers | (.[$domain] += 1)) }} )

The program would be invoked with the -n option, e.g. like so:

jq -n -f program.jq data.json

The -n option is required as the invocation of inputs does the reading of the file.

If the task requires both per-line (or per-entity) processing as well as some kind of reduction, then the foreach builtin, also introduced in jq 1.5, is very useful, as it obviates the need to accumulate anything that is not required for the reduction.

The trick is to use foreach (inputs, null) rather than just foreach inputs. As a simple example, suppose we have a file consisting of a large number of JSON objects, some of which have a key, say "n", and we are required to extract the corresponding values as well as determine the number of objects for which the "n" value is present and not null.

foreach (inputs, null) as $line 
  (0;
   if $line.n then .+1 else . end;
   if $line == null then . else $line.n // empty end)

Processing a huge JSON array of items

In jq 1.5 there is a --stream option that enables a streaming JSON parser. Streaming can process huge JSON texts incrementally without having to wait until the whole input is parsed.

Here is a simple example that shows how to produce a stream of "leaflets", that is, a stream of JSON entities, one for each "leaf", where each "leaflet" reflects the original structure:

jq -c --stream '. as $in | select(length == 2) | {}|setpath($in[0]; $in[1])' # stream of leaflets

Input:

{"a":1, "b": {"c": 3}}

Output:

{"a":1}
{"b":{"c":3}}

For further information about the streaming parser, see the jq Manual and the FAQ.

List keys used in any object in a list

If you have an array of JSON objects and want to obtain a listing of the top-level keys in these objects, consider:

add | keys

If you want to obtain a listing of all the keys in all the objects, no matter how deeply nested, you can use this filter:

[.. | objects | keys[]] | unique

For example, given the array: [{"a": {"b":1}}, {"a": {"c":2}}]

the previous filter will produce: ["a", "b", "c"]

Import a module and call its functions

  • Specify search path with -L <path> # the path may be absolute
  • import "string" as symbol; # where string is file name of the module without the .jq extension
  • :: is the scope resolution operator, e.g. builtin::walk

Example:

  1. Copy the definition of walk/1 to $HOME/jq/library.jq (see e.g. https://github.com/stedolan/jq/blob/master/src/builtin.jq)

  2. Invoke jq with the -L option:

jq -L $HOME/jq 'import "library" as lib;
   lib::walk(if type == "object" then del(.foo) else . end)' <<< '{"a":1, "foo": 2}'