Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exercism jq Cookbook #5055

Closed
sshine opened this issue Oct 1, 2019 · 4 comments
Closed

Exercism jq Cookbook #5055

sshine opened this issue Oct 1, 2019 · 4 comments

Comments

@sshine
Copy link

sshine commented Oct 1, 2019

For a while I've wanted to make a jq cookbook aimed at Exercism. So far this has only amounted to draft blog posts, so to get the ball rolling, I'm collecting practical snippets here. Please comment with corrections or improvements, additional snippets or questions of how to do something, and I will move them to the top.

For now, the "tutorial" part of the cookbook is just embedded as it goes along. I worked on a separate section, but since the scope wasn't clear, I kept going back and revising it constantly, and it became a drag. I'll extract this part eventually.

Eventually we may find a better place to keep this cookbook.

A list of all exercise slugs for a track

$ jq '[ .exercises[].slug ]' config.json
[
  "hello-world",
  "leap",
  "space-age",
  ...
]

$ jq -c '.exercises | map(.slug)' config.json
["hello-world","leap","space-age",...]

$ jq -r '.exercises[].slug' config.json
hello-world
leap
space-age
...

Explanation:

  • --compact-output / -c: Strip all whitespace.
  • The top-level object in config.json becomes the implicit input to the jq expression.
  • [ .exercises[].slug ] is equivalent to [ .exercises | .[] | .slug ].
  • This list of exercise objects, extracted with .exercises, is filtered through the array value iterator, .[], which splits each entry into multiple results, each of which is filtered through .slug that reduces an object to the value indexed by the key "slug".
  • [ .[] ] is the no-op filter for lists: .[] breaks the list apart, and [ ... ] assembles it again.
  • For an understanding of "multiple results", try and omit the [ ... ] constructor.
  • --raw-output / -r: "With this option, if the filter's result is a string then it will be written directly to standard output rather than being formatted as a JSON string with quotes. This can be useful for making jq filters talk to non-JSON-based systems."
  • Notice that the -r version doesn't pack the result into a list, so each of the multiple results is a string and gets printed without the JSON string wrapping.
  • map() is a built-in higher-order combinator. If it didn't exist, it could be written as:
$ jq -c 'def map(f): [ .[] | f ]; .exercises | map(.slug)' config.json`
["hello-world","leap","space-age",...]

A list of non-deprecated exercises for a track

$ jq -c '[ .exercises[] | select(.deprecated != true) | .slug ]' config.json
["hello-world","leap","space-age",...]

$ jq -c '.exercises | map(select(.deprecated != true)) | map(.slug)' config.json
["hello-world","leap","space-age",...]

$ jq -c '.exercises | map(select(.deprecated | not) | .slug)' config.json
["hello-world","leap","space-age",...]

Explanation:

  • Equivalent to the previous snippet, but with an extra select(.deprecated != true).
  • select(): "The function select(foo) produces its input unchanged if foo returns true for that input, and produces no output otherwise."
  • Because the .[] operator splits the list into multiple results, the subsequent filters select(.deprecated != true) and .slug take each of those results as their implicit argument.
  • map(f) | map(g) = [ .[] | f ] | [ .[] | g ] = [ .[] | (f | g) ] = map(f | g).
  • not is a combinator, so it takes its argument implicitly. null is interpreted as false.
  • It may be confusing even to functional programmers that map(select(...)) filters, but this is because select() either outputs zero or one result based on its predicate, so map(f) is better understood as "concat ∘ map f".

A list of core exercises for a track

$ jq -c '[ (.exercises[] | select(.core)) | .slug ]' config.json
["hello-world","leap","space-age",...]

$ jq -c '.exercises | map(select(.core)) | map(.slug)' config.json
["hello-world","leap","space-age",...]

$ jq -c '.exercises | map(select(.core) | .slug)' config.json
["hello-world","leap","space-age",...]

It should be reasonable to assume that no exercises are marked as both core and deprecated, but I don't think this is documented anywhere, and configlet lint . does not complain about it.

A list of canonical exercises

$ (cd ~/exercism/problem-specifications/exercises && printf '%s\0' *) | \
    jq --raw-input --slurp 'sub("\u0000$"; "") | split("\u0000")'

Explanation:

  • (cd ... && ...) invokes a sub-shell within which the directory change happens only.
  • cd ... && printf '%s\0' * is safer than ls ... because ls will not differentiate between linebreaks between filenames and linebreaks in filenames. * is glob expansion. \0 is picked as a separator because filenames cannot contain NUL bytes.
  • --slurp / -s: "Instead of running the filter for each JSON object in the input, read the entire input stream into a large array and run the filter just once."
  • --raw-input / -R: "Don't parse the input as JSON. Instead, each line of text is passed to the filter as a string. If combined with --slurp, then the entire input is passed to the filter as a single long string."
  • split("separator") works as you may expect. "\u0000" is a NUL byte in a jq string.
  • Because jq currently (and unwarranted) strips some NUL bytes when slurping (but not always), a forward-compatible fix is to deliberately strip the trailing NUL byte with sub("\u0000$"; ""). Yes, that's regex.
  • Thanks to @Niiiil and @geirha for the advice on reading directory contents safely (filenames may contain line breaks) and for the trailing NUL byte fix. This advice does not only apply to jq but to shell scripting in general.

A list of canonical exercises not implemented for a track

$ jq -c --argjson canonical \
        "$(cd ~/exercism/problem-specifications/exercises \
           && printf '%s\0' * | jq -R 'sub("\u0000$"; "") | split("\u0000")')" \
    '$canonical - (.exercises | map(.slug))' config.json
["affine-cipher","book-store","circular-buffer",...]

$ jq -r --argjson canonical \
        "$(cd ~/exercism/problem-specifications/exercises \
           && printf '%s\0' * | jq -R 'sub("\u0000$"; "") | split("\u0000")')" \
    '$canonical - (.exercises | map(.slug)) | .[]' config.json
affine-cipher
book-store
circular-buffer
...
  • Thanks to @pkoppstein for this comment on reading the lines of a command into a jq variable. Unfortunately, bash 4.4 will warn when passing a NUL byte via $() (because bash can't actually store NUL bytes in strings), so a workaround is made.
  • --raw-input / -R: "Don't parse the input as JSON. Instead, each line of text is passed to the filter as a string. If combined with --slurp, then the entire input is passed to the filter as a single long string."
  • --argjson varname value: "This option passes a JSON-encoded value to the jq program as a predefined variable. If you run jq with --argjson foo 123, then $foo is available in the program and has the value 123."
  • Subtraction on lists means takes the difference. The parenthesis around .exercises | map(.slug) is necessary because - binds tighter than |.

A list of all used topics for a track

$ jq '.exercises | map(.topics | select(. != null)[]) | unique' config.json
["accumulator_strictness","algorithms","bitwise_operations",...]

$ jq -r '.exercises | map(.topics | select(. != null)[]) | unique[]' config.json
accumulator_strictness
algorithms
bitwise_operations
...

Explanation:

  • .exercises | map(.topics) is a list of lists of strings.
  • To concatenate them into a single list, .exercises | map(.topics[]) would work, but because .topics can be null and null[] is a run-time error, we remove the nulls with select(. != null).
  • The total list (hopefully) has a lot of duplicates which are removed with the built-in unique.
  • In the -r version an [] array value iterator is added at the end to produce multiple string results instead of a single list result with strings in it.

A map of (non-deprecated) exercise slugs to unlocking exercise slug

$ jq '.exercises | map(select(.deprecated | not) |
                       { (.slug): .unlocked_by })
                 | add' config.json
{
  "hello-world": null,
  "leap": null,
  "space-age": null,
  ...
  "armstrong-numbers": "sum-of-multiples",
  "difference-of-squares": "nucleotide-count",
  "acronym": "hello-world",
  ...
}

Explanation:

  • .exercises | ... reduces the config to the list of its exercises.
  • map(select(.deprecated | not)) filters out deprecated exercises.
  • map(... | { (.slug): .unlocked_by }) converts the remaining into a smaller object.
  • ... | add merges all those objects into a single one with each slug key being unique.
@iHiD
Copy link
Member

iHiD commented Oct 1, 2019

This is awesome!

I want to really revamp the docs repo, so this will be perfect there when that's done.

@ErikSchierboom
Copy link
Member

This is awesome work! I've seen several people use jq very effectively, but so far I haven't yet used it myself. This has got me very intrigued!

@workingjubilee
Copy link

workingjubilee commented Apr 26, 2020

I recently used this to map all the exercises with a given unlock to another unlock.

jq 'setpath(["exercises"]; [] + .exercises |
    map(if .unlocked_by == "atbash-cipher"
        then setpath(["unlocked_by"]; "luhn")
        else . end))' ./exercism-rust/config.json

@ErikSchierboom
Copy link
Member

Hello 👋

With the launch of Exercism v3, we are closing all issues in this repository to help give us a clean slate to detect new problems. If this issue is still relevant to Exercism v3 (e.g. it's a feature that we haven't implemented in v3, or a bug that still exists), please reopen it and we will review it and post an update on it as soon as we get chance.

Thanks for helping make Exercism better, and we hope you enjoy v3 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants