-
Notifications
You must be signed in to change notification settings - Fork 3
bash parallel
ghdrako edited this page Mar 20, 2023
·
8 revisions
- https://www.gnu.org/software/parallel/parallel_design.html#pipepart-vs-pipe
- https://thenybble.de/posts/json-analysis/
Syntax:
parallel ::: prog1 prog2
- Parallel can split up files in an efficient manner using the
--pipe-part
option
keep our output in the original order, so we add the --keep-order argument. The default configuration, --group, would buffer input for each job until it is finished. Depending on your exact query, this will require buffering to disk if the query output can’t fit in main memory. This is probably not the case, so using --group would be fine. However, we can do slightly better with --line-buffer, which, in combination with --keep-order, starts output for the first job immediately, and buffers output for other jobs. This should require slightly less disk space or memory, at the cost of some CPU time. Both will be fine for “normal” queries, but do some benchmarking if your query generates large amounts of output.
parallel -a '<file>' --pipepart --keep-order --line-buffer --block 100M --recend '}\n' "jq '<query>'"
Test