map -- making xargs simpler and more powerful at the same time!
map command was something I wrote a long time ago and have used pretty
much forever, in a sort of "taken for granted" way. I would never have put it
out there if I had not, by chance, discovered something called GNU
Parallel and started reading the huge list of examples on its pages.
And yet, casually looking at the examples, I found that
map could do pretty
much all of the generic ones! So much so that I sat down and started writing
map equivalents of GNU Parallel's examples, and before I knew it I was
about half way through their list with only a few that
map could not do!
The end result was this [feature comparison][mapgp].
mapis 330 lines of perl. GNU Parallel is 5000 lines.
maphas 3 options (if you don't count -h, -q, and -v). GNU Parallel has almost a 100.
(In all fairness, [here][cantwont]'s a list of things
map can't/won't do
which GNU Parallel can/will, although a lot of them are "kitchen sink" items!)
And that was when I decided to put this out there as its own little project.
If you use it, please let me know. Some quick documentation is right here in
this file. Examples are [here][mapgp].
map responds to
-h as you would
expect, if you need to refresh your memory.
'map' is like xargs in many ways, except for having very few options, and a
fixed set of "replace strings", all using the
Here are the highlights:
map will treat each input line as one single argument; it will not space-separate them like xargs does.
This makes it usable by default for filenames with spaces etc (but please see the IMPORTANT NOTES section later for details).
map's replace strings are
%%. A single % means substitute exactly one input item and run the command, while a double % means to substitute as many input items as possible (subject to '-n' and total command length) before running the command (like xargs).
See later for defaults if you don't specify them, as well as the D, B, and E modifiers.
map will happily treat the first argument as the command and the rest as input "lines" if STDIN is not a pipe. This lets you do things like this:
map gzip *.pdf # same as: ls *.pdf | map gzip
or, using quotes for the first argument:
map "zip -q -r % %" src doc conf contrib hooks # same as: \ls -d src doc conf contrib hooks | map zip -q -r % %
map also has a pretty cool "delimiter mode" that at first seems totally unrelated to xargs but actually is not. See later for examples.
#drs default replacement string
If no replacement string (%, %%, or variants) exist anywhere in the command, the default is to assume a '%%' at the end.
However, if the '-p' option is used without the '-n' option, the default becomes '%'.
% is replaced by the current input line, with a trailing slash removed if
%D is replaced by the directory name of the current filename.
%B is the basename and
%E is the extension. (This means that
pretty much equal to
As said above, these replacements use only one input line per run, so
seq 1 3 | map echo %
1 2 3
Most often, you want all the arguments tacked on to one "run" of the
the command. Do this by specifying a
seq 1 3 | map echo %%
1 2 3
Since this is the most common reason for using map, this is the default if you don't specify either % or %%:
seq 1 3 | map echo # returns: 1 2 3
%% (and similarly
%%E) get replaced by as many input
lines as possible (subject to internal limit of command line length and the
-n value if used).
Just like GNU Parallel, this replacement even works within a word, replicating the entire word:
seq 1 3 | map echo abc-%%-def
abc-1-def abc-2-def abc-3-def
multiple jobs in parallel
When you run something like:
map -p 4 gzip *.pdf
you are running 4 jobs in parallel. This indicates that the job might be CPU bound (usually, though not always) so it's best to run each job on one input line rather than give it as many as it will take.
So when you run in parallel mode, the default is
% because that is what
specifying maximum arguments per invocation
However, if you use
-n, (even if you are also using
-p) the default
switches back to
%%. The logic is that specifying "maximum arguments per
invocation" implicitly gives permission to actually have more than one
argument, overriding the
So yeah this is an exception to an exception but I don't think it's too hard to remember.
And if in doubt you can always specify what you want you know...
Here's an example; more documentation may follow if anyone asks but notice the delimiter character (colon) and the specification of field 1 and field 7:
cat /etc/passwd | egrep -v 'nologin|bash' | map -d=: echo %1 use %7 as shell
The default delimiter is whitespace. For convenience, '-d=t' uses tabs. Anything else, like ':', is specified literally, like above.
Here's another example: report users who have some shell as login but no GECOS field:
< /etc/passwd map -d=: -- '[[ %7 =~ sh ]] && [ -z "%5" ] && echo %1 || :'
#IN IMPORTANT NOTES
Filenames with unusual characters
Map will work fine with such filenames except that you have to do some extra quoting for parallel mode (since that invokes xargs), or if you want to use redirection or multiple commands.
For example (assuming some command sending in a list of filenames), this will fail:
... | map 'echo -n `gzip < % | wc -c`; echo -n '*100/'; wc -c < % ' | bc
but this will succeed:
... | map 'echo -n `gzip < "%" | wc -c`; echo -n '*100/'; wc -c < "%"' | bc
(by the way, this computes the size of the gzipped file as a percentage of the original)
never mix the 3 styles of replacement strings ('%' and its cousins, '%%' and its cousins, and '%1', '%2', etc for delimiter mode). Odd things will happen -- I don't check for sanity.
parallel mode defaults to number of CPUs.