Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Fetching contributors…

Cannot retrieve contributors at this time

197 lines (125 sloc) 6.438 kb

map -- making xargs simpler and more powerful at the same time!

TOC


The map command was something I wrote a long time ago and have used pretty much forever, in a sort of "taken for granted" way. I would never have put it out there if I had not, by chance, discovered something called GNU Parallel and started reading the huge list of examples on its pages.

And yet, casually looking at the examples, I found that map could do pretty much all of the generic ones! So much so that I sat down and started writing down map equivalents of GNU Parallel's examples, and before I knew it I was about half way through their list with only a few that map could not do! The end result was this [feature comparison][mapgp].

But...

  • map is 330 lines of perl. GNU Parallel is 5000 lines.
  • map has 3 options (if you don't count -h, -q, and -v). GNU Parallel has almost a 100.

(In all fairness, [here][cantwont]'s a list of things map can't/won't do which GNU Parallel can/will, although a lot of them are "kitchen sink" items!)

And that was when I decided to put this out there as its own little project.

If you use it, please let me know. Some quick documentation is right here in this file. Examples are [here][mapgp]. map responds to -h as you would expect, if you need to refresh your memory.


#concepts concepts

'map' is like xargs in many ways, except for having very few options, and a fixed set of "replace strings", all using the % character.

Here are the highlights:

  • map will treat each input line as one single argument; it will not space-separate them like xargs does.

    This makes it usable by default for filenames with spaces etc (but please see the IMPORTANT NOTES section later for details).

  • map's replace strings are % and %%. A single % means substitute exactly one input item and run the command, while a double % means to substitute as many input items as possible (subject to '-n' and total command length) before running the command (like xargs).

    See later for defaults if you don't specify them, as well as the D, B, and E modifiers.

  • map will happily treat the first argument as the command and the rest as input "lines" if STDIN is not a pipe. This lets you do things like this:

    map gzip *.pdf
    # same as: ls *.pdf | map gzip
    

    or, using quotes for the first argument:

    map "zip -q -r % %" src doc conf contrib hooks
    # same as: \ls -d src doc conf contrib hooks | map zip -q -r % %
    
  • map also has a pretty cool "delimiter mode" that at first seems totally unrelated to xargs but actually is not. See later for examples.

#drs default replacement string

If no replacement string (%, %%, or variants) exist anywhere in the command, the default is to assume a '%%' at the end.

However, if the '-p' option is used without the '-n' option, the default becomes '%'.

#details details

single replacements

% is replaced by the current input line, with a trailing slash removed if present. %D is replaced by the directory name of the current filename. %B is the basename and %E is the extension. (This means that % is pretty much equal to %D/%B.%E).

As said above, these replacements use only one input line per run, so

seq 1 3 | map echo %

gives you

1
2
3

multiple replacements

Most often, you want all the arguments tacked on to one "run" of the the command. Do this by specifying a %%:

seq 1 3 | map echo %%

returns

1 2 3

Since this is the most common reason for using map, this is the default if you don't specify either % or %%:

seq 1 3 | map echo
# returns:
1 2 3

A %% (and similarly %%D, %%B, and %%E) get replaced by as many input lines as possible (subject to internal limit of command line length and the user-specified -n value if used).

Just like GNU Parallel, this replacement even works within a word, replicating the entire word:

seq 1 3 | map echo abc-%%-def

produces

abc-1-def abc-2-def abc-3-def

multiple jobs in parallel

When you run something like:

map -p 4 gzip *.pdf

you are running 4 jobs in parallel. This indicates that the job might be CPU bound (usually, though not always) so it's best to run each job on one input line rather than give it as many as it will take.

So when you run in parallel mode, the default is % because that is what makes sense.

specifying maximum arguments per invocation

However, if you use -n, (even if you are also using -p) the default switches back to %%. The logic is that specifying "maximum arguments per invocation" implicitly gives permission to actually have more than one argument, overriding the -p exception.

So yeah this is an exception to an exception but I don't think it's too hard to remember.

And if in doubt you can always specify what you want you know...

delimiter mode

Here's an example; more documentation may follow if anyone asks but notice the delimiter character (colon) and the specification of field 1 and field 7:

cat /etc/passwd | egrep -v 'nologin|bash' | map -d=: echo %1 use %7 as shell

The default delimiter is whitespace. For convenience, '-d=t' uses tabs. Anything else, like ':', is specified literally, like above.

Here's another example: report users who have some shell as login but no GECOS field:

 < /etc/passwd map -d=: -- '[[ %7 =~ sh ]] && [ -z "%5" ] && echo %1 || :'

#IN IMPORTANT NOTES

Filenames with unusual characters

Map will work fine with such filenames except that you have to do some extra quoting for parallel mode (since that invokes xargs), or if you want to use redirection or multiple commands.

For example (assuming some command sending in a list of filenames), this will fail:

... | map 'echo -n `gzip <  %  | wc -c`; echo -n '*100/'; wc -c <  % ' | bc

but this will succeed:

... | map 'echo -n `gzip < "%" | wc -c`; echo -n '*100/'; wc -c < "%"' | bc

(by the way, this computes the size of the gzipped file as a percentage of the original)

OTHER WARNINGS

  • never mix the 3 styles of replacement strings ('%' and its cousins, '%%' and its cousins, and '%1', '%2', etc for delimiter mode). Odd things will happen -- I don't check for sanity.

  • parallel mode defaults to number of CPUs.

Jump to Line
Something went wrong with that request. Please try again.