Interact with python's numpy package from the command line. Useful as part of pipelines.
Influenced by, and liberally taking ideas from from Wes Turner's pyline utility.
For convenience certain features provide interfaces to
These are not installed by default to minimize the number of dependencies for basic usage.
Tested with python 2.7 and python 3.5.
Command line pipelines are wonderful things. Some nice properties they have include:
- Complete searchable history of everything you have run
- Being able to use shell commands you already know rather than learning new python libraries
- Being able to compose disparate commands through string input and output
However the command line is sometimes slightly... lacking. Particularly when it comes to
things like maths. There are ad-hoc, single purpose commands that can help: things like
sum or similar, but they will always only solve one problem.
Here we try to solve a general class of problems by welding python (any numpy) to the command line. This means that anything you can do in python can be done in a way that easily interacts with a the command line.
# The squares of the numbers 1 to 100 seq 100 | npcli 'd**2' # Work out the mean of some random numebrs npcli 'np.random.random(10000)' -m numpy.random | npcli 'np.mean(d)' # Plot a graph seq 100 | npcli -nK 'pylab.plot(d); pylab.show()' # Produce a histogram of when most lines in syslog are printed sudo cat /var/log/syslog | cut -d " " -f 1-4 | xargs -L 1 -I A date -d A +%s | npcli 'd % 86400' | npcli 'd // 3600 * 3600' | uniq -c | npcli -Kn 'pylab.plot(d[:,1], d[:,0]); pylab.show()' # Generate some random data npcli -K 'random(100)' # Summarize the last 100 days of GOOG's share price curl "http://real-chart.finance.yahoo.com/table.csv?s=GOOG" | head -n 100 | npcli -I pandas 'd["Close"].describe()' -D # Chain together operations seq 10 | npcli 'd' -e 'd*2' -e 'd + 4' -e 'd * 3' -e 'd - 12' -e 'd / 6' # Multiple data sources npcli --name one <(seq 100) --name two <(seq 201 300) 'one + two'
usage: npcli [-h] [--expr EXPR] [--code] [--debug] [--input-format INPUT_FORMAT] [--kitchen-sink] [--name NAME NAME] [--output-format OUTPUT_FORMAT | --raw | --repr | --no-result] [--module MODULE] [-f data_source] expr [data_sources [data_sources ...]] Interact with numpy from the command line positional arguments: expr Expression involving d, a numpy array data_sources Files to read data from. Stored in d1, d2 etc optional arguments: -h, --help show this help message and exit --expr EXPR, -e EXPR Expression involving d, a numpy array. Multipe expressions get chained --code Produce python code rather than running --debug Print debug output --input-format INPUT_FORMAT, -I INPUT_FORMAT Dtype of the data read in. "lines" for a list of lines. "str" for a string. "csv" for csv, "pandas" for a pandas csv, "json" for json data --kitchen-sink, -K Import a lot of useful things into the execution scope --name NAME NAME, -N NAME NAME A named data source --output-format OUTPUT_FORMAT, -O OUTPUT_FORMAT Output as a flat numpy array with this format. "str" for a string --raw Result is a string that should be written to standard out --repr, -D Output a repr of the result. Often used for _D_ebug --no-result, -n Discard result --module MODULE, -m MODULE Result is a string that should be written to standard out -f data_source
pip install git+https://github.com/facetframer/npcli#egg=npcli
pip install git+git://firstname.lastname@example.org#egg=npcli
Alternatives and prior work
- perl command line invocation
- [Rio] (https://cran.r-project.org/web/packages/rio/README.html): A similar tool in R (useful with ggplot)
There are unit tests: you can run them with
python setup.py test
This will run the tests using
tox, testing the installation commands and different versions of python.
For a quicker test run use:
argparse appears to be not be able to deal with repeated flags (
-e 1 -e second) and repeated optional position args (i.e. data sources), it may error out when given valid input.
This can be circumvented by using the
-f flag in preference to positional arguments.
However, we still allow positional arguments in the interest of discoverability.
I'm open to this being a bad decision.
Just open a file for goodness sake
It is very easy to do more on the command line than one should.
Everything that is done here can be done in a python file with calls to
Above a certain size, one-liners become unwieldy
The cost of scripting in python is that you actually have to go to the effort of opening file, and doing the kind of things npcli automates can take quite a lot of boilerplate.
One also loses the simplicity of the shell debug cycle: "modify", "press enter", "see if it works".