Say "ni" to data of any size
Clone or download
Latest commit 59a21b9 Nov 13, 2018
Permalink
Failed to load latest commit information.
bugs Disabling the world's worst unit test Sep 27, 2018
core Added gitddelta:// Nov 13, 2018
dev Oops, actually quoting metacharacters this time (and added a unit test) Nov 9, 2018
doc Added gitddelta:// Nov 13, 2018
env Added an arch linux testing environment, which surfaced a new bug inv… Oct 26, 2018
images Fixed horrific image misalignment Mar 5, 2018
reference
.dockerignore MIMO vertical + two tests, WIP spatial, non-consuming filenames + fix… Sep 17, 2016
.gitignore Added an arch linux testing environment, which surfaced a new bug inv… Oct 26, 2018
.gitlab-ci.yml Added an arch linux testing environment, which surfaced a new bug inv… Oct 26, 2018
.travis.yml Added an arch linux testing environment, which surfaced a new bug inv… Oct 26, 2018
LICENSE.md Cleanup Mar 5, 2018
README.md Changed default render brightness/saturation rate; these are closer t… Nov 2, 2018
TODO.md Recomputing if dependencies are missing, unrelatedly removed some stu… Oct 27, 2018
boot Completely removed SDoc from the source May 21, 2017
build Added sqlite URI schema for table access Oct 27, 2018
lazytest Fixed FV operator for multiline fields Sep 30, 2018
ni Added gitddelta:// Nov 13, 2018
test Added an arch linux testing environment, which surfaced a new bug inv… Oct 26, 2018

README.md


ni
ni says "ni" to your data. Travis CI


Installing ni

$ git clone git://github.com/spencertipping/ni
$ sudo ln -s $PWD/ni/ni /usr/bin/

What is ni?

ni is a fast, portable tool that reduces most data processing operations to a handful of keystrokes.

ni basics

ni is efficient for big and small data

ni can process terabytes or petabytes of data in constant space, and knows about things like GNU sort's --compress-program option to make it possible to process more data than will fit on disk. It can interoperate with Hadoop and self-install on workers if you have a cluster available. Commands written in ni are typically as fast or faster than hand-written equivalents.

ni can process full datasets on one machine, e.g. Wikipedia (~40GB), OpenStreetMap (~400GB), and Reddit (~1.5TB). Intermediate streams aren't written to disk unless you sort them.

ni is cat and less (and zless, bzless, etc)

$ ni /etc/passwd
$ ni /usr/share/dict/words
$ ni /usr/share/man/man1/ls.1.gz
$ find . | ni

ni is gzip -dc, xz -dc, lz4 -dc, etc

ni knows the magic number for common compression formats and invokes the correct decompressor automatically.

$ cat mystery-file | ni > decoded-file

ni is pv/pipemeter

$ find / | ni > /dev/null               # == cat, but show data throughput

(NB: if you're not redirecting data to /dev/null or a file, ni may intermittently print monitor updates that temporarily overwrite your output; use Ctrl+L twice to refresh the screen.)

ni is ls

...but often faster because it doesn't look at file attributes; it just gives you the listing.

$ ni /
$ ni /etc
$ ni .

ni is curl/sftp

$ ni https://google.com
$ ni http://wikipedia.org http://github.com

ni is seq

$ ni n100
$ ni n01000
$ ni nE6                                # E6 == 10^6 = 1000000

ni is grep

ni's r// operator searches for rows which match a regular expression:

$ ni n1000 | ni r/77/

ni is |

In general, ni X Y == ni X | ni Y. Data generators like files are appended to the stream: ni /etc/passwd == cat - /etc/passwd.

$ ni n1000 r/77/
$ ni n1000 r/77/ r/3/

ni is echo

$ ni ifoo                               # == echo foo
$ ni i[foo bar]                         # == echo -e "foo\tbar"

ni is xargs ni (xargs cat)

$ ni /etc \<                            # \< == xargs ni, give or take
$ ni /usr/share/man/man1 \<             # \< auto-decompresses files
$ ni ihttps://google.com /etc \<        # \< recognizes URL formats

ni is hadoop fs -cat and hadoop fs -text

$ ni hdfs:///path/to/file               # == hadoop fs -cat /path/to/file
$ ni hdfst:///path/to/file              # == hadoop fs -text /path/to/file

ni can also run Hadoop Streaming jobs with itself nondestructively installed on worker nodes.

ni is git ls-tree etc

$ ni git://.                            # show all branches/tags
$ ni githistory://.:develop             # full history of develop branch
$ ni githistory://.:develop::a/file     # full history of a file on develop
$ ni gittree://.:develop                # file listing for develop branch
$ ni gittree://.:develop::folder        # directory listing at develop revision
$ ni gitsnap://.:master^                # all blobs one commit before master
$ ni gitblob://.:18891afd4              # file contents
$ ni gitblob://.:develop::ni            # file contents of 'ni' on develop
$ ni gitdiff://.:master..develop        # regular diff
$ ni gitpdiff://.:develop               # processed diff
$ ni gitpdiff://.:develop::path/path    # processed diff for a specific path

ni is sqlite3

$ ni sqlite:///path/to/file.db          # list tables in database
$ ni sqlitet:///path/to/file.db:table   # output all table data as TSV
$ ni sqlites:///path/to/file.db:table   # output table schema as SQL
$ ni sqliteq:///path/to/file.db:'sql'   # output SQL results as TSV

ni is unzip and tar -x/-t, but better

$ ni tar://myfile.tgz                   # == tar -tzf myfile.tgz (requires tar)
$ ni zip://myfile.zip                   # == zip file listing (requires unzip)
$ ni 7z://myfile.7z                     # == 7zip file listing (requires 7z, 7za, 7zr, or p7zip)
$ ni tarentry://myfile.tgz:foo.txt      # contents of specific tar entry
$ ni zipentry://myfile.zip:foo.txt      # contents of specific zip entry
$ ni 7zentry://myfile.7z:foo.txt        # contents of specific 7zip entry

ni reads xlsx

$ ni xlsx://spreadsheet.xlsx            # list of sheets
$ ni xlsxsheet://spreadsheet.xlsx:1     # contents of sheet 1 as TSV

ni is xargs -P for data

$ find /usr -type f \
    | ni \< S4[ r'/all your base/' ]    # use four workers for r// operator

ni is ssh

...and nondestructively self-installs on remote hosts.

$ ni shost[ /etc/hostname ]             # == ssh host ni /etc/hostname | ni

ni is interoperable

ni is realtime visualization for big data

$ ni --js                               # start the webserver (Ctrl+C to exit)
http://localhost:8090                   # open this link in a browser

image

ni explain

RocketChat support forums

Ni By Example

An excellent guide by Michael Bilow:

ni license

MIT license

Copyright (c) 2016-2018 Spencer Tipping

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Contributors