A collection of coreutils-ish tools that are CSV-aware, TSV-aware etc.
Think grep, cut, tac etc. but CSV-aware.
dsv is not built to be fast, but intuitive to use
for people who spend a lot of time in the terminal and are used to using shell commands.
If you need something fast, or have big data to process,
or are not familiar with coreutils tools, dsv may not be for you.
I used to use xsv and sometimes csvkit but I could never remember the different command names and different flags, I use the shell a lot and I just wanted something with a similar interface. miller is the closest I've found, but eventually I decided to write my own.
<file.csv dsv grep -C5 -w REGEXdoes the basically same thing that grep -C5 -w REGEX except that
it always prints the header and REGEX matches will not span multiple columns.
ermm....
If you have python3, the quickest way is to just run dsv (or symlink it into your $PATH or something).
Also find shell completion scripts in completions/.
- If you have
pypy3, you can also rundsv.pypy(which is actually the same code) and this will be faster than cypython- for small inputs
pypy3actually runs slower thancpythondue to the JIT startup cost
- for small inputs
- If you want even more performance, build the rust version (
cargo build --release). You should get a./target/release/dsv- you can also download automated builds from: https://github.com/lincheney/dsv/releases/tag/nightly
- note that there are differences between the rust and python versions
- to run the python-based commands, you will still need
python3- the other commands should work fine without python however
Note:
- many commands have an additional
-kflag to restrict their effects to certain columns, e.g.dsv grep -k COLUMN ...(why-k? because that's whatsortuses) - most commands take only input from stdin (i.e. no filename argument)
!: pipe multiple commands together- e.g.
dsv ! grep something ! cut -f column ! head -n10 ! tojson
- e.g.
cat: like coreutilscut: like coreutilsflip: prints each column on a separate linefromhtml: convert from html tablefromjson: convert from jsonfrommarkdown: convert from markdown tablegrep: like coreutils (also a bit like https://github.com/BurntSushi/ripgrep)head: like coreutilsjoin: like coreutilspage: view the file in a pager (less)paste: like coreutilspipe: pipe rows through a processs- e.g.
dsv pipe -- tr [:lower:] [:upper:]
- e.g.
pretty: pretty prints the filepy: run python on each rowpy-filter: filter rows using pythonpy-groupby: aggregate rows using pythonreplace: replace text- similar to
rg --replace ...(see https://github.com/BurntSushi/ripgrep)
- similar to
reshape-long: reshape to long formatreshape-wide: reshape to wide formatset-header: sets the header labelssort: like coreutilssqlite: use sql on the datasummary: produce automatic summaries of the data, kind of likesummary()in Rtac: like coreutilstail: like coreutilstocsv: convert to csvtojson: convert to jsontomarkdown: convert to markdown tabletotsv: convert to tsvuniq: likesort | uniq ...xargs: likexargsand GNUparallel
Why is there a rust and a python version? Because I wrote the python code first, then did the rust kinda for fun.
Differences:
- different regex engine/syntax: https://docs.python.org/3/library/re.html vs https://docs.rs/regex/latest/regex/#syntax
- different html engine: https://docs.python.org/3/library/html.html vs https://docs.rs/quick-xml/latest/quick_xml/
- rust pipeline
!command is actually faster- the pipeline command allows you to chain multiple commands together
dsv ! CMD ARG ! CMD ARG .... Theoretically this is faster thandsv CMD ARG | dsv CMD ARG ...because it avoids having to re-parse the contents, but in practice with python, it is actually slower because python is single threaded whereas real shell pipes effectively allow multiprocessing. Rust does not have this problem.
- the pipeline command allows you to chain multiple commands together
- rust may be slower than
pypy3for with heavy python based command usage- this is because it uses
python3
- this is because it uses
- if you really need high performance or have some seriously huge data, consider https://github.com/dathere/qsv instead
- I used this for a long time: https://github.com/johnkerl/miller
gnuplotis great for graphs on the terminal, especially the "braille" mode: http://www.gnuplot.info/