A collection of coreutils-ish tools that are CSV-aware, TSV-aware etc.
Think grep, cut, tac etc. but CSV-aware.
dsv is not built to be fast, but intuitive to use
for people who spend a lot of time in the terminal and are used to using shell commands.
If you need something fast, or have big data to process,
or are not familiar with coreutils tools, dsv may not be for you.
I used to use xsv and sometimes csvkit but I could never remember the different command names and different flags, I use the shell a lot and I just wanted something with a similar interface. miller is the closest I've found, but eventually I decided to write my own.
<file.csv dsv grep -C5 -w REGEXdoes the basically same thing that grep -C5 -w REGEX except that
it always prints the header and REGEX matches will not span multiple columns.
ermm....
If you have python3, the quickest way is to just run dsv (or symlink it into your $PATH or something).
Also find shell completion scripts in completions/.
- If you have
pypy3, you can also rundsv.pypy(which is actually the same code) and this will be faster than cypython- for small inputs
pypy3actually runs slower thancpythondue to the JIT startup cost
- for small inputs
- If you want even more performance, build the rust version (
cargo build --release). You should get a./target/release/dsv- note that there are differences between the rust and python versions
- to run the python-based commands, you will still need
python3- the other commands should work fine without python however
Note:
-
many commands have an additional
-kflag to restrict their effects to certain columns, e.g.dsv grep -k COLUMN ...(why-k? because that's whatsortuses) -
most commands take only input from stdin (i.e. no filename argument)
-
!: pipe multiple commands together- e.g.
dsv ! grep something ! cut -f column ! head -n10 ! tojson
- e.g.
-
cat: like coreutils -
cut: like coreutils -
flip: prints each column on a separate line -
fromhtml: convert from html table -
fromjson: convert from json -
frommarkdown: convert from markdown table -
grep: like coreutils (also a bit like https://github.com/BurntSushi/ripgrep) -
head: like coreutils -
join: like coreutils -
page: view the file in a pager (less) -
paste: like coreutils -
pipe: pipe rows through a processs- e.g.
dsv pipe -- tr [:lower:] [:upper:]
- e.g.
-
pretty: pretty prints the file -
py: run python on each row -
py-filter: filter rows using python -
py-groupby: aggregate rows using python -
replace: replace text- similar to
rg --replace ...(see https://github.com/BurntSushi/ripgrep)
- similar to
-
reshape-long: reshape to long format -
reshape-wide: reshape to wide format -
set-header: sets the header labels -
sort: like coreutils -
sqlite: use sql on the data -
summary: produce automatic summaries of the data, kind of likesummary()in R -
tac: like coreutils -
tail: like coreutils -
tocsv: convert to csv -
tojson: convert to json -
tomarkdown: convert to markdown table -
totsv: convert to tsv -
uniq: likesort | uniq ... -
xargs: likexargsand GNUparallel
Why is there a rust and a python version? Because I wrote the python code first, then did the rust kinda for fun.
Differences:
- different regex engine/syntax: https://docs.python.org/3/library/re.html vs https://docs.rs/regex/latest/regex/#syntax
- different html engine: https://docs.python.org/3/library/html.html vs https://docs.rs/quick-xml/latest/quick_xml/
- rust pipeline
!command is actually faster- the pipeline command allows you to chain multiple commands together
dsv ! CMD ARG ! CMD ARG .... Theoretically this is faster thandsv CMD ARG | dsv CMD ARG ...because it avoids having to re-parse the contents, but in practice with python, it is actually slower because python is single threaded whereas real shell pipes effectively allow multiprocessing. Rust does not have this problem.
- the pipeline command allows you to chain multiple commands together
- rust may be slower than
pypy3for with heavy python based command usage- this is because it uses
python3
- this is because it uses
- if you really need high performance or have some seriously huge data, consider https://github.com/dathere/qsv instead
- I used this for a long time: https://github.com/johnkerl/miller
gnuplotis great for graphs on the terminal, especially the "braille" mode: http://www.gnuplot.info/