catcsv: Concatenate directories of possibly-compressed CSV files
This is a small utility that we use to reassemble many small CSV files into much larger ones. In our case, the small CSV files are generated by highly-parallel by Pachyderm pipelines doing map/reduce-style operations.
catcsv - Combine many CSV files into one Usage: catcsv <input-file-or-dir>... catcsv (--help | --version) Options: --help Show this screen. --version Show version. Input files must have the extension *.csv or *.csv.sz. The latter are assumed to be in Google's "snappy framed" format: https://github.com/google/snappy If passed a directory, this will recurse over all files in that directory.
If you'd like to add support for other common compression formats, such as
we'll happily accept PRs that depend on either pure Rust crates, or which
include C code in the crate but still cross-compile easily with musl.
If you're interested in this utility, you might also be interested in:
- BurntSushi's excellent xsv utility, which features a wide variety of
subcommands for working with CSV files. Among these is a powerful
xsv catcommand, which has many options that
catcsvdoesn't (but which doesn't do directory walking or automatic decompression as far as I know).
- Faraday's scrubcsv utility, which attempts to normalize non-standard CSV files.