Concatenate CSV files, directories of CSV files, and snappy-compressed CSV files.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
fixtures
src
tests
.gitignore
Cargo.toml
README.md
rustfmt.toml

README.md

catcsv: Concatenate directories of possibly-compressed CSV files

This is a small utility that we use to reassemble many small CSV files into much larger ones. In our case, the small CSV files are generated by highly-parallel by Pachyderm pipelines doing map/reduce-style operations.

Usage:

catcsv - Combine many CSV files into one

Usage:
  catcsv <input-file-or-dir>...
  catcsv (--help | --version)

Options:
  --help        Show this screen.
  --version     Show version.

Input files must have the extension *.csv or *.csv.sz.  The latter are assumed
to be in Google's "snappy framed" format: https://github.com/google/snappy

If passed a directory, this will recurse over all files in that directory.

Wish list

If you'd like to add support for other common compression formats, such as *.gz, we'll happily accept PRs that depend on either pure Rust crates, or which include C code in the crate but still cross-compile easily with musl.

Related utilities

If you're interested in this utility, you might also be interested in:

  • BurntSushi's excellent xsv utility, which features a wide variety of subcommands for working with CSV files. Among these is a powerful xsv cat command, which has many options that catcsv doesn't (but which doesn't do directory walking or automatic decompression as far as I know).
  • Faraday's scrubcsv utility, which attempts to normalize non-standard CSV files.