Skip to content

sen-ltd/csv-sort

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

csv-sort

A small PHP CLI that sorts CSV files correctly:

  • quoted commas survive (unlike sort -t,)
  • keys can be typed as string, int, float, or date
  • multiple keys, with per-column reverse (--key name,-age)
  • --unique dedupes by the sort key
  • tabs, custom delimiters, headerless CSVs, --out file

Built on PHP 8.2 stdlib (fgetcsv, fputcsv, usort). Zero runtime deps.

Why not sort -t,?

# This is BROKEN on quoted commas:
$ cat users.csv
name,city
alice,"Tokyo, Japan"
bob,Osaka
$ sort -t, -k2 users.csv
alice,"Tokyo       # <- split at the quoted comma, field count wrong
bob,Osaka
name,city

# This works, and is type-aware:
$ csv-sort users.csv --key city
name,city
bob,Osaka
alice,"Tokyo, Japan"

And sort has no idea what a number is:

$ printf 'n\n2\n10\n1\n' | sort -k1
1
10        # <- lexicographic, "10" < "2"
2
n
$ csv-sort - --key n --type n=int    # type-aware

Install

Docker (recommended, zero install)

docker build -t csv-sort .
docker run --rm -v "$PWD:/work" csv-sort /work/users.csv --key age --type age=int

From source

composer install
./bin/csv-sort --help

Or run it on a bare PHP 8.2+ install with no composer install at all — the CLI registers its own PSR-4 autoloader via src/Bootstrap.php when it can't find vendor/autoload.php.

Usage

csv-sort <file.csv> [--key COL,...] [options]

  --key COL[,COL...]    sort keys, in order. Prefix a column with `-` to
                        reverse that column only (e.g. --key name,-age).
                        Columns may be named or 1-indexed numbers.
  --type NAME=T,...     declare column types: string|int|float|date
                        (default string). Example: --type age=int,joined=date
  --order asc|desc      global order (default asc).
  --unique              deduplicate rows by the sort key (keeps the first).
  --tabs                shorthand for --delim '\t'
  --delim CHAR          field delimiter (single character, default ',')
  --header              CSV has a header row (default)
  --no-header           CSV has no header — columns become col0, col1, ...
  --out FILE            write sorted CSV to FILE instead of stdout
  -h, --help            print help
  -V, --version         print version

Exit codes

code meaning
0 success
1 bad CSV, column not found, I/O error
2 bad args

Examples

# Sort users by age numerically
csv-sort users.csv --key age --type age=int

# Most recent signup first
csv-sort users.csv --key joined --type joined=date --order desc

# Multi-key: name ascending, age descending within each name
csv-sort users.csv --key name,-age --type age=int

# Dedupe by city, write to file
csv-sort users.csv --key city --unique --out by-city.csv

# TSV
csv-sort data.tsv --tabs --key 2 --type col1=float

Design notes

fgetcsv handles the quoted-comma case for you, plus newlines inside quoted cells, plus ""-escaped literal quotes. A round-trip through fgetcsvusortfputcsv preserves all of that without custom parsing.

The comparator is type-aware. When you declare a column as int or float or date, values that can't be parsed sort after the parseable rows instead of silently coercing to 0 — so if your data has a dirty row, it surfaces instead of quietly landing at the top of an ascending sort.

PHP's usort has been stable since 8.0, which the --unique step relies on (it only compares against the previous kept row, not a seen-set, which is fine exactly because stable sort guarantees duplicates land adjacent).

Scope (what this is not)

  • Not a streaming sorter. The whole file lives in memory during usort. For multi-GB CSVs you want a merge-sort with temp files, which is a different tool.
  • No Unicode collation. string comparisons are pure strcmp on UTF-8 bytes.
  • date accepts whatever strtotime accepts. Custom format strings are out of scope.
  • Only single-character delimiters.

License

MIT. See LICENSE.

Links

About

A small PHP 8.2 CLI that sorts CSV files correctly — the way `sort -t,` pretends to but doesn't.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors