The following is a list of text-based file formats and command line tools for manipulating each.
- DSV
- XML, HTML
- JSON
- YAML, TOML
- INI
- Log files
- Configuration files
- Bonus round: CLIs for single-file databases
- License
- Disclosure
Delimiter-separated values, including CSV, TSV, etc.
Awk is a POSIX-standard command line tool and programming language for processing DSV data. If you use Linux, macOS or a BSD, you almost certainly have it installed. See below for Windows.
- If you already know how to program, the nawk man page is a great way to learn Awk quickly. What you learn from it will apply to other implementations on different platforms. Read it first if you feel overwhelmed by the sheer size of the GNU Awk manual.
- Awk.info archive — an extensive resource on Awk.
- AWK Vs NAWK Vs GAWK — a comparison of implementations' features.
- busybox-w32 includes a full implementation of POSIX Awk and other tools like
sed
in a single Windows executable. - GNU Awk 4 binaries for Windows by EZWinPorts.
Name | Description |
---|---|
comm |
Select lines common to two sorted files or those contained in only one of them. (Manual: man 1 comm on your system, GNU, FreeBSD.) |
cut |
Select portions of each line in one or several files. Can work with delimiter-separated fields. (Manual: man 1 cut , GNU, FreeBSD.) |
grep |
Select lines from one or several files. (Manual: man 1 grep , GNU, FreeBSD.) |
join |
Join the lines from two files on a common field. (Manual: man 1 join , GNU, FreeBSD.) |
paste |
Combine several consecutive lines in a text file into one. (Manual: man 1 paste , GNU, FreeBSD.) |
sort |
Sort lines by key fields. (Manual: man 1 sort , GNU, FreeBSD.) |
uniq |
Find or remove repeated lines. (Manual: man 1 uniq , GNU, FreeBSD.) |
Name and link | Description |
---|---|
csv2md | Convert CSV to Markdown tables. |
csv2html | Convert CSV to HTML tables. |
csvfaker | Generate CSV files with fake data. Supports different types of fake data in different locales: names, cities, jobs, email addresses, and others. |
csvfix | A multitool. Compare, filter, normalize, split, and validate CSV files. Reorder, remove, split, and merge fields. Convert data between fixed-width, multi-line, XML, and DSV format. Generate SQL statements. Documentation. |
csvtk | Search, sample, cut, join, transpose, and sort CSV/TSV files. Rename columns. Replace fields and generate new fiends from existing fields. Plot data as vector or raster histograms and box, line, and scatter plots. Convert CSV to Markdown. Convert XLSX to CSV. Split XLSX sheets. |
GNU datamash | Perform statistical operations on text input. |
jp (sgreben) | Plot data. See the JSON section. |
Miller | sed , awk , cut , join and sort for name-indexed data such as CSV and tabular JSON. |
pawk | Process text with Awk-like patterns, but Python code. |
rows | A Python library with a CLI. Convert between a number of file formats for tabular data: CSV, XLS, XLSX, ODS, and others. Query the data (via SQLite). Combine tables. Generate schemas. |
tab | A non-Turing-complete statically typed programming language for data processing. An alternative to Awk. |
eBay's TSV utilities | Filter, summarize, join, and perform other operations on TSV files. Written in D. |
VisiData | Explore interactively data in TSV, CSV, XLS, XLSX, HDF5, JSON, and other formats. Introduction. |
xsv | Index, slice, analyze, split, and join CSV files. |
See the Grand Comparison Table of SQL-based Tools. It covers
- AlaSQL CLI
- csvsql
- fsql
- q
- rows
- Sqawk (dbohdan)
- sqawk (tjunier)
- Squawk
- termsql
- trdsql
- textql
Name and link | Description |
---|---|
xml-to-json-fast | Convert XML to JSON. Can handle very large XML files. |
html-xml-utils | A number of simple utilities (like hxcopy , hxpipe , hxunent , hxselect ) for manipulating HTML and XML files from W3C. Written in C, quite old-fashioned, but still relevant and maintained. |
pup | Query HTML pages with CSS selectors. Static binaries available for releases. Inspired by jq. |
Saxon | Query XML and HTML data with XPath. Documentation. |
Temme | Query HTML with CSS-like selectors to extract JSON. Temme extends CSS selectors with value capture patterns. |
tidy-html5 | Validate, fix, and reformat HTML(5), XHTML, and XML documents. Convert HTML to XHTML. |
tq | Query HTML with CSS selectors. |
Xidel | Query or modify XML and HTML pages with XPath, XQuery 3, and CSS selectors. |
xml2 | Convert XML and HTML to and from flat, greppable lists of "path=value" statements. Source code mirror. |
XMLLint | Query (including XSLT), validate and reformat XML documents. |
XMLStarlet | Query, modify, and validate XML documents. |
xq | jq wrapper for XML documents. |
xsltproc | Transform XML documents using XSLT and EXSLT. |
See also: Grep and Sed Equivalent for XML Command Line Processing on StackOverflow.
Name and link | Description |
---|---|
fx | Run arbitrary JavaScript on JSON input. Standalone binaries available. |
gron | Convert JSON to and from flat, greppable lists of "path=value" statements. |
jid | Explore JSON interactively with filtering queries like jq. |
jj | Query and modify values in JSON or JSON lines with a key path. |
jl | Query and manipulate JSON using a tiny functional language. |
jo | Create JSON objects from the shell. |
jp (jmespath) | JMESPath |
jp (sgreben) | Plot JSON and CSV data in the terminal. Supports different kinds of plots: bar charts, line charts, scatter plots, histograms, and heatmaps. |
jplot | Plot real-time JSON data in the terminal (works with terminals supporting graphic rendering). |
jq | Create and manipulate JSON with a functional (as in "functional programming") DSL. Can convert JSON to other formats. |
jshon | Create and manipulate JSON using getopt-style command-line options. |
json2 | Convert JSON to and from flat, greppable lists of "path=value" statements. Modeled after xml2. |
jsonaxe | Create and manipulate JSON with a Python-based DSL. Inspired by jq. |
json | Run arbitrary JavaScript on JSON input. |
json-table | Convert nested JSON into CSV or TSV for processing in the shell. |
json.tool (Python 3 docs) | Validate and pretty-print JSON. This module is part of the standard library of Python 2/3 and is likely to be available wherever Python is installed. |
jsonwatch | Track changes in JSON data from the command line. Works like watch -d . |
lobar | Explore JSON interactively or process it in batch with a wrapper for lodash.chain() . An alternative to jq with a JavaScript syntax. |
rq | Create and manipulate JSON with a DSL inspired by Rust, C and JavaScript. Similar to jq. Supports JSON, YAML and TOML as well as binary formats like Apache Avro and MessagePack. |
validjson | Validate or pretty-print JSON. |
VisiData | Explore data interactively data. See the DSV/Other tools section. |
With a format converter like Remarshal (below) you can use JSON tools to process YAML and TOML, but make sure you do not lose data in the conversion.
Name and link | Description |
---|---|
Remarshal | Convert between YAML, TOML, and JSON. Validate or pretty-print each of the three formats. |
rq | See the JSON section. |
shyaml | Query YAML. Can output null-terminated strings for use in shell scripts. |
validtoml | Validate TOML. |
validyaml | Validate or pretty-print YAML. |
yq (kislyuk) | jq wrapper for YAML. |
yq (mikefarah) | Query, modify, and merge YAML. Convert to and from JSON. |
Name and link | Platform | License | Description |
---|---|---|---|
confget | Linux, FreeBSD | Two-clause BSD | Retrieve properties and sections as shell script commands to set the corresponding variables. Retrieve properties' values as plain text. Check for existence of properties. List sections. Find values that match a pattern. Read-only. |
crudini | Any with Python 2.x | GNU GPLv2 | Retrieve properties and sections as INI fragments or shell script commands to set the corresponding variables. Retrieve properties' values as plain text. Set properties. Remove properties and sections. Create empty sections. Merge INI files. Changes files in place. |
IniFile (DOS version) | Windows (x86, x86-64), MS-DOS | Closed-source freeware | Retrieve properties and sections as batch file commands to set the corresponding variables. Set properties. Remove properties and sections. Changes files in place. |
initool | Linux, FreeBSD, Windows | MIT | Retrieve properties and sections as INI fragments. Retrieve properties' values as plain text. Set properties. Check for existence of properties and sections. Remove properties and sections. Outputs the updated INI file. |
Name and link | Description |
---|---|
Squawk | Query Apache and Nginx log files. See the SQL-based tool comparison. |
lnav | Query and watch log files. Has batch and interactive mode. Supported formats include the Common Log Format, CUPS page_log, syslog, strace, and generic timestamped messages. Can perform SQL queries. |
Name and link | Description |
---|---|
Augeas | Query and modify a number of file formats. Not all of the formats are equally well supported by Augeas and for some only a limited subset of all valid files can be parsed. |
Elektra | Query and modify configuration files. Shares Augeas' limitations when it comes to application-specific configuration files (it uses the same lenses), but has better support for generic formats such as JSON and INI. |
Name and link | Description | File format |
---|---|---|
Firebird | Firebird is a FOSS database that can be used from a single file, like SQLite. "isql is a program that allows the user to issue arbitrary SQL commands". | Binary |
GNU Recutils | "[A] set of tools and libraries to access human-editable, plain text databases called recfiles." | Text-based, roughly "key: value" |
SDB | "[A] simple string key/value database based on djb's cdb disk storage and supports JSON and arrays introspection." | Binary |
sqlite3(1) | "[A] simple command-line utility [...] that allows the user to manually enter and execute SQL statements against an SQLite database." | Binary |
The contents of this document is licensed under the Creative Commons Attribution 4.0 International License. By contributing you agree to release your contribution under this license.
csv2html, Sqawk, jsonwatch, Remarshal and initool are developed by the curator of this document.