dkit

A unified CLI to convert, query, and explore data across formats.

Convert between JSON, CSV, YAML, TOML, XML, Parquet, and more with a single CLI. Query nested data, compare files, preview as tables, and pipe everything together.

Quick Start

# Convert JSON to CSV
dkit convert data.json --to csv

# Query nested data
dkit query config.yaml '.database.host'

# Filter and aggregate
dkit query sales.csv '.[] | where region == "US" | sum revenue'

# Preview as a table
dkit view users.csv --limit 10 --border rounded --color

# Compare two configs
dkit diff config_dev.yaml config_prod.yaml

Installation

From crates.io

cargo install dkit

From source

git clone https://github.com/syang0531/dkit.git
cd dkit
cargo install --path dkit-cli

Pre-built binaries

Download from GitHub Releases.

With cargo-binstall

cargo binstall dkit

Shell completions

After installing, generate shell completions for tab-completion support:

# Bash
dkit completions bash > ~/.bash_completion.d/dkit && source ~/.bash_completion.d/dkit

# Zsh
dkit completions zsh > ~/.zfunc/_dkit

# Fish
dkit completions fish > ~/.config/fish/completions/dkit.fish

# PowerShell
dkit completions powershell > dkit.ps1 && . ./dkit.ps1

Supported Formats

Format	Extensions	Read	Write	Notes
JSON	`.json`	✓	✓	Pretty / compact output
JSONL	`.jsonl`, `.ndjson`	✓	✓	One JSON object per line
CSV	`.csv`	✓	✓	Configurable delimiter
TSV	`.tsv`	✓	✓	Tab-separated values
YAML	`.yaml`, `.yml`	✓	✓
TOML	`.toml`	✓	✓
XML	`.xml`	✓	✓	Namespace handling, attributes
MessagePack	`.msgpack`	✓	✓	Binary format
Parquet	`.parquet`	✓	✓	Snappy / Gzip / Zstd compression
Excel	`.xlsx`	✓	—	Sheet selection, input-only
SQLite	`.db`, `.sqlite`	✓	—	Custom SQL queries, input-only
INI/CFG	`.ini`, `.cfg`	✓	✓	Section-based config files
.properties	`.properties`	✓	✓	Java properties files
.env	`.env`	✓	✓	Environment variable files
HCL	`.hcl`, `.tf`, `.tfvars`	✓	✓	Terraform / HashiCorp configs
plist	`.plist`	✓	✓	macOS Property List (XML)
Markdown	`.md`	—	✓	GFM table, output-only
HTML	`.html`	—	✓	Styled tables, output-only

All conversion paths between readable and writable formats are supported. Format is auto-detected from file extension, or via content sniffing for stdin.

Commands Overview

Command	Description
`convert`	Convert between any supported formats
`query`	Query and transform data with expressions
`join`	Join two files on a common key (inner/left/right/full)
`profile`	Profile data quality (types, nulls, cardinality)
`view`	Preview data as a formatted table
`stats`	Show statistics (count, avg, percentiles, etc.)
`schema`	Inspect data structure as a tree
`diff`	Compare two data files (cross-format supported)
`merge`	Combine multiple files into one
`validate`	Validate data against JSON Schema
`sample`	Random / systematic / stratified sampling
`flatten`	Flatten nested structures (`a.b.c` keys)
`unflatten`	Restore flattened structures
`config`	Manage user / project configuration
`alias`	Define and use command shortcuts
`completions`	Generate shell completion scripts

Usage

`convert` — Format conversion

# Basic conversion (format detected from extension)
dkit convert data.json --to yaml
dkit convert users.csv --to json -o users.json

# Batch conversion
dkit convert *.csv --to json --outdir ./converted/

# Pipe from stdin
cat data.json | dkit convert --from json --to csv

# Format-specific options
dkit convert data.json --to json --compact           # Minified JSON
dkit convert data.json --to json --indent 4          # 4-space indentation
dkit convert data.json --to json --sort-keys         # Alphabetically sorted keys
dkit convert data.csv --to json --no-header           # CSV without header row
dkit convert data.json --to xml --root-element items  # Custom XML root
dkit convert data.json --to html --styled --full-html # Styled HTML document

# Excel / SQLite input
dkit convert data.xlsx --to json --sheet Products
dkit convert data.db --to csv --sql "SELECT * FROM users WHERE age > 25"

# Parquet with compression
dkit convert data.csv --to parquet --compression zstd -o out.parquet

# Encoding support
dkit convert data.csv --to json --encoding euc-kr
dkit convert data.csv --to json --detect-encoding

# Streaming for large files
dkit convert large.jsonl --from jsonl -f csv --chunk-size 1000 -o out.csv

# Deduplication
dkit convert data.json --to csv --unique             # Remove duplicate records
dkit convert data.json --to csv --unique-by email    # Deduplicate by field

# Computed fields and transformations
dkit convert data.json --to json --add-field 'total = price * qty'
dkit convert data.json --to json --map 'name = upper(name)'

# Column selection and aggregation
dkit convert data.json --to csv --select 'name, email'
dkit convert sales.csv --to json --group-by category --agg 'count(), sum(amount)'
dkit convert data.json --to csv --select 'name, age' --filter 'age > 30' --sort-by age

# Dry-run (preview without writing)
dkit convert huge.json --to csv -o output.csv --dry-run

# INI / .properties format
dkit convert config.ini --to json                    # INI → JSON
dkit convert app.properties --to yaml                # Properties → YAML

# .env format
dkit convert .env --to json                         # .env → JSON
dkit convert config.json --to env -o .env            # JSON → .env
dkit diff .env.dev .env.prod                         # Compare environments

# HCL / Terraform format
dkit convert main.tf --to json                      # Terraform → JSON
dkit convert variables.json --to hcl -o vars.tf     # JSON → HCL
dkit query main.tf '.resource.aws_instance'         # Query Terraform config

# plist (macOS Property List) format
dkit convert Info.plist --to json                   # plist → JSON
dkit convert config.json --to plist -o Info.plist   # JSON → plist
dkit query Info.plist '.CFBundleVersion'             # Query plist value

# Explode (unnest arrays into rows)
dkit convert data.json --to csv --explode tags      # Unnest array field

# Pivot / Unpivot (reshape data)
dkit convert wide.csv --to json --unpivot 'jan,feb,mar' --key month --value sales
dkit convert long.csv --to json --pivot --index name --columns month --values sales

# Watch mode (re-convert on file change)
dkit convert data.json --to csv --watch

# Log file parsing
dkit convert access.log --to json --log-format apache        # Apache log → JSON
dkit convert app.log --to csv --log-format nginx             # nginx log → CSV
dkit convert app.log --to json --log-format syslog           # syslog → JSON
dkit convert app.log --to json --log-format '{timestamp} [{level}] {message}'  # Custom pattern
dkit convert app.log --to json --log-format apache --log-error raw  # Include unparseable lines

# Parallel batch conversion
dkit convert *.json --to csv --outdir ./out/ --parallel 4    # 4 threads
dkit convert *.json --to csv --outdir ./out/ --parallel auto # Auto-detect CPU cores

# Execution timing (available on all commands)
dkit convert data.json --to csv --time
dkit query data.json '.users[]' --time

`query` — Data querying

# Field access
dkit query config.yaml '.database.host'
dkit query data.json '.users[0].name'

# Array iteration and filtering
dkit query data.json '.users[] | where age > 20 | select name, email'
dkit query data.json '.items[] | sort price desc | limit 5'
dkit query data.json '.users[] | where name contains "Kim"'

# Array slicing and wildcards
dkit query data.json '.[0:3]'                        # First 3 elements
dkit query data.json '.[-2:]'                        # Last 2 elements
dkit query data.json '.[*].name'                     # All names (wildcard)

# IN / NOT IN, matches operators
dkit query data.json '.[] | where status in ("active", "pending")'
dkit query data.json '.[] | where name matches "^[A-C]"'

# Recursive descent (..) — find keys at any depth
dkit query nested.json '..email'                     # All 'email' fields
dkit query terraform.json '..instance_type'          # Deep key search

# Conditional expressions
dkit query data.json '.[] | select name, if(age < 18, "minor", "adult") as category'
dkit query data.json '.[] | select name, case when score >= 90 then "A" when score >= 70 then "B" else "C" end as grade'

# Aggregate functions
dkit query data.csv '.[] | count'
dkit query data.csv '.[] | sum price'
dkit query data.csv '.[] | avg score'

# GROUP BY
dkit query data.csv '.[] | group_by category count(), sum(price)'
dkit query data.csv '.[] | group_by category count() having count > 1'

# Built-in functions in select
dkit query data.csv '.[] | select upper(name), round(price, 2)'
dkit query data.json '.[] | select name, coalesce(email, "N/A")'

# Output in any format
dkit query data.json '.users[]' --to csv -o users.csv

# Query execution plan (--explain)
dkit query data.json '.[] | where age > 20 | select name | sort name desc | limit 5' --explain

# Query log files
dkit query access.log '.[] | where status == 404' --log-format apache

Query syntax reference:

Syntax	Description
`.field`	Object field access
`.field.sub`	Nested field access
`.[0]`, `.[-1]`	Array index (0-based, negative from end)
`.[]`	Iterate all elements
`.[*]`	Wildcard (same as `[]`)
`.[0:3]`, `.[-2:]`, `.[::2]`	Array slicing (start:end:step)
`..key`	Recursive descent (find key at any depth)
`where .field == value`	Filter (`==`, `!=`, `>`, `<`, `>=`, `<=`)
`where .field contains "str"`	String filter (`contains`, `starts_with`, `ends_with`)
`where .field in ("a", "b")`	Membership filter (`in`, `not in`)
`where .field matches "regex"`	Regex filter (`matches`, `not matches`)
`select .f1, .f2`	Select specific fields
`sort .field [desc]`	Sort ascending/descending
`limit N`	Limit results
`\|`	Pipeline chaining

Built-in functions:

Category	Functions
String	`upper`, `lower`, `trim`, `ltrim`, `rtrim`, `length`, `substr`, `concat`, `replace`, `split`, `index_of`, `rindex_of`, `starts_with`, `ends_with`, `reverse`, `repeat`, `pad_left`, `pad_right`
Math	`round`, `ceil`, `floor`, `abs`, `sqrt`, `pow`
Date	`now`, `date`, `year`, `month`, `day`
Type	`to_int`, `to_float`, `to_string`, `to_bool`
Util	`coalesce`, `if_null`
Conditional	`if(cond, then, else)`, `case when ... then ... end`

Aggregate functions: count, sum, avg, min, max, distinct, median, percentile, stddev, variance, mode, group_concat

See Query Syntax Reference for the full grammar and more examples.

`join` — Cross-file data joining

# Inner join on same key name
dkit join users.json orders.json --on user_id -f json

# Left join with different key names
dkit join users.csv orders.csv --on id=user_id --type left -f csv

# Cross-format join (YAML + CSV)
dkit join users.yaml transactions.csv --on email -f table

# Full outer join
dkit join a.json b.json --on id --type full -f json --pretty

`profile` — Data profiling

dkit profile data.csv                            # Table output
dkit profile data.csv --output-format json        # JSON output for scripting
dkit profile data.csv --detailed                  # Detailed stats (histograms, outliers)
dkit profile data.json                            # JSON input
dkit profile data.csv --output-format md           # Markdown output

`query` — Window functions

# Ranking
dkit query sales.json '.[] | select name, revenue, rank() over (order by revenue desc) as rank'

# Partition-based ranking
dkit query sales.json '.[] | select dept, name, revenue, row_number() over (partition by dept order by revenue desc) as dept_rank'

# Previous/next row reference
dkit query timeseries.json '.[] | select date, value, lag(value, 1) over (order by date) as prev_value'

# Running total
dkit query transactions.json '.[] | select date, amount, sum(amount) over (order by date) as running_total'

`convert` — Template output

# Custom text output with Tera templates (requires --features template)
dkit convert users.json -f template --template '{{ name }} <{{ email }}>'
dkit convert data.json -f template --template-file report.hbs
dkit convert data.json -f template --template '{{ name | upper }}'   # Filters: upper, lower, default

`view` — Table preview

dkit view users.csv
dkit view data.json --path '.users' --limit 20
dkit view data.csv --columns name,email --border rounded --color
dkit view data.json --row-numbers --max-width 30
dkit view data.json --select 'name, email'         # Select specific columns
dkit view sales.csv --group-by status --agg 'count()'  # Aggregation table
dkit view data.json --format md                   # Output as Markdown table
dkit view data.xlsx --list-sheets                 # List Excel sheets
dkit view data.db --list-tables                   # List SQLite tables
dkit view data.csv --watch                        # Auto-refresh on change

`stats` — Data statistics

dkit stats data.csv                               # Overall statistics
dkit stats data.csv --column revenue               # Per-field details
dkit stats data.csv --column age --histogram       # Text histogram
dkit stats data.json --path .users --format json   # JSON output
dkit stats data.csv --output-format json            # Structured JSON for scripting

`schema` — Structure inspection

dkit schema config.yaml
dkit schema data.json
cat data.json | dkit schema - --from json

`diff` — Compare data files

dkit diff old.json new.json
dkit diff config_dev.yaml config_prod.yaml         # Cross-format OK
dkit diff a.json b.json --path '.database'         # Compare nested path
dkit diff a.json b.json --mode value               # Value changes only
dkit diff a.json b.json --diff-format side-by-side # Side-by-side view
dkit diff a.json b.json --array-diff key=id        # Match arrays by key
dkit diff a.json b.json --ignore-order --ignore-case
dkit diff a.json b.json --quiet                    # Exit code only

`validate` — JSON Schema validation

dkit validate data.json --schema schema.json
dkit validate data.yaml --schema schema.json       # Any format works
dkit validate data.json --schema schema.json --quiet

`sample` — Data sampling

dkit sample data.csv -n 100                         # 100 random records
dkit sample data.json --ratio 0.1                   # 10% sample
dkit sample data.csv -n 50 --seed 42                # Reproducible
dkit sample data.csv -n 100 --method systematic     # Every k-th element
dkit sample data.csv -n 50 --method stratified --stratify-by category

`flatten` / `unflatten`

dkit flatten data.json                              # {"a.b.c": 1}
dkit flatten data.json --separator '/'              # {"a/b/c": 1}
dkit flatten data.json --array-format bracket       # {"items[0]": ...}
dkit unflatten flat.json                            # Restore nested structure

`merge` — Combine files

dkit merge a.json b.json --to json
dkit merge users1.csv users2.csv --to json -o merged.json

`config` — Configuration

dkit config show                    # Show effective config with sources
dkit config init                    # Create user-level config
dkit config init --project          # Create project-level .dkit.toml

Config priority: CLI options > .dkit.toml (project) > ~/.dkit.toml (user) > defaults.

`alias` — Command shortcuts

dkit alias list                     # Show all aliases
dkit alias set myjson "convert --to json --pretty"
dkit alias remove myjson

# Built-in aliases
dkit j2c data.json                  # JSON → CSV
dkit c2j data.csv                   # CSV → JSON
dkit y2j config.yaml                # YAML → JSON

Comparison with Other Tools

Feature	dkit	jq	miller	yq
JSON	✓	✓	✓	✓
CSV/TSV	✓	—	✓	—
YAML	✓	—	—	✓
TOML	✓	—	—	—
XML	✓	—	—	✓
MessagePack	✓	—	—	—
Parquet	✓	—	—	—
Excel (.xlsx) input	✓	—	—	—
SQLite input	✓	—	—	—
INI / .properties	✓	—	—	—
.env files	✓	—	—	—
HCL / Terraform	✓	—	—	—
plist (macOS)	✓	—	—	—
Cross-format convert	✓	—	Partial	Partial
Query (where/select/sort)	✓	✓	✓	✓
Aggregate / GROUP BY	✓	Partial	✓	—
Built-in functions	✓	✓	✓	—
Table preview	✓	—	✓	—
Statistics	✓	—	✓	—
Schema inspection	✓	—	—	—
File diff	✓	—	—	—
JSON Schema validation	✓	—	—	—
Sampling	✓	—	—	—
Flatten / unflatten	✓	—	—	—
Streaming (large files)	✓	—	✓	—
Log parsing	✓	—	—	—
Parallel batch conversion	✓	—	—	—
Execution timing	✓	—	—	—
Query explain plan	✓	—	—	—
Window functions	✓	—	—	—
Cross-file JOIN	✓	—	—	—
Data profiling	✓	—	—	—
Template output	✓	—	✓	—
Watch mode	✓	—	—	—
Config file	✓	—	—	—
Shell completions	✓	✓	✓	✓

dkit focuses on seamless conversion between all supported formats with a unified query syntax, eliminating the need for separate tools per format.

Documentation

Tutorial — Step-by-step getting started guide
Cookbook — Practical recipes for common tasks
Migration Guide — Switching from jq / csvkit / yq
Query Syntax — Full query language reference
CLI Specification — Complete CLI reference
Architecture — Project internals
Technical Spec — Core types and traits
API Reference — Library API documentation

Building from Source

cargo build                     # Build
cargo test                      # Run tests
cargo clippy -- -D warnings     # Lint
cargo fmt -- --check            # Format check

Minimum supported Rust version: 1.75.0

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 315 Commits
.github/workflows		.github/workflows
Formula		Formula
dkit-cli		dkit-cli
dkit-core		dkit-core
docs		docs
fuzz		fuzz
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh

Folders and files

Latest commit

History

Repository files navigation

dkit

Quick Start

Installation

From crates.io

From source

Pre-built binaries

With cargo-binstall

Shell completions

Supported Formats

Commands Overview

Usage

convert — Format conversion

query — Data querying

join — Cross-file data joining

profile — Data profiling

query — Window functions

convert — Template output

view — Table preview

stats — Data statistics

schema — Structure inspection

diff — Compare data files

validate — JSON Schema validation

sample — Data sampling

flatten / unflatten

merge — Combine files

config — Configuration

alias — Command shortcuts

Comparison with Other Tools

Documentation

Building from Source

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 17

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`convert` — Format conversion

`query` — Data querying

`join` — Cross-file data joining

`profile` — Data profiling

`query` — Window functions

`convert` — Template output

`view` — Table preview

`stats` — Data statistics

`schema` — Structure inspection

`diff` — Compare data files

`validate` — JSON Schema validation

`sample` — Data sampling

`flatten` / `unflatten`

`merge` — Combine files

`config` — Configuration

`alias` — Command shortcuts

Packages