Skip to content

ndcorder/fileshifter

Repository files navigation

fileshifter

Convert between CSV, JSON, YAML, TOML, Parquet, SQLite, and Excel from the command line.

Why?

Developers constantly convert between data formats using ad-hoc scripts, jq one-liners, and pandas snippets. fileshifter is a fast CLI tool that auto-detects input format, infers schema, and converts to any supported output format. It handles large files via streaming and provides a Python API for programmatic use.

Installation

pip install fileshifter

Quick Start

# CSV to JSON
fileshifter data.csv data.json

# JSON to Parquet with type inference
fileshifter data.json data.parquet

# CSV to Excel with sheet name
fileshifter data.csv data.xlsx --sheet Sales

# Flatten nested JSON to CSV
fileshifter data.json data.csv --flatten

# SQLite table to CSV
fileshifter data.db data.csv --table users

# Select specific columns
fileshifter data.json data.csv --columns name,email

# Filter with jq-style expressions
fileshifter data.json out.json --filter '.results[]'

# Stdin streaming
cat stream.jsonl | fileshifter - out.parquet --input-format jsonl

Supported Formats

Format Read Write Extension(s)
CSV yes yes .csv
TSV yes yes .tsv
JSON yes yes .json
JSONL yes yes .jsonl
YAML yes yes .yaml, .yml
TOML yes yes .toml
Parquet yes yes .parquet
SQLite yes yes .sqlite, .db
Excel yes yes .xlsx

CLI Reference

fileshifter

fileshifter INPUT OUTPUT [OPTIONS]

Arguments:

  • INPUT — Input file path (use - for stdin)
  • OUTPUT — Output file path

Options:

  • --input-format, -i — Override input format detection
  • --output-format, -o — Override output format detection
  • --flatten, -f — Flatten nested structures (dot notation)
  • --columns, -c — Comma-separated list of columns to select
  • --filter — jq-style filter expression (e.g., .results[])
  • --table, -t — Table name for SQLite read/write
  • --sheet, -s — Sheet name for Excel read/write

fileshifter formats

List all supported formats and their read/write capabilities.

fileshifter --version

Show version and exit.

Python API

from fileshifter import convert

# Basic conversion
convert("data.csv", "data.json")

# With options
convert("data.json", "data.parquet", flatten=True, columns=["name", "email"])

# SQLite with table name
convert("data.db", "data.csv", table="users")

# Excel with sheet name
convert("data.csv", "data.xlsx", sheet="Sales")

Features

  • Format auto-detection — Infers format from file extension
  • Schema inference — Detects column types (int, float, bool, datetime, str) and applies them during conversion
  • Streaming mode — Processes large files via chunked reads (CSV, JSONL)
  • Nested structure handling — Configurable flattening with dot-notation for targeting flat formats
  • Filter and transform — jq-style filtering and column selection during conversion
  • Python APIconvert() function for embedding in scripts and notebooks

Development

# Clone and install
git clone https://github.com/ndcorder/fileshifter.git
cd fileshifter
uv sync

# Run tests
uv run pytest

# Run linter
uv run ruff check .

# Format code
uv run ruff format .

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feat/my-feature)
  3. Write tests for your changes
  4. Ensure all tests pass (uv run pytest)
  5. Ensure linter passes (uv run ruff check .)
  6. Submit a pull request

License

MIT — see LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages