A CLI-first, declarative ETL/ELT pipeline tool driven by YAML.
Modern data pipelines often become bogged down by repetitive boilerplate and fragile scripts. Loafer solves this by treating data movement and transformation as configuration.
Loafer is a CLI-first tool that allows you to extract data from databases, files, and APIs, apply powerful transformations (both custom and AI-generated), and load the results into target systems — all configured within a single, highly readable YAML file.
Key Idea: Describe your pipeline in YAML, run it from the CLI, and let Loafer handle the execution.
- Declarative Configuration: Define your inputs, outputs, and processing logic in simple YAML. No complex workflow orchestrators required.
- ETL & ELT Support: Choose between Extract-Transform-Load (in-memory/streaming Python transformations) or Extract-Load-Transform (in-database SQL executions).
- AI-Powered Transformations: Optionally use natural language to auto-generate complex Python transforms or target-database SQL using modern LLMs (Gemini, OpenAI, Claude).
- Streaming & Batch Processing: Automatically switches to memory-efficient streaming for large datasets based on configurable thresholds.
- Data Quality Guards: Built-in validation steps to catch malformed data or high null-rates before they hit your target.
- Developer-Friendly & Extensible: Use custom Python files for advanced transformations when declarative logic isn't enough.
The easiest way to install Loafer is via pip:
pip install loafer-etlLoafer is officially published to the GitHub Container Registry. Available tags include specific versions (e.g. 0.2.0, 0.2) and latest.
docker pull ghcr.io/lupppig/loafer:latestTo run Loafer using Docker, mount your current working directory so Loafer can access your configuration and local files:
docker run --rm -v $(pwd):/workspace -w /workspace ghcr.io/lupppig/loafer:latest run pipeline.yamlTo contribute or use the latest unreleased features:
git clone https://github.com/lupppig/loafer.git
cd loafer
pip install -e .Create a single YAML file describing your pipeline. This example extracts orders from PostgreSQL, normalizes the data via AI, and saves it to a clean CSV.
pipeline.yaml
name: Daily Orders Pipeline
mode: etl
source:
type: postgres
url: ${DATABASE_URL}
query: "SELECT * FROM orders WHERE created_at >= NOW() - INTERVAL '1 day'"
target:
type: csv
path: ./output/clean_orders.csv
write_mode: overwrite
transform:
type: ai
instruction: >
Drop cancelled orders, normalize currency to USD, and
combine first_name and last_name into full_name.
llm:
# Supported providers: gemini, openai, claude, qwen
provider: gemini
model: gemini-2.5-flash
api_key: ${GEMINI_API_KEY}Run the pipeline from your terminal:
export DATABASE_URL="postgresql://user:pass@localhost/db"
export GEMINI_API_KEY="your-api-key"
loafer run pipeline.yamlExpected Outcome:
Loafer will connect to the Postgres database, extract the past day's orders, execute the transformation to clean the data, and successfully write the normalized output to ./output/clean_orders.csv.
Loafer pipelines consist of three main stages, executed as a directed graph:
- Extract: Loafer connects to your source (e.g., a SQL database, CSV, or REST API) and pulls the data. Large datasets are automatically chunked and streamed to prevent memory exhaustion.
- Transform: The data is manipulated according to your YAML instructions. This can be AI-generated Python code running in a safe isolated context, a custom Python script you provide, or skipped entirely for simple EL (Extract-Load) operations.
- Load: The transformed data is pushed to your designated target connector (e.g. CSV, Snowflake, Postgres) adhering to your specified write mode (append, overwrite).
In ELT mode, the steps differ slightly: data is first loaded raw into the target database, and transformations are executed as native SQL queries against the target engine.
Loafer provides a streamlined CLI tailored for day-to-day data engineering.
Command Syntax:
loafer <command> [options]Common Commands:
loafer run <config.yaml>: Execute a pipeline defined in the given YAML file.--dry-run: Extracts and transforms data without writing to the target.--verbose: Prints detailed execution logs and agent outputs.
loafer validate <config.yaml>: Check a configuration file for syntax and connection errors without running the pipeline.loafer connectors: List all available source and target connectors.
loafer/
├── cli.py # CLI entrypoints and command routing
├── config.py # Configuration parsing and validation
├── runner.py # Pipeline execution logic and LangGraph orchestration
├── connectors/ # Integrations (sources and targets)
│ ├── sources/ # Postgres, CSV, REST API, Excel, etc.
│ └── targets/ # Postgres, CSV, Snowflake, etc.
├── transform/ # Transformation engines (AI, custom, SQL)
├── graph/ # LangGraph state management and pipeline DAGs
├── llm/ # LLM provider integrations (Gemini, Claude, OpenAI)
└── agents/ # Individual workflow nodes (extract, transform, load)
Loafer pipelines are driven by a single YAML configuration file. Here is the structure:
# Pipeline metadata
name: User Sync Pipeline
mode: etl # Supports 'etl' or 'elt'
# Extract configuration
source:
type: rest_api
url: "https://api.example.com/users"
method: GET
# Load configuration
target:
type: postgres
url: ${TARGET_DB_URL}
table: users_dim
# Transform logic
transform:
type: custom
path: ./transforms/clean_users.py
# Optional: LLM Configuration (required if using type: ai or mode: elt)
llm:
# Supported providers: gemini, openai, claude, qwen
provider: gemini
model: gemini-2.5-flash
api_key: ${GEMINI_API_KEY}
# Performance & Validation
chunk_size: 1000
streaming_threshold: 10000
validation:
strict: true
max_null_rate: 0.1Setting up Loafer for local development requires uv (or standard pip / hatch).
-
Clone and Install dependencies:
git clone https://github.com/yourusername/loafer.git cd loafer # If using uv for fast dependencies: uv sync # Or using standard pip: pip install -e ".[dev]"
-
Run Tests: Loafer uses
pytestfor unit and integration testing.pytest
-
Code Style: This project uses standard Python linting and formatting tools. We recommend running
ruffprior to committing:ruff check . ruff format .
Contributions are highly encouraged! Whether it’s a new data connector, a bug fix, or an improvement to the documentation, we'd love your help.
- Fork the repository.
- Create a new branch for your feature (
git checkout -b feature/amazing-connector). - Commit your changes with clear messages.
- Open a Pull Request against the
mainbranch.
Keep things simple and ensure any new features include appropriate tests.
- GitHub Repository: https://github.com/lupppig/loafer
- PyPI Package: https://pypi.org/project/loafer-etl/
This project is licensed under the MIT License. See the LICENSE file for details.