db-toolkit

Arrow-based data I/O for Go — like io.Reader/io.Writer, but for databases and file formats.

Overview

db-toolkit provides a unified streaming interface for moving data between databases and file formats using Apache Arrow as the in-memory representation. It follows a pull model: readers produce BatchStreams of Arrow records, writers consume them, and dio.Copy wires them together.

    Reader           BatchStream           Writer
┌────────────┐    ┌──────────────┐    ┌────────────┐
│ Source     │───▶│ Schema()     │───▶│ Destination│
│ Schema()   │    │ Next(ctx)    │    │            │
│ Read(ctx)  │    │ Close()      │    │ WriteStream│
└────────────┘    └──────────────┘    └────────────┘

Features:

Pull-model streaming — readers produce batches on demand, keeping memory usage bounded
Filter pushdown — push predicates down to the source (SQL WHERE clauses, Parquet row-group pruning)
Column projection — read only the columns you need
Platform registry — database platforms self-register via init(), activated with a blank import

Package Layout

Package	Description
`dio`	Core interfaces: `Reader`, `Writer`, `BatchStream`, `Filter`, `Copy`
`dio/csv`	CSV reader and writer
`dio/parquet`	Parquet reader and writer with row-group pruning via filter pushdown
`dio/gen`	Synthetic data generation for testing
`platform`	Platform registry (`Register`, `Get`, `List`)
`platform/postgres`	PostgreSQL reader, writer, and filter pushdown

Quick Start

package main

import (
    "context"
    "os"

    "github.com/rnestertsov/db-toolkit/dio"
    "github.com/rnestertsov/db-toolkit/dio/csv"
    "github.com/rnestertsov/db-toolkit/platform"
    "github.com/rnestertsov/db-toolkit/platform/postgres"
)

func main() {
    ctx := context.Background()

    pg, err := platform.Get("postgres")
    if err != nil {
        panic(err)
    }

    conn, err := pg.OpenConnection(ctx, postgres.ConnectionConfig{
        Name: "mydb",
        User: "myuser",
        Host: "localhost",
        Port: "5432",
    })
    if err != nil {
        panic(err)
    }
    defer conn.Close(ctx)

    reader, err := conn.Query(ctx, "SELECT id, name, email FROM users")
    if err != nil {
        panic(err)
    }

    writer := csv.NewWriter(os.Stdout)

    if err := dio.Copy(ctx, reader, writer); err != nil {
        panic(err)
    }
}

Key Concepts

BatchStream

A BatchStream is a pull-based iterator over Arrow record batches. Call Next(ctx) to get the next batch; it returns io.EOF when exhausted. Callers must call Release() on each returned record.

stream, err := reader.Read(ctx)
defer stream.Close()

for {
    rec, err := stream.Next(ctx)
    if err == io.EOF {
        break
    }
    // process rec
    rec.Release()
}

Filter Pushdown

Pass filters as ReadOptions to push predicates to the source. SQL databases translate them into WHERE clauses; Parquet readers use them for row-group pruning.

stream, err := reader.Read(ctx,
    dio.WithFilter([]dio.Filter{
        dio.Eq{Column: 0, Value: 42},
        dio.GtEq{Column: 1, Value: "2024-01-01"},
    }),
)

Readers optionally implement dio.FilterClassifier to report how precisely they handle each filter (FilterExact, FilterInexact, or FilterUnsupported).

Column Projection

Read only the columns you need:

stream, err := reader.Read(ctx,
    dio.WithProjection([]int{0, 2, 5}),
)

Platform Registry

Platforms register themselves at import time. Activate a platform with a blank import, then retrieve it by name:

import _ "github.com/rnestertsov/db-toolkit/platform/postgres"

pg, err := platform.Get("postgres")

Supported Platforms

Platform	Package	Status
PostgreSQL	`platform/postgres`	Supported

Adding a new platform? See docs/adding-a-platform.md.

Development

Prerequisites

Go 1.24+
Docker (for integration tests using testcontainers)

Build

make build

Test

make test

Integration tests spin up real database instances via testcontainers, so Docker must be running.

License

This project is licensed under the MIT License — see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
dio		dio
docs		docs
platform		platform
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

db-toolkit

Overview

Package Layout

Quick Start

Key Concepts

BatchStream

Filter Pushdown

Column Projection

Platform Registry

Supported Platforms

Development

Prerequisites

Build

Test

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

db-toolkit

Overview

Package Layout

Quick Start

Key Concepts

BatchStream

Filter Pushdown

Column Projection

Platform Registry

Supported Platforms

Development

Prerequisites

Build

Test

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages