Skip to content

Add a native SQLite output connector #212

@zxqfd555

Description

@zxqfd555

Is your feature request related to a problem? Please describe.

Pathway currently has pw.io.sqlite.read, which is useful for exchanging data between Pathway and external processes (other Pathway pipelines, third-party applications, etc.). However, there is no corresponding pw.io.sqlite.write, making SQLite a read-only integration and breaking the symmetry that exists for all other database connectors (PostgreSQL, MySQL, MS SQL Server, MongoDB). Users who want to use SQLite as a shared data store between two Pathway processes have no native way to do so.

Describe the solution you'd like

Add a native SQLite output connector implemented in Rust using the existing rusqlite crate (MIT license), exposed as pw.io.sqlite.write.

The connector must be consistent with the interface established by the other database output connectors (e.g. pw.io.mysql.write, pw.io.postgres.write). Specifically:

Crate: rusqliteMIT license. Already used by pw.io.sqlite.read; no new dependency needed.

API:

pw.io.sqlite.write(
    table,
    path,
    table_name,
    *,
    max_batch_size=None,
    init_mode="default",                    # "default" | "create_if_not_exists" | "replace"
    output_table_type="stream_of_changes",  # "stream_of_changes" | "snapshot"
    # ... some other parameters consistent with other DB connectors (primary_key, name, sort_by, etc.)
)

init_mode options (matching the convention from other DB connectors):

  • "default" — write to an existing table; raise an error if it does not exist.
  • "create_if_not_exists" — create the table if it does not already exist, inferred from the Pathway schema.
  • "replace" — drop and recreate the table at startup.

output_table_type options:

  • "stream_of_changes" (default) — append a log of all changes to the target table, with additional time (INTEGER) and diff (INTEGER, +1 for insertion, -1 for deletion) columns, consistent with other connectors.
  • "snapshot" — maintain the current state of the data, applying upserts and deletes atomically per minibatch. Requires primary_key to be specified.

Type mapping and round-trip guarantee: the type serialization must be consistent with how pw.io.sqlite.read currently deserializes values, so that any value written via pw.io.sqlite.write survives a round-trip through pw.io.sqlite.read unchanged. The finalized type mapping table should be added to the pw.io.sqlite API docs page as part of this work, analogously to how it is done for pw.io.mysql.

Describe alternatives you've considered

Users can currently write to SQLite via a Python output connector, but this bypasses the engine-level batching and transaction management and is significantly more error-prone. There is no reason SQLite should be treated differently from the other database connectors.

Additional context

Testing: Since SQLite requires no external server or Docker container, the integration tests do not need to go into a separate Docker Compose-based test suite. They should be added directly to the existing unit test suite (e.g. test_io), alongside the existing pw.io.sqlite.read tests. The test coverage should include:

  • Writing in stream_of_changes mode and reading back, asserting equality.
  • Writing in snapshot mode and reading back, asserting equality.
  • All three init_mode variants (default, create_if_not_exists, replace).
  • Round-trip tests for every Pathway type with a defined SQLite mapping — write via pw.io.sqlite.write, read back via pw.io.sqlite.read, assert the value is unchanged. These are analogous to the *_parsing tests that exist for other connectors.

Since this is a completely new connector being added, no backward compatibility issues are expected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions