Add replication plugin for external-to-internal data pull#143
Open
brone1323 wants to merge 1 commit intoouterbase:mainfrom
Open
Add replication plugin for external-to-internal data pull#143brone1323 wants to merge 1 commit intoouterbase:mainfrom
brone1323 wants to merge 1 commit intoouterbase:mainfrom
Conversation
Pulls rows from an external Postgres or MySQL source into the DO SQLite on a configurable per-table interval, advancing a per-table watermark on every successful run. Driven by REPLICATION_CONFIG_JSON; runs on a Cloudflare Cron Trigger via the Worker scheduled() handler. Resolves outerbase#72
Author
Demo videoGenerated programmatically (HTML deck → Playwright recordVideo → ffmpeg). Hosted as a release asset on the fork to comply with the Algora claim requirement. Sammy / @brone1323 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
/claim #72
Closes #72.
What this PR does
Adds a
ReplicationPlugin(plugins/replication/) that pulls rows from an external Postgres or MySQL source into the DO SQLite on a configurable per-table interval. Append-only polling, watermark-per-table, audit log per tick. Schema is reflected from the source on the first sync so users don't have to maintain a parallel DDL.Driven by a single env var,
REPLICATION_CONFIG_JSON, validated at plugin construction. Runs on the Worker'sscheduled()handler — no new scheduling primitive, no external service.Architecture
Adapter pattern —
ReplicationAdapter(types.ts) is the contract. Two built-in adapters:PostgresAdapter(postgres.js — already independencies) — schema reflection viainformation_schema.columns+pg_indexfor PK detection. Paged pullSELECT * FROM ... WHERE wm > $1 ORDER BY wm ASC LIMIT pageSize.MysqlAdapter(mysql2/promise — already independencies) — same shape, MySQLinformation_schemafor the metadata.Custom adapters can be plugged in via
adapterFactoryon the plugin constructor (so users can replicate from sources this PR doesn't ship — SQL Server, ClickHouse, REST — without forking the plugin).Watermarks live in
_starbase_replication_watermarks(source, table, watermark_column, last_value, last_run_ts)and are upserted per page, not per run, so a mid-run failure on page N+1 doesn't redo pages 1..N on the next tick.Audit log in
_starbase_replication_log(ts, source, table, rows_pulled, ok, error)records every tick — successful and failed — for observability via the normal/queryendpoint.Failure isolation — if pulling table A fails, the watermark for A is not advanced and the failure is logged, but tables B and C in the same tick still run. Next tick re-attempts A from its prior watermark.
HTTP surface is intentionally minimal — two admin-only endpoints:
POST /replication/run— manually fire all due tables (admin token required)GET /replication/status— read current watermarks (admin token required)That's it. No CRUD admin API, no web UI, no mutable runtime config — config is the env var, observability is the audit table, and any further introspection is a
SELECTagainst the two replication tables.Differentiation from PR #138
PR #138 ships a sprawling REST API + CronPlugin coupling + 33 tests but reinvents scheduling and configuration plumbing the platform already provides. This PR is intentionally smaller: it leans on the Worker's existing
scheduled()handler, validates config at construction, and keeps the surface to the watermark + audit table + adapter contract that the issue actually asks for. It also exposes anadapterFactoryso non-Postgres/MySQL sources are a one-line plug-in, not a new in-tree adapter.Tests
plugins/replication/index.test.ts— 17 tests, all passing:buildCreateTablerenders correct CREATE TABLE with PK clause, omits PK when none, throws on zero columnsbuildInsertusesINSERT OR REPLACEwhen PK is configured,INSERT OR IGNOREfor append-onlyintervalSecondsCREATE TABLEis issued only on the first sync per table, not every tickpull()failure does not block others, and the failure is recorded in the audit logclose()releases every instantiated adapterThe 4 failing tests in
src/rls/index.test.tsalready fail onmainand are unrelated to this PR.Files
plugins/replication/index.ts— plugin + schedulingplugins/replication/sql.ts— SQL builders, watermark/log DDLplugins/replication/types.ts—ReplicationAdapter,ReplicationConfigplugins/replication/adapters/postgres.tsplugins/replication/adapters/mysql.tsplugins/replication/index.test.tsplugins/replication/README.mdplugins/replication/meta.jsonsrc/index.ts— registers the plugin and adds thescheduled()entry pointwrangler.toml— example config + cron trigger commentWiring (for the maintainer reviewing)
[triggers] crons = ["* * * * *"]inwrangler.toml(the plugin handles per-table cadence on top — you only need the smallest interval).REPLICATION_CONFIG_JSONin[vars]or as a secret.If
REPLICATION_CONFIG_JSONis unset thescheduled()handler is a no-op and the plugin is invisible.