Skip to content

Multi-tenant refactor + Network integration (8 commits)#10

Merged
keithfawcett merged 11 commits intomainfrom
multi-tenant
Apr 27, 2026
Merged

Multi-tenant refactor + Network integration (8 commits)#10
keithfawcett merged 11 commits intomainfrom
multi-tenant

Conversation

@keithfawcett
Copy link
Copy Markdown
Contributor

Summary

The full multi-tenant refactor + the integration with the new openpartner-network repo.

Multi-tenant (commits 1–6):

  • feat(db): multi-tenant foundation + RLS — Tenant + tenantId on every data table; FORCE RLS policies + openpartner_app role.
  • feat(api): tenancy middleware + connection split — privileged `db` (admin pool) + `appDb` (app pool, RLS-engaged).
  • feat(tenancy): add tenantOf(req) helper — route ergonomics.
  • feat(api): route + helper refactor — every handler uses `tenantOf(req)`; helpers (auth-sessions, auth.resolvePrincipal, attribution, payouts, usage-billing, webhook-dispatcher, mailer, mail-settings, config) take `(db, tenantId, ...)`.
  • feat(api): public /signup + test seed tenantId fixes.
  • chore(ops): multi-tenant env + docker + DO + docs.
  • fix(tenancy): bypass RLS on privileged db, commit trx pre-response, add isolation tests — fixed two real bugs (privileged-pool RLS gating, trx-commit-after-response race) and added 9 isolation tests.

Network integration (commits 7–8):

  • feat(network): creator self-signup + vendor↔Network protocol — new POST /partner-signup; network-client.ts (NetworkOutbox, push/upsert/revoke, backfill, drainOutbox); /config/network settings + backfill endpoint; partners.ts wired to push on admin-create + revoke; NETWORK_FEDERATION constants; openpartner-network repo's protocol layer + 9 round-trip tests.
  • feat(network): vendor-side onboarding integration — signupWithNetwork + completeNetworkConnect helpers; POST /config/network/start-connect + /complete-connect; hosted /signup auto-registers when NETWORK_URL is set.
  • feat(portal): vendor admin Network UI — Connection, Offerings, Requests pages + complete-connect callback. NetworkProxy backend routes that decrypt the vendorToken and proxy to Network.

Test plan

  • `pnpm typecheck` clean
  • 82/82 vendor-side tests pass (against docker-compose postgres on :5433)
  • 49/49 network-side tests pass (in the openpartner-network repo, against :5435)
  • Vendor portal builds clean (318 KB JS, 92 KB gzip)
  • Manual: spin up a fresh tenant via `/signup`, confirm magic-link onboarding works end-to-end with the deployed Network
  • Manual: verify partner-signup → admin approve → Network upsert flow
  • Manual: connect a self-host instance to the Network via Settings → Network → Connect

Keith Fawcett and others added 11 commits April 26, 2026 01:27
Three new migrations and matching type updates lay the groundwork for
multi-tenant deployments while keeping single-tenant self-host working
identically (just with tenantId='default' baked in).

20260507000000_multi_tenant
  - New Tenant table; seeded 'default' tenant.
  - tenantId column on every data table (Partner, Campaign, Link, Click,
    Identity, Event, Attribution, Commission, Payout, ApiKey, Config,
    Admin, MagicLinkToken, Session, WebhookEndpoint, WebhookDelivery).
  - Backfill existing rows to the default tenant.
  - Re-scope unique constraints to be per-tenant: Partner.email,
    Admin.email, Link.linkKey, Config.(key→tenantId,key).

20260507010000_rls_policies
  - PlatformAdmin table (cross-tenant Coherence support staff).
  - RLS ENABLE + FORCE on every tenanted table.
  - Policy: row visible iff tenantId matches `app.tenant_id` GUC OR
    `app.platform_admin` GUC = 'on'.
  - Tenant table: row visible iff its id matches app.tenant_id (or
    platform admin). Same for PlatformAdmin.
  - Policies use COALESCE / current_setting(..., true) so an unset GUC
    returns 0 rows (default deny) instead of erroring.

20260507020000_app_role
  - Provisions a non-superuser openpartner_app role from
    OPENPARTNER_APP_DB_PASSWORD. Postgres bypasses RLS for superusers
    and BYPASSRLS roles regardless of FORCE, so RLS only protects when
    the app connects as a constrained role.
  - Grants DML (no DDL) on every tenanted table.
  - Idempotent: rotates password if the role already exists.
  - Skipped (with notice) when OPENPARTNER_APP_DB_PASSWORD is unset —
    self-host installs that don't need RLS isolation can run the app as
    the same role as migrations.

Migration runner sets `row_security = off` at session start so DDL
runs unrestricted.

Verified: connecting as openpartner_app, queries return 0 rows when
app.tenant_id is unset or mismatched, and only the in-scope tenant's
rows when set correctly. Platform-admin override works.

Types: every Row interface gained `tenantId: string`; new TenantRow,
PlatformAdminRow types and DEFAULT_TENANT_ID constant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two knex instances now:
  db (admin pool, DATABASE_URL)
    - migrations, signup, platform-admin tooling, jobs that need
      cross-tenant access
    - bypasses RLS (superuser/owner role)

  appDb (app pool, DATABASE_URL_APP if set, else DATABASE_URL)
    - normal request handling. When pointed at the openpartner_app role
      every query is subject to RLS.
    - per-request transaction in tenancy middleware sets
      `app.tenant_id` (and optionally `app.platform_admin = 'on'`)
      so RLS policies match correctly.

OPENPARTNER_TENANCY env (defaults 'single'):
  single  — every request runs as tenantId = DEFAULT_TENANT_ID. Self-host.
  multi   — path-based tenant resolution (/t/<slug>/...). Reserved
            slugs (www, api, app, signup, etc.) reject.

tenantMiddleware:
  - resolves tenantId for the request
  - opens a transaction on appDb
  - stamps req.db, req.tenantId, req.tenantSlug
  - awaits response finish before committing/rolling back so handler
    queries land in the right transaction context.

Routes will switch from `db('Partner')...` to `req.db('Partner')...`
and add `tenantId: req.tenantId` to inserts. That refactor is the next
commit; this one just lays the wiring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Architecture decisions, what's committed, file-by-file refactor plan,
test fixup plan, and how to resume. Read this first before continuing
the multi-tenant work on this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Section A + B + C + E of the multi-tenant refactor: every route handler
now uses tenantOf(req) for a per-request transaction with app.tenant_id
pinned. Helpers (auth-sessions, auth.resolvePrincipal, config, mail-
settings, mailer, attribution, payouts, usage-billing, webhook-dispatcher)
take Knex + tenantId as parameters. tenantMiddleware is mounted in
app.ts; install + metrics stay public above it. Stripe webhook resolves
tenantId from event metadata and runs each event in appDb.transaction
with SET LOCAL app.tenant_id. Scheduler iterates active tenants per
tick. Typecheck passes.

What this leaves: section D (public /signup), F (test seed updates so
35 of 64 currently-failing tests go green), G (multi-tenant isolation
tests), H (env config + ops). Documented in docs/multi-tenant-refactor.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Section D + F of the multi-tenant refactor.

D — POST /signup creates a Tenant + first Admin and emails an activation
magic link. Public, IP rate-limited (10/min), gated by slug validation
(/^[a-z0-9-]{3,30}$/, not in RESERVED_SLUGS, not already taken). Mounted
before tenantMiddleware in app.ts and uses the privileged db. Multi-mode
only — single-mode operators use /install.

F — every direct db().insert() in integration.test.ts, regressions.test.ts,
stripe-webhook.test.ts, and webhooks.test.ts now stamps tenantId:
DEFAULT_TENANT_ID. Test setups force OPENPARTNER_TENANCY=single. Cannot
verify against a live Postgres in this session; flagged as DONE BUT NOT
VALIDATED in docs/multi-tenant-refactor.md so the next pass runs the
suite first.

Handoff doc updated with current branch state and remaining work
(sections G, H + test validation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Section H of the multi-tenant refactor.

- .env.example: OPENPARTNER_TENANCY, OPENPARTNER_APP_DB_PASSWORD,
  DATABASE_URL_APP with explanatory comments.
- docker-compose.yml: mount docker/initdb so postgres provisions the
  openpartner_app role on first boot. Role is NOLOGIN if no password
  set so RLS isolation can still be exercised via SET ROLE in tests.
- .do/app.yaml: add OPENPARTNER_TENANCY=multi (default for hosted),
  DATABASE_URL_APP + OPENPARTNER_APP_DB_PASSWORD secrets on the api
  component.
- docs/deploy-production.md: rows for the new secrets in the env
  table; new "Multi-tenant rollout" subsection covering URL routing,
  signup, RLS engagement, Stripe webhook tenant resolution, reserved
  slugs, and the migration path from single-tenant.

The route, helper, signup, and stripe-webhook refactors plus this
ops layer make the multi-tenant branch deployable. What's left in
docs/multi-tenant-refactor.md is section G (live-Postgres isolation
tests) — needs a real DB to write meaningfully.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dd isolation tests

Section G of the multi-tenant refactor + two real bugs the existing
suite surfaced once it ran against a real Postgres.

Bug 1: privileged db was subject to FORCE RLS. The migration role
owns the tenanted tables but FORCE RLS still gates the owner unless
row_security is explicitly off or app.tenant_id is set. Without
either, /metrics, /signup, the stripe-webhook tenant resolver, the
scheduler, and every test's direct cleanup query silently saw zero
rows. Fixed by adding bypassRls: true to createDb (sets row_security
= off in afterCreate) and turning it on for the privileged pool. The
appDb (tenant pool) keeps RLS engaged.

Bug 2: tenantMiddleware committed the per-request transaction on
res.on('finish'), which fires AFTER the response is sent. Tests doing
`await request(app).post(...)` then `await db(...).insert(...)` raced
the commit and got FK violations because the route's writes weren't
yet visible. Fixed by patching res.json/send/end so the trx commits
(or rolls back on 5xx) before any byte goes out. Belt-and-suspenders
res.on('close') still rolls back if the patched methods are bypassed.

Section G: apps/api/src/__tests__/multi-tenant.test.ts — 9 tests
that connect as openpartner_app via SET ROLE inside a privileged-pool
transaction (so RLS engages because openpartner_app has neither
BYPASSRLS nor superuser). Covers default deny, per-tenant visibility,
WITH CHECK rejection on cross-tenant inserts, platform_admin override,
session isolation, and the Tenant table self-policy. Suite skips
cleanly with a warning if the openpartner_app role isn't provisioned.

Stripe webhook tenant resolution: customer/invoice/charge events that
don't carry our metadata now fall back to a local Identity → Click
lookup so checkout-stitched customers still route to the right tenant
on subsequent invoice.paid / charge.refunded.

Result: 73/73 tests pass against the docker-compose postgres.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three pieces, designed so the same code paths cover hosted multi-tenant
and self-host:

1. Public POST /partner-signup (apps/api/src/routes/partner-signup.ts).
   Tenant-scoped, IP rate-limited, creates a Partner row + magic link.
   Honors a per-tenant partner_signup config (auto_approve vs
   require_review, with disabled override). On hosted multi-tenant the
   URL is /t/<slug>/partner-signup; on self-host it's /partner-signup.

2. Vendor-side Network client (apps/api/src/network-client.ts) +
   NetworkOutbox migration. Fire-and-forget POSTs to /partners/upsert
   on creator events (signup, admin invite, revoke); failures persist
   to the outbox and the scheduler drains them every 5 min with
   exponential backoff (~24h max). vendorToken stored AES-GCM
   encrypted in Config (network_membership), never returned by GET
   /config/network. backfillPartners(...) reconciles a vendor's
   existing roster when they enable Network membership later — the
   Network dedups on email and returns alreadyExisted=true for
   creators who joined another vendor first.

3. Network protocol spec (docs/network-protocol.md). Defines the
   /vendors/register, /partners/upsert, /vendors/backfill-partners,
   and /vendors/me/heartbeat surface that openpartner-network
   implements. Spells out the identity model (vendorId,
   vendorPartnerId, networkCreatorId), auth rotation, and the
   late-join reconciliation behavior.

Wired into existing flows:
- POST /partners (admin invite) + /partners/:id/revoke push to Network
  when membership is enabled. autoEnroll gates new-partner upserts;
  revokes mirror unconditionally so a Network-known creator stops
  being matched after the vendor cuts them off.
- Settings router exposes GET/POST /config/network,
  POST /config/network/backfill, and GET/POST /config/partner-signup.
- Scheduler runs network-outbox-drain every 5 min per active tenant.

Tests (apps/api/src/__tests__/network-and-signup.test.ts, 9 cases)
spin up an in-process HTTP receiver to act as the Network and verify:
signup without Network is silent; with Network on stamps
networkCreatorId on Partner.metadata.network; with Network down
enqueues outbox; drain retries succeed; require_review still pushes
status=pending; admin invite + revoke push; late-join backfill flips
preExisting=true for emails the Network already knew; GET
/config/network never leaks the vendor token.

82/82 tests pass against the docker-compose postgres.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the openpartner vendor side to the openpartner-network
self-serve onboarding flow.

network-client.ts: signupWithNetwork() POSTs to /vendors/signup;
completeNetworkConnect() POSTs to /vendors/verify-and-issue-token.
Failures surface immediately to the admin (no outbox queueing — a
failed signup is something the admin retries by hand).

routes/settings.ts: POST /config/network/start-connect mints a fresh
scoped key with NETWORK_FEDERATION_SCOPES, calls signupWithNetwork
with inferred instanceUrl + portalCallbackUrl, stashes partial state
in network_membership Config (enabled=false until verify lands).
POST /config/network/complete-connect consumes the magic-link ntoken,
calls Network /vendors/verify-and-issue-token, saves the returned
vendorToken with enabled=true. Same shape works for hosted multi-
tenant tenants (slug-aware URL inference) and self-host (request host).

routes/signup.ts: hosted multi-tenant signup auto-calls
signupWithNetwork after Tenant/Admin creation when NETWORK_URL env
is set. Best-effort: a Network outage doesn't fail the signup; the
admin can finish later via Settings → Network → Connect button.
Returns network: { status, vendorId } in the signup response so the
portal can show the right next-step UI.

.env.example: NETWORK_URL added with explanatory comment.

82/82 vendor-side tests still pass (no regressions; the new endpoints
are additive).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the gap where vendors had backend wiring for Network membership
but no UI to use it. Without this, Network was invisible to vendor
admins on hosted multi-tenant + self-host.

Backend (apps/api):
- network-client.ts: NetworkProxyError + networkProxy.{listOfferings,
  createOffering, updateOffering, deleteOffering, listRequests,
  approveRequest, rejectRequest, whoami}. Decrypts the vendor token
  from network_membership Config and proxies to Network endpoints
  with the right bearer.
- routes/settings.ts: /admin/network/{me,offerings,offerings/:id,
  requests,requests/:id/approve,requests/:id/reject}. Each is a thin
  wrapper around networkProxy.* that turns NetworkProxyError into
  the appropriate HTTP status. Required because the vendorToken is
  a server-side secret — the portal can't hold it.

Portal (apps/portal):
- pages/admin/Network.tsx: connection status, contact-email/display-
  name form for the Connect button, autoEnroll toggle, backfill
  panel for late-join reconciliation.
- pages/admin/NetworkComplete.tsx: handles ?ntoken= callback from
  the Network onboarding email; calls /config/network/complete-connect,
  redirects to /admin/network on success. StrictMode-safe (one-shot
  guard).
- pages/admin/NetworkOfferings.tsx: list + create + publish/unpublish
  + delete. Campaign dropdown pulls from /campaigns. Form fields:
  title, description, productUrl, campaign, commission summary,
  cookie window.
- pages/admin/NetworkRequests.tsx: pending requests list with creator
  bio + pitch; approve dispatches federation (creates Partner +
  Link on this instance); reject + status filter (pending /
  approved / rejected / cancelled).

Wired into App.tsx routes + a new "Network" sidebar section
(Connection, Offerings, Requests).

Typecheck passes; portal builds (318 KB JS, 92 KB gzip).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@keithfawcett keithfawcett merged commit d895e65 into main Apr 27, 2026
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant