Skip to content

feat: adopt dedicated PostgreSQL schema for Recoco internal tracking tables#94

Merged
bashandbone merged 3 commits intomainfrom
copilot/adopt-db-schema-for-recoco
Mar 16, 2026
Merged

feat: adopt dedicated PostgreSQL schema for Recoco internal tracking tables#94
bashandbone merged 3 commits intomainfrom
copilot/adopt-db-schema-for-recoco

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 13, 2026

Syncs the upstream CocoIndex architecture change (#1459): all Recoco-internal tables (cocoindex_setup_metadata, <flow>__cocoindex_tracking, <flow>__cocoindex_srcstate) can now be isolated in a dedicated PostgreSQL schema, keeping them out of user-facing schemas.

Changes

Settings struct (settings.rs)

  • Added db_schema_name: Option<String> — when set, all internal tables are placed in that schema; defaults to None (existing behavior, uses connection default schema)

Global schema store (lib_context.rs)

  • Added INTERNAL_DB_SCHEMA (LazyLock<RwLock<Option<String>>>) — stores the configured schema at init time for synchronous access throughout DB helpers
  • Added get_internal_db_schema() -> Option<String> public accessor
  • create_lib_context persists the schema; clear_lib_context clears it (test isolation)

Schema qualification helpers

  • qualify_table_name_with_schema(table: &str) -> String in db_tracking_setup.rs — returns schema.table or plain table depending on config
  • get_setup_metadata_table_name() -> String in setup/db_metadata.rs — same for the metadata table

SQL operations (db_tracking.rs, db_tracking_setup.rs, setup/db_metadata.rs)

  • All table CREATE/INSERT/SELECT/UPDATE/DELETE/DROP/RENAME operations updated to use schema-qualified names
  • MetadataTableSetup::apply_change issues CREATE SCHEMA IF NOT EXISTS before table creation when a schema is configured
  • Fixed schema-aware existence check in read_setup_metadata — parameterized schemaname/tablename query instead of hardcoded schemaname = 'public'
  • Table RENAME: source is schema-qualified; destination uses only the sanitized table name (PostgreSQL syntax requirement)

Usage

use recoco::settings::{Settings, DatabaseConnectionSpec};

let settings = Settings {
    database: Some(DatabaseConnectionSpec {
        url: "******localhost:5432/mydb".to_string(),
        ..
    }),
    // Internal tables go to `recoco_state` schema; schema is auto-created
    db_schema_name: Some("recoco_state".to_string()),
    ..Default::default()
};
recoco::lib_context::init_lib_context(Some(settings)).await?;

When db_schema_name is None (default), behavior is unchanged — tables land in the connection's default schema.

Documentation (site/src/content/docs/reference/configuration.md)

  • Added db_schema_name to the settings reference with example and behavioral notes

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • test
    • Triggering command: /home/REDACTED/work/recoco/recoco/target/debug/deps/recoco_core-e9addf7bc802a9c3 /home/REDACTED/work/recoco/recoco/target/debug/deps/recoco_core-e9addf7bc802a9c3 /home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_-z /home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_relro /home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_-o /home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_/home/REDACTED/work/recoco/recoco/target/debug/deps/libasync_openai_macros-b4a0e9c4e5088142.so /home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_/usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/crti.o /home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_/usr/lib/gcc/x86_64-linux-gnu/13/crtbeginS.o /src/lib.rs /home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_-L/home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_64-REDACTED-linux-gnu/lib /home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_64-u�� /home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_64-REDACTED-linux-gnu/lib/librustc_std_workspace_alloc-1aa74596e1d30fe3.rlib /home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_64-REDACTED-linux-gnu/lib/libminiz_oxide-92023c1cb0992e10.rlib -194�� ld/blake3-1ddee7/home/REDACTED/work/recoco/recoco/target/debug/deps/indoc-c245401abdd5e8b3.indoc.3/home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_64-REDACTED-linux-gnu/bin/rust-lld -1949cf8c6b5b557/home/REDACTED/work/recoco/recoco/target/debug/deps/indoc-c245401abdd5e8b3.indoc.3-flavor f/schemars_deriv/home/REDACTED/work/recoco/recoco/target/debug/deps/indoc-c245401abdd5e8b3.indoc.3gnu -g /index.crates.io-m64 d36f965af7/out/b/home/REDACTED/work/recoco/recoco/target/debug/build/thiserror-807aad773eed03cf/ru-plugin-opt=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper /index.crates.io/home/REDACTED/work/recoco/recoco/target/debug/build/thiserror-807aad773eed03cf/bu-plugin-opt=-fresolution=/tmp/ccpL1iwf.res (dns block)
    • Triggering command: /home/REDACTED/work/recoco/recoco/target/debug/deps/recoco_core-e9addf7bc802a9c3 /home/REDACTED/work/recoco/recoco/target/debug/deps/recoco_core-e9addf7bc802a9c3 debuginfo=2 -C debug-assertions=on 0d67�� 0d6780f47d96841.derive_where.ee2dcf92753f67b2-cgu.05.rcgu.o 0d6780f47d96841.derive_where.ee2dcf92753f67b2-cgu.06.rcgu.o bin/rustc 0d6780f47d96841./home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/bin/rustc dbdd36-cgu.0.rcg--crate-name u.o bin/rustc 0d67�� 0d6780f47d96841.--error-format=json 0d6780f47d96841.--json=diagnostic-rendered-ansi,artifacts,future-incompat bin/rustc /lto-wrapper ul_p384.o ul_p384_alt.o bin/rustc (dns block)
    • Triggering command: /home/REDACTED/work/recoco/recoco/target/debug/deps/recoco_core-e9addf7bc802a9c3 /home/REDACTED/work/recoco/recoco/target/debug/deps/recoco_core-e9addf7bc802a9c3 _macro-2f105f69e--crate-name _macro-2f105f69earraydeque bin/rustc _mac�� a.rs stable-x86_64-un--json=diagnostic-rendered-ansi,artifacts,future-incompat f/rustls-0.23.36--crate-type f98bb5df7b618.rliptables e-8f6043c0591a70-w 462bfddf44ae.rli-t 463b17398.rlib 3026�� e050294cd.rlib 7d0200b95c8.rlib-C ld/mime_guess-4bcodegen-units=256 s/async_stream_igit s/async_stream_ishow s/libunicase-10b30ba25993b1599363559dccc5b12ede21141733f:crates/recoco-core/src/execution/db_tracking.rs known-linux-gnu/debug-assertions=on (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>[Upstream-Sync] [ARCHITECTURE] Adopt dedicated DB schema for Recoco internal tracking tables (upstream PR #1459)</issue_title>
<issue_description>### Summary

Adopt the upstream architecture change from cocoindex-io/cocoindex#1459: introduce a dedicated DB schema (namespace) for Recoco internal tracking tables, following CocoIndex upstream. This cleanly separates Recoco's bookkeeping tables from user/application tables for improved maintainability, multi-tenant safety, and DB hygiene.

Motivation

  • Upstream now uses a specific DB schema for all internal state tables; this keeps tracking/metadata tables (used for state, setup, etc.) out of user-facing schemas.
  • Benefits: avoids name collisions, simplifies migrations/backups, supports multi-tenant and shared DB environments, respects user DB boundaries.

What changes upstream?

  • PR knitli/recoco#1459: Internal tables moved into a named schema, with settings/configs to allow overrides.
  • Major files: execution/db_tracking.rs, db_tracking_setup.rs, source_indexer.rs, settings.rs, setup/db_metadata.rs (in Recoco: likely crates/recoco-core/src/execution/ and settings).
  • Tracking tables are now always created in (by default) a schema like recoco_state instead of the default DB schema.

Action for Recoco

  • Audit all state/tracking tables for use of hardcoded or default schema.
  • Update schema creation, queries, and migration logic to use a dedicated internal namespace (e.g., recoco_state).
  • Provide a config option for advanced deployments to override the namespace if needed.
  • Update integration tests to expect the new schema separation.

References

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…m PR #1459)

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Mar 13, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
❌ Deployment failed
View logs
recoco-docs acf16dd Mar 13 2026, 09:41 PM

Copilot AI changed the title [WIP] [Upstream-Sync] Adopt dedicated DB schema for Recoco internal tracking tables feat: adopt dedicated PostgreSQL schema for Recoco internal tracking tables Mar 13, 2026
Copilot AI requested a review from bashandbone March 13, 2026 20:50
@bashandbone bashandbone marked this pull request as ready for review March 13, 2026 21:08
Copilot AI review requested due to automatic review settings March 13, 2026 21:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional dedicated PostgreSQL schema for Recoco/CocoIndex internal tables to keep bookkeeping tables out of user-facing schemas.

Changes:

  • Introduces Settings.db_schema_name and persists it globally for DB helpers.
  • Schema-qualifies internal metadata/tracking table SQL and creates the schema when configured (metadata path).
  • Documents the new configuration option and adds deserialization tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
site/src/content/docs/reference/configuration.md Documents db_schema_name and shows configuration example.
crates/recoco-core/src/setup/db_metadata.rs Qualifies setup metadata table name and creates the schema before table creation.
crates/recoco-core/src/settings.rs Adds db_schema_name to settings plus JSON deserialization tests.
crates/recoco-core/src/lib_context.rs Stores configured internal schema in a global lock for synchronous access.
crates/recoco-core/src/execution/db_tracking_setup.rs Adds helper to qualify internal table names with schema and updates rename/drop usage.
crates/recoco-core/src/execution/db_tracking.rs Updates tracking table queries to use schema-qualified table names.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +67 to +72
let schema_name = get_internal_db_schema().unwrap_or_else(|| "public".to_string());
let exists: Option<bool> = sqlx::query_scalar(
"SELECT EXISTS (SELECT 1 FROM pg_tables WHERE schemaname = 'public' AND tablename = $1)",
"SELECT EXISTS (SELECT 1 FROM pg_tables WHERE schemaname = $1 AND tablename = $2)",
)
.bind(SETUP_METADATA_TABLE_NAME)
.bind(&schema_name)
.bind(SETUP_METADATA_TABLE_NAME_UNQUALIFIED)
@@ -38,14 +57,14 @@ async fn upgrade_tracking_table(
existing_version_id: i32,
) -> Result<()> {
if existing_version_id < 1 && desired_state.version_id >= 1 {
Comment on lines +66 to +67
let query = format!(
"CREATE TABLE IF NOT EXISTS {qualified_table_name} (
}),

// Optional: store internal tracking tables in a dedicated schema
// instead of the default public schema

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `db_schema_name` | `Option<String>` | `None` | PostgreSQL schema for internal Recoco tracking/metadata tables. When set, a schema is auto-created and all internal tables are placed there, keeping them separate from application tables. When unset, the default schema (usually `public`) is used. |
Comment on lines +346 to +348
pub fn get_internal_db_schema() -> Option<String> {
INTERNAL_DB_SCHEMA.read().unwrap().clone()
}
@bashandbone
Copy link
Copy Markdown
Contributor

@copilot apply changes based on the comments in this thread

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 13, 2026

@copilot apply changes based on the comments in this thread

Applied all four review fixes in acf16dd:

  1. Existence check (db_metadata.rs): Replaced the pg_tables WHERE schemaname = 'public' query with SELECT to_regclass($1) IS NOT NULL using the already-computed schema-qualified table name — correctly handles any search_path and custom schema.

  2. Schema creation before table DDL (db_tracking_setup.rs): Added ensure_internal_schema_exists() helper (mirrors MetadataTableSetup::apply_change's pattern) and call it at the top of both upgrade_tracking_table and create_source_state_table before any CREATE TABLE.

  3. RwLock poisoning (lib_context.rs): All three INTERNAL_DB_SCHEMA lock accesses now use .unwrap_or_else(|e| e.into_inner()) to recover from a poisoned lock rather than panicking.

  4. Docs (configuration.md): Updated all references from "default public schema" to "connection's default schema (often public, but determined by search_path)".

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • test
    • Triggering command: /home/REDACTED/work/recoco/recoco/target/debug/deps/recoco_core-b2f1c4eca26c649c /home/REDACTED/work/recoco/recoco/target/debug/deps/recoco_core-b2f1c4eca26c649c --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no -C .10/src/lib.rs -C debuginfo=2 -C debug-assertions=on s/li�� s/libfind_msvc_t/home/REDACTED/work/recoco/recoco/target/debug/deps/indoc-5d650358fccbfb73.indoc.csed s/libjobserver-b/home/REDACTED/work/recoco/recoco/target/debug/deps/indoc-5d650358fccbfb73.indoc.c-ne s/liblibc-bf90f0/home/REDACTED/work/recoco/recoco/target/debug/deps/indoc-5d650358fccbfb73.indoc.cs/^session\.save_handler=\(.*\)$/\1/p s/libshlex-de6b9/home/REDACTED/.rustup/toolchains/stable-x86_64-REDACTED-linux-gnu/bin/rustc ild.8dc4649c9c80--crate-name ild.8dc4649c9c80futures_timer ild.8dc4649c9c80--edition=2018 (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@bashandbone bashandbone merged commit 074a2a9 into main Mar 16, 2026
7 of 8 checks passed
@bashandbone bashandbone deleted the copilot/adopt-db-schema-for-recoco branch March 16, 2026 00:25
@github-project-automation github-project-automation bot moved this from Backlog to Done in Recoco v1.0.0 Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Upstream-Sync] [ARCHITECTURE] Adopt dedicated DB schema for Recoco internal tracking tables (upstream PR #1459)

3 participants