Skip to content

Harden DuckDB extension persistence#185

Open
nicosuave wants to merge 8 commits into
review/python-native-contractfrom
review/duckdb-extension-hardening
Open

Harden DuckDB extension persistence#185
nicosuave wants to merge 8 commits into
review/python-native-contractfrom
review/duckdb-extension-hardening

Conversation

@nicosuave
Copy link
Copy Markdown
Member

Base: #184

  • Make DuckDB extension persistence atomic and context-scoped across database connections.
  • Persist metrics, dimensions, and segments under their owning model block so autoload rebuilds the same graph.
  • Tighten CREATE MODEL parsing, sidecar validation, lock handling, and extension build linking across native targets.
  • Update DuckDB extension SQL tests and docs for the supported load/install path.

@nicosuave nicosuave force-pushed the review/python-native-contract branch from cd7a08e to 4b9c638 Compare May 31, 2026 18:49
@nicosuave nicosuave force-pushed the review/duckdb-extension-hardening branch from 66c6541 to c712342 Compare May 31, 2026 18:49
@nicosuave nicosuave marked this pull request as ready for review May 31, 2026 21:47
@nicosuave nicosuave force-pushed the review/duckdb-extension-hardening branch 3 times, most recently from 9a6d2d2 to 67dd8ce Compare May 31, 2026 22:35
@nicosuave
Copy link
Copy Markdown
Member Author

@codex review

@nicosuave nicosuave force-pushed the review/python-native-contract branch from b3ee67c to 1b91738 Compare May 31, 2026 23:24
@nicosuave nicosuave force-pushed the review/duckdb-extension-hardening branch from 67dd8ce to 6cdc4af Compare May 31, 2026 23:24
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6cdc4afeab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread sidemantic-rs/src/ffi.rs Outdated
impl Drop for DefinitionsFileLock {
fn drop(&mut self) {
self.file.take();
let _ = fs::remove_file(&self.path);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard lock cleanup against stolen locks

If a writer is paused or slow for longer than DEFINITIONS_STALE_LOCK_AFTER, another process can classify its lock as stale, remove and recreate the same .lock file, and then this original guard will unconditionally remove that newer lock when it drops. That opens a window for two writers to enter the critical section and lose or corrupt persisted definitions; write an owner token into the lock and only unlink when it still matches, or avoid stealing locks that may still be live.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2baf6794d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread sidemantic-rs/src/ffi.rs
idx += 1;
}

if let Some(statement_start) = start {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Split persisted model blocks without semicolons

When definitions are created through clients that omit trailing semicolons (for example con.execute("SEMANTIC CREATE MODEL ...")), append_definition_to_content preserves the semicolonless text. This fallback then makes statement_ranges return one statement spanning the rest of the sidecar, so model_definition_ranges sees all subsequent MODEL definitions as part of the first block; replacing the first model can drop later models, and adding a metric to a later model reports that it is not present in the persisted file. Split on top-level definition keywords as well as semicolons, or normalize persisted definitions to include semicolons.

Useful? React with 👍 / 👎.

@nicosuave nicosuave force-pushed the review/duckdb-extension-hardening branch 3 times, most recently from e598752 to 9a62d99 Compare June 1, 2026 01:01
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9a62d99e9a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread sidemantic-rs/src/ffi.rs Outdated
Comment on lines +820 to +821
if is_stale_definitions_lock(&lock_path) {
let _ = fs::remove_file(&lock_path);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Verify stale lock before unlinking

When two processes concurrently encounter the same stale sidecar lock, both can pass is_stale_definitions_lock before either recreates it; one process can then remove the old file and create its lock, only for the other process to execute this unconditional remove_file and delete that new lock before acquiring its own. That leaves two writers in the critical section despite the owner-token cleanup on drop; the fresh evidence in the current code is that the stale-removal path still unlinks by path without verifying the token or metadata it classified as stale.

Useful? React with 👍 / 👎.

@nicosuave nicosuave force-pushed the review/duckdb-extension-hardening branch from 9a62d99 to 17f7749 Compare June 1, 2026 01:11
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17f77491f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +400 to +401
} else if (ch == ',' && depth == 0) {
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Track braces when scanning model properties

When the new CREATE MODEL name (...) shorthand is used with a structured property before the top-level name, this scanner treats commas inside {} or [] values as top-level separators because it only tracks parentheses. For example SEMANTIC CREATE MODEL orders (metadata {owner analytics, name internal}, table orders) is a valid body shape for the Rust parser's metadata object support, but this code resumes at the nested name key and reports a spurious name mismatch against orders, blocking the model from being created. Track bracket/brace depth here (as the Rust property parser does) before breaking on commas.

Useful? React with 👍 / 👎.

@nicosuave nicosuave force-pushed the review/duckdb-extension-hardening branch 4 times, most recently from 3791e32 to 9c48e6c Compare June 1, 2026 03:22
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9c48e6cbdd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return false;
}
}
return std::isspace(value[pos + keyword.size()]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Recognize all supported name separators

When a SEMANTIC CREATE MODEL outer (...) body uses the supported name: or name= property syntax, this check returns false because it only accepts whitespace after name. The Rust SQL parser accepts : and = separators, so the later injection path adds a second name instead of validating the existing one; for example SEMANTIC CREATE MODEL orders (name: customers, table orders, primary_key order_id) is accepted and parsed/persisted as customers rather than being rejected as a mismatch.

Useful? React with 👍 / 👎.

@nicosuave nicosuave force-pushed the review/duckdb-extension-hardening branch from 9c48e6c to ce1456b Compare June 1, 2026 03:42
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab3938df65

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread sidemantic-rs/src/ffi.rs Outdated
if !result.is_empty() {
result.push_str("\n\n");
}
result.push_str(trimmed_definition);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Terminate persisted definitions before appending

When a client creates a simple AS definition without a trailing semicolon (for example SEMANTIC CREATE METRIC revenue AS SUM(amount)) and a later model/metric is appended, this preserves the semicolonless text. The Rust SQL parser's simple AS parsers read the expression until a semicolon or EOF, so on autoload the next MODEL line is consumed as part of the metric SQL instead of being loaded as a separate definition; normalize persisted definitions with a semicolon before adding the newline.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14aeffbb61

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread sidemantic-rs/src/ffi.rs
Comment on lines +462 to +465
bytes
.get(idx + keyword.len())
.map(|byte| byte.is_ascii_whitespace())
.unwrap_or(true)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept compact definition keywords

When definitions use the parser-supported compact form without a space after the keyword, such as MODEL(name orders, ...) or METRIC(name revenue, ...), this boundary check returns false because it only accepts whitespace. The previous scanner accepted ( and parse_sql_model still accepts these forms, so a model created through the FFI as MODEL(name orders, ...) is persisted but later sidemantic_add_definition reports the model is not present in the sidecar; compact item replacements also fail to remove the old item and then fail validation as duplicates. Treat ( as a valid keyword boundary here as well.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant