fix: serialize same-node cold writers with a node-local advisory lock#36
Conversation
|
Warning Review limit reached
Next review available in: 10 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (12)
📝 WalkthroughWalkthroughThis PR adds a transaction-scoped advisory lock to ChangesSame-node claim serialization
Estimated code review effort: 3 (Moderate) | ~20 minutes Related issues: Suggested labels: bakery, concurrency, ci, mesh Suggested reviewers: maintainers familiar with the coldfront bakery/claim protocol and mesh CI journeys 🐰 In the bakery, tickets queue with care, 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ci/journey.sh`:
- Around line 1153-1166: Replace the predictable /tmp/journey-mw.* per-writer
log files in the multi-writer test block with files created inside a fresh
mktemp -d directory, and register a trap to remove that directory on exit.
Update the q command redirects and the later cat/rm cleanup in the same
journey.sh test section so they reference the mktemp-managed paths rather than
fixed /tmp names.
In `@docs/formal/Bakery_v2.tla`:
- Around line 33-43: The current Bakery_v2 TLA model only documents the
deployment refinement and still encodes a single writer/claim per node, so it
does not check the new multi-writer-per-node topology. Update the PlusCal/TLA
state machine in Bakery_v2.tla and the corresponding Bakery.tla translation so
the checked-in model represents the new per-node concurrency, then re-run
pcal.trans and the required TLC configs. Verify that all safe configs still
report no errors and that Bakery_v2_race.cfg continues to violate
NoLakekeeperConflict.
In `@extension/coldfront/coldfront--1.0.sql`:
- Around line 2065-2072: The advisory lock in the coldfront claim/commit path is
using a 32-bit key via hashtext(), which can collide across different
p_iceberg_table values and incorrectly serialize unrelated tables. Update the
lock acquisition in the same-node serialization section to use a 64-bit key with
hashtextextended(..., 0) and pass that result to pg_advisory_xact_lock(bigint),
keeping the existing behavior of holding the lock for the whole claim+commit
flow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f497cf8f-768f-4ebe-aea8-545bdfea9162
📒 Files selected for processing (3)
ci/journey.shdocs/formal/Bakery_v2.tlaextension/coldfront/coldfront--1.0.sql
| rm -f /tmp/journey-mw.* 2>/dev/null | ||
| # Fire N writers on db1 AND N on the peer, all at once, all to iceonly. | ||
| for k in $(seq 1 "$N"); do | ||
| q "$HOST" "INSERT INTO iceonly VALUES ($((7000+k)),date_trunc('month',now()) + interval '3 months' + interval '$k hours','mw','{}');" >"/tmp/journey-mw.a$k" 2>&1 & | ||
| pids+=("$!") | ||
| q "$p1" "INSERT INTO iceonly VALUES ($((8000+k)),date_trunc('month',now()) + interval '3 months' + interval '$k hours 30 minutes','mw','{}');" >"/tmp/journey-mw.b$k" 2>&1 & | ||
| pids+=("$!") | ||
| done | ||
| for k in "${pids[@]}"; do wait "$k"; done | ||
| assert_eq "no multi-writer-per-node cross-node cold writer errored (no 409)" "0" \ | ||
| "$(cat /tmp/journey-mw.* 2>/dev/null | grep -cEi 'error|conflict|409')" | ||
| assert_eq "all $((2 * N)) multi-writer-per-node cross-node cold writes landed" "$((2 * N))" \ | ||
| "$(q "$HOST" "SELECT count(*) FROM iceonly WHERE status='mw';")" | ||
| rm -f /tmp/journey-mw.* 2>/dev/null |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟡 Minor | ⚡ Quick win
Use mktemp for the per-writer output files.
These /tmp/journey-mw.* paths are predictable, so another local process can pre-create or symlink them and corrupt the test run. Please write into a mktemp -d directory and clean it up with trap.
Suggested hardening
- rm -f /tmp/journey-mw.* 2>/dev/null
+ local tmpdir
+ tmpdir="$(mktemp -d "${TMPDIR:-/tmp}/journey-mw.XXXXXX")" || { fail "mesh: mktemp failed"; return; }
+ trap 'rm -rf "$tmpdir"' RETURN
@@
- q "$HOST" "INSERT INTO iceonly VALUES ($((7000+k)),date_trunc('month',now()) + interval '3 months' + interval '$k hours','mw','{}');" >"/tmp/journey-mw.a$k" 2>&1 &
+ q "$HOST" "INSERT INTO iceonly VALUES ($((7000+k)),date_trunc('month',now()) + interval '3 months' + interval '$k hours','mw','{}');" >"$tmpdir/a$k" 2>&1 &
@@
- q "$p1" "INSERT INTO iceonly VALUES ($((8000+k)),date_trunc('month',now()) + interval '3 months' + interval '$k hours 30 minutes','mw','{}');" >"/tmp/journey-mw.b$k" 2>&1 &
+ q "$p1" "INSERT INTO iceonly VALUES ($((8000+k)),date_trunc('month',now()) + interval '3 months' + interval '$k hours 30 minutes','mw','{}');" >"$tmpdir/b$k" 2>&1 &
@@
- "$(cat /tmp/journey-mw.* 2>/dev/null | grep -cEi 'error|conflict|409')"
+ "$(cat "$tmpdir"/* 2>/dev/null | grep -cEi 'error|conflict|409')"
@@
- rm -f /tmp/journey-mw.* 2>/dev/null
+ rm -rf "$tmpdir"
+ trap - RETURN📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rm -f /tmp/journey-mw.* 2>/dev/null | |
| # Fire N writers on db1 AND N on the peer, all at once, all to iceonly. | |
| for k in $(seq 1 "$N"); do | |
| q "$HOST" "INSERT INTO iceonly VALUES ($((7000+k)),date_trunc('month',now()) + interval '3 months' + interval '$k hours','mw','{}');" >"/tmp/journey-mw.a$k" 2>&1 & | |
| pids+=("$!") | |
| q "$p1" "INSERT INTO iceonly VALUES ($((8000+k)),date_trunc('month',now()) + interval '3 months' + interval '$k hours 30 minutes','mw','{}');" >"/tmp/journey-mw.b$k" 2>&1 & | |
| pids+=("$!") | |
| done | |
| for k in "${pids[@]}"; do wait "$k"; done | |
| assert_eq "no multi-writer-per-node cross-node cold writer errored (no 409)" "0" \ | |
| "$(cat /tmp/journey-mw.* 2>/dev/null | grep -cEi 'error|conflict|409')" | |
| assert_eq "all $((2 * N)) multi-writer-per-node cross-node cold writes landed" "$((2 * N))" \ | |
| "$(q "$HOST" "SELECT count(*) FROM iceonly WHERE status='mw';")" | |
| rm -f /tmp/journey-mw.* 2>/dev/null | |
| local tmpdir | |
| tmpdir="$(mktemp -d "${TMPDIR:-/tmp}/journey-mw.XXXXXX")" || { fail "mesh: mktemp failed"; return; } | |
| trap 'rm -rf "$tmpdir"' RETURN | |
| # Fire N writers on db1 AND N on the peer, all at once, all to iceonly. | |
| for k in $(seq 1 "$N"); do | |
| q "$HOST" "INSERT INTO iceonly VALUES ($((7000+k)),date_trunc('month',now()) + interval '3 months' + interval '$k hours','mw','{}');" >"$tmpdir/a$k" 2>&1 & | |
| pids+=("$!") | |
| q "$p1" "INSERT INTO iceonly VALUES ($((8000+k)),date_trunc('month',now()) + interval '3 months' + interval '$k hours 30 minutes','mw','{}');" >"$tmpdir/b$k" 2>&1 & | |
| pids+=("$!") | |
| done | |
| for k in "${pids[@]}"; do wait "$k"; done | |
| assert_eq "no multi-writer-per-node cross-node cold writer errored (no 409)" "0" \ | |
| "$(cat "$tmpdir"/* 2>/dev/null | grep -cEi 'error|conflict|409')" | |
| assert_eq "all $((2 * N)) multi-writer-per-node cross-node cold writes landed" "$((2 * N))" \ | |
| "$(q "$HOST" "SELECT count(*) FROM iceonly WHERE status='mw';")" | |
| rm -rf "$tmpdir" | |
| trap - RETURN |
🧰 Tools
🪛 ast-grep (0.44.0)
[warning] 1155-1155: Writing to or reading from a hardcoded, predictable path under /tmp is vulnerable to symlink and TOCTOU attacks: a local attacker can pre-create the file (or a symlink pointing elsewhere) and hijack or corrupt the contents. Generate a unique, unpredictable temporary file with mktemp instead, e.g. tmpfile="$(mktemp)" (or mktemp -d for directories) and reference "$tmpfile".
Context: "/tmp/journey-mw.a$k"
Note: [CWE-377] Insecure Temporary File.
(predictable-tmp-file-bash)
[warning] 1157-1157: Writing to or reading from a hardcoded, predictable path under /tmp is vulnerable to symlink and TOCTOU attacks: a local attacker can pre-create the file (or a symlink pointing elsewhere) and hijack or corrupt the contents. Generate a unique, unpredictable temporary file with mktemp instead, e.g. tmpfile="$(mktemp)" (or mktemp -d for directories) and reference "$tmpfile".
Context: "/tmp/journey-mw.b$k"
Note: [CWE-377] Insecure Temporary File.
(predictable-tmp-file-bash)
[warning] 1162-1162: Writing to or reading from a hardcoded, predictable path under /tmp is vulnerable to symlink and TOCTOU attacks: a local attacker can pre-create the file (or a symlink pointing elsewhere) and hijack or corrupt the contents. Generate a unique, unpredictable temporary file with mktemp instead, e.g. tmpfile="$(mktemp)" (or mktemp -d for directories) and reference "$tmpfile".
Context: /tmp/journey-mw.*
Note: [CWE-377] Insecure Temporary File.
(predictable-tmp-file-bash)
[warning] 1165-1165: Writing to or reading from a hardcoded, predictable path under /tmp is vulnerable to symlink and TOCTOU attacks: a local attacker can pre-create the file (or a symlink pointing elsewhere) and hijack or corrupt the contents. Generate a unique, unpredictable temporary file with mktemp instead, e.g. tmpfile="$(mktemp)" (or mktemp -d for directories) and reference "$tmpfile".
Context: /tmp/journey-mw.*
Note: [CWE-377] Insecure Temporary File.
(predictable-tmp-file-bash)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ci/journey.sh` around lines 1153 - 1166, Replace the predictable
/tmp/journey-mw.* per-writer log files in the multi-writer test block with files
created inside a fresh mktemp -d directory, and register a trap to remove that
directory on exit. Update the q command redirects and the later cat/rm cleanup
in the same journey.sh test section so they reference the mktemp-managed paths
rather than fixed /tmp names.
Source: Linters/SAST tools
| -- Same-node serialization: hold a node-local advisory xact lock for this | ||
| -- iceberg table across the whole claim+commit, so at most ONE cold writer | ||
| -- per node is in the bakery at once. That keeps per-node concurrency at 1 | ||
| -- (the topology Bakery_v2 proves safe), so the cross-node Ricart-Agrawala | ||
| -- ack/defer only ever arbitrates a single same-node claim. Cross-node | ||
| -- writers are unaffected (advisory locks are instance-local); in the async | ||
| -- path this runs after the parquet upload, so same-node uploads pipeline. | ||
| PERFORM pg_advisory_xact_lock(hashtext('coldfront_iceberg:' || p_iceberg_table)); |
There was a problem hiding this comment.
🚀 Performance & Scalability | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the surrounding code and any related advisory-lock usage.
FILE="extension/coldfront/coldfront--1.0.sql"
sed -n '2045,2090p' "$FILE"
printf '\n--- related usages ---\n'
rg -n "pg_advisory_xact_lock|hashtext\\(" extension/coldfront -SRepository: pgEdge/coldfront
Length of output: 4643
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check PostgreSQL docs already present in repo, if any, for hashtext/advisory lock semantics.
rg -n "hashtext|pg_advisory_xact_lock|advisory lock" -S README* docs extension 2>/dev/null || trueRepository: pgEdge/coldfront
Length of output: 4075
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="extension/coldfront/coldfront--1.0.sql"
sed -n '2045,2090p' "$FILE"
printf '\n--- related usages ---\n'
rg -n "pg_advisory_xact_lock|hashtext\\(" extension/coldfront -SRepository: pgEdge/coldfront
Length of output: 4643
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "hashtext|pg_advisory_xact_lock|advisory lock" -S README* docs extension 2>/dev/null || trueRepository: pgEdge/coldfront
Length of output: 4075
🌐 Web query:
PostgreSQL hashtextextended pg_advisory_xact_lock bigint advisory lock 64-bit hash function docs
💡 Result:
The PostgreSQL function hashtextextended is a built-in function that generates a 64-bit hash (bigint) from text input [1][2]. It is often used in conjunction with pg_advisory_xact_lock to derive a 64-bit integer lock key from a string value [3][4]. Key Technical Details: - Function Signature: hashtextextended(text, bigint) returns bigint [1]. The second argument serves as a seed for the hash computation [5][6]. - Purpose: It allows users to convert arbitrary text into a 64-bit integer, which is required by PostgreSQL advisory lock functions like pg_advisory_xact_lock [7][8]. - Behavior: The function is collation-aware [5][6]. For deterministic collations, it hashes the text directly using hash_any_extended [6][9]. For non-deterministic collations, it performs a transformation (using pg_strnxfrm) before hashing the resulting byte sequence [5][6]. - Limitations/Risk: While hashtextextended provides a convenient 64-bit hash, it is subject to hash collisions [3][10]. Because it maps an effectively infinite input space (text) to a finite 64-bit space, there is a non-zero probability that different strings will produce the same hash value, potentially causing unexpected lock contention or deadlocks if multiple distinct resources are mapped to the same key [3][4]. When using this or any hashing approach for advisory locks: - Collision Handling: Users should be aware that hash collisions can occur [3][4]. If absolute uniqueness is required, developers may need to implement a strategy to handle collisions, such as using a composite key or secondary verification [4][10]. - Lock Ordering: To avoid deadlocks when acquiring multiple advisory locks, it is common practice to sort the keys before acquisition [3][4]. Sorting by the resulting hash values is a frequent strategy, though it does not eliminate the underlying risk of hash collisions [4]. - Alternative Approaches: Some developers use other hashing methods, such as MD5 (truncated to 64 bits) [11][12] or non-cryptographic 64-bit hashes like xxHash or FNV, to derive advisory lock keys depending on their specific requirements for distribution, speed, or collision resistance [13][14].
Citations:
- 1: https://www.postgresql.org/message-id/9434.1568839177%40sss.pgh.pa.us
- 2: https://doxygen.postgresql.org/hashfunc_8c_source.html
- 3: https://www.postgresql.org/message-id/b461c689-1d2a-4e20-be33-e75a439d5823%40posteo.de
- 4: https://www.postgresql.org/message-id/777cfa1c-fd1f-479f-b9b8-217b0f7a40b7%40posteo.de
- 5: https://ryogrid.github.io/create_pg_super_document/h/hashtextextended.html
- 6: https://github.com/postgres/postgres/blob/master/src/backend/access/hash/hashfunc.c
- 7: https://www.postgresql.org/docs/9.1/functions-admin.html
- 8: https://vladmihalcea.com/how-do-postgresql-advisory-locks-work/
- 9: https://doxygen.postgresql.org/hashfunc_8c.html
- 10: https://www.postgresql.org/message-id/71f6fa8c-de7b-4af0-a952-1bd4db4e3007%40posteo.de
- 11: https://softwareengineering.stackexchange.com/questions/403955/using-postgresql-md5-hash-to-calculate-a-64-bit-hash-value-for-advisory-lock-fun
- 12: https://klapacz.dev/blog/0001-solving-concurrency-issues-with-postgresql-advisory-locks/
- 13: https://docs.rs/entelix-persistence/latest/entelix_persistence/advisory_key/index.html
- 14: https://riverqueue.com/blog/uniqueness-with-advisory-locks
🌐 Web query:
PostgreSQL advisory lock bigint and hashtextextended documentation
💡 Result:
PostgreSQL advisory locks and the hashtextextended function serve distinct purposes in managing application-level concurrency and data hashing, respectively. PostgreSQL Advisory Locks Advisory locks allow you to define application-specific locks that the database engine tracks but does not automatically enforce on rows or tables [1]. These locks are identified by a key, which can be either a single 64-bit bigint or a pair of 32-bit integers [2][1]. The two key spaces (single bigint vs. two-int) are independent and do not overlap [2][3]. Key concepts include: - Scope: Locks can be session-level (held until explicitly released or the session ends) or transaction-level (automatically released when the transaction finishes) [4][5]. - Reentrancy: Locks are reentrant; if a session acquires the same lock multiple times, it must release it the same number of times to fully unlock it [3][5]. - Functions: Key functions include pg_advisory_lock(key), pg_try_advisory_lock(key) (returns boolean, non-blocking), and their transaction-level equivalents (e.g., pg_advisory_xact_lock(key)) [4][2]. hashtextextended Function The hashtextextended function is a built-in PostgreSQL tool used to generate a 64-bit hash value from text data, which is useful when you need to map complex data types into the bigint format required by many advisory lock functions [6]. Key features include: - Signature: It accepts two arguments: the text to be hashed and a 64-bit seed value (int64) [7][6]. - Seed-based hashing: The seed allows for creating different hash values for the same input, which is beneficial for avoiding collisions or managing hash table resizing [7]. - Compatibility: When the seed is 0, the low 32 bits of the 64-bit hash often align with the output of the standard 32-bit hashtext function [6]. - Usage: It is commonly used to generate a unique 64-bit identifier for an arbitrary string resource so that it can be passed to pg_advisory_lock(bigint) [8][6]. When using these together, you can hash an arbitrary string resource into a 64-bit bigint using hashtextextended(your_string, 0), then pass that result into an advisory lock function to manage concurrency for that specific resource [8][6].
Citations:
- 1: https://www.postgresscripts.com/post/postgresql-advisory-locks-with-pg-advisory-lock/
- 2: https://flaviodelgrosso.com/blog/postgresql-advisory-locks
- 3: https://www.postgresql.org/docs/9.1/functions-admin.html
- 4: https://vladmihalcea.com/how-do-postgresql-advisory-locks-work/
- 5: https://www.postgresql.org/docs/current/explicit-locking.html
- 6: https://stackoverflow.com/questions/9809381/hashing-a-string-to-a-numeric-value-in-postgresql
- 7: https://ryogrid.github.io/create_pg_super_document/h/hashtextextended.html
- 8: https://dba.stackexchange.com/questions/198334/postgres-advisory-lock-when-key-doesnt-fit
Use a 64-bit advisory-lock key here. hashtext() is 32-bit, so different p_iceberg_table values can alias and serialize unrelated tables. hashtextextended(..., 0) fits pg_advisory_xact_lock(bigint) and cuts the collision risk substantially.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@extension/coldfront/coldfront--1.0.sql` around lines 2065 - 2072, The
advisory lock in the coldfront claim/commit path is using a 32-bit key via
hashtext(), which can collide across different p_iceberg_table values and
incorrectly serialize unrelated tables. Update the lock acquisition in the
same-node serialization section to use a 64-bit key with hashtextextended(...,
0) and pass that result to pg_advisory_xact_lock(bigint), keeping the existing
behavior of holding the lock for the whole claim+commit flow.
a49656e to
971231b
Compare
On a Spock mesh the bakery serialized cold writers per node via the claim's same-node ordering, but with more than one writer per node on 2+ nodes the cross-node ack/defer could let two commits reach Lakekeeper at once (CatalogCommitConflicts 409). _claim_iceberg_lock now takes a node-local advisory xact lock before claiming, so at most one cold writer per node is in the bakery at a time, which is the one-claim-per-node topology Bakery_v2 already proves.
Adds a mesh reproducer (story_mesh_multiwriter, red before / green after) and a deployment-refinement note in the model. Verified: decoupled mesh journey 41/0, TLC safe config No error, race config still violates.
Fixes #35