Skip to content

Speed up delta calculation on schema insert/update#767

Merged
jviotti merged 2 commits intomainfrom
optimisations-delta
Mar 20, 2026
Merged

Speed up delta calculation on schema insert/update#767
jviotti merged 2 commits intomainfrom
optimisations-delta

Conversation

@jviotti
Copy link
Member

@jviotti jviotti commented Mar 20, 2026

Signed-off-by: Juan Cruz Viotti jv@jviotti.com

Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
@augmentcode
Copy link

augmentcode bot commented Mar 20, 2026

🤖 Augment PR Summary

Summary: This PR optimizes BuildState::save() to avoid rebuilding the entire on-disk state when only a small set of schema outputs/resolver entries change.

Changes:

  • Adds a can_patch mode that reuses the existing hash table capacity and copies the prior slot table, applying deletions in-place.
  • Detects which overlay/resolver overlay keys are updates vs inserts, and updates existing slots rather than rewriting all disk entries.
  • Introduces combined “old pool + appended pool” offsets so new strings can be appended without rewriting the old string pool.
  • Speeds up persisted schema-index generation by reusing the previously persisted schema table and recomputing only affected schemas when patching.
  • Writes the state to a temporary file and renames it into place, reducing risk of partial writes.

Technical Notes: Correctness now depends on careful handling of slot key/data offsets across two pools (old mapped pool and newly appended pool) and on keeping the persisted schema index consistent with incremental deletions/updates.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 4 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

while (slots[index * SLOT_SIZE + SLOT_OCCUPIED] != 0) {
const auto *slot{slots.data() + index * SLOT_SIZE};
if (read_field<std::uint64_t>(slot, SLOT_HASH) == hash &&
slot_key(slot, this->string_pool) == entry_key) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In write_new_slot’s update probe, slot_key(slot, this->string_pool) assumes all probed slots’ keys live in the old pool; after earlier updates/inserts in this save, a probed slot can have a key offset into the appended pool, which can lead to invalid reads if a hash collision forces a key-compare.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

const auto new_offset{key_offset - old_pool_size};
return {pool.data() + new_offset, key_length};
}
if (this->string_pool != nullptr) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_slot_key falls back to this->string_pool even when can_patch is false; in the rebuild path, slot key offsets refer to the newly-built pool, so this can produce incorrect keys or out-of-bounds reads during schema index generation.

Severity: high

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

const std::uint8_t *base;
if (can_patch && offset >= old_pool_size) {
base =
reinterpret_cast<const std::uint8_t *>(pool.data()) - old_pool_size;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_slot_pool_string computes base = pool.data() - old_pool_size, which forms a pointer before the start of the pool allocation; even if the later base+offset lands back in-range, creating/using an out-of-bounds base pointer is undefined behavior in C++.

Severity: high

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

const auto sentinel_position{after.find(sentinel_marker)};
if (sentinel_position == std::string_view::npos) {
continue;
save_schema_index.erase(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When patching from persisted_schema_table, this erases the entire schema record for any deleted target under that schema; if a single per-schema artifact is removed (but others remain), the persisted schema index may incorrectly drop the whole schema and later consumers could treat it as missing rather than “present but missing targets”.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Index (community)

Details
Benchmark suite Current: fbe2b46 Previous: 3fb41b6 Ratio
Add one schema (0 existing) 21 ms 21 ms 1
Add one schema (100 existing) 25 ms 26 ms 0.96
Add one schema (1000 existing) 75 ms 86 ms 0.87
Add one schema (10000 existing) 623 ms 686 ms 0.91
Update one schema (1 existing) 18 ms 19 ms 0.95
Update one schema (101 existing) 26 ms 28 ms 0.93
Update one schema (1001 existing) 75 ms 82 ms 0.91
Update one schema (10001 existing) 634 ms 637 ms 1.00
Cached rebuild (1 existing) 10 ms 11 ms 0.91
Cached rebuild (101 existing) 12 ms 12 ms 1
Cached rebuild (1001 existing) 27 ms 26 ms 1.04
Cached rebuild (10001 existing) 195 ms 188 ms 1.04
Index 100 schemas 116 ms 118 ms 0.98
Index 1000 schemas 965 ms 1154 ms 0.84
Index 10000 schemas 13462 ms 14786 ms 0.91

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Index (enterprise)

Details
Benchmark suite Current: fbe2b46 Previous: 3fb41b6 Ratio
Add one schema (0 existing) 23 ms 23 ms 1
Add one schema (100 existing) 29 ms 29 ms 1
Add one schema (1000 existing) 82 ms 84 ms 0.98
Add one schema (10000 existing) 691 ms 700 ms 0.99
Update one schema (1 existing) 21 ms 21 ms 1
Update one schema (101 existing) 28 ms 28 ms 1
Update one schema (1001 existing) 81 ms 83 ms 0.98
Update one schema (10001 existing) 666 ms 620 ms 1.07
Cached rebuild (1 existing) 12 ms 12 ms 1
Cached rebuild (101 existing) 14 ms 14 ms 1
Cached rebuild (1001 existing) 33 ms 28 ms 1.18
Cached rebuild (10001 existing) 233 ms 193 ms 1.21
Index 100 schemas 124 ms 127 ms 0.98
Index 1000 schemas 1065 ms 1069 ms 1.00
Index 10000 schemas 13988 ms 13933 ms 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@jviotti jviotti merged commit 780a4bc into main Mar 20, 2026
6 checks passed
@jviotti jviotti deleted the optimisations-delta branch March 20, 2026 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant