Skip to content

Fix subpar caching usage during ingestion#698

Merged
jviotti merged 1 commit intomainfrom
bad-cache
Mar 4, 2026
Merged

Fix subpar caching usage during ingestion#698
jviotti merged 1 commit intomainfrom
bad-cache

Conversation

@jviotti
Copy link
Member

@jviotti jviotti commented Mar 3, 2026

Signed-off-by: Juan Cruz Viotti jv@jviotti.com

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Index (community)

Details
Benchmark suite Current: bd2ea84 Previous: ef1287c Ratio
Add one schema (0 existing) 40 ms 41 ms 0.98
Add one schema (100 existing) 472 ms 455 ms 1.04
Add one schema (1000 existing) 4594 ms 4594 ms 1

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Index (enterprise)

Details
Benchmark suite Current: bd2ea84 Previous: ef1287c Ratio
Add one schema (0 existing) 44 ms 51 ms 0.86
Add one schema (100 existing) 473 ms 476 ms 0.99
Add one schema (1000 existing) 4771 ms 4757 ms 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@jviotti jviotti changed the title [WIP] Fix subpar caching usage during ingestion Fix subpar caching usage during ingestion Mar 3, 2026
@jviotti jviotti marked this pull request as ready for review March 3, 2026 20:37
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 4 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/index/index.cc">

<violation number="1" location="src/index/index.cc:811">
P2: Built artifacts are only hardlinked back to staging when missing, so stale existing files are left behind and can cause repeated cache misses on subsequent runs.</violation>
</file>

<file name="src/index/output.h">

<violation number="1" location="src/index/output.h:89">
P1: `files()` exposes `tracker` by reference without synchronization, allowing unsynchronized reads during concurrent writes and causing undefined behavior.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@augmentcode
Copy link

augmentcode bot commented Mar 3, 2026

🤖 Augment PR Summary

Summary: This PR improves cache tracking during one index runs that use a staging directory, aiming to avoid unnecessary rebuilds on subsequent runs.

Changes:

  • Introduces Output::FileStatus (Unseen/Skipped/Built) to distinguish cached vs rebuilt outputs.
  • Updates the indexing dispatch path to explicitly track both generated targets and their .deps files as built or skipped.
  • Adds a post-atomic_directory_swap step that hardlinks files built in the current run from the new committed output back into the staging directory to improve cache hits.
  • Exposes tracked outputs via Output::files() to drive the hardlinking step.
  • Adds a new CLI regression test (rebuild-two-to-three) to validate full cache hits after expanding the schema set.
  • Registers the new test in the CLI CMake test list.

Technical Notes: The approach relies on hardlinking (not copying) to keep staging in sync with the committed output for incremental runs while preserving the atomic commit behavior.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

const auto relative_path{entry.first.lexically_relative(staging_path)};
const auto source{final_output_path / relative_path};
if (std::filesystem::is_regular_file(source) &&
!std::filesystem::exists(entry.first)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The post-swap hardlink step only runs when !std::filesystem::exists(entry.first), so targets rebuilt during this run (status Built but already present in the old output) will remain stale in staging_path and may still miss cache hits on the next run. Is it intentional to only backfill new files rather than also refreshing existing ones that were rebuilt?

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

return match == this->tracker.cend() || match->second == FileStatus::Unseen;
}

auto files() const
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output::files() returns tracker without taking tracker_mutex; since track() is called from multiple threads, calling files() before all work finishes would be undefined behavior (similar to the note in is_untracked_file). Consider guarding this accessor or returning a snapshot to make it safe to use in the presence of concurrent tracking.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@jviotti jviotti force-pushed the bad-cache branch 2 times, most recently from 9ba7bfc to db50ab6 Compare March 4, 2026 00:05
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
@jviotti jviotti merged commit db1a7c6 into main Mar 4, 2026
5 checks passed
@jviotti jviotti deleted the bad-cache branch March 4, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant