Search: Add content_last_updated for content-only change tracking by reakaleek · Pull Request #3016 · elastic/docs-builder

reakaleek · 2026-04-01T22:53:40Z

What

Add a content_last_updated field to DocumentationDocument that only advances when page content (stripped_body) actually changes, as opposed to last_updated which advances on any change (metadata, navigation, mappings).

Why

The sitemap and search features need to know when content was last meaningfully updated, not when metadata like navigation ordering or synonym configs changed. last_updated conflates content and metadata changes.

How

New content_hash field: whitespace-normalized SHA-256 hash of stripped_body (survives reformatting)
New content_last_updated field: only advances when content_hash changes
Persistent lookup index (docs-{type}-content-dates-{env}) stores {url, content_hash, content_last_updated} — survives index rollovers since it's a fixed-name index independent of the document index lifecycle
Lookup loads in parallel with orchestrator start to minimize latency
Sitemap generation now uses content_last_updated instead of last_updated

Test plan

ContentHash.CreateNormalized unit tests (whitespace normalization)
DocumentationDocument serialization round-trip test with new fields
All 2,970 unit tests pass
Manual: run export twice with same content → content_last_updated unchanged; modify a page → it advances; modify only navigation → it stays the same

🤖 Generated with Claude Code

Add a new content_last_updated field to DocumentationDocument that only advances when the page content (stripped_body) actually changes, ignoring metadata-only changes like navigation reordering or mapping rollovers. Uses a persistent lookup index (docs-{type}-content-dates-{env}) to preserve timestamps across index rollovers. Content hashing normalizes whitespace so reformatting doesn't trigger false updates. Also updates the sitemap to use content_last_updated instead of last_updated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use JsonObject instead of raw string literal for index mapping to satisfy the dotnet-format JSON002 rule. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-01T22:58:38Z

Caution

Review failed

The pull request is closed.

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 04fe9f80-5773-472e-87ea-bf6163f27a16

📥 Commits

Reviewing files that changed from the base of the PR and between 7d776e3 and f990ada.

📒 Files selected for processing (11)

src/Elastic.Documentation/Search/ContentHash.cs
src/Elastic.Documentation/Search/DocumentationDocument.cs
src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs
src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.Export.cs
src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.cs
src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchOperations.cs
src/services/Elastic.Documentation.Assembler/Building/AssemblerBuildService.cs
src/services/Elastic.Documentation.Assembler/Building/EsSitemapReader.cs
src/tooling/docs-builder/Commands/Assembler/AssemblerSitemapCommand.cs
tests/Elastic.Markdown.Tests/Search/ContentHashTests.cs
tests/Elastic.Markdown.Tests/Search/DocumentationDocumentSerializationTests.cs

📝 Walkthrough

Walkthrough

Adds content-date tracking via a new public ContentDateEnrichment class that provisions and maintains a lookup index, an enrich policy, and an ingest pipeline. DocumentationDocument gains ContentLastUpdated and ContentBodyHash; ContentHash gains CreateNormalized to compute a whitespace-normalized hash. The Elasticsearch exporter computes and assigns ContentBodyHash during export flows and initializes ContentDateEnrichment before publishing synonyms/query rules. On shutdown the exporter completes ingestion orchestration then calls SyncLookupIndexAsync to persist lookup state. ElasticsearchOperations gains a ReindexAsync helper used by the enrichment sync.

Sequence Diagram

sequenceDiagram
    participant Exporter as ElasticsearchMarkdownExporter
    participant CDEnrich as ContentDateEnrichment
    participant ES as Elasticsearch
    participant Ingest as Ingest Pipeline

    Exporter->>CDEnrich: InitializeAsync()
    CDEnrich->>ES: Ensure lookup index (mappings: url, content_hash, content_last_updated)
    CDEnrich->>ES: Create enrich policy (match on url -> content_hash, content_last_updated)
    CDEnrich->>ES: Execute enrich policy
    CDEnrich->>ES: Create ingest pipeline (set timestamp, enrich, painless script)
    CDEnrich-->>Exporter: Ready

    Exporter->>Exporter: For each doc: ContentBodyHash = ContentHash.CreateNormalized(doc.StrippedBody)
    Exporter->>ES: Index document (includes content_hash)

    note over Ingest,ES: On ingestion
    Ingest->>ES: Enrich using lookup index (_content_date_lookup)
    Ingest->>Ingest: Painless script: if lookup.content_hash == ctx.content_hash then set content_last_updated = lookup.content_last_updated else set content_last_updated = ctx._ingest.timestamp
    Ingest->>ES: Finalize indexing with content_last_updated

    Exporter->>Exporter: Stop -> Complete orchestrator
    Exporter->>CDEnrich: SyncLookupIndexAsync(lexicalAlias)
    CDEnrich->>ES: Delete lookup docs (delete-by-query)
    CDEnrich->>ES: Reindex lexical -> lookup (uses ReindexAsync)
    CDEnrich->>ES: Refresh lookup & re-execute enrich policy
    CDEnrich-->>Exporter: Sync complete

Possibly related PRs

Search: Add content_last_updated for content-only change tracking #3016 — Appears to be the same feature set: adds ContentHash.CreateNormalized, DocumentationDocument.ContentLastUpdated/ContentBodyHash, wiring in exporter changes, ContentDateEnrichment, and associated tests.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 38.46% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely describes the main change: adding content_last_updated field for tracking content-only changes, which is the core objective across all modified files.
Description check	✅ Passed	The description is directly related to the changeset, clearly explaining the What, Why, and How of the content-only change tracking feature, including the persistent lookup index and sitemap updates.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch handsomely-lighter

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateLookup.cs`:
- Around line 178-179: EscapeJson currently only handles backslashes and quotes
and misses control characters; update the EscapeJson implementation in
ContentDateLookup (the EscapeJson method) to use a proper JSON escaper such as
System.Text.Json: either return JsonSerializer.Serialize(value).Trim('"') or use
JsonEncodedText.Encode(value).ToString() so all control characters and Unicode
are correctly escaped; replace the manual Replace chain with one of these calls
in the EscapeJson method.
- Around line 139-146: The Resolve method uses the non-thread-safe fields
_existing and _changed from ContentDateLookup while being called concurrently;
replace both Dictionary usages with
System.Collections.Concurrent.ConcurrentDictionary instances, update any
TryGetValue/_existing lookups to use ConcurrentDictionary.TryGetValue (or
TryGetValue remains but on the concurrent type), and replace the assignment
_changed[url] = (newContentHash, batchTimestamp) with a thread-safe update
(e.g., TryAdd/AddOrUpdate) so Resolve and any other methods (referencing
_existing, _changed, Resolve) are safe under Parallel.ForEachAsync.

In
`@src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.cs`:
- Around line 199-210: The current pattern starts _orchestrator.StartAsync(...)
and then awaits loadTask first, risking an unobserved exception on
orchestratorTask if LoadAsync fails; modify the logic in
ElasticsearchMarkdownExporter (around _contentDateLookup.LoadAsync and
_orchestrator.StartAsync) to await both tasks together (e.g., use Task.WhenAll
or equivalent) so exceptions from both tasks are observed/aggregated, then
extract the orchestratorContext from the completed orchestratorTask (or its
result) to log the strategy and aliases.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ff336ae1-4274-4a37-8ae8-f5e7670f43e4

📥 Commits

Reviewing files that changed from the base of the PR and between e578fac and 86f436a.

📒 Files selected for processing (10)

src/Elastic.Documentation/Search/ContentHash.cs
src/Elastic.Documentation/Search/DocumentationDocument.cs
src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateLookup.cs
src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.Export.cs
src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.cs
src/services/Elastic.Documentation.Assembler/Building/AssemblerBuildService.cs
src/services/Elastic.Documentation.Assembler/Building/EsSitemapReader.cs
src/tooling/docs-builder/Commands/Assembler/AssemblerSitemapCommand.cs
tests/Elastic.Markdown.Tests/Search/ContentHashTests.cs
tests/Elastic.Markdown.Tests/Search/DocumentationDocumentSerializationTests.cs

src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateLookup.cs

src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.cs

- Use ConcurrentDictionary for _existing and _changed since Resolve is called from Parallel.ForEachAsync - Use JsonEncodedText.Encode for AOT-safe JSON escaping that handles control characters and Unicode - Use Task.WhenAll to observe both tasks when running lookup load and orchestrator start in parallel Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Mpdreamz

Review: content_last_updated via ContentDateLookup

The ContentHash.CreateNormalized addition, the new DocumentationDocument fields, and the unit tests are all good — those should stay regardless.

The main concern is ContentDateLookup: it drains the entire side index into a ConcurrentDictionary on startup, then bulk-upserts changes on shutdown. For the current corpus (~5–10k pages) this works, but it sets a precedent that scales poorly and duplicates infrastructure the project already has.

Hand-rolled PIT pagination

LoadAsync (lines 68–130 of ContentDateLookup.cs) manually implements PIT open / search_after / close — the exact pattern already available as PointInTimeSearch<T> in Elastic.Ingest.Elasticsearch.Helpers. The existing helper handles typed deserialization, automatic PIT lifecycle via IAsyncDisposable, sliced parallel search, and configurable page size / keep-alive / sort / query. The ~80 lines of OpenPitAsync, ClosePitAsync, BuildSearchBody, and the do/while loop could be replaced by roughly:

await using var pit = new PointInTimeSearch<ContentDateEntry>(transport, new PointInTimeSearchOptions
{
    Index = _indexName,
    Size = 1000
}, jsonOptions);

await foreach (var entry in pit.SearchDocumentsAsync(ct))
    _existing[entry.Url] = (entry.ContentHash, entry.ContentLastUpdated);

(Same observation applies to the existing EsSitemapReader — separate cleanup.)

Suggested alternative: Elasticsearch Enrich Policy

The project already uses this exact pattern for AI enrichment (AiEnrichmentOrchestrator). The same approach eliminates the in-memory drain entirely by letting Elasticsearch do the join at index time:

Lookup index — docs-{type}-content-dates-{env} stores {url, content_hash, content_last_updated} (same as now).
Enrich policy — a match policy on field url, enriching with content_hash and content_last_updated.
Ingest pipeline — an enrich processor runs on every document. A script processor compares the incoming content_hash with the enriched value: if they differ (or no match exists), it sets content_last_updated to _ingest.timestamp; otherwise it preserves the looked-up date.
Post-index save — after indexing, the application already knows every {url, content_hash, timestamp} it just wrote. Bulk-upsert those into the lookup index (a single bulk call for the whole corpus), then re-execute the enrich policy so the next run sees current data.

This is self-sufficient — no separate reconciliation scan of the main index is needed. The app tracks what it indexed during the normal export flow.

Benefits:

Zero memory overhead — no startup drain, no ConcurrentDictionary
Reuses the exact infrastructure patterns from AiEnrichmentOrchestrator (EnsureLookupIndex, EnsureEnrichPolicy, ExecuteEnrichPolicy, EnsurePipeline)
Scales to any corpus size
Stale entry cleanup follows the existing CleanupOrphanedAsync pattern (PIT-scan lookup, terms-check against main index, delete orphans)

Missing cleanup

The current PR has no mechanism to remove lookup entries for pages that have been deleted or whose URLs changed. Over time the lookup index accumulates orphans. The enrich-policy approach naturally supports cleanup via the same CleanupOrphanedAsync pattern the AI enrichment already uses.

Move content_last_updated resolution from an in-memory drain-and-compare approach to an Elasticsearch enrich policy + ingest pipeline. This eliminates the startup memory overhead of loading the entire lookup index into a ConcurrentDictionary and removes ~200 lines of hand-rolled PIT pagination. The ingest pipeline compares content hashes at index time via an enrich processor and painless script. After indexing, the lookup index is synced via reindex from the lexical index, which also implicitly cleans up orphaned entries for deleted pages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

reakaleek · 2026-04-02T08:59:53Z

61141fc

coderabbitai

🧹 Nitpick comments (2)

src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs (2)

47-56: Consider adding a refresh before policy execution to ensure reindexed documents are visible.

After ReindexToLookupAsync completes, the documents may not be immediately searchable due to the default 1-second refresh interval. The subsequent ExecutePolicyAsync might build the enrich index from stale data.

Add explicit refresh

 public async Task SyncLookupIndexAsync(string lexicalAlias, Cancel ct)
 {
     logger.LogInformation("Syncing content date lookup index {Index} from {Source}", _lookupIndex, lexicalAlias);

     await DeleteLookupContentsAsync(ct);
     await ReindexToLookupAsync(lexicalAlias, ct);
+    await RefreshLookupIndexAsync(ct);
     await ExecutePolicyAsync(ct);

     logger.LogInformation("Content date lookup sync complete");
 }
+
+private async Task RefreshLookupIndexAsync(Cancel ct)
+{
+    await operations.WithRetryAsync(
+        () => transport.PostAsync<StringResponse>($"{_lookupIndex}/_refresh", PostData.Empty, ct),
+        $"POST {_lookupIndex}/_refresh",
+        ct
+    );
+}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs` around
lines 47 - 56, SyncLookupIndexAsync may call ExecutePolicyAsync before reindexed
documents are visible; after awaiting ReindexToLookupAsync (and before
ExecutePolicyAsync) call the Elasticsearch refresh for the target lookup index
(the same index referenced by _lookupIndex) and await its completion so newly
reindexed documents are searchable; place the refresh call between
ReindexToLookupAsync(...) and ExecutePolicyAsync(...) and ensure any
client/response errors are handled/logged similarly to DeleteLookupContentsAsync
and ReindexToLookupAsync.

58-94: Infrastructure failures are logged but not propagated — documents may silently lack content_last_updated.

All setup methods (EnsureLookupIndexAsync, PutEnrichPolicyAsync, ExecutePolicyAsync, PutPipelineAsync) log warnings on failure but don't throw. If the pipeline fails to create, documents will be indexed without content_last_updated being set, and the issue may go unnoticed until production.

If this is intentional (graceful degradation), consider at minimum emitting a diagnostic error via the collector to surface it to CI/CD.

Also applies to: 96-118, 120-132, 134-180

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs` around
lines 58 - 94, The setup methods (EnsureLookupIndexAsync, PutEnrichPolicyAsync,
ExecutePolicyAsync, PutPipelineAsync) currently only log failures and swallow
errors, which can let documents be indexed without content_last_updated; change
each method to propagate failures instead of silently continuing: after the
transport/operations call check ApiCallDetails.HasSuccessfulStatusCode, and if
false either throw a descriptive exception (e.g., InvalidOperationException)
including response.ApiCallDetails.DebugInformation and the operation name
(HEAD/PUT/POST) or call the existing diagnostics/collector API (e.g.,
DiagnosticCollector.ReportError) with the same contextual details, then rethrow
or return a failed Task; keep the log entry but ensure the error is surfaced to
CI by throwing or reporting so callers of
EnsureLookupIndexAsync/PutEnrichPolicyAsync/ExecutePolicyAsync/PutPipelineAsync
cannot continue as if creation succeeded.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs`:
- Around line 47-56: SyncLookupIndexAsync may call ExecutePolicyAsync before
reindexed documents are visible; after awaiting ReindexToLookupAsync (and before
ExecutePolicyAsync) call the Elasticsearch refresh for the target lookup index
(the same index referenced by _lookupIndex) and await its completion so newly
reindexed documents are searchable; place the refresh call between
ReindexToLookupAsync(...) and ExecutePolicyAsync(...) and ensure any
client/response errors are handled/logged similarly to DeleteLookupContentsAsync
and ReindexToLookupAsync.
- Around line 58-94: The setup methods (EnsureLookupIndexAsync,
PutEnrichPolicyAsync, ExecutePolicyAsync, PutPipelineAsync) currently only log
failures and swallow errors, which can let documents be indexed without
content_last_updated; change each method to propagate failures instead of
silently continuing: after the transport/operations call check
ApiCallDetails.HasSuccessfulStatusCode, and if false either throw a descriptive
exception (e.g., InvalidOperationException) including
response.ApiCallDetails.DebugInformation and the operation name (HEAD/PUT/POST)
or call the existing diagnostics/collector API (e.g.,
DiagnosticCollector.ReportError) with the same contextual details, then rethrow
or return a failed Task; keep the log entry but ensure the error is surfaced to
CI by throwing or reporting so callers of
EnsureLookupIndexAsync/PutEnrichPolicyAsync/ExecutePolicyAsync/PutPipelineAsync
cannot continue as if creation succeeded.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1293e388-205f-4210-b83d-a088d2ee48e0

📥 Commits

Reviewing files that changed from the base of the PR and between 541007d and 61141fc.

📒 Files selected for processing (4)

src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs
src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.Export.cs
src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.cs
src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchOperations.cs

Throw InvalidOperationException on setup failures (index creation, enrich policy, pipeline) instead of logging warnings and continuing silently. This ensures CI fails fast when infrastructure setup is broken rather than indexing documents without content_last_updated. Add an explicit index refresh between reindex and enrich policy execution in SyncLookupIndexAsync so newly reindexed documents are visible when the policy snapshots the lookup index. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs`:
- Around line 157-168: The script in ContentDateEnrichment currently references
ctx._ingest.timestamp which is not available inside script processors; update
the processor so the script uses metadata().now for the fallback timestamp
(i.e., inside the painless script in the script processor replace
ctx._ingest.timestamp usage with metadata().now), or alternatively remove the
fallback timestamp assignment from the script and add a subsequent set processor
that assigns content_last_updated = {{_ingest.timestamp}} when needed; locate
this change in the JsonObject assigned to the "script" for the script processor
in ContentDateEnrichment and apply one of the two fixes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 176e9e68-c3a8-4c81-a1df-b33c4d5afa90

📥 Commits

Reviewing files that changed from the base of the PR and between 61141fc and 3899203.

📒 Files selected for processing (1)

src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs

src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs

Replace ctx._ingest.timestamp (not available in Painless script processors) with a set processor using Mustache {{{_ingest.timestamp}}}. The set processor pre-sets content_last_updated to the ingest timestamp, and the script processor only overwrites it when the enrich lookup finds a matching content hash. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs`:
- Around line 207-237: The DeleteLookupContentsAsync and ReindexToLookupAsync
methods currently await operations.DeleteByQueryAsync and
operations.ReindexAsync but ignore the start task result, allowing silent
failures when PostAsyncTaskAsync returns null; update both methods to capture
the returned value (the task id/response) from DeleteByQueryAsync and
ReindexAsync, check if that value is null (or indicates failure), and throw a
descriptive exception (e.g., InvalidOperationException) including context like
the operation name and target index (referencing DeleteLookupContentsAsync,
ReindexToLookupAsync, DeleteByQueryAsync, ReindexAsync and the
PostAsyncTaskAsync result/taskId) so callers can detect and handle failed start
of the async operation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c810e016-be94-4171-a2ef-4d1e83c9f05e

📥 Commits

Reviewing files that changed from the base of the PR and between 3899203 and fc7c5cc.

📒 Files selected for processing (1)

src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs

src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs

DeleteByQueryAsync, ReindexAsync, and UpdateByQueryAsync now throw InvalidOperationException when PostAsyncTaskAsync returns null instead of silently skipping the poll. These are wait-for-completion methods where callers expect the operation to have succeeded on return. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-14T08:06:45Z

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
@coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
@coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve merge conflict to automatically resolve merge conflicts.
@coderabbitai autofix to automatically fix issues identified in unresolved review comments.
@coderabbitai resolve to resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, support, documentation and community

Visit our status page to check the current availability of CodeRabbit.
Create a ticket on our support page for assistance with any issues or questions.
Visit our documentation site for detailed information on how to use CodeRabbit.
Join our Discord community to connect with other users and get help from the community.
Follow us on X/Twitter for updates and announcements.

reakaleek · 2026-04-14T08:07:35Z

@coderabbitai review

coderabbitai · 2026-04-14T08:07:40Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

reakaleek · 2026-04-14T08:08:15Z

@coderabbitai full review

coderabbitai · 2026-04-14T08:08:22Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs`:
- Around line 51-52: The current sequence calls DeleteLookupContentsAsync which
removes the only durable lookup before ReindexToLookupAsync writes a
replacement; instead, change the flow to reindex into a new staging index (e.g.,
derive a staging name from lexicalAlias), run the full copy and refresh against
that staging index, verify success, then atomically swap the lexicalAlias to
point to the new index and only then delete the old lookup; update usages in
InitializeAsync to continue referencing the alias so the swap is transparent and
ensure ReindexToLookupAsync and DeleteLookupContentsAsync are replaced or
refactored to support staging + alias-swap semantics.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: eee48ab1-51b6-48f2-adfa-2f64d4511ffb

📥 Commits

Reviewing files that changed from the base of the PR and between 3899203 and f990ada.

📒 Files selected for processing (4)

src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs
src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchOperations.cs
src/services/Elastic.Documentation.Assembler/Building/AssemblerBuildService.cs
src/tooling/docs-builder/Commands/Assembler/AssemblerSitemapCommand.cs

✅ Files skipped from review due to trivial changes (2)

src/services/Elastic.Documentation.Assembler/Building/AssemblerBuildService.cs
src/tooling/docs-builder/Commands/Assembler/AssemblerSitemapCommand.cs

🚧 Files skipped from review as they are similar to previous changes (1)

src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchOperations.cs

coderabbitai · 2026-04-14T08:15:05Z

src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs

+		await DeleteLookupContentsAsync(ct);
+		await ReindexToLookupAsync(lexicalAlias, ct);


⚠️ Potential issue | 🟠 Major

Don’t rebuild the persisted lookup in place.

Line 51 deletes the only durable content-date state before Line 52 has produced a replacement. If shutdown is cancelled or the reindex fails partway through, the lookup index stays empty/partial, and the next InitializeAsync will execute the enrich policy against that truncated data set. Unchanged pages will then get a fresh content_last_updated. Rebuild into a staging index and swap it in only after the copy and refresh succeed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs` around lines 51 - 52, The current sequence calls DeleteLookupContentsAsync which removes the only durable lookup before ReindexToLookupAsync writes a replacement; instead, change the flow to reindex into a new staging index (e.g., derive a staging name from lexicalAlias), run the full copy and refresh against that staging index, verify success, then atomically swap the lexicalAlias to point to the new index and only then delete the old lookup; update usages in InitializeAsync to continue referencing the alias so the swap is transparent and ensure ReindexToLookupAsync and DeleteLookupContentsAsync are replaced or refactored to support staging + alias-swap semantics.

reakaleek and others added 2 commits April 2, 2026 00:50

Search: Fix JSON002 lint error in ContentDateLookup

86f436a

Use JsonObject instead of raw string literal for index mapping to satisfy the dotnet-format JSON002 rule. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

reakaleek requested a review from a team as a code owner April 1, 2026 22:53

reakaleek requested a review from Mpdreamz April 1, 2026 22:53

reakaleek added the enhancement label Apr 1, 2026

coderabbitai bot added feature and removed enhancement labels Apr 1, 2026

coderabbitai bot requested changes Apr 1, 2026

View reviewed changes

coderabbitai bot approved these changes Apr 2, 2026

View reviewed changes

Mpdreamz requested changes Apr 2, 2026

View reviewed changes

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

reakaleek requested a review from Mpdreamz April 2, 2026 09:11

coderabbitai bot requested changes Apr 2, 2026

View reviewed changes

src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs Show resolved Hide resolved

coderabbitai bot requested changes Apr 2, 2026

View reviewed changes

src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs Show resolved Hide resolved

reakaleek and others added 3 commits April 2, 2026 17:46

Merge branch 'main' into handsomely-lighter

df2a386

Merge branch 'main' into handsomely-lighter

05f2c68

Mpdreamz approved these changes Apr 8, 2026

View reviewed changes

Merge branch 'main' into handsomely-lighter

f990ada

coderabbitai bot requested changes Apr 14, 2026

View reviewed changes

reakaleek merged commit 36e15c1 into main Apr 14, 2026
28 checks passed

reakaleek deleted the handsomely-lighter branch April 14, 2026 08:15

reakaleek mentioned this pull request Apr 14, 2026

Changes API: Use content_last_updated instead of last_updated #3101

Merged

		await DeleteLookupContentsAsync(ct);
		await ReindexToLookupAsync(lexicalAlias, ct);

Conversation

reakaleek commented Apr 1, 2026

What

Why

How

Test plan

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Reviews paused

Walkthrough

Sequence Diagram

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mpdreamz left a comment

Choose a reason for hiding this comment

Review: content_last_updated via ContentDateLookup

Hand-rolled PIT pagination

Suggested alternative: Elasticsearch Enrich Policy

Missing cleanup

Uh oh!

reakaleek commented Apr 2, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Apr 14, 2026

Chat

CodeRabbit commands

Other keywords and placeholders

Status, support, documentation and community

Uh oh!

reakaleek commented Apr 14, 2026

Uh oh!

coderabbitai bot commented Apr 14, 2026

Uh oh!

reakaleek commented Apr 14, 2026

Uh oh!

coderabbitai bot commented Apr 14, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Apr 1, 2026 •

edited

Loading