Skip to content

Search: Add content_last_updated for content-only change tracking#3016

Merged
reakaleek merged 10 commits intomainfrom
handsomely-lighter
Apr 14, 2026
Merged

Search: Add content_last_updated for content-only change tracking#3016
reakaleek merged 10 commits intomainfrom
handsomely-lighter

Conversation

@reakaleek
Copy link
Copy Markdown
Member

What

Add a content_last_updated field to DocumentationDocument that only advances when page content (stripped_body) actually changes, as opposed to last_updated which advances on any change (metadata, navigation, mappings).

Why

The sitemap and search features need to know when content was last meaningfully updated, not when metadata like navigation ordering or synonym configs changed. last_updated conflates content and metadata changes.

How

  • New content_hash field: whitespace-normalized SHA-256 hash of stripped_body (survives reformatting)
  • New content_last_updated field: only advances when content_hash changes
  • Persistent lookup index (docs-{type}-content-dates-{env}) stores {url, content_hash, content_last_updated} — survives index rollovers since it's a fixed-name index independent of the document index lifecycle
  • Lookup loads in parallel with orchestrator start to minimize latency
  • Sitemap generation now uses content_last_updated instead of last_updated

Test plan

  • ContentHash.CreateNormalized unit tests (whitespace normalization)
  • DocumentationDocument serialization round-trip test with new fields
  • All 2,970 unit tests pass
  • Manual: run export twice with same content → content_last_updated unchanged; modify a page → it advances; modify only navigation → it stays the same

🤖 Generated with Claude Code

reakaleek and others added 2 commits April 2, 2026 00:50
Add a new content_last_updated field to DocumentationDocument that only
advances when the page content (stripped_body) actually changes, ignoring
metadata-only changes like navigation reordering or mapping rollovers.

Uses a persistent lookup index (docs-{type}-content-dates-{env}) to
preserve timestamps across index rollovers. Content hashing normalizes
whitespace so reformatting doesn't trigger false updates.

Also updates the sitemap to use content_last_updated instead of
last_updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use JsonObject instead of raw string literal for index mapping
to satisfy the dotnet-format JSON002 rule.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@reakaleek reakaleek requested a review from a team as a code owner April 1, 2026 22:53
@reakaleek reakaleek requested a review from Mpdreamz April 1, 2026 22:53
@coderabbitai coderabbitai bot added feature and removed enhancement labels Apr 1, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

Caution

Review failed

The pull request is closed.

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 04fe9f80-5773-472e-87ea-bf6163f27a16

📥 Commits

Reviewing files that changed from the base of the PR and between 7d776e3 and f990ada.

📒 Files selected for processing (11)
  • src/Elastic.Documentation/Search/ContentHash.cs
  • src/Elastic.Documentation/Search/DocumentationDocument.cs
  • src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs
  • src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.Export.cs
  • src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.cs
  • src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchOperations.cs
  • src/services/Elastic.Documentation.Assembler/Building/AssemblerBuildService.cs
  • src/services/Elastic.Documentation.Assembler/Building/EsSitemapReader.cs
  • src/tooling/docs-builder/Commands/Assembler/AssemblerSitemapCommand.cs
  • tests/Elastic.Markdown.Tests/Search/ContentHashTests.cs
  • tests/Elastic.Markdown.Tests/Search/DocumentationDocumentSerializationTests.cs

📝 Walkthrough

Walkthrough

Adds content-date tracking via a new public ContentDateEnrichment class that provisions and maintains a lookup index, an enrich policy, and an ingest pipeline. DocumentationDocument gains ContentLastUpdated and ContentBodyHash; ContentHash gains CreateNormalized to compute a whitespace-normalized hash. The Elasticsearch exporter computes and assigns ContentBodyHash during export flows and initializes ContentDateEnrichment before publishing synonyms/query rules. On shutdown the exporter completes ingestion orchestration then calls SyncLookupIndexAsync to persist lookup state. ElasticsearchOperations gains a ReindexAsync helper used by the enrichment sync.

Sequence Diagram

sequenceDiagram
    participant Exporter as ElasticsearchMarkdownExporter
    participant CDEnrich as ContentDateEnrichment
    participant ES as Elasticsearch
    participant Ingest as Ingest Pipeline

    Exporter->>CDEnrich: InitializeAsync()
    CDEnrich->>ES: Ensure lookup index (mappings: url, content_hash, content_last_updated)
    CDEnrich->>ES: Create enrich policy (match on url -> content_hash, content_last_updated)
    CDEnrich->>ES: Execute enrich policy
    CDEnrich->>ES: Create ingest pipeline (set timestamp, enrich, painless script)
    CDEnrich-->>Exporter: Ready

    Exporter->>Exporter: For each doc: ContentBodyHash = ContentHash.CreateNormalized(doc.StrippedBody)
    Exporter->>ES: Index document (includes content_hash)

    note over Ingest,ES: On ingestion
    Ingest->>ES: Enrich using lookup index (_content_date_lookup)
    Ingest->>Ingest: Painless script: if lookup.content_hash == ctx.content_hash then set content_last_updated = lookup.content_last_updated else set content_last_updated = ctx._ingest.timestamp
    Ingest->>ES: Finalize indexing with content_last_updated

    Exporter->>Exporter: Stop -> Complete orchestrator
    Exporter->>CDEnrich: SyncLookupIndexAsync(lexicalAlias)
    CDEnrich->>ES: Delete lookup docs (delete-by-query)
    CDEnrich->>ES: Reindex lexical -> lookup (uses ReindexAsync)
    CDEnrich->>ES: Refresh lookup & re-execute enrich policy
    CDEnrich-->>Exporter: Sync complete
Loading

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: adding content_last_updated field for tracking content-only changes, which is the core objective across all modified files.
Description check ✅ Passed The description is directly related to the changeset, clearly explaining the What, Why, and How of the content-only change tracking feature, including the persistent lookup index and sitemap updates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch handsomely-lighter

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateLookup.cs`:
- Around line 178-179: EscapeJson currently only handles backslashes and quotes
and misses control characters; update the EscapeJson implementation in
ContentDateLookup (the EscapeJson method) to use a proper JSON escaper such as
System.Text.Json: either return JsonSerializer.Serialize(value).Trim('"') or use
JsonEncodedText.Encode(value).ToString() so all control characters and Unicode
are correctly escaped; replace the manual Replace chain with one of these calls
in the EscapeJson method.
- Around line 139-146: The Resolve method uses the non-thread-safe fields
_existing and _changed from ContentDateLookup while being called concurrently;
replace both Dictionary usages with
System.Collections.Concurrent.ConcurrentDictionary instances, update any
TryGetValue/_existing lookups to use ConcurrentDictionary.TryGetValue (or
TryGetValue remains but on the concurrent type), and replace the assignment
_changed[url] = (newContentHash, batchTimestamp) with a thread-safe update
(e.g., TryAdd/AddOrUpdate) so Resolve and any other methods (referencing
_existing, _changed, Resolve) are safe under Parallel.ForEachAsync.

In
`@src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.cs`:
- Around line 199-210: The current pattern starts _orchestrator.StartAsync(...)
and then awaits loadTask first, risking an unobserved exception on
orchestratorTask if LoadAsync fails; modify the logic in
ElasticsearchMarkdownExporter (around _contentDateLookup.LoadAsync and
_orchestrator.StartAsync) to await both tasks together (e.g., use Task.WhenAll
or equivalent) so exceptions from both tasks are observed/aggregated, then
extract the orchestratorContext from the completed orchestratorTask (or its
result) to log the strategy and aliases.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ff336ae1-4274-4a37-8ae8-f5e7670f43e4

📥 Commits

Reviewing files that changed from the base of the PR and between e578fac and 86f436a.

📒 Files selected for processing (10)
  • src/Elastic.Documentation/Search/ContentHash.cs
  • src/Elastic.Documentation/Search/DocumentationDocument.cs
  • src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateLookup.cs
  • src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.Export.cs
  • src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.cs
  • src/services/Elastic.Documentation.Assembler/Building/AssemblerBuildService.cs
  • src/services/Elastic.Documentation.Assembler/Building/EsSitemapReader.cs
  • src/tooling/docs-builder/Commands/Assembler/AssemblerSitemapCommand.cs
  • tests/Elastic.Markdown.Tests/Search/ContentHashTests.cs
  • tests/Elastic.Markdown.Tests/Search/DocumentationDocumentSerializationTests.cs

- Use ConcurrentDictionary for _existing and _changed since Resolve
  is called from Parallel.ForEachAsync
- Use JsonEncodedText.Encode for AOT-safe JSON escaping that handles
  control characters and Unicode
- Use Task.WhenAll to observe both tasks when running lookup load and
  orchestrator start in parallel

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@Mpdreamz Mpdreamz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: content_last_updated via ContentDateLookup

The ContentHash.CreateNormalized addition, the new DocumentationDocument fields, and the unit tests are all good — those should stay regardless.

The main concern is ContentDateLookup: it drains the entire side index into a ConcurrentDictionary on startup, then bulk-upserts changes on shutdown. For the current corpus (~5–10k pages) this works, but it sets a precedent that scales poorly and duplicates infrastructure the project already has.

Hand-rolled PIT pagination

LoadAsync (lines 68–130 of ContentDateLookup.cs) manually implements PIT open / search_after / close — the exact pattern already available as PointInTimeSearch<T> in Elastic.Ingest.Elasticsearch.Helpers. The existing helper handles typed deserialization, automatic PIT lifecycle via IAsyncDisposable, sliced parallel search, and configurable page size / keep-alive / sort / query. The ~80 lines of OpenPitAsync, ClosePitAsync, BuildSearchBody, and the do/while loop could be replaced by roughly:

await using var pit = new PointInTimeSearch<ContentDateEntry>(transport, new PointInTimeSearchOptions
{
    Index = _indexName,
    Size = 1000
}, jsonOptions);

await foreach (var entry in pit.SearchDocumentsAsync(ct))
    _existing[entry.Url] = (entry.ContentHash, entry.ContentLastUpdated);

(Same observation applies to the existing EsSitemapReader — separate cleanup.)

Suggested alternative: Elasticsearch Enrich Policy

The project already uses this exact pattern for AI enrichment (AiEnrichmentOrchestrator). The same approach eliminates the in-memory drain entirely by letting Elasticsearch do the join at index time:

  1. Lookup indexdocs-{type}-content-dates-{env} stores {url, content_hash, content_last_updated} (same as now).
  2. Enrich policy — a match policy on field url, enriching with content_hash and content_last_updated.
  3. Ingest pipeline — an enrich processor runs on every document. A script processor compares the incoming content_hash with the enriched value: if they differ (or no match exists), it sets content_last_updated to _ingest.timestamp; otherwise it preserves the looked-up date.
  4. Post-index save — after indexing, the application already knows every {url, content_hash, timestamp} it just wrote. Bulk-upsert those into the lookup index (a single bulk call for the whole corpus), then re-execute the enrich policy so the next run sees current data.

This is self-sufficient — no separate reconciliation scan of the main index is needed. The app tracks what it indexed during the normal export flow.

Benefits:

  • Zero memory overhead — no startup drain, no ConcurrentDictionary
  • Reuses the exact infrastructure patterns from AiEnrichmentOrchestrator (EnsureLookupIndex, EnsureEnrichPolicy, ExecuteEnrichPolicy, EnsurePipeline)
  • Scales to any corpus size
  • Stale entry cleanup follows the existing CleanupOrphanedAsync pattern (PIT-scan lookup, terms-check against main index, delete orphans)

Missing cleanup

The current PR has no mechanism to remove lookup entries for pages that have been deleted or whose URLs changed. Over time the lookup index accumulates orphans. The enrich-policy approach naturally supports cleanup via the same CleanupOrphanedAsync pattern the AI enrichment already uses.

Move content_last_updated resolution from an in-memory drain-and-compare
approach to an Elasticsearch enrich policy + ingest pipeline. This
eliminates the startup memory overhead of loading the entire lookup index
into a ConcurrentDictionary and removes ~200 lines of hand-rolled PIT
pagination.

The ingest pipeline compares content hashes at index time via an enrich
processor and painless script. After indexing, the lookup index is synced
via reindex from the lexical index, which also implicitly cleans up
orphaned entries for deleted pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@reakaleek
Copy link
Copy Markdown
Member Author

61141fc

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs (2)

47-56: Consider adding a refresh before policy execution to ensure reindexed documents are visible.

After ReindexToLookupAsync completes, the documents may not be immediately searchable due to the default 1-second refresh interval. The subsequent ExecutePolicyAsync might build the enrich index from stale data.

Add explicit refresh
 public async Task SyncLookupIndexAsync(string lexicalAlias, Cancel ct)
 {
     logger.LogInformation("Syncing content date lookup index {Index} from {Source}", _lookupIndex, lexicalAlias);

     await DeleteLookupContentsAsync(ct);
     await ReindexToLookupAsync(lexicalAlias, ct);
+    await RefreshLookupIndexAsync(ct);
     await ExecutePolicyAsync(ct);

     logger.LogInformation("Content date lookup sync complete");
 }
+
+private async Task RefreshLookupIndexAsync(Cancel ct)
+{
+    await operations.WithRetryAsync(
+        () => transport.PostAsync<StringResponse>($"{_lookupIndex}/_refresh", PostData.Empty, ct),
+        $"POST {_lookupIndex}/_refresh",
+        ct
+    );
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs` around
lines 47 - 56, SyncLookupIndexAsync may call ExecutePolicyAsync before reindexed
documents are visible; after awaiting ReindexToLookupAsync (and before
ExecutePolicyAsync) call the Elasticsearch refresh for the target lookup index
(the same index referenced by _lookupIndex) and await its completion so newly
reindexed documents are searchable; place the refresh call between
ReindexToLookupAsync(...) and ExecutePolicyAsync(...) and ensure any
client/response errors are handled/logged similarly to DeleteLookupContentsAsync
and ReindexToLookupAsync.

58-94: Infrastructure failures are logged but not propagated — documents may silently lack content_last_updated.

All setup methods (EnsureLookupIndexAsync, PutEnrichPolicyAsync, ExecutePolicyAsync, PutPipelineAsync) log warnings on failure but don't throw. If the pipeline fails to create, documents will be indexed without content_last_updated being set, and the issue may go unnoticed until production.

If this is intentional (graceful degradation), consider at minimum emitting a diagnostic error via the collector to surface it to CI/CD.

Also applies to: 96-118, 120-132, 134-180

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs` around
lines 58 - 94, The setup methods (EnsureLookupIndexAsync, PutEnrichPolicyAsync,
ExecutePolicyAsync, PutPipelineAsync) currently only log failures and swallow
errors, which can let documents be indexed without content_last_updated; change
each method to propagate failures instead of silently continuing: after the
transport/operations call check ApiCallDetails.HasSuccessfulStatusCode, and if
false either throw a descriptive exception (e.g., InvalidOperationException)
including response.ApiCallDetails.DebugInformation and the operation name
(HEAD/PUT/POST) or call the existing diagnostics/collector API (e.g.,
DiagnosticCollector.ReportError) with the same contextual details, then rethrow
or return a failed Task; keep the log entry but ensure the error is surfaced to
CI by throwing or reporting so callers of
EnsureLookupIndexAsync/PutEnrichPolicyAsync/ExecutePolicyAsync/PutPipelineAsync
cannot continue as if creation succeeded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs`:
- Around line 47-56: SyncLookupIndexAsync may call ExecutePolicyAsync before
reindexed documents are visible; after awaiting ReindexToLookupAsync (and before
ExecutePolicyAsync) call the Elasticsearch refresh for the target lookup index
(the same index referenced by _lookupIndex) and await its completion so newly
reindexed documents are searchable; place the refresh call between
ReindexToLookupAsync(...) and ExecutePolicyAsync(...) and ensure any
client/response errors are handled/logged similarly to DeleteLookupContentsAsync
and ReindexToLookupAsync.
- Around line 58-94: The setup methods (EnsureLookupIndexAsync,
PutEnrichPolicyAsync, ExecutePolicyAsync, PutPipelineAsync) currently only log
failures and swallow errors, which can let documents be indexed without
content_last_updated; change each method to propagate failures instead of
silently continuing: after the transport/operations call check
ApiCallDetails.HasSuccessfulStatusCode, and if false either throw a descriptive
exception (e.g., InvalidOperationException) including
response.ApiCallDetails.DebugInformation and the operation name (HEAD/PUT/POST)
or call the existing diagnostics/collector API (e.g.,
DiagnosticCollector.ReportError) with the same contextual details, then rethrow
or return a failed Task; keep the log entry but ensure the error is surfaced to
CI by throwing or reporting so callers of
EnsureLookupIndexAsync/PutEnrichPolicyAsync/ExecutePolicyAsync/PutPipelineAsync
cannot continue as if creation succeeded.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1293e388-205f-4210-b83d-a088d2ee48e0

📥 Commits

Reviewing files that changed from the base of the PR and between 541007d and 61141fc.

📒 Files selected for processing (4)
  • src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs
  • src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.Export.cs
  • src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.cs
  • src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchOperations.cs

Throw InvalidOperationException on setup failures (index creation, enrich
policy, pipeline) instead of logging warnings and continuing silently.
This ensures CI fails fast when infrastructure setup is broken rather than
indexing documents without content_last_updated.

Add an explicit index refresh between reindex and enrich policy execution
in SyncLookupIndexAsync so newly reindexed documents are visible when the
policy snapshots the lookup index.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@reakaleek reakaleek requested a review from Mpdreamz April 2, 2026 09:11
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs`:
- Around line 157-168: The script in ContentDateEnrichment currently references
ctx._ingest.timestamp which is not available inside script processors; update
the processor so the script uses metadata().now for the fallback timestamp
(i.e., inside the painless script in the script processor replace
ctx._ingest.timestamp usage with metadata().now), or alternatively remove the
fallback timestamp assignment from the script and add a subsequent set processor
that assigns content_last_updated = {{_ingest.timestamp}} when needed; locate
this change in the JsonObject assigned to the "script" for the script processor
in ContentDateEnrichment and apply one of the two fixes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 176e9e68-c3a8-4c81-a1df-b33c4d5afa90

📥 Commits

Reviewing files that changed from the base of the PR and between 61141fc and 3899203.

📒 Files selected for processing (1)
  • src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs

Replace ctx._ingest.timestamp (not available in Painless script
processors) with a set processor using Mustache {{{_ingest.timestamp}}}.
The set processor pre-sets content_last_updated to the ingest timestamp,
and the script processor only overwrites it when the enrich lookup finds
a matching content hash.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs`:
- Around line 207-237: The DeleteLookupContentsAsync and ReindexToLookupAsync
methods currently await operations.DeleteByQueryAsync and
operations.ReindexAsync but ignore the start task result, allowing silent
failures when PostAsyncTaskAsync returns null; update both methods to capture
the returned value (the task id/response) from DeleteByQueryAsync and
ReindexAsync, check if that value is null (or indicates failure), and throw a
descriptive exception (e.g., InvalidOperationException) including context like
the operation name and target index (referencing DeleteLookupContentsAsync,
ReindexToLookupAsync, DeleteByQueryAsync, ReindexAsync and the
PostAsyncTaskAsync result/taskId) so callers can detect and handle failed start
of the async operation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c810e016-be94-4171-a2ef-4d1e83c9f05e

📥 Commits

Reviewing files that changed from the base of the PR and between 3899203 and fc7c5cc.

📒 Files selected for processing (1)
  • src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs

reakaleek and others added 3 commits April 2, 2026 17:46
DeleteByQueryAsync, ReindexAsync, and UpdateByQueryAsync now throw
InvalidOperationException when PostAsyncTaskAsync returns null instead
of silently skipping the poll. These are wait-for-completion methods
where callers expect the operation to have succeeded on return.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
  • @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
  • @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve merge conflict to automatically resolve merge conflicts.
  • @coderabbitai autofix to automatically fix issues identified in unresolved review comments.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, support, documentation and community

  • Visit our status page to check the current availability of CodeRabbit.
  • Create a ticket on our support page for assistance with any issues or questions.
  • Visit our documentation site for detailed information on how to use CodeRabbit.
  • Join our Discord community to connect with other users and get help from the community.
  • Follow us on X/Twitter for updates and announcements.

@reakaleek
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@reakaleek
Copy link
Copy Markdown
Member Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs`:
- Around line 51-52: The current sequence calls DeleteLookupContentsAsync which
removes the only durable lookup before ReindexToLookupAsync writes a
replacement; instead, change the flow to reindex into a new staging index (e.g.,
derive a staging name from lexicalAlias), run the full copy and refresh against
that staging index, verify success, then atomically swap the lexicalAlias to
point to the new index and only then delete the old lookup; update usages in
InitializeAsync to continue referencing the alias so the swap is transparent and
ensure ReindexToLookupAsync and DeleteLookupContentsAsync are replaced or
refactored to support staging + alias-swap semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: eee48ab1-51b6-48f2-adfa-2f64d4511ffb

📥 Commits

Reviewing files that changed from the base of the PR and between 3899203 and f990ada.

📒 Files selected for processing (4)
  • src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs
  • src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchOperations.cs
  • src/services/Elastic.Documentation.Assembler/Building/AssemblerBuildService.cs
  • src/tooling/docs-builder/Commands/Assembler/AssemblerSitemapCommand.cs
✅ Files skipped from review due to trivial changes (2)
  • src/services/Elastic.Documentation.Assembler/Building/AssemblerBuildService.cs
  • src/tooling/docs-builder/Commands/Assembler/AssemblerSitemapCommand.cs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchOperations.cs

Comment on lines +51 to +52
await DeleteLookupContentsAsync(ct);
await ReindexToLookupAsync(lexicalAlias, ct);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t rebuild the persisted lookup in place.

Line 51 deletes the only durable content-date state before Line 52 has produced a replacement. If shutdown is cancelled or the reindex fails partway through, the lookup index stays empty/partial, and the next InitializeAsync will execute the enrich policy against that truncated data set. Unchanged pages will then get a fresh content_last_updated. Rebuild into a staging index and swap it in only after the copy and refresh succeed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Elastic.Markdown/Exporters/Elasticsearch/ContentDateEnrichment.cs` around
lines 51 - 52, The current sequence calls DeleteLookupContentsAsync which
removes the only durable lookup before ReindexToLookupAsync writes a
replacement; instead, change the flow to reindex into a new staging index (e.g.,
derive a staging name from lexicalAlias), run the full copy and refresh against
that staging index, verify success, then atomically swap the lexicalAlias to
point to the new index and only then delete the old lookup; update usages in
InitializeAsync to continue referencing the alias so the swap is transparent and
ensure ReindexToLookupAsync and DeleteLookupContentsAsync are replaced or
refactored to support staging + alias-swap semantics.

@reakaleek reakaleek merged commit 36e15c1 into main Apr 14, 2026
28 checks passed
@reakaleek reakaleek deleted the handsomely-lighter branch April 14, 2026 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants