Skip to content

feat(search): add task subtype to opensearch#2183

Merged
gbirman merged 5 commits intomainfrom
gab/feat/opensearch-sub-type
Mar 25, 2026
Merged

feat(search): add task subtype to opensearch#2183
gbirman merged 5 commits intomainfrom
gab/feat/opensearch-sub-type

Conversation

@gbirman
Copy link
Copy Markdown
Contributor

@gbirman gbirman commented Mar 25, 2026

No description provided.

@gbirman gbirman requested review from a team as code owners March 25, 2026 18:54
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

Walkthrough

This pull request introduces support for a new sub_type field across the document storage and search infrastructure. Changes include adding a sub_type keyword mapping in OpenSearch indices, extending CLI filtering to support --sub_type alongside date-range filtering, updating the database query layer to handle the new filter parameters, and adding created_at timestamp tracking to search document models. The search processing service now derives sub_type from document metadata and populates it in upsert operations. A new dependency for document sub-type definitions was added to support these changes.

Poem

🐰 ✨ Hop, skip, and a jump through the code we go,
A new sub_type field makes the documents glow,
With filters and mappings and timestamps so bright,
From OpenSearch to storage, everything's right,
The search service dances with metadata delight! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive No pull request description was provided by the author. Add a description explaining the purpose of adding task subtype support to OpenSearch and how it enables new functionality.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding a task subtype field to OpenSearch across the codebase (TypeScript mappings, Rust models, and upsert logic).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch gab/feat/opensearch-sub-type

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@infra/stacks/opensearch/helpers/scripts/create_indices.ts`:
- Around line 144-148: The new sub_type keyword mapping is only applied when
DOCUMENT_INDEX is created, leaving existing indices without that mapping; update
createDocumentIndex to detect existing DOCUMENT_INDEX and call the same
putMapping flow used in add_field.ts (the addField/putMapping logic) to add
sub_type: { type: 'keyword', index: true, doc_values: true } to existing
indices; ensure you reuse the utility from
infra/stacks/opensearch/helpers/utils/add_field.ts (or extract its putMapping
call) so createDocumentIndex applies the mapping whether the index is new or
already exists.

In `@rust/cloud-storage/document_storage_service/src/bin/backfill_search.rs`:
- Around line 17-25: Validate the CLI filters at startup in backfill_search.rs:
ensure the optional sub_type value (sub_type) is checked against
allowed/expected values or at least rejected if empty/invalid, and validate the
date bounds so that if both created_after and created_before are present then
created_after <= created_before; on any invalid input return an error / exit
non‑zero before starting the backfill (i.e., perform these checks in main or the
argument-parsing block that prepares the backfill job, referencing sub_type,
created_after, and created_before), and mirror the same validation for the
duplicate flag handling present around the other occurrence (lines 37-43).

In `@rust/cloud-storage/macro_db_client/src/document/get_documents_search.rs`:
- Around line 72-73: The pagination ORDER BY in the backfill query (used by
backfill_search) is non-deterministic because it only sorts by d."createdAt";
update the SQL ORDER BY to include a stable tiebreaker such as d.id (e.g. ORDER
BY d."createdAt" ASC, d.id ASC) so rows with identical timestamps have a
deterministic order and pages won't duplicate or skip documents.

In
`@rust/cloud-storage/search_processing_service/src/process/document/raw_document.rs`:
- Line 393: The test currently only asserts the None branch for
upserts[0].sub_type; add a positive assertion covering propagation of a subtype
by creating the input that sets DocumentSubType::Task and then asserting
upserts[0].sub_type == Some(DocumentSubType::Task). Update the test in
raw_document.rs to include that case (ensure the constructed document/input
includes the subtype field) so the new behavior that sends sub_type is
exercised.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d6005e57-183e-4e82-9d81-d3aa96441db3

📥 Commits

Reviewing files that changed from the base of the PR and between 310eae8 and ae94385.

📒 Files selected for processing (9)
  • infra/stacks/opensearch/helpers/scripts/create_indices.ts
  • rust/cloud-storage/document_storage_service/src/bin/backfill_search.rs
  • rust/cloud-storage/macro_db_client/.sqlx/query-0977b687594b92e6637fff17b6bf4c14b115a7329a678b647755c0d67e5a40e4.json
  • rust/cloud-storage/macro_db_client/.sqlx/query-594e0e7bd80d920c3d37e6739f7a01623ab9cde30224640c159eec0a7ba7d52b.json
  • rust/cloud-storage/macro_db_client/.sqlx/query-e51c29c9e584cce0a3d2657316db8269b604689722628144eca8c19838aa36e9.json
  • rust/cloud-storage/macro_db_client/src/document/get_documents_search.rs
  • rust/cloud-storage/model/src/document/mod.rs
  • rust/cloud-storage/opensearch_client/src/upsert/document.rs
  • rust/cloud-storage/search_processing_service/src/process/document/raw_document.rs
💤 Files with no reviewable changes (2)
  • rust/cloud-storage/macro_db_client/.sqlx/query-594e0e7bd80d920c3d37e6739f7a01623ab9cde30224640c159eec0a7ba7d52b.json
  • rust/cloud-storage/macro_db_client/.sqlx/query-0977b687594b92e6637fff17b6bf4c14b115a7329a678b647755c0d67e5a40e4.json

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
rust/cloud-storage/macro_db_client/src/document/get_documents_search.rs (1)

11-11: 🛠️ Refactor suggestion | 🟠 Major

Add err to tracing::instrument since the function returns Result.

Per coding guidelines, functions returning Result should include err in the #[tracing::instrument] attribute to log errors automatically.

♻️ Proposed fix
-#[tracing::instrument(skip(db))]
+#[tracing::instrument(skip(db), err)]

Based on learnings: "Include err when adding the tracing::instrument attribute to functions that return Result. Do not include err on functions that return Option, (), or other non-Result types."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/cloud-storage/macro_db_client/src/document/get_documents_search.rs` at
line 11, The tracing attribute on the function that returns Result (the function
in get_documents_search.rs, e.g., get_documents_search) must include error
logging: change the attribute from #[tracing::instrument(skip(db))] to
#[tracing::instrument(skip(db), err)] so that returned errors are automatically
recorded by tracing; update the attribute above the get_documents_search
function accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@rust/cloud-storage/macro_db_client/src/document/get_documents_search.rs`:
- Line 11: The tracing attribute on the function that returns Result (the
function in get_documents_search.rs, e.g., get_documents_search) must include
error logging: change the attribute from #[tracing::instrument(skip(db))] to
#[tracing::instrument(skip(db), err)] so that returned errors are automatically
recorded by tracing; update the attribute above the get_documents_search
function accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 68db41b6-36e5-4e70-a19b-85c667edf8f6

📥 Commits

Reviewing files that changed from the base of the PR and between ae94385 and 0c60371.

⛔ Files ignored due to path filters (1)
  • rust/cloud-storage/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • rust/cloud-storage/macro_db_client/.sqlx/query-929e8e21a0b181340c4d74adbb26e14ec0c63b5f7da47d260f84eb0cfb15c575.json
  • rust/cloud-storage/macro_db_client/src/document/get_documents_search.rs
  • rust/cloud-storage/search_processing_service/Cargo.toml
  • rust/cloud-storage/search_processing_service/src/process/document/raw_document.rs

@gbirman gbirman merged commit abb324e into main Mar 25, 2026
40 checks passed
@gbirman gbirman deleted the gab/feat/opensearch-sub-type branch March 25, 2026 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant