feat(search): add task subtype to opensearch#2183
Conversation
WalkthroughThis pull request introduces support for a new Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@infra/stacks/opensearch/helpers/scripts/create_indices.ts`:
- Around line 144-148: The new sub_type keyword mapping is only applied when
DOCUMENT_INDEX is created, leaving existing indices without that mapping; update
createDocumentIndex to detect existing DOCUMENT_INDEX and call the same
putMapping flow used in add_field.ts (the addField/putMapping logic) to add
sub_type: { type: 'keyword', index: true, doc_values: true } to existing
indices; ensure you reuse the utility from
infra/stacks/opensearch/helpers/utils/add_field.ts (or extract its putMapping
call) so createDocumentIndex applies the mapping whether the index is new or
already exists.
In `@rust/cloud-storage/document_storage_service/src/bin/backfill_search.rs`:
- Around line 17-25: Validate the CLI filters at startup in backfill_search.rs:
ensure the optional sub_type value (sub_type) is checked against
allowed/expected values or at least rejected if empty/invalid, and validate the
date bounds so that if both created_after and created_before are present then
created_after <= created_before; on any invalid input return an error / exit
non‑zero before starting the backfill (i.e., perform these checks in main or the
argument-parsing block that prepares the backfill job, referencing sub_type,
created_after, and created_before), and mirror the same validation for the
duplicate flag handling present around the other occurrence (lines 37-43).
In `@rust/cloud-storage/macro_db_client/src/document/get_documents_search.rs`:
- Around line 72-73: The pagination ORDER BY in the backfill query (used by
backfill_search) is non-deterministic because it only sorts by d."createdAt";
update the SQL ORDER BY to include a stable tiebreaker such as d.id (e.g. ORDER
BY d."createdAt" ASC, d.id ASC) so rows with identical timestamps have a
deterministic order and pages won't duplicate or skip documents.
In
`@rust/cloud-storage/search_processing_service/src/process/document/raw_document.rs`:
- Line 393: The test currently only asserts the None branch for
upserts[0].sub_type; add a positive assertion covering propagation of a subtype
by creating the input that sets DocumentSubType::Task and then asserting
upserts[0].sub_type == Some(DocumentSubType::Task). Update the test in
raw_document.rs to include that case (ensure the constructed document/input
includes the subtype field) so the new behavior that sends sub_type is
exercised.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d6005e57-183e-4e82-9d81-d3aa96441db3
📒 Files selected for processing (9)
infra/stacks/opensearch/helpers/scripts/create_indices.tsrust/cloud-storage/document_storage_service/src/bin/backfill_search.rsrust/cloud-storage/macro_db_client/.sqlx/query-0977b687594b92e6637fff17b6bf4c14b115a7329a678b647755c0d67e5a40e4.jsonrust/cloud-storage/macro_db_client/.sqlx/query-594e0e7bd80d920c3d37e6739f7a01623ab9cde30224640c159eec0a7ba7d52b.jsonrust/cloud-storage/macro_db_client/.sqlx/query-e51c29c9e584cce0a3d2657316db8269b604689722628144eca8c19838aa36e9.jsonrust/cloud-storage/macro_db_client/src/document/get_documents_search.rsrust/cloud-storage/model/src/document/mod.rsrust/cloud-storage/opensearch_client/src/upsert/document.rsrust/cloud-storage/search_processing_service/src/process/document/raw_document.rs
💤 Files with no reviewable changes (2)
- rust/cloud-storage/macro_db_client/.sqlx/query-594e0e7bd80d920c3d37e6739f7a01623ab9cde30224640c159eec0a7ba7d52b.json
- rust/cloud-storage/macro_db_client/.sqlx/query-0977b687594b92e6637fff17b6bf4c14b115a7329a678b647755c0d67e5a40e4.json
rust/cloud-storage/macro_db_client/src/document/get_documents_search.rs
Outdated
Show resolved
Hide resolved
rust/cloud-storage/search_processing_service/src/process/document/raw_document.rs
Show resolved
Hide resolved
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
rust/cloud-storage/macro_db_client/src/document/get_documents_search.rs (1)
11-11: 🛠️ Refactor suggestion | 🟠 MajorAdd
errtotracing::instrumentsince the function returnsResult.Per coding guidelines, functions returning
Resultshould includeerrin the#[tracing::instrument]attribute to log errors automatically.♻️ Proposed fix
-#[tracing::instrument(skip(db))] +#[tracing::instrument(skip(db), err)]Based on learnings: "Include
errwhen adding thetracing::instrumentattribute to functions that returnResult. Do not includeerron functions that returnOption,(), or other non-Resulttypes."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust/cloud-storage/macro_db_client/src/document/get_documents_search.rs` at line 11, The tracing attribute on the function that returns Result (the function in get_documents_search.rs, e.g., get_documents_search) must include error logging: change the attribute from #[tracing::instrument(skip(db))] to #[tracing::instrument(skip(db), err)] so that returned errors are automatically recorded by tracing; update the attribute above the get_documents_search function accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@rust/cloud-storage/macro_db_client/src/document/get_documents_search.rs`:
- Line 11: The tracing attribute on the function that returns Result (the
function in get_documents_search.rs, e.g., get_documents_search) must include
error logging: change the attribute from #[tracing::instrument(skip(db))] to
#[tracing::instrument(skip(db), err)] so that returned errors are automatically
recorded by tracing; update the attribute above the get_documents_search
function accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 68db41b6-36e5-4e70-a19b-85c667edf8f6
⛔ Files ignored due to path filters (1)
rust/cloud-storage/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
rust/cloud-storage/macro_db_client/.sqlx/query-929e8e21a0b181340c4d74adbb26e14ec0c63b5f7da47d260f84eb0cfb15c575.jsonrust/cloud-storage/macro_db_client/src/document/get_documents_search.rsrust/cloud-storage/search_processing_service/Cargo.tomlrust/cloud-storage/search_processing_service/src/process/document/raw_document.rs
No description provided.