Skip to content

fix: extensionsless key upload problems#2159

Merged
gbirman merged 9 commits intomainfrom
gab/fix/add-better-error-handling-for-search-upload
Mar 24, 2026
Merged

fix: extensionsless key upload problems#2159
gbirman merged 9 commits intomainfrom
gab/fix/add-better-error-handling-for-search-upload

Conversation

@gbirman
Copy link
Copy Markdown
Contributor

@gbirman gbirman commented Mar 24, 2026

  • fixes missing dss auth env key
  • removes .pdf extension match requirement for document text extractor lambda since we no longer have the extension in the key. this means that the lambda will be triggered more often but it's not really a problem at current scale imo. the lambda already has code for doing db read to get file type so this is a sufficient fix. in a future pr we can get the file type separately as a single upload handler and then provide that info the search lambda/bypass text process lambda for non-pdf files
  • also adds a dlq for search upload and text extractor lambdas

deployed this to dev stack already

@gbirman gbirman requested a review from a team as a code owner March 24, 2026 21:27
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 24, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3e303c18-a843-4b2f-99d2-47fdab7fee4c

📥 Commits

Reviewing files that changed from the base of the PR and between 3492897 and ead842f.

📒 Files selected for processing (1)
  • infra/stacks/document-storage-bucket-integrations/index.ts

Walkthrough

Added getEventRuleArn and switched EventBridge rule handling to use derived rule-name constants; refactored the Lambda permission helper to accept ruleName and use getEventRuleArn. Created two SQS DLQs (search-upload and text-extractor) with 14-day retention and QueuePolicy resources allowing events.amazonaws.com to SendMessage constrained by the specific EventBridge rule ARNs; wired DLQs into EventBridge targets via deadLetterConfig. Changed the text-extractor rule to trigger on all S3 Object Created events. Added Pulumi config keys for document_storage_service_auth_key, fetched the secret and injected DOCUMENT_STORAGE_SERVICE_AUTH_KEY into the search-upload Lambda env. Bumped @biomejs/biome devDependency to 2.2.6.

Poem

🐇 I hopped through stacks and stitched each rule with care,
Queues for lost messages tucked into their lair,
I fetched a secret from the vault so deep,
Pinned it to Lambdas so they wake from sleep,
I twitch my nose and watch infra dance with flair.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'fix: extensionsless key upload problems' does not align with the actual changes. The PR focuses on EventBridge DLQ setup, secrets management, and error handling—not extensionless key uploads. Update the title to reflect the main changes, such as 'fix: add EventBridge DLQ and DSS auth key to search-upload stack' or use the more descriptive commit message.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The PR description clearly describes the changeset: fixes missing DSS auth key, removes PDF extension requirement, and adds DLQ for error handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch gab/fix/add-better-error-handling-for-search-upload

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the infra label Mar 24, 2026
@gbirman gbirman force-pushed the gab/fix/add-better-error-handling-for-search-upload branch from 53491b7 to cf1981d Compare March 24, 2026 21:31
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@infra/stacks/document-storage-bucket-integrations/index.ts`:
- Around line 43-49: The new SQS queue declaration (searchUploadDlq using
aws.sqs.Queue) is failing the infra formatter; run `biome format` on
infra/stacks/document-storage-bucket-integrations/index.ts (or at repo root) to
reformat the block and commit the formatted changes so the queue block conforms
to the project's Biome formatting rules.
- Around line 51-66: The SQS DLQ policy currently allows any EventBridge rule to
send messages because it lacks a Condition; update the QueuePolicy for
searchUploadDlq so the policy Statement includes a Condition that restricts
SendMessage to the specific EventBridge rule (search-upload-rule-${stack})
and/or the account. Modify the policy JSON returned in the
searchUploadDlq.arn.apply callback to add "Condition": { "ArnEquals": {
"aws:SourceArn": "<arn-of-search-upload-rule-${stack}>" } } (or include
"aws:SourceAccount": "<account-id>")—use the actual EventBridge rule ARN
(search-upload-rule-${stack}) value or build it from stack/account/region rather
than leaving it unrestricted.

In `@infra/stacks/search-upload/index.ts`:
- Around line 14-18: Replace the permissive config.get(...) ?? '' usage with a
required lookup so missing config fails fast: in the
DOCUMENT_STORAGE_SERVICE_AUTH_KEY assignment (the
aws.secretsmanager.getSecretVersionOutput(...).apply(...) expression), call
config.require('document_storage_service_auth_key') instead of
config.get('document_storage_service_auth_key') ?? '' so the deployment errors
clearly if the secret id is not provided; leave the rest of the secret lookup
and .apply(...) logic unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d72c15a3-6e23-4bc0-aecb-7595d54237ba

📥 Commits

Reviewing files that changed from the base of the PR and between 50ce720 and ec4e20a.

📒 Files selected for processing (5)
  • infra/stacks/document-storage-bucket-integrations/index.ts
  • infra/stacks/search-upload/Pulumi.dev.yaml
  • infra/stacks/search-upload/Pulumi.prod.yaml
  • infra/stacks/search-upload/index.ts
  • infra/stacks/search-upload/search-upload-lambda.ts

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
infra/stacks/document-storage-bucket-integrations/index.ts (1)

48-63: ⚠️ Potential issue | 🟠 Major

Scope the DLQ QueuePolicy to the intended EventBridge rule.

Line 56 currently grants events.amazonaws.com send access without source conditions. Add aws:SourceArn (and ideally aws:SourceAccount) so only search-upload-rule-${stack} can write to this DLQ.

🔐 Suggested hardening diff
 new aws.sqs.QueuePolicy(`search-upload-dlq-policy-${stack}`, {
   queueUrl: searchUploadDlq.url,
-  policy: searchUploadDlq.arn.apply((arn) =>
+  policy: pulumi
+    .all([searchUploadDlq.arn, aws.getCallerIdentityOutput({}).accountId])
+    .apply(([arn, accountId]) =>
       JSON.stringify({
         Version: '2012-10-17',
         Statement: [
           {
             Effect: 'Allow',
             Principal: { Service: 'events.amazonaws.com' },
             Action: 'sqs:SendMessage',
             Resource: arn,
+            Condition: {
+              ArnEquals: {
+                'aws:SourceArn': `arn:aws:events:${aws.config.region}:${accountId}:rule/search-upload-rule-${stack}`,
+              },
+              StringEquals: {
+                'aws:SourceAccount': accountId,
+              },
+            },
           },
         ],
       })
-    ),
+    ),
 });
#!/bin/bash
set -euo pipefail

FILE="infra/stacks/document-storage-bucket-integrations/index.ts"

# Inspect current DLQ policy block
sed -n '48,63p' "$FILE"

# Verify whether source-scoping conditions exist
rg -n '"Condition"|"aws:SourceArn"|"aws:SourceAccount"' "$FILE" -C2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/stacks/document-storage-bucket-integrations/index.ts` around lines 48 -
63, The DLQ QueuePolicy created by QueuePolicy for searchUploadDlq currently
allows events.amazonaws.com to SendMessage broadly; update the policy JSON
inside searchUploadDlq.arn.apply (the QueuePolicy instantiation) to add a
Condition that scopes access to the intended EventBridge rule by adding
"Condition": { "ArnEquals": { "aws:SourceArn": /* ARN of
search-upload-rule-${stack} */ }, "StringEquals": { "aws:SourceAccount": /*
account id */ } } so only the rule search-upload-rule-${stack} (and optionally
the account) can send to the DLQ; keep the Effect, Principal and Action as-is
but nest them under this Condition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@infra/stacks/document-storage-bucket-integrations/index.ts`:
- Around line 48-63: The DLQ QueuePolicy created by QueuePolicy for
searchUploadDlq currently allows events.amazonaws.com to SendMessage broadly;
update the policy JSON inside searchUploadDlq.arn.apply (the QueuePolicy
instantiation) to add a Condition that scopes access to the intended EventBridge
rule by adding "Condition": { "ArnEquals": { "aws:SourceArn": /* ARN of
search-upload-rule-${stack} */ }, "StringEquals": { "aws:SourceAccount": /*
account id */ } } so only the rule search-upload-rule-${stack} (and optionally
the account) can send to the DLQ; keep the Effect, Principal and Action as-is
but nest them under this Condition.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d415bab4-9b5d-48ff-9f05-336c3ad1881f

📥 Commits

Reviewing files that changed from the base of the PR and between ec4e20a and d11db4c.

⛔ Files ignored due to path filters (1)
  • infra/bun.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • infra/package.json
  • infra/stacks/document-storage-bucket-integrations/index.ts
  • infra/stacks/search-upload/index.ts

@gbirman gbirman changed the title fix(search): add better error handling for search upload fix(search): upload problems Mar 24, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
infra/stacks/document-storage-bucket-integrations/index.ts (1)

133-137: 🧹 Nitpick | 🔵 Trivial

Consider adding DLQ for text-extractor target for consistency.

The search-upload-target now has a DLQ configured, but text-extractor-target does not. For consistent error handling across both triggers, consider adding a DLQ here as well.

♻️ Optional: Add DLQ for text-extractor

If desired, create a similar DLQ and wire it:

+      // Add DLQ for text extractor
+      const textExtractorDlq = new aws.sqs.Queue(`text-extractor-dlq-${stack}`, {
+        name: `text-extractor-dlq-${stack}`,
+        messageRetentionSeconds: 14 * 24 * 60 * 60,
+      });

       // Add the Lambda as a target
       new aws.cloudwatch.EventTarget('text-extractor-target', {
         rule: textExtractorRule.name,
         arn: extractorArn,
+        deadLetterConfig: {
+          arn: textExtractorDlq.arn,
+        },
       });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/stacks/document-storage-bucket-integrations/index.ts` around lines 133
- 137, The text-extractor CloudWatch EventTarget ('text-extractor-target' using
rule textExtractorRule and arn extractorArn) lacks a dead-letter queue while the
'search-upload-target' has one; create an SQS DLQ (e.g.,
textExtractorDeadLetterQueue) and pass its ARN into the EventTarget's
deadLetterConfig when constructing 'text-extractor-target' so failed invocations
are routed to that queue for consistency with the search-upload target.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@infra/stacks/document-storage-bucket-integrations/index.ts`:
- Around line 5-7: The function getEventRuleArn currently hardcodes account ID;
update it to use aws.getCallerIdentityOutput().accountId so the ARN is portable
across accounts. Replace the literal 569036502058 in the pulumi.interpolate call
with aws.getCallerIdentityOutput().accountId (using the Output value inside the
interpolation), keeping the function name getEventRuleArn and the
pulumi.interpolate pattern so callers remain unchanged.

---

Outside diff comments:
In `@infra/stacks/document-storage-bucket-integrations/index.ts`:
- Around line 133-137: The text-extractor CloudWatch EventTarget
('text-extractor-target' using rule textExtractorRule and arn extractorArn)
lacks a dead-letter queue while the 'search-upload-target' has one; create an
SQS DLQ (e.g., textExtractorDeadLetterQueue) and pass its ARN into the
EventTarget's deadLetterConfig when constructing 'text-extractor-target' so
failed invocations are routed to that queue for consistency with the
search-upload target.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4983aac4-b7ef-459a-b75f-b6453f39bdb2

📥 Commits

Reviewing files that changed from the base of the PR and between d11db4c and 3492897.

📒 Files selected for processing (1)
  • infra/stacks/document-storage-bucket-integrations/index.ts

@gbirman gbirman changed the title fix(search): upload problems fix: extensionsless key upload problems Mar 24, 2026
@gbirman gbirman merged commit 72924be into main Mar 24, 2026
21 of 22 checks passed
@gbirman gbirman deleted the gab/fix/add-better-error-handling-for-search-upload branch March 24, 2026 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant