feat: hybrid Shopify JSON enrichment (deterministic + guarded oss-120b)#58
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Caution Review failedThe pull request is closed. WalkthroughIntroduces a Shopify product-JSON enrichment flow that derives deterministic power signals, optionally requests LLM enrichment via an AI gateway, and integrates those enriched signals into the existing ingest/processWorkflowItem pipeline prior to standard extraction and persistence. Changes
Sequence DiagramsequenceDiagram
participant Ingest as Ingest Pipeline
participant Shopify as Shopify Product API
participant Enrich as Shopify JSON Enrichment
participant AIGW as AI Gateway
participant Store as Cable Data Store
Ingest->>Shopify: fetchShopifyProductJsonPayload(product URL)
Shopify-->>Ingest: product JSON (or error)
Ingest->>Enrich: buildShopifyJsonEnrichmentInput(JSON)
Enrich->>Enrich: deriveDeterministicPowerSignals()
Enrich->>Enrich: applyShopifyJsonPowerSignals()
alt AI gateway enabled & provider configured
Enrich->>AIGW: generateObject(enrichment prompt)
AIGW-->>Enrich: LLM enrichment result
Enrich->>Enrich: applyShopifyJsonLlmEnrichment()
end
Enrich-->>Ingest: enriched cables
Ingest->>Store: persist extraction & evidence
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
packages/backend/convex/shopifyJsonEnrichment.ts (2)
158-171: Prefer a type guard overasintoOptionValue.Keeps the unknown-object path aligned with the type-narrowing guideline.
As per coding guidelines: Leverage TypeScript's type narrowing instead of type assertions.♻️ Suggested refactor
+const isRecord = (value: unknown): value is Record<string, unknown> => + typeof value === "object" && value !== null; + const toOptionValue = (value: unknown): string => { if (typeof value === "string") { return cleanText(value); } - if (value && typeof value === "object") { - const candidate = value as { label?: unknown; value?: unknown }; - const label = cleanText(candidate.label); - if (label) { - return label; - } - return cleanText(candidate.value); - } + if (isRecord(value)) { + const label = cleanText(value.label); + if (label) { + return label; + } + return cleanText(value.value); + } return ""; };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/backend/convex/shopifyJsonEnrichment.ts` around lines 158 - 171, In toOptionValue, avoid the assertion "value as { label?: unknown; value?: unknown }" and instead narrow the unknown object with a type guard: check that value is an object (not null), then use 'in' checks and typeof checks on label and value (e.g., 'label' in value && typeof (value as any).label === "string") to safely read and cleanText the fields; make sure to prefer the cleaned label if present, otherwise clean the value, and return "" for non-string/non-present fields—update the logic in toOptionValue to use these runtime checks rather than a direct 'as' cast.
7-8: Extract the snippet cap into a named constant.Avoiding the magic number makes the intent clearer and easier to reuse.
As per coding guidelines: Use meaningful variable names instead of magic numbers - extract constants with descriptive names.♻️ Suggested refactor
-const MAX_WATTAGE = 500; +const MAX_WATTAGE = 500; +const MAX_SNIPPET_LENGTH = 240; @@ - return cleanText(value).slice(0, 240); + return cleanText(value).slice(0, MAX_SNIPPET_LENGTH);Also applies to: 154-156
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/backend/convex/shopifyJsonEnrichment.ts` around lines 7 - 8, Replace the magic-number "snippet cap" values found in this file (the numeric literals at the block noted around lines 154-156) with a descriptive named constant (similar to the existing MAX_WATTAGE) — e.g., declare a top-level constant like SNIPPET_CHAR_LIMIT or SNIPPET_CAP and use that constant wherever the snippet length/cap number is used instead of the hard-coded literal; update any references in the function(s) that construct the enriched JSON so they read from the new constant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/backend/convex/ingest.ts`:
- Around line 328-448: The enrichment step can throw and abort optional
Shopify-only ingestion; wrap the external calls in try/catch so failures are
swallowed and ingestion continues using deterministic signals. Specifically, in
enrichShopifyCablesFromProductJson call fetchShopifyProductJsonPayload inside a
try/catch and treat any thrown error as a null payload (proceed without LLM
enrichment), and wrap the generateObject invocation (inside the providerConfig
block) in try/catch and set llmEnrichment = null on error (optionally log the
error) rather than letting it bubble; leave function signatures like
fetchShopifyProductJsonPayload, enrichShopifyCablesFromProductJson,
getProviderConfig, canUseAiGatewayEnrichment and the
shopifyJsonLlmEnrichmentSchema usage unchanged.
In `@packages/backend/convex/shopifyJsonEnrichment.ts`:
- Around line 345-374: hasExplicitWattsToken builds a regex by interpolating the
numeric value which breaks on decimals and differs from the shared POWER_REGEX
approach; replace the dynamic RegExp with using the existing POWER_REGEX (the
same one used by getMaxWattsFromSegments) to iterate matches via matchAll on
haystack, parse the numeric portion of each match to a number, and return true
if any parsed number equals the provided value (taking decimals into account);
update the function named hasExplicitWattsToken to use POWER_REGEX + matchAll
and numeric comparison instead of new RegExp interpolation.
---
Nitpick comments:
In `@packages/backend/convex/shopifyJsonEnrichment.ts`:
- Around line 158-171: In toOptionValue, avoid the assertion "value as { label?:
unknown; value?: unknown }" and instead narrow the unknown object with a type
guard: check that value is an object (not null), then use 'in' checks and typeof
checks on label and value (e.g., 'label' in value && typeof (value as any).label
=== "string") to safely read and cleanText the fields; make sure to prefer the
cleaned label if present, otherwise clean the value, and return "" for
non-string/non-present fields—update the logic in toOptionValue to use these
runtime checks rather than a direct 'as' cast.
- Around line 7-8: Replace the magic-number "snippet cap" values found in this
file (the numeric literals at the block noted around lines 154-156) with a
descriptive named constant (similar to the existing MAX_WATTAGE) — e.g., declare
a top-level constant like SNIPPET_CHAR_LIMIT or SNIPPET_CAP and use that
constant wherever the snippet length/cap number is used instead of the
hard-coded literal; update any references in the function(s) that construct the
enriched JSON so they read from the new constant.
| const fetchShopifyProductJsonPayload = async ( | ||
| url: string | ||
| ): Promise<unknown | null> => { | ||
| let parsed: URL; | ||
| try { | ||
| parsed = new URL(url); | ||
| } catch { | ||
| return null; | ||
| } | ||
|
|
||
| const basePath = parsed.pathname.endsWith("/") | ||
| ? parsed.pathname.slice(0, -1) | ||
| : parsed.pathname; | ||
| if (!basePath) { | ||
| return null; | ||
| } | ||
|
|
||
| const productJsonUrl = new URL(parsed.origin); | ||
| productJsonUrl.pathname = `${basePath}.js`; | ||
|
|
||
| const response = await fetch(productJsonUrl.toString(), { | ||
| headers: { | ||
| accept: "application/json", | ||
| }, | ||
| }); | ||
|
|
||
| if (response.status === 404) { | ||
| return null; | ||
| } | ||
| if (!response.ok) { | ||
| throw new Error( | ||
| `Failed to fetch Shopify product JSON (${response.status}) for ${productJsonUrl.toString()}` | ||
| ); | ||
| } | ||
|
|
||
| return await response.json(); | ||
| }; | ||
|
|
||
| const canUseAiGatewayEnrichment = (): boolean => { | ||
| return Boolean(process.env.AI_GATEWAY_API_KEY); | ||
| }; | ||
|
|
||
| const enrichShopifyCablesFromProductJson = async ( | ||
| parsedCables: readonly ParsedCable[], | ||
| productUrl: string, | ||
| canonicalUrl: string, | ||
| workflowRunId: Id<"ingestionWorkflows">, | ||
| workflowItemId: Id<"ingestionWorkflowItems">, | ||
| getProviderConfig: () => ProviderConfig | ||
| ): Promise<ParsedCable[]> => { | ||
| if ( | ||
| !parsedCables.some((parsed) => shouldAttemptShopifyJsonEnrichment(parsed)) | ||
| ) { | ||
| return [...parsedCables]; | ||
| } | ||
|
|
||
| const payload = await fetchShopifyProductJsonPayload(productUrl); | ||
| if (!payload) { | ||
| return [...parsedCables]; | ||
| } | ||
|
|
||
| const firstCandidate = | ||
| parsedCables.find((parsed) => shouldAttemptShopifyJsonEnrichment(parsed)) ?? | ||
| parsedCables[0]; | ||
| if (!firstCandidate) { | ||
| return [...parsedCables]; | ||
| } | ||
|
|
||
| const input = buildShopifyJsonEnrichmentInput(payload, { | ||
| sku: firstCandidate.sku, | ||
| variant: firstCandidate.variant, | ||
| }); | ||
| if (!input) { | ||
| return [...parsedCables]; | ||
| } | ||
|
|
||
| const deterministicSignals = deriveDeterministicPowerSignals(input); | ||
| const inputForPrompt = formatShopifyJsonInputForPrompt(input); | ||
|
|
||
| let llmEnrichment: ReturnType< | ||
| typeof shopifyJsonLlmEnrichmentSchema.parse | ||
| > | null = null; | ||
| const needsLlm = | ||
| canUseAiGatewayEnrichment() && | ||
| parsedCables.some((parsed) => { | ||
| const withSignals = applyShopifyJsonPowerSignals( | ||
| parsed, | ||
| canonicalUrl, | ||
| deterministicSignals | ||
| ); | ||
| return shouldAttemptShopifyJsonEnrichment(withSignals); | ||
| }); | ||
|
|
||
| if (needsLlm) { | ||
| let providerConfig: ProviderConfig | null = null; | ||
| try { | ||
| providerConfig = getProviderConfig(); | ||
| } catch { | ||
| providerConfig = null; | ||
| } | ||
| if (providerConfig) { | ||
| const { object } = await generateObject({ | ||
| model: gateway(providerConfig.model), | ||
| schema: shopifyJsonLlmEnrichmentSchema, | ||
| system: SHOPIFY_JSON_ENRICHMENT_SYSTEM_PROMPT, | ||
| prompt: buildShopifyJsonEnrichmentPrompt(canonicalUrl, inputForPrompt), | ||
| temperature: 0, | ||
| experimental_telemetry: { | ||
| isEnabled: providerConfig.aiTelemetryEnabled, | ||
| functionId: "convex.ingest.enrichShopifyFromProductJson", | ||
| metadata: { | ||
| canonicalUrl, | ||
| workflowRunId, | ||
| workflowItemId, | ||
| }, | ||
| recordInputs: providerConfig.aiTelemetryRecordInputs, | ||
| recordOutputs: providerConfig.aiTelemetryRecordOutputs, | ||
| }, | ||
| }); | ||
| llmEnrichment = object; | ||
| } |
There was a problem hiding this comment.
Optional enrichment failures shouldn’t abort ingestion.
fetchShopifyProductJsonPayload or generateObject errors currently bubble and can fail a Shopify-only ingest even though enrichment is optional. Prefer swallowing those failures and continuing with deterministic signals.
🛠️ Suggested fix
- const payload = await fetchShopifyProductJsonPayload(productUrl);
+ let payload: unknown | null = null;
+ try {
+ payload = await fetchShopifyProductJsonPayload(productUrl);
+ } catch {
+ return [...parsedCables];
+ }
if (!payload) {
return [...parsedCables];
}
@@
- if (providerConfig) {
- const { object } = await generateObject({
- model: gateway(providerConfig.model),
- schema: shopifyJsonLlmEnrichmentSchema,
- system: SHOPIFY_JSON_ENRICHMENT_SYSTEM_PROMPT,
- prompt: buildShopifyJsonEnrichmentPrompt(canonicalUrl, inputForPrompt),
- temperature: 0,
- experimental_telemetry: {
- isEnabled: providerConfig.aiTelemetryEnabled,
- functionId: "convex.ingest.enrichShopifyFromProductJson",
- metadata: {
- canonicalUrl,
- workflowRunId,
- workflowItemId,
- },
- recordInputs: providerConfig.aiTelemetryRecordInputs,
- recordOutputs: providerConfig.aiTelemetryRecordOutputs,
- },
- });
- llmEnrichment = object;
- }
+ if (providerConfig) {
+ try {
+ const { object } = await generateObject({
+ model: gateway(providerConfig.model),
+ schema: shopifyJsonLlmEnrichmentSchema,
+ system: SHOPIFY_JSON_ENRICHMENT_SYSTEM_PROMPT,
+ prompt: buildShopifyJsonEnrichmentPrompt(canonicalUrl, inputForPrompt),
+ temperature: 0,
+ experimental_telemetry: {
+ isEnabled: providerConfig.aiTelemetryEnabled,
+ functionId: "convex.ingest.enrichShopifyFromProductJson",
+ metadata: {
+ canonicalUrl,
+ workflowRunId,
+ workflowItemId,
+ },
+ recordInputs: providerConfig.aiTelemetryRecordInputs,
+ recordOutputs: providerConfig.aiTelemetryRecordOutputs,
+ },
+ });
+ llmEnrichment = object;
+ } catch {
+ llmEnrichment = null;
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/backend/convex/ingest.ts` around lines 328 - 448, The enrichment
step can throw and abort optional Shopify-only ingestion; wrap the external
calls in try/catch so failures are swallowed and ingestion continues using
deterministic signals. Specifically, in enrichShopifyCablesFromProductJson call
fetchShopifyProductJsonPayload inside a try/catch and treat any thrown error as
a null payload (proceed without LLM enrichment), and wrap the generateObject
invocation (inside the providerConfig block) in try/catch and set llmEnrichment
= null on error (optionally log the error) rather than letting it bubble; leave
function signatures like fetchShopifyProductJsonPayload,
enrichShopifyCablesFromProductJson, getProviderConfig, canUseAiGatewayEnrichment
and the shopifyJsonLlmEnrichmentSchema usage unchanged.
| const addEvidenceIfMissing = ( | ||
| evidence: ExtractionOutput["evidence"], | ||
| fieldPath: string, | ||
| sourceUrl: string, | ||
| snippet?: string | ||
| ): void => { | ||
| const hasFieldPath = evidence.some((item) => item.fieldPath === fieldPath); | ||
| if (hasFieldPath) { | ||
| return; | ||
| } | ||
|
|
||
| evidence.push({ | ||
| fieldPath, | ||
| snippet: snippet ? safeSnippet(snippet) : undefined, | ||
| sourceUrl, | ||
| }); | ||
| }; | ||
|
|
||
| const hasExplicitWattsToken = (value: number, haystack: string): boolean => { | ||
| const pattern = new RegExp(`\\b${value}(?:\\.0+)?\\s*W\\b`, "i"); | ||
| return pattern.test(haystack); | ||
| }; | ||
|
|
||
| const llmSnippetForField = ( | ||
| enrichment: ShopifyJsonLlmEnrichment, | ||
| fieldPath: z.infer<typeof llmEvidenceFieldPathSchema> | ||
| ): string | undefined => { | ||
| return enrichment.evidence.find((item) => item.fieldPath === fieldPath) | ||
| ?.snippet; | ||
| }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for POWER_REGEX in the codebase
rg "POWER_REGEX" -t ts -t tsx -t js -t jsxRepository: anand-testcompare/cable-intel
Length of output: 100
🏁 Script executed:
# Examine the shopifyJsonEnrichment.ts file to understand the context and find POWER_REGEX
head -n 400 packages/backend/convex/shopifyJsonEnrichment.ts | tail -n 100Repository: anand-testcompare/cable-intel
Length of output: 2577
🏁 Script executed:
# Check if there are any imports related to POWER_REGEX or regex patterns
head -n 100 packages/backend/convex/shopifyJsonEnrichment.tsRepository: anand-testcompare/cable-intel
Length of output: 2840
Avoid dynamic regex construction with numeric values—use the shared POWER_REGEX for consistency and to prevent unescaped special characters.
The current pattern interpolates the numeric value directly, which will fail if the value contains a decimal point (e.g., 100.5 produces \b100.5... where the dot matches any character, not a literal dot). Instead, use POWER_REGEX with matchAll and numeric comparison, following the same pattern as getMaxWattsFromSegments.
Suggested fix
const hasExplicitWattsToken = (value: number, haystack: string): boolean => {
- const pattern = new RegExp(`\\b${value}(?:\\.0+)?\\s*W\\b`, "i");
- return pattern.test(haystack);
+ if (!Number.isFinite(value)) {
+ return false;
+ }
+ for (const match of haystack.matchAll(POWER_REGEX)) {
+ if (Number(match[1]) === value) {
+ return true;
+ }
+ }
+ return false;
};🧰 Tools
🪛 ast-grep (0.40.5)
[warning] 363-363: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(\\b${value}(?:\\.0+)?\\s*W\\b, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/backend/convex/shopifyJsonEnrichment.ts` around lines 345 - 374,
hasExplicitWattsToken builds a regex by interpolating the numeric value which
breaks on decimals and differs from the shared POWER_REGEX approach; replace the
dynamic RegExp with using the existing POWER_REGEX (the same one used by
getMaxWattsFromSegments) to iterate matches via matchAll on haystack, parse the
numeric portion of each match to a number, and return true if any parsed number
equals the provided value (taking decimals into account); update the function
named hasExplicitWattsToken to use POWER_REGEX + matchAll and numeric comparison
instead of new RegExp interpolation.
Summary
.jsattributesopenai/gpt-oss-120bonly as a guarded assist for ambiguous fields, with strict no-hallucination checks for wattagesatechi-usb4-c-to-c-cableto ensure 100W is recovered from Shopify JSON attributesValidation
bun test packages/backend/convex/shopifyJsonEnrichment.test.tsbun test packages/backend/convex/shopify.ingest.integration.test.tsbun test packages/backend/convex/ingest.quality.test.tsbun run check-typesbun x ultracite check packages/backend/convex/ingest.ts packages/backend/convex/shopifyJsonEnrichment.ts packages/backend/convex/shopifyJsonEnrichment.test.ts packages/backend/convex/shopify.ingest.integration.test.tsSummary by CodeRabbit
New Features
Reliability
Tests