feat: hybrid Shopify JSON enrichment (deterministic + guarded oss-120b) by anand-testcompare · Pull Request #58 · shpitdev/cable-intel

anand-testcompare · 2026-02-19T21:02:33Z

Summary

add a Shopify JSON enrichment stage for Shopify URLs before persistence
deterministically extract power signals (watts/PD/EPR) from stable Shopify .js attributes
run openai/gpt-oss-120b only as a guarded assist for ambiguous fields, with strict no-hallucination checks for wattage
keep deterministic fields authoritative and immutable
add focused unit tests for enrichment helpers
add integration coverage for satechi-usb4-c-to-c-cable to ensure 100W is recovered from Shopify JSON attributes

Validation

bun test packages/backend/convex/shopifyJsonEnrichment.test.ts
bun test packages/backend/convex/shopify.ingest.integration.test.ts
bun test packages/backend/convex/ingest.quality.test.ts
bun run check-types
bun x ultracite check packages/backend/convex/ingest.ts packages/backend/convex/shopifyJsonEnrichment.ts packages/backend/convex/shopifyJsonEnrichment.test.ts packages/backend/convex/shopify.ingest.integration.test.ts

Summary by CodeRabbit

New Features
- Shopify JSON enrichment: automatically extract power specs, max wattage, PD/EPR support, and connector signals from product pages.
- Optional AI-powered enrichment to improve product details when available.
Reliability
- Safer product fetches with request timeouts to reduce hangs and improve ingestion stability.
Tests
- Added tests validating Shopify JSON enrichment and ingestion for representative products.

vercel · 2026-02-19T21:02:38Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
cable-intel-web	Ready	Preview, Comment	Feb 19, 2026 9:10pm

coderabbitai · 2026-02-19T21:02:52Z

Caution

Review failed

The pull request is closed.

Walkthrough

Introduces a Shopify product-JSON enrichment flow that derives deterministic power signals, optionally requests LLM enrichment via an AI gateway, and integrates those enriched signals into the existing ingest/processWorkflowItem pipeline prior to standard extraction and persistence.

Changes

Cohort / File(s)	Summary
Core Shopify JSON Enrichment `packages/backend/convex/shopifyJsonEnrichment.ts`, `packages/backend/convex/shopifyJsonEnrichment.test.ts`	New module: zod schemas, input builders, deterministic power-signal extraction, prompt formatting, LLM-enrichment application, evidence tracking, gating logic, and unit tests for signals and enrichment behaviors.
Ingest Pipeline Integration `packages/backend/convex/ingest.ts`	Added Shopify JSON fetch (`fetchShopifyProductJsonPayload`), gating (`canUseAiGatewayEnrichment`), enrichment orchestration (`enrichShopifyCablesFromProductJson`), system prompt constant, and conditional enrichment step inside `processWorkflowItem`.
Integration Tests `packages/backend/convex/shopify.ingest.integration.test.ts`	New integration test for Satechi USB4 product ingestion from Shopify JSON validating rows, connectors, maxWatts, qualityState, and evidence references.
Shopify Source Fetch Timeout `packages/shopify-cable-source/src/source.ts`	Added `FETCH_TIMEOUT_MS` and `fetchWithTimeout`; replaced direct fetches with timeout-aware fetch calls for build ID, search suggest, Next data, and product JSON requests.

Sequence Diagram

sequenceDiagram
    participant Ingest as Ingest Pipeline
    participant Shopify as Shopify Product API
    participant Enrich as Shopify JSON Enrichment
    participant AIGW as AI Gateway
    participant Store as Cable Data Store

    Ingest->>Shopify: fetchShopifyProductJsonPayload(product URL)
    Shopify-->>Ingest: product JSON (or error)
    Ingest->>Enrich: buildShopifyJsonEnrichmentInput(JSON)
    Enrich->>Enrich: deriveDeterministicPowerSignals()
    Enrich->>Enrich: applyShopifyJsonPowerSignals()
    alt AI gateway enabled & provider configured
        Enrich->>AIGW: generateObject(enrichment prompt)
        AIGW-->>Enrich: LLM enrichment result
        Enrich->>Enrich: applyShopifyJsonLlmEnrichment()
    end
    Enrich-->>Ingest: enriched cables
    Ingest->>Store: persist extraction & evidence

Possibly related PRs

Gate catalog visibility by quality + add enrichment queue #34 — modifies ingestion/enrichment flow in packages/backend/convex/ingest.ts; overlaps where this PR integrates Shopify JSON enrichment into the same processing path.
Add Shopify template source package and Anker integration #1 — earlier changes to processWorkflowItem and enrichment orchestration; closely related to the integration points added here.

Poem

🐰 I nibble JSON leaves by moonlit beam,
Watts and connectors sewn into a dream,
AI whispers numbers, traces each clue,
Cables bloom brighter — hooray, and chew-chew!

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: introducing hybrid Shopify JSON enrichment combining deterministic signal extraction with guarded LLM assistance (oss-120b).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/shopify-json-hybrid-enrichment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

packages/backend/convex/shopifyJsonEnrichment.ts (2)

158-171: Prefer a type guard over as in toOptionValue.

Keeps the unknown-object path aligned with the type-narrowing guideline.

♻️ Suggested refactor

+const isRecord = (value: unknown): value is Record<string, unknown> =>
+  typeof value === "object" && value !== null;
+
 const toOptionValue = (value: unknown): string => {
   if (typeof value === "string") {
     return cleanText(value);
   }
-  if (value && typeof value === "object") {
-    const candidate = value as { label?: unknown; value?: unknown };
-    const label = cleanText(candidate.label);
-    if (label) {
-      return label;
-    }
-    return cleanText(candidate.value);
-  }
+  if (isRecord(value)) {
+    const label = cleanText(value.label);
+    if (label) {
+      return label;
+    }
+    return cleanText(value.value);
+  }
   return "";
 };

As per coding guidelines: Leverage TypeScript's type narrowing instead of type assertions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/backend/convex/shopifyJsonEnrichment.ts` around lines 158 - 171, In
toOptionValue, avoid the assertion "value as { label?: unknown; value?: unknown
}" and instead narrow the unknown object with a type guard: check that value is
an object (not null), then use 'in' checks and typeof checks on label and value
(e.g., 'label' in value && typeof (value as any).label === "string") to safely
read and cleanText the fields; make sure to prefer the cleaned label if present,
otherwise clean the value, and return "" for non-string/non-present
fields—update the logic in toOptionValue to use these runtime checks rather than
a direct 'as' cast.

7-8: Extract the snippet cap into a named constant.

Avoiding the magic number makes the intent clearer and easier to reuse.

♻️ Suggested refactor

-const MAX_WATTAGE = 500;
+const MAX_WATTAGE = 500;
+const MAX_SNIPPET_LENGTH = 240;
@@
-  return cleanText(value).slice(0, 240);
+  return cleanText(value).slice(0, MAX_SNIPPET_LENGTH);

As per coding guidelines: Use meaningful variable names instead of magic numbers - extract constants with descriptive names.

Also applies to: 154-156

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/backend/convex/shopifyJsonEnrichment.ts` around lines 7 - 8, Replace
the magic-number "snippet cap" values found in this file (the numeric literals
at the block noted around lines 154-156) with a descriptive named constant
(similar to the existing MAX_WATTAGE) — e.g., declare a top-level constant like
SNIPPET_CHAR_LIMIT or SNIPPET_CAP and use that constant wherever the snippet
length/cap number is used instead of the hard-coded literal; update any
references in the function(s) that construct the enriched JSON so they read from
the new constant.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/backend/convex/ingest.ts`:
- Around line 328-448: The enrichment step can throw and abort optional
Shopify-only ingestion; wrap the external calls in try/catch so failures are
swallowed and ingestion continues using deterministic signals. Specifically, in
enrichShopifyCablesFromProductJson call fetchShopifyProductJsonPayload inside a
try/catch and treat any thrown error as a null payload (proceed without LLM
enrichment), and wrap the generateObject invocation (inside the providerConfig
block) in try/catch and set llmEnrichment = null on error (optionally log the
error) rather than letting it bubble; leave function signatures like
fetchShopifyProductJsonPayload, enrichShopifyCablesFromProductJson,
getProviderConfig, canUseAiGatewayEnrichment and the
shopifyJsonLlmEnrichmentSchema usage unchanged.

In `@packages/backend/convex/shopifyJsonEnrichment.ts`:
- Around line 345-374: hasExplicitWattsToken builds a regex by interpolating the
numeric value which breaks on decimals and differs from the shared POWER_REGEX
approach; replace the dynamic RegExp with using the existing POWER_REGEX (the
same one used by getMaxWattsFromSegments) to iterate matches via matchAll on
haystack, parse the numeric portion of each match to a number, and return true
if any parsed number equals the provided value (taking decimals into account);
update the function named hasExplicitWattsToken to use POWER_REGEX + matchAll
and numeric comparison instead of new RegExp interpolation.

---

Nitpick comments:
In `@packages/backend/convex/shopifyJsonEnrichment.ts`:
- Around line 158-171: In toOptionValue, avoid the assertion "value as { label?:
unknown; value?: unknown }" and instead narrow the unknown object with a type
guard: check that value is an object (not null), then use 'in' checks and typeof
checks on label and value (e.g., 'label' in value && typeof (value as any).label
=== "string") to safely read and cleanText the fields; make sure to prefer the
cleaned label if present, otherwise clean the value, and return "" for
non-string/non-present fields—update the logic in toOptionValue to use these
runtime checks rather than a direct 'as' cast.
- Around line 7-8: Replace the magic-number "snippet cap" values found in this
file (the numeric literals at the block noted around lines 154-156) with a
descriptive named constant (similar to the existing MAX_WATTAGE) — e.g., declare
a top-level constant like SNIPPET_CHAR_LIMIT or SNIPPET_CAP and use that
constant wherever the snippet length/cap number is used instead of the
hard-coded literal; update any references in the function(s) that construct the
enriched JSON so they read from the new constant.

coderabbitai · 2026-02-19T21:14:56Z

packages/backend/convex/ingest.ts

+const fetchShopifyProductJsonPayload = async (
+  url: string
+): Promise<unknown | null> => {
+  let parsed: URL;
+  try {
+    parsed = new URL(url);
+  } catch {
+    return null;
+  }
+
+  const basePath = parsed.pathname.endsWith("/")
+    ? parsed.pathname.slice(0, -1)
+    : parsed.pathname;
+  if (!basePath) {
+    return null;
+  }
+
+  const productJsonUrl = new URL(parsed.origin);
+  productJsonUrl.pathname = `${basePath}.js`;
+
+  const response = await fetch(productJsonUrl.toString(), {
+    headers: {
+      accept: "application/json",
+    },
+  });
+
+  if (response.status === 404) {
+    return null;
+  }
+  if (!response.ok) {
+    throw new Error(
+      `Failed to fetch Shopify product JSON (${response.status}) for ${productJsonUrl.toString()}`
+    );
+  }
+
+  return await response.json();
+};
+
+const canUseAiGatewayEnrichment = (): boolean => {
+  return Boolean(process.env.AI_GATEWAY_API_KEY);
+};
+
+const enrichShopifyCablesFromProductJson = async (
+  parsedCables: readonly ParsedCable[],
+  productUrl: string,
+  canonicalUrl: string,
+  workflowRunId: Id<"ingestionWorkflows">,
+  workflowItemId: Id<"ingestionWorkflowItems">,
+  getProviderConfig: () => ProviderConfig
+): Promise<ParsedCable[]> => {
+  if (
+    !parsedCables.some((parsed) => shouldAttemptShopifyJsonEnrichment(parsed))
+  ) {
+    return [...parsedCables];
+  }
+
+  const payload = await fetchShopifyProductJsonPayload(productUrl);
+  if (!payload) {
+    return [...parsedCables];
+  }
+
+  const firstCandidate =
+    parsedCables.find((parsed) => shouldAttemptShopifyJsonEnrichment(parsed)) ??
+    parsedCables[0];
+  if (!firstCandidate) {
+    return [...parsedCables];
+  }
+
+  const input = buildShopifyJsonEnrichmentInput(payload, {
+    sku: firstCandidate.sku,
+    variant: firstCandidate.variant,
+  });
+  if (!input) {
+    return [...parsedCables];
+  }
+
+  const deterministicSignals = deriveDeterministicPowerSignals(input);
+  const inputForPrompt = formatShopifyJsonInputForPrompt(input);
+
+  let llmEnrichment: ReturnType<
+    typeof shopifyJsonLlmEnrichmentSchema.parse
+  > | null = null;
+  const needsLlm =
+    canUseAiGatewayEnrichment() &&
+    parsedCables.some((parsed) => {
+      const withSignals = applyShopifyJsonPowerSignals(
+        parsed,
+        canonicalUrl,
+        deterministicSignals
+      );
+      return shouldAttemptShopifyJsonEnrichment(withSignals);
+    });
+
+  if (needsLlm) {
+    let providerConfig: ProviderConfig | null = null;
+    try {
+      providerConfig = getProviderConfig();
+    } catch {
+      providerConfig = null;
+    }
+    if (providerConfig) {
+      const { object } = await generateObject({
+        model: gateway(providerConfig.model),
+        schema: shopifyJsonLlmEnrichmentSchema,
+        system: SHOPIFY_JSON_ENRICHMENT_SYSTEM_PROMPT,
+        prompt: buildShopifyJsonEnrichmentPrompt(canonicalUrl, inputForPrompt),
+        temperature: 0,
+        experimental_telemetry: {
+          isEnabled: providerConfig.aiTelemetryEnabled,
+          functionId: "convex.ingest.enrichShopifyFromProductJson",
+          metadata: {
+            canonicalUrl,
+            workflowRunId,
+            workflowItemId,
+          },
+          recordInputs: providerConfig.aiTelemetryRecordInputs,
+          recordOutputs: providerConfig.aiTelemetryRecordOutputs,
+        },
+      });
+      llmEnrichment = object;
+    }


⚠️ Potential issue | 🟠 Major

Optional enrichment failures shouldn’t abort ingestion.

fetchShopifyProductJsonPayload or generateObject errors currently bubble and can fail a Shopify-only ingest even though enrichment is optional. Prefer swallowing those failures and continuing with deterministic signals.

🛠️ Suggested fix

- const payload = await fetchShopifyProductJsonPayload(productUrl); + let payload: unknown | null = null; + try { + payload = await fetchShopifyProductJsonPayload(productUrl); + } catch { + return [...parsedCables]; + } if (!payload) { return [...parsedCables]; } @@ - if (providerConfig) { - const { object } = await generateObject({ - model: gateway(providerConfig.model), - schema: shopifyJsonLlmEnrichmentSchema, - system: SHOPIFY_JSON_ENRICHMENT_SYSTEM_PROMPT, - prompt: buildShopifyJsonEnrichmentPrompt(canonicalUrl, inputForPrompt), - temperature: 0, - experimental_telemetry: { - isEnabled: providerConfig.aiTelemetryEnabled, - functionId: "convex.ingest.enrichShopifyFromProductJson", - metadata: { - canonicalUrl, - workflowRunId, - workflowItemId, - }, - recordInputs: providerConfig.aiTelemetryRecordInputs, - recordOutputs: providerConfig.aiTelemetryRecordOutputs, - }, - }); - llmEnrichment = object; - } + if (providerConfig) { + try { + const { object } = await generateObject({ + model: gateway(providerConfig.model), + schema: shopifyJsonLlmEnrichmentSchema, + system: SHOPIFY_JSON_ENRICHMENT_SYSTEM_PROMPT, + prompt: buildShopifyJsonEnrichmentPrompt(canonicalUrl, inputForPrompt), + temperature: 0, + experimental_telemetry: { + isEnabled: providerConfig.aiTelemetryEnabled, + functionId: "convex.ingest.enrichShopifyFromProductJson", + metadata: { + canonicalUrl, + workflowRunId, + workflowItemId, + }, + recordInputs: providerConfig.aiTelemetryRecordInputs, + recordOutputs: providerConfig.aiTelemetryRecordOutputs, + }, + }); + llmEnrichment = object; + } catch { + llmEnrichment = null; + } + }

As per coding guidelines: Handle errors appropriately in async code with try-catch blocks.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/backend/convex/ingest.ts` around lines 328 - 448, The enrichment step can throw and abort optional Shopify-only ingestion; wrap the external calls in try/catch so failures are swallowed and ingestion continues using deterministic signals. Specifically, in enrichShopifyCablesFromProductJson call fetchShopifyProductJsonPayload inside a try/catch and treat any thrown error as a null payload (proceed without LLM enrichment), and wrap the generateObject invocation (inside the providerConfig block) in try/catch and set llmEnrichment = null on error (optionally log the error) rather than letting it bubble; leave function signatures like fetchShopifyProductJsonPayload, enrichShopifyCablesFromProductJson, getProviderConfig, canUseAiGatewayEnrichment and the shopifyJsonLlmEnrichmentSchema usage unchanged.

coderabbitai · 2026-02-19T21:14:56Z

packages/backend/convex/shopifyJsonEnrichment.ts

+const addEvidenceIfMissing = (
+  evidence: ExtractionOutput["evidence"],
+  fieldPath: string,
+  sourceUrl: string,
+  snippet?: string
+): void => {
+  const hasFieldPath = evidence.some((item) => item.fieldPath === fieldPath);
+  if (hasFieldPath) {
+    return;
+  }
+
+  evidence.push({
+    fieldPath,
+    snippet: snippet ? safeSnippet(snippet) : undefined,
+    sourceUrl,
+  });
+};
+
+const hasExplicitWattsToken = (value: number, haystack: string): boolean => {
+  const pattern = new RegExp(`\\b${value}(?:\\.0+)?\\s*W\\b`, "i");
+  return pattern.test(haystack);
+};
+
+const llmSnippetForField = (
+  enrichment: ShopifyJsonLlmEnrichment,
+  fieldPath: z.infer<typeof llmEvidenceFieldPathSchema>
+): string | undefined => {
+  return enrichment.evidence.find((item) => item.fieldPath === fieldPath)
+    ?.snippet;
+};


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for POWER_REGEX in the codebase rg "POWER_REGEX" -t ts -t tsx -t js -t jsx

Repository: anand-testcompare/cable-intel

Length of output: 100

🏁 Script executed:

# Examine the shopifyJsonEnrichment.ts file to understand the context and find POWER_REGEX head -n 400 packages/backend/convex/shopifyJsonEnrichment.ts | tail -n 100

Repository: anand-testcompare/cable-intel

Length of output: 2577

🏁 Script executed:

# Check if there are any imports related to POWER_REGEX or regex patterns head -n 100 packages/backend/convex/shopifyJsonEnrichment.ts

Repository: anand-testcompare/cable-intel

Length of output: 2840

Avoid dynamic regex construction with numeric values—use the shared POWER_REGEX for consistency and to prevent unescaped special characters.

The current pattern interpolates the numeric value directly, which will fail if the value contains a decimal point (e.g., 100.5 produces \b100.5... where the dot matches any character, not a literal dot). Instead, use POWER_REGEX with matchAll and numeric comparison, following the same pattern as getMaxWattsFromSegments.

Suggested fix

const hasExplicitWattsToken = (value: number, haystack: string): boolean => { - const pattern = new RegExp(`\\b${value}(?:\\.0+)?\\s*W\\b`, "i"); - return pattern.test(haystack); + if (!Number.isFinite(value)) { + return false; + } + for (const match of haystack.matchAll(POWER_REGEX)) { + if (Number(match[1]) === value) { + return true; + } + } + return false; };

🧰 Tools

🪛 ast-grep (0.40.5)

[warning] 363-363: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(\\b${value}(?:\\.0+)?\\s*W\\b, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/backend/convex/shopifyJsonEnrichment.ts` around lines 345 - 374, hasExplicitWattsToken builds a regex by interpolating the numeric value which breaks on decimals and differs from the shared POWER_REGEX approach; replace the dynamic RegExp with using the existing POWER_REGEX (the same one used by getMaxWattsFromSegments) to iterate matches via matchAll on haystack, parse the numeric portion of each match to a number, and return true if any parsed number equals the provided value (taking decimals into account); update the function named hasExplicitWattsToken to use POWER_REGEX + matchAll and numeric comparison instead of new RegExp interpolation.

feat(ingest): hybrid Shopify JSON enrichment with guarded LLM assist

d801b91

vercel bot deployed to Preview February 19, 2026 21:03 View deployment

test(shopify-source): add request timeout guard for flaky network calls

017003c

vercel bot deployed to Preview February 19, 2026 21:10 View deployment

anand-testcompare merged commit 25f0c8a into main Feb 19, 2026
4 checks passed

anand-testcompare deleted the feat/shopify-json-hybrid-enrichment branch February 19, 2026 21:11

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: hybrid Shopify JSON enrichment (deterministic + guarded oss-120b)#58

feat: hybrid Shopify JSON enrichment (deterministic + guarded oss-120b)#58
anand-testcompare merged 2 commits intomainfrom
feat/shopify-json-hybrid-enrichment

anand-testcompare commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 19, 2026 •

edited

Loading

Review failed

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 19, 2026

Uh oh!

coderabbitai bot Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anand-testcompare commented Feb 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Uh oh!

vercel bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram

Possibly related PRs

Poem

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anand-testcompare commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

vercel bot commented Feb 19, 2026 •

edited

Loading

coderabbitai bot commented Feb 19, 2026 •

edited

Loading