Skip to content

feat: hybrid Shopify JSON enrichment (deterministic + guarded oss-120b)#58

Merged
anand-testcompare merged 2 commits intomainfrom
feat/shopify-json-hybrid-enrichment
Feb 19, 2026
Merged

feat: hybrid Shopify JSON enrichment (deterministic + guarded oss-120b)#58
anand-testcompare merged 2 commits intomainfrom
feat/shopify-json-hybrid-enrichment

Conversation

@anand-testcompare
Copy link
Collaborator

@anand-testcompare anand-testcompare commented Feb 19, 2026

Summary

  • add a Shopify JSON enrichment stage for Shopify URLs before persistence
  • deterministically extract power signals (watts/PD/EPR) from stable Shopify .js attributes
  • run openai/gpt-oss-120b only as a guarded assist for ambiguous fields, with strict no-hallucination checks for wattage
  • keep deterministic fields authoritative and immutable
  • add focused unit tests for enrichment helpers
  • add integration coverage for satechi-usb4-c-to-c-cable to ensure 100W is recovered from Shopify JSON attributes

Validation

  • bun test packages/backend/convex/shopifyJsonEnrichment.test.ts
  • bun test packages/backend/convex/shopify.ingest.integration.test.ts
  • bun test packages/backend/convex/ingest.quality.test.ts
  • bun run check-types
  • bun x ultracite check packages/backend/convex/ingest.ts packages/backend/convex/shopifyJsonEnrichment.ts packages/backend/convex/shopifyJsonEnrichment.test.ts packages/backend/convex/shopify.ingest.integration.test.ts

Summary by CodeRabbit

  • New Features

    • Shopify JSON enrichment: automatically extract power specs, max wattage, PD/EPR support, and connector signals from product pages.
    • Optional AI-powered enrichment to improve product details when available.
  • Reliability

    • Safer product fetches with request timeouts to reduce hangs and improve ingestion stability.
  • Tests

    • Added tests validating Shopify JSON enrichment and ingestion for representative products.

@vercel
Copy link

vercel bot commented Feb 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
cable-intel-web Ready Ready Preview, Comment Feb 19, 2026 9:10pm

Request Review

@coderabbitai
Copy link

coderabbitai bot commented Feb 19, 2026

Caution

Review failed

The pull request is closed.

Walkthrough

Introduces a Shopify product-JSON enrichment flow that derives deterministic power signals, optionally requests LLM enrichment via an AI gateway, and integrates those enriched signals into the existing ingest/processWorkflowItem pipeline prior to standard extraction and persistence.

Changes

Cohort / File(s) Summary
Core Shopify JSON Enrichment
packages/backend/convex/shopifyJsonEnrichment.ts, packages/backend/convex/shopifyJsonEnrichment.test.ts
New module: zod schemas, input builders, deterministic power-signal extraction, prompt formatting, LLM-enrichment application, evidence tracking, gating logic, and unit tests for signals and enrichment behaviors.
Ingest Pipeline Integration
packages/backend/convex/ingest.ts
Added Shopify JSON fetch (fetchShopifyProductJsonPayload), gating (canUseAiGatewayEnrichment), enrichment orchestration (enrichShopifyCablesFromProductJson), system prompt constant, and conditional enrichment step inside processWorkflowItem.
Integration Tests
packages/backend/convex/shopify.ingest.integration.test.ts
New integration test for Satechi USB4 product ingestion from Shopify JSON validating rows, connectors, maxWatts, qualityState, and evidence references.
Shopify Source Fetch Timeout
packages/shopify-cable-source/src/source.ts
Added FETCH_TIMEOUT_MS and fetchWithTimeout; replaced direct fetches with timeout-aware fetch calls for build ID, search suggest, Next data, and product JSON requests.

Sequence Diagram

sequenceDiagram
    participant Ingest as Ingest Pipeline
    participant Shopify as Shopify Product API
    participant Enrich as Shopify JSON Enrichment
    participant AIGW as AI Gateway
    participant Store as Cable Data Store

    Ingest->>Shopify: fetchShopifyProductJsonPayload(product URL)
    Shopify-->>Ingest: product JSON (or error)
    Ingest->>Enrich: buildShopifyJsonEnrichmentInput(JSON)
    Enrich->>Enrich: deriveDeterministicPowerSignals()
    Enrich->>Enrich: applyShopifyJsonPowerSignals()
    alt AI gateway enabled & provider configured
        Enrich->>AIGW: generateObject(enrichment prompt)
        AIGW-->>Enrich: LLM enrichment result
        Enrich->>Enrich: applyShopifyJsonLlmEnrichment()
    end
    Enrich-->>Ingest: enriched cables
    Ingest->>Store: persist extraction & evidence
Loading

Possibly related PRs

Poem

🐰 I nibble JSON leaves by moonlit beam,
Watts and connectors sewn into a dream,
AI whispers numbers, traces each clue,
Cables bloom brighter — hooray, and chew-chew!

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: introducing hybrid Shopify JSON enrichment combining deterministic signal extraction with guarded LLM assistance (oss-120b).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/shopify-json-hybrid-enrichment

Comment @coderabbitai help to get the list of available commands and usage tips.

@anand-testcompare anand-testcompare merged commit 25f0c8a into main Feb 19, 2026
4 checks passed
@anand-testcompare anand-testcompare deleted the feat/shopify-json-hybrid-enrichment branch February 19, 2026 21:11
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
packages/backend/convex/shopifyJsonEnrichment.ts (2)

158-171: Prefer a type guard over as in toOptionValue.

Keeps the unknown-object path aligned with the type-narrowing guideline.

♻️ Suggested refactor
+const isRecord = (value: unknown): value is Record<string, unknown> =>
+  typeof value === "object" && value !== null;
+
 const toOptionValue = (value: unknown): string => {
   if (typeof value === "string") {
     return cleanText(value);
   }
-  if (value && typeof value === "object") {
-    const candidate = value as { label?: unknown; value?: unknown };
-    const label = cleanText(candidate.label);
-    if (label) {
-      return label;
-    }
-    return cleanText(candidate.value);
-  }
+  if (isRecord(value)) {
+    const label = cleanText(value.label);
+    if (label) {
+      return label;
+    }
+    return cleanText(value.value);
+  }
   return "";
 };
As per coding guidelines: Leverage TypeScript's type narrowing instead of type assertions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/backend/convex/shopifyJsonEnrichment.ts` around lines 158 - 171, In
toOptionValue, avoid the assertion "value as { label?: unknown; value?: unknown
}" and instead narrow the unknown object with a type guard: check that value is
an object (not null), then use 'in' checks and typeof checks on label and value
(e.g., 'label' in value && typeof (value as any).label === "string") to safely
read and cleanText the fields; make sure to prefer the cleaned label if present,
otherwise clean the value, and return "" for non-string/non-present
fields—update the logic in toOptionValue to use these runtime checks rather than
a direct 'as' cast.

7-8: Extract the snippet cap into a named constant.

Avoiding the magic number makes the intent clearer and easier to reuse.

♻️ Suggested refactor
-const MAX_WATTAGE = 500;
+const MAX_WATTAGE = 500;
+const MAX_SNIPPET_LENGTH = 240;
@@
-  return cleanText(value).slice(0, 240);
+  return cleanText(value).slice(0, MAX_SNIPPET_LENGTH);
As per coding guidelines: Use meaningful variable names instead of magic numbers - extract constants with descriptive names.

Also applies to: 154-156

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/backend/convex/shopifyJsonEnrichment.ts` around lines 7 - 8, Replace
the magic-number "snippet cap" values found in this file (the numeric literals
at the block noted around lines 154-156) with a descriptive named constant
(similar to the existing MAX_WATTAGE) — e.g., declare a top-level constant like
SNIPPET_CHAR_LIMIT or SNIPPET_CAP and use that constant wherever the snippet
length/cap number is used instead of the hard-coded literal; update any
references in the function(s) that construct the enriched JSON so they read from
the new constant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/backend/convex/ingest.ts`:
- Around line 328-448: The enrichment step can throw and abort optional
Shopify-only ingestion; wrap the external calls in try/catch so failures are
swallowed and ingestion continues using deterministic signals. Specifically, in
enrichShopifyCablesFromProductJson call fetchShopifyProductJsonPayload inside a
try/catch and treat any thrown error as a null payload (proceed without LLM
enrichment), and wrap the generateObject invocation (inside the providerConfig
block) in try/catch and set llmEnrichment = null on error (optionally log the
error) rather than letting it bubble; leave function signatures like
fetchShopifyProductJsonPayload, enrichShopifyCablesFromProductJson,
getProviderConfig, canUseAiGatewayEnrichment and the
shopifyJsonLlmEnrichmentSchema usage unchanged.

In `@packages/backend/convex/shopifyJsonEnrichment.ts`:
- Around line 345-374: hasExplicitWattsToken builds a regex by interpolating the
numeric value which breaks on decimals and differs from the shared POWER_REGEX
approach; replace the dynamic RegExp with using the existing POWER_REGEX (the
same one used by getMaxWattsFromSegments) to iterate matches via matchAll on
haystack, parse the numeric portion of each match to a number, and return true
if any parsed number equals the provided value (taking decimals into account);
update the function named hasExplicitWattsToken to use POWER_REGEX + matchAll
and numeric comparison instead of new RegExp interpolation.

---

Nitpick comments:
In `@packages/backend/convex/shopifyJsonEnrichment.ts`:
- Around line 158-171: In toOptionValue, avoid the assertion "value as { label?:
unknown; value?: unknown }" and instead narrow the unknown object with a type
guard: check that value is an object (not null), then use 'in' checks and typeof
checks on label and value (e.g., 'label' in value && typeof (value as any).label
=== "string") to safely read and cleanText the fields; make sure to prefer the
cleaned label if present, otherwise clean the value, and return "" for
non-string/non-present fields—update the logic in toOptionValue to use these
runtime checks rather than a direct 'as' cast.
- Around line 7-8: Replace the magic-number "snippet cap" values found in this
file (the numeric literals at the block noted around lines 154-156) with a
descriptive named constant (similar to the existing MAX_WATTAGE) — e.g., declare
a top-level constant like SNIPPET_CHAR_LIMIT or SNIPPET_CAP and use that
constant wherever the snippet length/cap number is used instead of the
hard-coded literal; update any references in the function(s) that construct the
enriched JSON so they read from the new constant.

Comment on lines +328 to +448
const fetchShopifyProductJsonPayload = async (
url: string
): Promise<unknown | null> => {
let parsed: URL;
try {
parsed = new URL(url);
} catch {
return null;
}

const basePath = parsed.pathname.endsWith("/")
? parsed.pathname.slice(0, -1)
: parsed.pathname;
if (!basePath) {
return null;
}

const productJsonUrl = new URL(parsed.origin);
productJsonUrl.pathname = `${basePath}.js`;

const response = await fetch(productJsonUrl.toString(), {
headers: {
accept: "application/json",
},
});

if (response.status === 404) {
return null;
}
if (!response.ok) {
throw new Error(
`Failed to fetch Shopify product JSON (${response.status}) for ${productJsonUrl.toString()}`
);
}

return await response.json();
};

const canUseAiGatewayEnrichment = (): boolean => {
return Boolean(process.env.AI_GATEWAY_API_KEY);
};

const enrichShopifyCablesFromProductJson = async (
parsedCables: readonly ParsedCable[],
productUrl: string,
canonicalUrl: string,
workflowRunId: Id<"ingestionWorkflows">,
workflowItemId: Id<"ingestionWorkflowItems">,
getProviderConfig: () => ProviderConfig
): Promise<ParsedCable[]> => {
if (
!parsedCables.some((parsed) => shouldAttemptShopifyJsonEnrichment(parsed))
) {
return [...parsedCables];
}

const payload = await fetchShopifyProductJsonPayload(productUrl);
if (!payload) {
return [...parsedCables];
}

const firstCandidate =
parsedCables.find((parsed) => shouldAttemptShopifyJsonEnrichment(parsed)) ??
parsedCables[0];
if (!firstCandidate) {
return [...parsedCables];
}

const input = buildShopifyJsonEnrichmentInput(payload, {
sku: firstCandidate.sku,
variant: firstCandidate.variant,
});
if (!input) {
return [...parsedCables];
}

const deterministicSignals = deriveDeterministicPowerSignals(input);
const inputForPrompt = formatShopifyJsonInputForPrompt(input);

let llmEnrichment: ReturnType<
typeof shopifyJsonLlmEnrichmentSchema.parse
> | null = null;
const needsLlm =
canUseAiGatewayEnrichment() &&
parsedCables.some((parsed) => {
const withSignals = applyShopifyJsonPowerSignals(
parsed,
canonicalUrl,
deterministicSignals
);
return shouldAttemptShopifyJsonEnrichment(withSignals);
});

if (needsLlm) {
let providerConfig: ProviderConfig | null = null;
try {
providerConfig = getProviderConfig();
} catch {
providerConfig = null;
}
if (providerConfig) {
const { object } = await generateObject({
model: gateway(providerConfig.model),
schema: shopifyJsonLlmEnrichmentSchema,
system: SHOPIFY_JSON_ENRICHMENT_SYSTEM_PROMPT,
prompt: buildShopifyJsonEnrichmentPrompt(canonicalUrl, inputForPrompt),
temperature: 0,
experimental_telemetry: {
isEnabled: providerConfig.aiTelemetryEnabled,
functionId: "convex.ingest.enrichShopifyFromProductJson",
metadata: {
canonicalUrl,
workflowRunId,
workflowItemId,
},
recordInputs: providerConfig.aiTelemetryRecordInputs,
recordOutputs: providerConfig.aiTelemetryRecordOutputs,
},
});
llmEnrichment = object;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Optional enrichment failures shouldn’t abort ingestion.

fetchShopifyProductJsonPayload or generateObject errors currently bubble and can fail a Shopify-only ingest even though enrichment is optional. Prefer swallowing those failures and continuing with deterministic signals.

🛠️ Suggested fix
-  const payload = await fetchShopifyProductJsonPayload(productUrl);
+  let payload: unknown | null = null;
+  try {
+    payload = await fetchShopifyProductJsonPayload(productUrl);
+  } catch {
+    return [...parsedCables];
+  }
   if (!payload) {
     return [...parsedCables];
   }
@@
-    if (providerConfig) {
-      const { object } = await generateObject({
-        model: gateway(providerConfig.model),
-        schema: shopifyJsonLlmEnrichmentSchema,
-        system: SHOPIFY_JSON_ENRICHMENT_SYSTEM_PROMPT,
-        prompt: buildShopifyJsonEnrichmentPrompt(canonicalUrl, inputForPrompt),
-        temperature: 0,
-        experimental_telemetry: {
-          isEnabled: providerConfig.aiTelemetryEnabled,
-          functionId: "convex.ingest.enrichShopifyFromProductJson",
-          metadata: {
-            canonicalUrl,
-            workflowRunId,
-            workflowItemId,
-          },
-          recordInputs: providerConfig.aiTelemetryRecordInputs,
-          recordOutputs: providerConfig.aiTelemetryRecordOutputs,
-        },
-      });
-      llmEnrichment = object;
-    }
+    if (providerConfig) {
+      try {
+        const { object } = await generateObject({
+          model: gateway(providerConfig.model),
+          schema: shopifyJsonLlmEnrichmentSchema,
+          system: SHOPIFY_JSON_ENRICHMENT_SYSTEM_PROMPT,
+          prompt: buildShopifyJsonEnrichmentPrompt(canonicalUrl, inputForPrompt),
+          temperature: 0,
+          experimental_telemetry: {
+            isEnabled: providerConfig.aiTelemetryEnabled,
+            functionId: "convex.ingest.enrichShopifyFromProductJson",
+            metadata: {
+              canonicalUrl,
+              workflowRunId,
+              workflowItemId,
+            },
+            recordInputs: providerConfig.aiTelemetryRecordInputs,
+            recordOutputs: providerConfig.aiTelemetryRecordOutputs,
+          },
+        });
+        llmEnrichment = object;
+      } catch {
+        llmEnrichment = null;
+      }
+    }
As per coding guidelines: Handle errors appropriately in async code with try-catch blocks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/backend/convex/ingest.ts` around lines 328 - 448, The enrichment
step can throw and abort optional Shopify-only ingestion; wrap the external
calls in try/catch so failures are swallowed and ingestion continues using
deterministic signals. Specifically, in enrichShopifyCablesFromProductJson call
fetchShopifyProductJsonPayload inside a try/catch and treat any thrown error as
a null payload (proceed without LLM enrichment), and wrap the generateObject
invocation (inside the providerConfig block) in try/catch and set llmEnrichment
= null on error (optionally log the error) rather than letting it bubble; leave
function signatures like fetchShopifyProductJsonPayload,
enrichShopifyCablesFromProductJson, getProviderConfig, canUseAiGatewayEnrichment
and the shopifyJsonLlmEnrichmentSchema usage unchanged.

Comment on lines +345 to +374
const addEvidenceIfMissing = (
evidence: ExtractionOutput["evidence"],
fieldPath: string,
sourceUrl: string,
snippet?: string
): void => {
const hasFieldPath = evidence.some((item) => item.fieldPath === fieldPath);
if (hasFieldPath) {
return;
}

evidence.push({
fieldPath,
snippet: snippet ? safeSnippet(snippet) : undefined,
sourceUrl,
});
};

const hasExplicitWattsToken = (value: number, haystack: string): boolean => {
const pattern = new RegExp(`\\b${value}(?:\\.0+)?\\s*W\\b`, "i");
return pattern.test(haystack);
};

const llmSnippetForField = (
enrichment: ShopifyJsonLlmEnrichment,
fieldPath: z.infer<typeof llmEvidenceFieldPathSchema>
): string | undefined => {
return enrichment.evidence.find((item) => item.fieldPath === fieldPath)
?.snippet;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for POWER_REGEX in the codebase
rg "POWER_REGEX" -t ts -t tsx -t js -t jsx

Repository: anand-testcompare/cable-intel

Length of output: 100


🏁 Script executed:

# Examine the shopifyJsonEnrichment.ts file to understand the context and find POWER_REGEX
head -n 400 packages/backend/convex/shopifyJsonEnrichment.ts | tail -n 100

Repository: anand-testcompare/cable-intel

Length of output: 2577


🏁 Script executed:

# Check if there are any imports related to POWER_REGEX or regex patterns
head -n 100 packages/backend/convex/shopifyJsonEnrichment.ts

Repository: anand-testcompare/cable-intel

Length of output: 2840


Avoid dynamic regex construction with numeric values—use the shared POWER_REGEX for consistency and to prevent unescaped special characters.

The current pattern interpolates the numeric value directly, which will fail if the value contains a decimal point (e.g., 100.5 produces \b100.5... where the dot matches any character, not a literal dot). Instead, use POWER_REGEX with matchAll and numeric comparison, following the same pattern as getMaxWattsFromSegments.

Suggested fix
 const hasExplicitWattsToken = (value: number, haystack: string): boolean => {
-  const pattern = new RegExp(`\\b${value}(?:\\.0+)?\\s*W\\b`, "i");
-  return pattern.test(haystack);
+  if (!Number.isFinite(value)) {
+    return false;
+  }
+  for (const match of haystack.matchAll(POWER_REGEX)) {
+    if (Number(match[1]) === value) {
+      return true;
+    }
+  }
+  return false;
 };
🧰 Tools
🪛 ast-grep (0.40.5)

[warning] 363-363: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(\\b${value}(?:\\.0+)?\\s*W\\b, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/backend/convex/shopifyJsonEnrichment.ts` around lines 345 - 374,
hasExplicitWattsToken builds a regex by interpolating the numeric value which
breaks on decimals and differs from the shared POWER_REGEX approach; replace the
dynamic RegExp with using the existing POWER_REGEX (the same one used by
getMaxWattsFromSegments) to iterate matches via matchAll on haystack, parse the
numeric portion of each match to a number, and return true if any parsed number
equals the provided value (taking decimals into account); update the function
named hasExplicitWattsToken to use POWER_REGEX + matchAll and numeric comparison
instead of new RegExp interpolation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant