Skip to content

Comments

updated artifact parsing#2278

Merged
tim-inkeep merged 2 commits intomainfrom
bugfix/artifact-parsing
Feb 24, 2026
Merged

updated artifact parsing#2278
tim-inkeep merged 2 commits intomainfrom
bugfix/artifact-parsing

Conversation

@tim-inkeep
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Feb 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Feb 24, 2026 1:37am
agents-docs Ready Ready Preview, Comment Feb 24, 2026 1:37am
agents-manage-ui Ready Ready Preview, Comment Feb 24, 2026 1:37am

Request Review

@changeset-bot
Copy link

changeset-bot bot commented Feb 24, 2026

🦋 Changeset detected

Latest commit: f84226e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages
Name Type
@inkeep/agents-api Patch
@inkeep/agents-manage-ui Patch
@inkeep/agents-cli Patch
@inkeep/agents-core Patch
@inkeep/agents-mcp Patch
@inkeep/agents-sdk Patch
@inkeep/agents-work-apps Patch
@inkeep/ai-sdk-provider Patch
@inkeep/create-agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low


✅ Summary

This is a well-executed bugfix that addresses a regex parsing issue where > characters inside quoted attribute values (e.g., JMESPath expressions like docs[?title=='Platform > llms.txt']) were incorrectly interpreted as tag terminators.

What's Good:

  • The fix consistently applies the quote-aware pattern (?:"[^"]*"|'[^']*'|[^>])+? across all 5 regex instances
  • Comprehensive test coverage (271 lines) with excellent edge case coverage including:
    • Standard tags, double-quoted values with >, single-quoted values with >
    • Compound JMESPath expressions with &&
    • Real-world reproduction case
    • Streaming boundary scenarios
  • Changeset included with appropriate scope

Regex Pattern Explanation:
The new pattern (?:"[^"]*"|'[^']*'|[^>])+? works by:

  1. "[^"]*" — Match double-quoted strings (any chars except ")
  2. '[^']*' — Match single-quoted strings (any chars except ')
  3. [^>] — Match any char except > (for unquoted portions)

This allows > characters inside quoted values while still correctly detecting tag boundaries.


💭 Consider (2) 💭

The following are optional improvements for test coverage completeness. They're noted as inline comments:

💭 1) ArtifactParser.test.ts:270 artifact:ref parsing test
Tests cover artifact:create thoroughly but artifact:ref data retrieval via getArtifactSummary is only verified at detection level.

💭 2) ArtifactParser.test.ts:256-261 Exception handling test
The null return case is tested, but the thrown exception code path (lines 283-291 of ArtifactParser.ts) is untested.

Inline Comments:

  • 💭 Consider: ArtifactParser.test.ts:270 artifact:ref parsing test suggestion
  • 💭 Consider: ArtifactParser.test.ts:256-261 exception handling test suggestion

Discarded (3)
Location Issue Reason Discarded
parseObject Missing tests for parseObject method Pre-existing untested code — method wasn't modified in this PR
parseCreateAttributes Missing validation test for required attributes Valid but nitpick — the validation is straightforward and failures would surface via other test failures
nested quotes Missing test for complex nested quote patterns Edge case with low likelihood — current prompts enforce quote usage patterns
Reviewers (2)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 0 0 0 0 0 0 0
pr-review-tests 5 0 2 0 2 0 3
Total 5 0 2 0 2 0 3

✅ APPROVE

Summary: Clean bugfix with solid test coverage. The regex changes are correct and consistently applied. The two "Consider" items are optional test coverage enhancements — they don't block approval. Ship it! 🚀


Note: Unable to submit formal approval due to repository access permissions. This review recommends approval.

const parts = await parser.parseText(text);
expect(parts.filter((p) => p.kind === 'data')).toHaveLength(1);
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Consider: Missing artifact:ref parsing test

Issue: The tests thoroughly cover artifact:create tags but don't include a test for artifact:ref tag parsing in parseText. The hasArtifactMarkers test at line 99 only verifies detection, not the actual data retrieval path.

Why: Lines 340-342 of ArtifactParser.ts handle artifact:ref parsing via getArtifactSummary, and this code path could regress silently.

Fix: Consider adding:

it('parses artifact:ref tags via getArtifactSummary', async () => {
  mockArtifactService.getArtifactSummary.mockResolvedValue(mockArtifactData);
  const text = `<artifact:ref id='a1' tool='t1' />`;
  const parts = await parser.parseText(text);
  expect(parts).toHaveLength(1);
  expect(parts[0].kind).toBe('data');
  expect(mockArtifactService.getArtifactSummary).toHaveBeenCalledWith('a1', 't1', undefined);
});

Comment on lines +256 to +261
it('removes the artifact tag when the service returns null', async () => {
mockArtifactService.createArtifact.mockResolvedValue(null);
const text = `Before <artifact:create id='a1' tool='t1' type='citation' base='result' /> After`;
const parts = await parser.parseText(text);
const dataParts = parts.filter((p) => p.kind === 'data');
expect(dataParts).toHaveLength(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Consider: Missing error handling test for thrown exceptions

Issue: This test covers when createArtifact returns null, but doesn't test when it throws an exception. Lines 283-291 of ArtifactParser.ts have a catch block that removes failed annotations and logs errors.

Why: Different code path than null return — exception handling removes the tag from output, which should be verified.

Fix: Consider adding:

it('removes the artifact tag and continues when createArtifact throws', async () => {
  mockArtifactService.createArtifact.mockRejectedValue(new Error('Service error'));
  const text = `Before <artifact:create id='a1' tool='t1' type='citation' base='result' /> After`;
  const parts = await parser.parseText(text);
  const dataParts = parts.filter((p) => p.kind === 'data');
  expect(dataParts).toHaveLength(0);
});

@github-actions github-actions bot deleted a comment from claude bot Feb 24, 2026
Copy link
Collaborator

@amikofalvy amikofalvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good. The new test suite is much needed.

Do an audit to see if there are other character issues that need to be addressed, not just the > one that we discovered.

@tim-inkeep tim-inkeep merged commit 6a7dc67 into main Feb 24, 2026
17 checks passed
@tim-inkeep tim-inkeep deleted the bugfix/artifact-parsing branch February 24, 2026 02:30
@itoqa
Copy link

itoqa bot commented Feb 24, 2026

Ito Test Report ✅

22 test cases ran. 22 passed.

This test run verified the ArtifactParser regex update in PR #2278, which fixes parsing of <artifact:create> tags when quoted attribute values contain > characters (e.g., JMESPath expressions, document titles). All 22 test cases passed successfully, covering the core regex logic, edge cases, adversarial inputs, streaming behavior, and regression scenarios. The 29 unit tests in the ArtifactParser test suite all passed, confirming the quoted-value-aware regex patterns work correctly across all six locations in the parser. E2E tests via the Manage UI playground and API endpoints demonstrated that artifact tags with > in quoted attributes are correctly parsed and rendered without exposing raw XML to users.

✅ Passed (22)
Test Case Summary Timestamp Screenshot
ROUTE-1 Manage UI loaded with auto-login, playground opened, chat message sent and mock response streamed successfully with Completed status. Response rendered as text paragraphs without stalling or raw XML. 52:27 ROUTE-1_52-27.png
ROUTE-2 Sent chat message via playground triggering mock response containing two artifact:create tags with > in double-quoted attributes. The streaming parser correctly recognized both tags as complete artifacts - no raw XML text appeared in chat output. 31:48 ROUTE-2_31-48.png
ROUTE-3 POST to /run/v1/chat/completions returned HTTP 200 with proper SSE stream. Response contains data: prefixed JSON chunks with data-operation objects and text content streamed character-by-character. 35:04 ROUTE-3_35-04.png
LOGIC-1 Unit test suite passed 29/29 tests. Specific tests verify hasArtifactMarkers detects standard tags, double-quoted > in base, single-quoted > (JMESPath), nested quotes+JSON details, and returns false for plain text with > and incomplete tags. 36:12 LOGIC-1_36-12.png
LOGIC-2 Unit tests verify: returns true for incomplete tag ending mid-attributes, returns false for complete self-closing tag, returns false for complete tag with > in double-quoted base, returns true when chunk ends after > in quoted value but before closing />. 36:24 LOGIC-2_36-24.png
LOGIC-3 Unit tests verify: returns full length for text with no artifacts, full length for complete tag, index before incomplete tag, full length for complete tag with > in quoted value, and index before incomplete tag with > in quoted value. 36:25 LOGIC-3_36-25.png
LOGIC-4 Unit tests verify: parses artifact:create with > inside double-quoted base attribute, parses exact real-world tag that exposed the bug, passes correct baseSelector (including >) to createArtifact, preserves surrounding text. Regex consistency check confirms all 6 locations use quoted-value-aware pattern. 36:26 LOGIC-4_36-26.png
EDGE-1 Unit test 'detects artifact:create when base contains a JMESPath > comparator in a single-quoted value' passed. Single-quoted > handled same as double-quoted. 36:49 EDGE-1_36-49.png
EDGE-2 Unit tests passed: 'detects artifact:create with details as JSON and > in base', 'detects artifact:create with both > in base and double quotes inside details JSON', 'handles artifact:create with single-quoted JSON in details attribute'. Complex real-world tag with nested quotes fully parsed. 36:50 EDGE-2_36-50.png
EDGE-3 Unit test 'returns true when streaming chunk ends after > in a quoted value but before closing />' passed. hasIncompleteArtifact correctly buffers partial tags when chunk boundary falls within quoted attribute with >. 36:51 EDGE-3_36-51.png
EDGE-4 Unit test 'returns false for complete tag when base attribute contains > in a double-quoted value' passed. hasIncompleteArtifact returns false for a complete self-closing tag with > in double-quoted base attribute, preventing false positive incomplete detection. 36:52 EDGE-4_36-52.png
EDGE-5 Unit test 'returns false for text containing > but no artifact tags' passed. hasArtifactMarkers correctly returns false for plain text with > characters. findSafeTextBoundary test also confirms correct boundary detection with preceding text. 36:53 EDGE-5_36-53.png
EDGE-6 Verified via unit test (ArtifactParser.test.ts line 256-262: 'removes the artifact tag when the service returns null' passes) and source code review. E2E playground confirms no raw artifact XML appears in chat output. 42:38 EDGE-6_42-38.png
EDGE-7 Unit test 'handles multiple artifact:create tags' passed. parseText correctly handles multiple tags with > in attributes interleaved with text, producing separate data parts for each artifact. 36:54 EDGE-7_36-54.png
ADV-1 Regex parses tag with 1000+ > chars in quoted attribute in 0ms. No catastrophic backtracking detected. Unit tests also pass (29/29). 37:52 ADV-1_37-52.png
ADV-2 Regex correctly matches tag with /> inside double-quoted attribute. The /> inside quotes does not prematurely close the tag. 38:18 ADV-2_38-18.png
ADV-3 Regex correctly matches tag with </artifact:create> inside double-quoted attribute. Full attribute value preserved including the closing tag text. 38:28 ADV-3_38-28.png
ADV-4 Verified via 5 inline adversarial regex tests (all passed - no false positives). Unit test 'returns plain text unchanged when no artifacts are present' passes. E2E playground confirms text with artifact-like patterns rendered as plain text without false positive artifact detection. 45:17 ADV-4_45-17.png
ADV-5 When chunk ends inside quoted attr with > (before closing />), hasIncompleteArtifact correctly returns true. Combined chunks parse correctly with full attribute value preserved. 39:27 ADV-5_39-27.png
ADV-6 Parser handles mismatched quotes with > gracefully - matches best-effort without crash or hang. Elapsed 0ms. 39:42 ADV-6_39-42.png
REGR-1 Standard self-closing tags, double-quoted tags, and tags with body content all parse correctly. No regression from the quoted-value-aware regex changes. 29/29 unit tests pass. 39:58 REGR-1_39-58.png
REGR-2 Prefix patterns, mid-tag states, and complete tags all correctly classified. The !text.includes('</artifact:create>') guard prevents false positives for complete tags with closing tags. 29/29 unit tests pass. 40:33 REGR-2_40-33.png
📋 View Recording

Screen Recording

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants