Skip to content

Refine XML sanitization to convert tags to parentheses format while removing character escaping#765

Merged
pelikhan merged 4 commits intomainfrom
copilot/fix-f17e7136-90cb-45c3-a4d5-5db05b12fe2c
Sep 12, 2025
Merged

Refine XML sanitization to convert tags to parentheses format while removing character escaping#765
pelikhan merged 4 commits intomainfrom
copilot/fix-f17e7136-90cb-45c3-a4d5-5db05b12fe2c

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Sep 12, 2025

  • Analyzed the XML tag detection regex pattern in both compute_text.cjs and sanitize_output.cjs
  • Identified the issue: regex required at least one whitespace character before attributes, causing self-closing tags without spaces to be missed
  • Fixed regex pattern from (?:\s+[^>]*)? to (?:\s[^>]*|\/)? to handle:
    • Self-closing tags without whitespace: <br/>, <img src="test.jpg"/>
    • Self-closing tags with whitespace: <br />, <img src="test.jpg" />
    • Tags with various whitespace patterns: spaces, tabs, newlines
    • Regular opening/closing tags: <div>, </div>, <div class="test">
  • Added comprehensive test cases for all whitespace patterns and self-closing tag scenarios
  • Verified all existing functionality continues to work (mathematical operators preserved, security maintained)
  • Confirmed all tests pass (Go and JavaScript test suites)

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…heses format

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Review sanitize_output function and refine the xml tags regex to be more precise. Currently it just escapes all characters but it is too imprecise. Only escape xml tags, and replace < "<foo...>" to "(foo...)" Refine XML sanitization to convert tags to parentheses format while preserving comparison operators Sep 12, 2025
Copilot AI requested a review from pelikhan September 12, 2025 12:41
Comment thread pkg/workflow/js/compute_text.cjs Outdated
// XML tag neutralization - convert XML tags to parentheses format
sanitized = convertXmlTagsToParentheses(sanitized);

// XML character escaping for remaining characters
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot xml char escaping can be removed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed XML character escaping for &, ", and ' characters as requested. These characters now remain unescaped while XML tags continue to be converted to parentheses format. (6530724)

Comment thread pkg/workflow/js/sanitize_output.cjs Outdated
// XML tag neutralization - convert XML tags to parentheses format
sanitized = convertXmlTagsToParentheses(sanitized);

// XML character escaping for remaining characters
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot xml char escaping can removed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed XML character escaping for &, ", and ' characters as requested. These characters now remain unescaped while XML tags continue to be converted to parentheses format. (6530724)

…ving XML tag conversion

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Sep 12, 2025

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • api.contoso.com
    • Triggering command: /tmp/go-build2828290525/b278/cli.test -test.testlogfile=/tmp/go-build2828290525/b278/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI changed the title Refine XML sanitization to convert tags to parentheses format while preserving comparison operators Refine XML sanitization to convert tags to parentheses format while removing character escaping Sep 12, 2025
Copilot AI requested a review from pelikhan September 12, 2025 12:51
Comment thread pkg/workflow/js/compute_text.cjs Outdated
return (
s
// Standard XML tags: <tag>, <tag attr="value">, <tag/>, </tag>
.replace(/<\/?[a-zA-Z][a-zA-Z0-9\-_:]*(?:\s+[^>]*)?>/g, match => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot review regex and make sure all allowed white space patterns are handled. Review again and again.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed and fixed the regex to handle all whitespace patterns correctly. The issue was that the pattern (?:\s+[^>]*)? required at least one whitespace character before attributes, causing self-closing tags like <br/> to be missed while <br /> worked fine.

Fixed regex: (?:\s[^>]*|\/)? now handles:

  • ✅ Self-closing without space: <br/>, <img src="test.jpg"/>
  • ✅ Self-closing with space: <br />, <img src="test.jpg" />
  • ✅ Various whitespace: tabs, newlines, multiple spaces
  • ✅ Regular tags: <div>, </div>, <div class="test">

Added comprehensive test coverage and verified all existing functionality remains intact. (c54bf99)

- Changed regex from `(?:\s+[^>]*)? ` to `(?:\s[^>]*|\/)?` in both compute_text.cjs and sanitize_output.cjs
- Now correctly handles self-closing tags without whitespace (e.g., `<br/>`, `<img src="test.jpg"/>`)
- Supports all whitespace patterns: spaces, tabs, newlines in any quantity
- Added comprehensive tests for self-closing tags and various whitespace patterns
- Preserves mathematical operators and maintains security by converting XML tags to parentheses format

Fixes issue where tags like `<tag/>` were not being converted while `<tag />` worked correctly.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Sep 12, 2025

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • api.contoso.com
    • Triggering command: /tmp/go-build469538158/b278/cli.test -test.testlogfile=/tmp/go-build469538158/b278/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true (dns block)
    • Triggering command: /tmp/go-build3784943454/b278/cli.test -test.testlogfile=/tmp/go-build3784943454/b278/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@pelikhan pelikhan marked this pull request as ready for review September 12, 2025 13:07
@pelikhan pelikhan merged commit d39e9a8 into main Sep 12, 2025
13 of 14 checks passed
@pelikhan pelikhan deleted the copilot/fix-f17e7136-90cb-45c3-a4d5-5db05b12fe2c branch September 12, 2025 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants