Skip to content

Escape HTML in markdown instead of stripping it#88

Merged
ValwareIRC merged 3 commits into
mainfrom
fix/markdown
Oct 16, 2025
Merged

Escape HTML in markdown instead of stripping it#88
ValwareIRC merged 3 commits into
mainfrom
fix/markdown

Conversation

@ValwareIRC
Copy link
Copy Markdown
Contributor

@ValwareIRC ValwareIRC commented Oct 16, 2025

This PR changes the markdown rendering behavior to escape HTML tags instead of completely stripping them.

Changes

  • Modified in to escape HTML tags ( and ) to HTML entities ( and ) instead of removing them entirely
  • Removed the output HTML stripping since escaped HTML is safe to render
  • Added tests in to verify HTML escaping behavior

Security Benefits

  • Prevents XSS attacks by ensuring HTML tags are displayed as literal text
  • Preserves message content integrity by showing what HTML was originally present
  • Maintains backward compatibility with existing functionality

Testing

  • All existing tests pass (252 tests)
  • New tests verify HTML escaping works correctly
  • Manual testing confirms HTML tags are displayed as text rather than being stripped

Closes security vulnerability where HTML could be rendered dangerously in markdown.

Summary by CodeRabbit

  • Bug Fixes

    • Improved text processing for message rendering by adjusting how HTML escaping and markdown parsing work together.
  • Tests

    • Added test coverage for markdown rendering and HTML escaping behavior.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 16, 2025

Walkthrough

The PR modifies the renderMarkdown function's security approach, replacing post-parse HTML sanitization with pre-parse input escaping. Raw text is now escaped before markdown parsing, and the resulting HTML is used directly without an additional sanitization pass that previously stripped dangerous tags and event handlers.

Changes

Cohort / File(s) Summary
Markdown rendering security pipeline
src/lib/ircUtils.tsx
Restructures markdown rendering to escape input text before parsing instead of sanitizing output HTML afterward; removes sanitization block stripping dangerous tags and event handlers
Markdown rendering tests
tests/lib/messageFormatter.test.ts
Adds new test suite for renderMarkdown function verifying HTML escaping and markdown rendering behavior

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Security-critical changes to HTML sanitization and escaping logic require careful analysis of threat models and markdown parser behavior. The shift from post-parse sanitization to pre-parse escaping is an architectural change with implications for XSS protection that demands thorough review of escape/parse interaction.

Possibly related PRs

Suggested reviewers

  • matheusfillipe

Poem

🐰 Input escapes before the parse,
No sanitizing come to pass—
Markdown flows through cleaner hands,
A rabbit's secure HTML stands! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately reflects the primary change of updating markdown rendering to escape HTML tags instead of stripping them, providing a concise and clear summary of the pull request’s intent.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/markdown

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

- Force all img tags to have style='max-height: 150px;' regardless of source
- Override any user-provided styles on images
- Maintain consistent image sizing across markdown and raw HTML
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7687b19 and 0428c0c.

📒 Files selected for processing (2)
  • src/lib/ircUtils.tsx (1 hunks)
  • tests/lib/messageFormatter.test.ts (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/lib/messageFormatter.test.ts (1)
src/lib/ircUtils.tsx (1)
  • renderMarkdown (193-263)
🪛 ast-grep (0.39.6)
src/lib/ircUtils.tsx

[warning] 259-259: Usage of dangerouslySetInnerHTML detected. This bypasses React's built-in XSS protection. Always sanitize HTML content using libraries like DOMPurify before injecting it into the DOM to prevent XSS attacks.
Context: dangerouslySetInnerHTML
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml
- https://cwe.mitre.org/data/definitions/79.html

(react-unsafe-html-injection)

Comment thread src/lib/ircUtils.tsx
Comment on lines +243 to +260
// Escape HTML tags in input so they render as text
const escapedText = text.replace(/</g, "&lt;").replace(/>/g, "&gt;");

marked.setOptions({
breaks: true,
gfm: true,
renderer: renderer,
});

// Parse markdown to HTML
const html = marked.parse(text) as string;

// Additional security: remove any remaining script tags or dangerous content that might have slipped through
const sanitizedHtml = html
.replace(/<script[^>]*>.*?<\/script>/gi, "")
.replace(/<iframe[^>]*>.*?<\/iframe>/gi, "")
.replace(/<object[^>]*>.*?<\/object>/gi, "")
.replace(/<embed[^>]*>.*?<\/embed>/gi, "")
.replace(/on\w+="[^"]*"/gi, "") // Remove event handlers
.replace(/javascript:/gi, "#");
const html = marked.parse(escapedText) as string;

// Return a div with dangerouslySetInnerHTML
return (
<div
className="markdown-content"
// biome-ignore lint/security/noDangerouslySetInnerHtml: HTML is sanitized above
dangerouslySetInnerHTML={{ __html: sanitizedHtml }}
dangerouslySetInnerHTML={{ __html: html }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Escaping misses ampersands—numeric entities can still inject HTML

By only replacing < and >, inputs like &#x3C;script&#x3E;alert(1) survive untouched until dangerouslySetInnerHTML. The browser resolves those numeric entities to <script> during parsing, so the script executes. This regression is critical because the previous sanitizer stripped such payloads. Escape ampersands first (or reintroduce a proper HTML sanitizer) before handing the string to marked.

-  const escapedText = text.replace(/</g, "&lt;").replace(/>/g, "&gt;");
+  const escapedText = text
+    .replace(/&/g, "&amp;")
+    .replace(/</g, "&lt;")
+    .replace(/>/g, "&gt;");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Escape HTML tags in input so they render as text
const escapedText = text.replace(/</g, "&lt;").replace(/>/g, "&gt;");
marked.setOptions({
breaks: true,
gfm: true,
renderer: renderer,
});
// Parse markdown to HTML
const html = marked.parse(text) as string;
// Additional security: remove any remaining script tags or dangerous content that might have slipped through
const sanitizedHtml = html
.replace(/<script[^>]*>.*?<\/script>/gi, "")
.replace(/<iframe[^>]*>.*?<\/iframe>/gi, "")
.replace(/<object[^>]*>.*?<\/object>/gi, "")
.replace(/<embed[^>]*>.*?<\/embed>/gi, "")
.replace(/on\w+="[^"]*"/gi, "") // Remove event handlers
.replace(/javascript:/gi, "#");
const html = marked.parse(escapedText) as string;
// Return a div with dangerouslySetInnerHTML
return (
<div
className="markdown-content"
// biome-ignore lint/security/noDangerouslySetInnerHtml: HTML is sanitized above
dangerouslySetInnerHTML={{ __html: sanitizedHtml }}
dangerouslySetInnerHTML={{ __html: html }}
// Escape HTML tags in input so they render as text
const escapedText = text
.replace(/&/g, "&amp;")
.replace(/</g, "&lt;")
.replace(/>/g, "&gt;");
marked.setOptions({
breaks: true,
gfm: true,
renderer: renderer,
});
// Parse markdown to HTML
const html = marked.parse(escapedText) as string;
// Return a div with dangerouslySetInnerHTML
return (
<div
className="markdown-content"
// biome-ignore lint/security/noDangerouslySetInnerHtml: HTML is sanitized above
dangerouslySetInnerHTML={{ __html: html }}
🧰 Tools
🪛 ast-grep (0.39.6)

[warning] 259-259: Usage of dangerouslySetInnerHTML detected. This bypasses React's built-in XSS protection. Always sanitize HTML content using libraries like DOMPurify before injecting it into the DOM to prevent XSS attacks.
Context: dangerouslySetInnerHTML
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml
- https://cwe.mitre.org/data/definitions/79.html

(react-unsafe-html-injection)

🤖 Prompt for AI Agents
In src/lib/ircUtils.tsx around lines 243 to 260, the current escaping only
replaces `<` and `>` so numeric/hex entities like `&#x3C;script&#x3E;` are still
decoded by the browser and can inject HTML; fix by escaping ampersands first
(replace `&` with `&amp;`) before replacing `<`/`>` and before passing the
string into marked, or better yet run a proper HTML sanitizer (e.g., DOMPurify)
on the resulting HTML from marked to remove any decoded/unsafe tags and
attributes before using dangerouslySetInnerHTML.

Comment on lines +313 to +330
describe("renderMarkdown", () => {
it("should escape HTML tags and render them as text", () => {
const input = 'Hello <script>alert("xss")</script> world';
const result = renderMarkdown(input);

// The HTML should be escaped and visible as text
expect(result).toBeDefined();
// We can't easily test the exact React element output, but we can verify it doesn't contain unescaped HTML
// The key is that <script> tags should be escaped to &lt;script&gt;
});

it("should render markdown while escaping HTML", () => {
const input = "**bold** <em>not html</em> *italic*";
const result = renderMarkdown(input);

expect(result).toBeDefined();
// Markdown should be rendered, HTML should be escaped
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Tests do not assert the escaping behavior

Both new specs only check that renderMarkdown returns “something”. They never confirm that <script> becomes &lt;script&gt; or that markdown still renders, so the tests will pass even if the escaping logic regresses. Please assert on the rendered markup (e.g. use renderToStaticMarkup(renderMarkdown(input))) to ensure the output contains escaped tags while still converting markdown.

🤖 Prompt for AI Agents
In tests/lib/messageFormatter.test.ts around lines 313 to 330, the two specs
only assert the result is defined and do not verify escaping or markdown
conversion; update each test to render the React output to static markup (e.g.
renderToStaticMarkup(renderMarkdown(input))) and assert the resulting string
contains the escaped HTML entities (e.g. "&lt;script&gt;" and "&lt;/script&gt;"
or "&lt;em&gt;") and also contains expected markdown output (e.g.
"<strong>bold</strong>" or "<em>italic</em>"/ "<em>not html</em>" depending on
case) so the tests fail if escaping or markdown rendering regresses.

- Change renderMarkdown to escape HTML tags (< >) to entities (&lt; &gt;) instead of removing them
- This preserves content while preventing XSS attacks
- HTML tags are now displayed as literal text rather than being stripped
- Add tests to verify HTML escaping behavior
- All existing tests still pass
@ValwareIRC ValwareIRC merged commit df72ee1 into main Oct 16, 2025
3 checks passed
@github-actions
Copy link
Copy Markdown

Pages Preview
Preview URL: https://fix-markdown.obsidianirc.pages.dev

Automated deployment preview for the PR in the Cloudflare Pages.

@ValwareIRC ValwareIRC deleted the fix/markdown branch October 16, 2025 16:25
zocram4cc pushed a commit to zocram4cc/ObsidianIRC that referenced this pull request Feb 17, 2026
* Ensure all img tags have controlled max-height styling

- Force all img tags to have style='max-height: 150px;' regardless of source
- Override any user-provided styles on images
- Maintain consistent image sizing across markdown and raw HTML

* Escape HTML in markdown instead of stripping it

- Change renderMarkdown to escape HTML tags (< >) to entities (&lt; &gt;) instead of removing them
- This preserves content while preventing XSS attacks
- HTML tags are now displayed as literal text rather than being stripped
- Add tests to verify HTML escaping behavior
- All existing tests still pass

* lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant