Escape HTML in markdown instead of stripping it by ValwareIRC · Pull Request #88 · obbyworld/obby

ValwareIRC · 2025-10-16T16:10:32Z

This PR changes the markdown rendering behavior to escape HTML tags instead of completely stripping them.

Changes

Modified in to escape HTML tags ( and ) to HTML entities ( and ) instead of removing them entirely
Removed the output HTML stripping since escaped HTML is safe to render
Added tests in to verify HTML escaping behavior

Security Benefits

Prevents XSS attacks by ensuring HTML tags are displayed as literal text
Preserves message content integrity by showing what HTML was originally present
Maintains backward compatibility with existing functionality

Testing

All existing tests pass (252 tests)
New tests verify HTML escaping works correctly
Manual testing confirms HTML tags are displayed as text rather than being stripped

Closes security vulnerability where HTML could be rendered dangerously in markdown.

Summary by CodeRabbit

Bug Fixes
- Improved text processing for message rendering by adjusting how HTML escaping and markdown parsing work together.
Tests
- Added test coverage for markdown rendering and HTML escaping behavior.

coderabbitai · 2025-10-16T16:11:00Z

Walkthrough

The PR modifies the renderMarkdown function's security approach, replacing post-parse HTML sanitization with pre-parse input escaping. Raw text is now escaped before markdown parsing, and the resulting HTML is used directly without an additional sanitization pass that previously stripped dangerous tags and event handlers.

Changes

Cohort / File(s)	Summary
Markdown rendering security pipeline `src/lib/ircUtils.tsx`	Restructures markdown rendering to escape input text before parsing instead of sanitizing output HTML afterward; removes sanitization block stripping dangerous tags and event handlers
Markdown rendering tests `tests/lib/messageFormatter.test.ts`	Adds new test suite for `renderMarkdown` function verifying HTML escaping and markdown rendering behavior

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Security-critical changes to HTML sanitization and escaping logic require careful analysis of threat models and markdown parser behavior. The shift from post-parse sanitization to pre-parse escaping is an architectural change with implications for XSS protection that demands thorough review of escape/parse interaction.

Possibly related PRs

Remove style attributes from markdown-rendered HTML and enhance security #86: Directly modifies the same renderMarkdown function and HTML sanitization pipeline, making coordinated changes to the security model.

Suggested reviewers

matheusfillipe

Poem

🐰 Input escapes before the parse,
No sanitizing come to pass—
Markdown flows through cleaner hands,
A rabbit's secure HTML stands! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title accurately reflects the primary change of updating markdown rendering to escape HTML tags instead of stripping them, providing a concise and clear summary of the pull request’s intent.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/markdown

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Force all img tags to have style='max-height: 150px;' regardless of source - Override any user-provided styles on images - Maintain consistent image sizing across markdown and raw HTML

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7687b19 and 0428c0c.

📒 Files selected for processing (2)

src/lib/ircUtils.tsx (1 hunks)
tests/lib/messageFormatter.test.ts (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/lib/messageFormatter.test.ts (1)

src/lib/ircUtils.tsx (1)

renderMarkdown (193-263)

🪛 ast-grep (0.39.6)

src/lib/ircUtils.tsx

[warning] 259-259: Usage of dangerouslySetInnerHTML detected. This bypasses React's built-in XSS protection. Always sanitize HTML content using libraries like DOMPurify before injecting it into the DOM to prevent XSS attacks.
Context: dangerouslySetInnerHTML
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml
- https://cwe.mitre.org/data/definitions/79.html

(react-unsafe-html-injection)

coderabbitai · 2025-10-16T16:14:38Z

+  // Escape HTML tags in input so they render as text
+  const escapedText = text.replace(/</g, "&lt;").replace(/>/g, "&gt;");
+
  marked.setOptions({
    breaks: true,
    gfm: true,
    renderer: renderer,
  });

  // Parse markdown to HTML
-  const html = marked.parse(text) as string;
-
-  // Additional security: remove any remaining script tags or dangerous content that might have slipped through
-  const sanitizedHtml = html
-    .replace(/<script[^>]*>.*?<\/script>/gi, "")
-    .replace(/<iframe[^>]*>.*?<\/iframe>/gi, "")
-    .replace(/<object[^>]*>.*?<\/object>/gi, "")
-    .replace(/<embed[^>]*>.*?<\/embed>/gi, "")
-    .replace(/on\w+="[^"]*"/gi, "") // Remove event handlers
-    .replace(/javascript:/gi, "#");
+  const html = marked.parse(escapedText) as string;

  // Return a div with dangerouslySetInnerHTML
  return (
    <div
      className="markdown-content"
      // biome-ignore lint/security/noDangerouslySetInnerHtml: HTML is sanitized above
-      dangerouslySetInnerHTML={{ __html: sanitizedHtml }}
+      dangerouslySetInnerHTML={{ __html: html }}


⚠️ Potential issue | 🔴 Critical

Escaping misses ampersands—numeric entities can still inject HTML

By only replacing < and >, inputs like <script>alert(1) survive untouched until dangerouslySetInnerHTML. The browser resolves those numeric entities to <script> during parsing, so the script executes. This regression is critical because the previous sanitizer stripped such payloads. Escape ampersands first (or reintroduce a proper HTML sanitizer) before handing the string to marked.

- const escapedText = text.replace(/</g, "<").replace(/>/g, ">"); + const escapedText = text + .replace(/&/g, "&") + .replace(/</g, "<") + .replace(/>/g, ">");

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Escape HTML tags in input so they render as text

const escapedText = text.replace(/</g, "<").replace(/>/g, ">");

marked.setOptions({

breaks: true,

gfm: true,

renderer: renderer,

});

// Parse markdown to HTML

const html = marked.parse(text) as string;

// Additional security: remove any remaining script tags or dangerous content that might have slipped through

const sanitizedHtml = html

.replace(/<script[^>]*>.*?<\/script>/gi, "")

.replace(/<iframe[^>]*>.*?<\/iframe>/gi, "")

.replace(/<object[^>]*>.*?<\/object>/gi, "")

.replace(/<embed[^>]*>.*?<\/embed>/gi, "")

.replace(/on\w+="[^"]*"/gi, "") // Remove event handlers

.replace(/javascript:/gi, "#");

const html = marked.parse(escapedText) as string;

// Return a div with dangerouslySetInnerHTML

return (

<div

className="markdown-content"

// biome-ignore lint/security/noDangerouslySetInnerHtml: HTML is sanitized above

dangerouslySetInnerHTML={{ __html: sanitizedHtml }}

dangerouslySetInnerHTML={{ __html: html }}

// Escape HTML tags in input so they render as text

const escapedText = text

.replace(/&/g, "&")

.replace(/</g, "<")

.replace(/>/g, ">");

marked.setOptions({

breaks: true,

gfm: true,

renderer: renderer,

});

// Parse markdown to HTML

const html = marked.parse(escapedText) as string;

// Return a div with dangerouslySetInnerHTML

return (

<div

className="markdown-content"

// biome-ignore lint/security/noDangerouslySetInnerHtml: HTML is sanitized above

dangerouslySetInnerHTML={{ __html: html }}

🧰 Tools

🪛 ast-grep (0.39.6)

[warning] 259-259: Usage of dangerouslySetInnerHTML detected. This bypasses React's built-in XSS protection. Always sanitize HTML content using libraries like DOMPurify before injecting it into the DOM to prevent XSS attacks.
Context: dangerouslySetInnerHTML
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml
- https://cwe.mitre.org/data/definitions/79.html

(react-unsafe-html-injection)

🤖 Prompt for AI Agents

In src/lib/ircUtils.tsx around lines 243 to 260, the current escaping only replaces `<` and `>` so numeric/hex entities like `<script>` are still decoded by the browser and can inject HTML; fix by escaping ampersands first (replace `&` with `&`) before replacing `<`/`>` and before passing the string into marked, or better yet run a proper HTML sanitizer (e.g., DOMPurify) on the resulting HTML from marked to remove any decoded/unsafe tags and attributes before using dangerouslySetInnerHTML.

coderabbitai · 2025-10-16T16:14:38Z

+  describe("renderMarkdown", () => {
+    it("should escape HTML tags and render them as text", () => {
+      const input = 'Hello <script>alert("xss")</script> world';
+      const result = renderMarkdown(input);
+
+      // The HTML should be escaped and visible as text
+      expect(result).toBeDefined();
+      // We can't easily test the exact React element output, but we can verify it doesn't contain unescaped HTML
+      // The key is that <script> tags should be escaped to &lt;script&gt;
+    });
+
+    it("should render markdown while escaping HTML", () => {
+      const input = "**bold** <em>not html</em> *italic*";
+      const result = renderMarkdown(input);
+
+      expect(result).toBeDefined();
+      // Markdown should be rendered, HTML should be escaped
+    });


⚠️ Potential issue | 🟠 Major

Tests do not assert the escaping behavior

Both new specs only check that renderMarkdown returns “something”. They never confirm that <script> becomes <script> or that markdown still renders, so the tests will pass even if the escaping logic regresses. Please assert on the rendered markup (e.g. use renderToStaticMarkup(renderMarkdown(input))) to ensure the output contains escaped tags while still converting markdown.

🤖 Prompt for AI Agents

In tests/lib/messageFormatter.test.ts around lines 313 to 330, the two specs only assert the result is defined and do not verify escaping or markdown conversion; update each test to render the React output to static markup (e.g. renderToStaticMarkup(renderMarkdown(input))) and assert the resulting string contains the escaped HTML entities (e.g. "<script>" and "</script>" or "<em>") and also contains expected markdown output (e.g. "<strong>bold</strong>" or "<em>italic</em>"/ "<em>not html</em>" depending on case) so the tests fail if escaping or markdown rendering regresses.

- Change renderMarkdown to escape HTML tags (< >) to entities (< >) instead of removing them - This preserves content while preventing XSS attacks - HTML tags are now displayed as literal text rather than being stripped - Add tests to verify HTML escaping behavior - All existing tests still pass

github-actions · 2025-10-16T16:16:14Z

Preview URL: https://fix-markdown.obsidianirc.pages.dev

Automated deployment preview for the PR in the Cloudflare Pages.

* Ensure all img tags have controlled max-height styling - Force all img tags to have style='max-height: 150px;' regardless of source - Override any user-provided styles on images - Maintain consistent image sizing across markdown and raw HTML * Escape HTML in markdown instead of stripping it - Change renderMarkdown to escape HTML tags (< >) to entities (< >) instead of removing them - This preserves content while preventing XSS attacks - HTML tags are now displayed as literal text rather than being stripped - Add tests to verify HTML escaping behavior - All existing tests still pass * lint

Ensure all img tags have controlled max-height styling

c05f1d4

- Force all img tags to have style='max-height: 150px;' regardless of source - Override any user-provided styles on images - Maintain consistent image sizing across markdown and raw HTML

coderabbitai Bot reviewed Oct 16, 2025

View reviewed changes

ValwareIRC added 2 commits October 16, 2025 17:14

lint

3334726

ValwareIRC force-pushed the fix/markdown branch from 0428c0c to 3334726 Compare October 16, 2025 16:15

ValwareIRC merged commit df72ee1 into main Oct 16, 2025
3 checks passed

ValwareIRC deleted the fix/markdown branch October 16, 2025 16:25

coderabbitai Bot mentioned this pull request Mar 11, 2026

fix/markdown rendering issues and xss #158

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Escape HTML in markdown instead of stripping it#88

Escape HTML in markdown instead of stripping it#88
ValwareIRC merged 3 commits into
mainfrom
fix/markdown

ValwareIRC commented Oct 16, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Oct 16, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Oct 16, 2025

Uh oh!

coderabbitai Bot Oct 16, 2025

Uh oh!

Uh oh!

github-actions Bot commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ValwareIRC commented Oct 16, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Security Benefits

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ValwareIRC commented Oct 16, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Oct 16, 2025 •

edited

Loading