-
-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Issue: Exponential Time Complexity Bug Causing Hangs with Nested <p> Tags
Summary
The library experiences severe performance degradation or complete hangs when processing HTML with deeply nested <p> tags, particularly common in malformed HTML fromWYSIWYG editors and user-generated content. Conversion that should take milliseconds can take 10+ seconds or never complete.
Root Cause
File: src/ReverseMarkdown/Converters/P.cs
Lines: 19-24 (in the buggy version)
The P.Convert() method had a critical bug where it called TreatChildren(node) to compute content, stored it in a variable, but then called TreatChildren(node) again in the return statement without using the cached result:
public override string Convert(HtmlNode node)
{
var indentation = IndentationFor(node);
var newlineAfter = NewlineAfter(node);
// Computes TreatChildren and stores in 'content'
var content = Converter.Config.CleanupUnnecessarySpaces
? TreatChildren(node).Trim() // First call
: TreatChildren(node); // Or second call
return $"{indentation}{TreatChildren(node)}{newlineAfter}"; // THIRD call - ignores 'content'!
}This causes the entire child tree to be processed 2-3 times for every <p> tag. With nested <p> tags, this becomes O(2^n) exponential complexity.
Performance Impact
| Nesting Depth | Recursive Calls | Time Impact |
|---|---|---|
| 5 levels | 2^5 = 32x | ~30ms → 1s |
| 10 levels | 2^10 = 1,024x | ~50ms → 10s+ |
| 20 levels | 2^20 = 1,048,576x | Effectively hangs |
| 26 levels (real-world) | 2^26 = 67,108,864x | Never completes |
Reproduction
Minimal Example:
var html = "<p>L1<p>L2<p>L3<p>L4<p>L5<p>L6<p>L7<p>L8<p>L9<p>L10</p></p></p></p></p></p></p></p></p></p>";
var converter = new Converter();
var result = converter.Convert(html); // Takes several seconds instead of <1msReal-World Example (from e-commerce product description):
<p><span><strong>Product Title</strong></span></p> Description text...
<p><span><strong>Includes:</strong><br> Item details
<p><span>Warning text
<p><span>Section 1
<p><span>INGREDIENTS:
<p><span>Long ingredient list...When HtmlAgilityPack parses this malformed HTML with unclosed <p> and <span> tags, it creates a deeply nested structure (~26 levels), causing the converter to hang indefinitely.
Why This Pattern Is Common
- E-commerce platforms - Many product description editors generate malformed HTML
- User-generated content - Copy-paste from various sources
- Legacy HTML - Old content without proper validation
- WYSIWYG editors - Some editors produce nested unclosed tags
Call Stack Pattern
The repeating pattern in the call stack confirms exponential recursion:
P.Convert (calls TreatChildren)
→ TreatChildren
→ Treat
→ ByPass.Convert (for <span>)
→ TreatChildren
→ Treat
→ P.Convert (calls TreatChildren AGAIN)
→ [exponential recursion continues...]
Expected Behavior
- Conversion should complete in linear O(n) time
- Deeply nested tags (even malformed) should convert in milliseconds
- No hangs regardless of nesting depth
Actual Behavior
- Exponential O(2^n) time complexity
- 10+ second delays with moderate nesting (10 levels)
- Complete hangs with deep nesting (20+ levels)
- Unresponsive applications when processing user-uploaded HTML
Environment
- Library Version: All versions (bug present in current codebase)
- Platform: All platforms (.NET 8.0, .NET 9.0)
- Impact: Critical - causes application hangs
Proposed Fix
Change the P.Convert() method to use the cached content variable instead of recalculating:
public override string Convert(HtmlNode node)
{
var indentation = IndentationFor(node);
var newlineAfter = NewlineAfter(node);
var content = TreatChildren(node); // Call ONCE
if (Converter.Config.CleanupUnnecessarySpaces)
{
content = content.Trim();
}
return $"{indentation}{content}{newlineAfter}"; // Use cached result
}Expected improvement:
- From O(2^n) to O(n) time complexity
- 1,000x+ speedup for moderately nested tags
- 1,000,000x+ speedup for deeply nested tags
- No more hangs regardless of nesting depth
Additional Notes
- The bug only affects
<p>tags - other converters don't have this issue - The fix is simple and low-risk (one-line change)
- Comprehensive test coverage has been added for regression testing
- No breaking changes to API or output format