Skip to content
Kevin Cheng edited this page Jan 20, 2018 · 1 revision

Welcome to the MarkupSanity wiki!

MarkupSanity is a html cleanup library, using HtmlAgilityPack to parse an input html, and using whitelists to ensure that the html output does not contain harmful elements, particularly cross-site scripting vulnerabilities. As this is still in early stages of development, it doesn't cover all the advanced attack vectors yet, but the goal is to cover as much as possible within the scope of the library.

Primarily, MarkupSanity is intended to be used to clean up user inputs where formatting is allowed, be it regular textboxes where users can type in html, or text editors that allow raw editing.

At its simplest usage, MarkupSanity is an extension method for Strings and validates the value against a pre-defined set of default whitelisted tags and attrbutes. These tags and attributes came from W3C standards, but excluding elements where Javascript is used (e.g. "onclick=" event). Some attributes have secondary usage for Javascripts (e.g. <a href="javascript:alert('gotcha!');">Click Me</a>), so these are handled as well.

As an example,

String inputValue = "<a href="javascript:alert('Gotcha again!');" onclick="javascript:alert('Gotcha!');">Click Me</a>";
String cleanValue = inputValue.SanitizeHtml();
Console.Writeline(cleanValue);

The output shall be:

<a>Click Me</a>

The onclick attribute is removed due to its use for scripting. But while href is a valid attribute, its usage to trigger a Javascript alert is considered dangerous and thus removed.

Do note, as MarkupSanity works against a whitelist as opposed to blocking from a blacklist, any custom/non-standard tags and attributes are also removed by default. If you commonly use such custom elements, you will need to define your own custom whitelist.