A common approach taken to mitigate this risk is to allow some HTML content, but block content that is potentially harmful. One problem with a straightforward approach to blocking such content is that HTML parsing in browsers differs from the ideal, and nefarious individuals can take advantage of these differences to obscure content.
DeXSS uses TagSoup an
open-source HTML parser that attempts to mimic how web browsers
work. TagSoup reads wild HTML and generates SAX2 events. DeXSS invokes
TagSoup and follows it with a pipeline of SAX2 filters to remove HTML
tags such as
script and attribute values containing such
See also https://dexss.org
DeXSS 1.2 is an Alpha release. You should be aware of the following issues:
- This release implements a blacklist approach, which has advantages over a whitelist approach, but also has inherent risks. There are still a number of known XSS attacks that DeXSS does not yet detect.
- DeXSS is agressive about removing style attributes that fail the CSS analyzer. There are probably other CSS attacks that DeXSS does not protect against.
- Elements that TagSoup thinks should be in the head are discarded by the default settings; changing the BODY_ONLY flag to allow head content will reduce effectiveness greatly. Consequently, DeXSS should not be used to parse entire user-provided HTML files, but only parts that are destined for inclusion.
- The output of DeXSS is intended for browsers, not for storage. As a result, some constructs may be overly verbose.
- Configurability and test suites are lacking.
- DeXSS does not specially handle any HTML5 elements or attributes not present in HTML4; see HTMLCleaner below.
If you have an interest in working on these issues, please consider contributing to the project.
DeXSS includes the following classes for direct use:
Test, a command-line utility for testing XSS removal.
DeXSS, which implements a string-to-string conversion of HTML, with XSS removal.
DeXSSParser, which can be used directly as a SAX2 parser to produce SAX2 events from an input stream.
DeXSSFilterPipeline, which can be used as a SAX2 filter if you have already used TagSoup to produce SAX2 events
How to build
ant dist -emacs
- dexss includes
tagsoup-1.2.1.jarfrom http://tagsoup.info If you need to change the TagSoup version, edit the file etc/build/build.properties.
- dexss includes
osbcp-css-paser-1.4.jarfrom http://github.com/corgrath/osbcp-css-parser If you need to change the OSBCP CSS Parser version, edit the file etc/build/build.properties.
How to test
- Test for false positives
java -classpath lib/tagsoup-1.2.1.jar:lib/osbcp-css-parser-1.4.jar:dist/lib/dexss-1.2.jar org.dexss.Test tests/benign/*.txt
java -classpath lib\tagsoup-1.2.1.jar\;lib\osbcp-css-parser-1.4.jar\;dist\lib\dexss-1.2.jar org.dexss.Test tests/benign/*.txt
- Test for false negatives
java -classpath lib/tagsoup-1.2.1.jar:lib/osbcp-css-parser-1.4.jar:dist/lib/dexss-1.2.jar org.dexss.Test tests/xss/*.txt
java -classpath lib\tagsoup-1.2.1.jar\;lib\osbcp-css-parser-1.4.jar\;dist/lib/dexss-1.2.jar org.dexss.Test tests/xss/*.txt
Other Similar Projects
If DeXSS does not meet your needs, see freecode.com for a list of similar libraries in other languages such as PHP and Perl.
- CSS analyzer is still not applied to style elements, only style attributes.
- Should upgrade from TagSoup to HTMLCleaner to get HTML5
- Should offer both blacklist and whitelist configurations for pipeline.
- Should upgrade build and test system to modern standards.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- This software is released under Apache 2.0 License (see file LICENSE).
- Copyright (C) 2005, 2006, 2007, 2012 Xerox Corporation
- Copyright (C) 2012 Leigh L. Klotz, Jr.
- Portions of the file build.xml were derived from TagSoup http://tagsoup.info Copyright (c) 2007 John Cowan licensed under Apache 2.0.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Portions of the file build.xml were derived from TagSoup http://tagsoup.info Copyright (c) 2005-2008 John Cowan licensed under Apache 2.0.