Skip to content

Commit

Permalink
Merge pull request #323 from willkg/298-sanitize-characters
Browse files Browse the repository at this point in the history
Convert invisible characters to ? in Characters tokens
  • Loading branch information
willkg committed Sep 28, 2017
2 parents 091b3c6 + c0fa1aa commit 5490eb6
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 0 deletions.
13 changes: 13 additions & 0 deletions CHANGES
Expand Up @@ -6,6 +6,15 @@ Version 2.1 (in development)

**Security fixes**

* Convert control characters (backspace particularly) to "?" preventing
malicious copy-and-paste situations. (#298)

See `<https://github.com/mozilla/bleach/issues/298>`_ for more details.

This affects all previous versions of Bleach. Check the comments on that
issue for ways to alleviate the issue if you can't upgrade to Bleach 2.1.


**Backwards incompatible changes**

* Redid versioning. ``bleach.VERSION`` is no longer available. Use the string
Expand All @@ -17,8 +26,10 @@ Version 2.1 (in development)

* clean, linkify: accept only unicode or utf-8-encoded str (#176)


**Features**


**Bug fixes**

* ``bleach.clean()`` no longer unescapes entities including ones that are missing
Expand All @@ -39,13 +50,15 @@ Version 2.1 (in development)
* add test website and scripts to test ``bleach.clean()`` output in browser;
thank you, Greg Guthe!


Version 2.0 (March 8th, 2017)
-----------------------------

**Security fixes**

* None


**Backwards incompatible changes**

* Removed support for Python 2.6. #206
Expand Down
20 changes: 20 additions & 0 deletions bleach/sanitizer.py
@@ -1,4 +1,5 @@
from __future__ import unicode_literals
from itertools import chain
import re
import string

Expand Down Expand Up @@ -60,6 +61,19 @@

AMP_SPLIT_RE = re.compile('(&)')

#: Invisible characters--0 to and including 31 except 9 (tab), 10 (lf), and 13 (cr)
INVISIBLE_CHARACTERS = ''.join([chr(c) for c in chain(range(0, 9), range(11, 13), range(14, 32))])

#: Regexp for characters that are invisible
INVISIBLE_CHARACTERS_RE = re.compile(
'[' + INVISIBLE_CHARACTERS + ']',
re.UNICODE
)

#: String to replace invisible characters with. This can be a character, a
#: string, or even a function that takes a Python re matchobj
INVISIBLE_REPLACEMENT_CHAR = '?'


class BleachHTMLTokenizer(HTMLTokenizer):
def consumeEntity(self, allowedChar=None, fromAttribute=False):
Expand Down Expand Up @@ -435,6 +449,12 @@ def sanitize_characters(self, token):
"""
data = token.get('data', '')

if not data:
return token

data = INVISIBLE_CHARACTERS_RE.sub(INVISIBLE_REPLACEMENT_CHAR, data)
token['data'] = data

# If there isn't a & in the data, we can return now
if '&' not in data:
return token
Expand Down
20 changes: 20 additions & 0 deletions tests/test_security.py
Expand Up @@ -156,6 +156,26 @@ def test_feed_protocol():
assert clean('<a href="feed:file:///tmp/foo">foo</a>') == '<a>foo</a>'


@pytest.mark.parametrize('data, expected', [
# Convert bell
('1\a23', '1?23'),
# Convert backpsace
('1\b23', '1?23'),
# Convert formfeed
('1\v23', '1?23'),
# Convert vertical tab
('1\f23', '1?23'),
# Convert a bunch of characters in a string
('import y\bose\bm\bi\bt\be\b', 'import y?ose?m?i?t?e?'),
])
def test_invisible_characters(data, expected):
assert clean(data) == expected


def get_tests():
"""Retrieves regression tests from data/ directory
Expand Down

0 comments on commit 5490eb6

Please sign in to comment.