Skip to content

Commit

Permalink
[3.9] bpo-42967: only use '&' as a query string separator (GH-24297) (#…
Browse files Browse the repository at this point in the history
…24528)

(cherry picked from commit fcbe0cb)

* [3.9] bpo-42967: only use '&' as a query string separator (GH-24297)

bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator.

Co-authored-by: Éric Araujo <merwok@netwok.org>
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
Co-authored-by: Adam Goldschmidt <adamgold7@gmail.com>
  • Loading branch information
4 people committed Feb 15, 2021
1 parent 242f6c9 commit c9f0781
Show file tree
Hide file tree
Showing 11 changed files with 176 additions and 46 deletions.
9 changes: 6 additions & 3 deletions Doc/library/cgi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -277,14 +277,14 @@ These are useful if you want more control, or if you want to employ some of the
algorithms implemented in this module in other circumstances.


.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False)
.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False, separator="&")

Parse a query in the environment or from a file (the file defaults to
``sys.stdin``). The *keep_blank_values* and *strict_parsing* parameters are
``sys.stdin``). The *keep_blank_values*, *strict_parsing* and *separator* parameters are
passed to :func:`urllib.parse.parse_qs` unchanged.


.. function:: parse_multipart(fp, pdict, encoding="utf-8", errors="replace")
.. function:: parse_multipart(fp, pdict, encoding="utf-8", errors="replace", separator="&")

Parse input of type :mimetype:`multipart/form-data` (for file uploads).
Arguments are *fp* for the input file, *pdict* for a dictionary containing
Expand All @@ -303,6 +303,9 @@ algorithms implemented in this module in other circumstances.
Added the *encoding* and *errors* parameters. For non-file fields, the
value is now a list of strings, not bytes.

.. versionchanged:: 3.9.2
Added the *separator* parameter.


.. function:: parse_header(string)

Expand Down
21 changes: 19 additions & 2 deletions Doc/library/urllib.parse.rst
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ or on combining URL components into a URL string.
now raise :exc:`ValueError`.


.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')

Parse a query string given as a string argument (data of type
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a
Expand All @@ -190,6 +190,9 @@ or on combining URL components into a URL string.
read. If set, then throws a :exc:`ValueError` if there are more than
*max_num_fields* fields read.

The optional argument *separator* is the symbol to use for separating the
query arguments. It defaults to ``&``.

Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
parameter set to ``True``) to convert such dictionaries into query
strings.
Expand All @@ -201,8 +204,14 @@ or on combining URL components into a URL string.
.. versionchanged:: 3.8
Added *max_num_fields* parameter.

.. versionchanged:: 3.9.2
Added *separator* parameter with the default value of ``&``. Python
versions earlier than Python 3.9.2 allowed using both ``;`` and ``&`` as
query parameter separator. This has been changed to allow only a single
separator key, with ``&`` as the default separator.


.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')

Parse a query string given as a string argument (data of type
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
Expand All @@ -226,6 +235,8 @@ or on combining URL components into a URL string.
read. If set, then throws a :exc:`ValueError` if there are more than
*max_num_fields* fields read.

The optional argument *separator* is the symbol to use for separating the query arguments. It defaults to ``&``.

Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
query strings.

Expand All @@ -235,6 +246,12 @@ or on combining URL components into a URL string.
.. versionchanged:: 3.8
Added *max_num_fields* parameter.

.. versionchanged:: 3.9.2
Added *separator* parameter with the default value of ``&``. Python
versions earlier than Python 3.9.2 allowed using both ``;`` and ``&`` as
query parameter separator. This has been changed to allow only a single
separator key, with ``&`` as the default separator.


.. function:: urlunparse(parts)

Expand Down
13 changes: 13 additions & 0 deletions Doc/whatsnew/3.6.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2443,3 +2443,16 @@ because of the behavior of the socket option ``SO_REUSEADDR`` in UDP. For more
details, see the documentation for ``loop.create_datagram_endpoint()``.
(Contributed by Kyle Stanley, Antoine Pitrou, and Yury Selivanov in
:issue:`37228`.)

Notable changes in Python 3.6.13
================================

Earlier Python versions allowed using both ``;`` and ``&`` as
query parameter separators in :func:`urllib.parse.parse_qs` and
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
newer W3C recommendations, this has been changed to allow only a single
separator key, with ``&`` as the default. This change also affects
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
functions internally. For more details, please see their respective
documentation.
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)
13 changes: 13 additions & 0 deletions Doc/whatsnew/3.7.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2556,3 +2556,16 @@ because of the behavior of the socket option ``SO_REUSEADDR`` in UDP. For more
details, see the documentation for ``loop.create_datagram_endpoint()``.
(Contributed by Kyle Stanley, Antoine Pitrou, and Yury Selivanov in
:issue:`37228`.)

Notable changes in Python 3.7.10
================================

Earlier Python versions allowed using both ``;`` and ``&`` as
query parameter separators in :func:`urllib.parse.parse_qs` and
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
newer W3C recommendations, this has been changed to allow only a single
separator key, with ``&`` as the default. This change also affects
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
functions internally. For more details, please see their respective
documentation.
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)
13 changes: 13 additions & 0 deletions Doc/whatsnew/3.8.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2234,3 +2234,16 @@ because of the behavior of the socket option ``SO_REUSEADDR`` in UDP. For more
details, see the documentation for ``loop.create_datagram_endpoint()``.
(Contributed by Kyle Stanley, Antoine Pitrou, and Yury Selivanov in
:issue:`37228`.)

Notable changes in Python 3.8.8
===============================

Earlier Python versions allowed using both ``;`` and ``&`` as
query parameter separators in :func:`urllib.parse.parse_qs` and
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
newer W3C recommendations, this has been changed to allow only a single
separator key, with ``&`` as the default. This change also affects
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
functions internally. For more details, please see their respective
documentation.
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)
13 changes: 13 additions & 0 deletions Doc/whatsnew/3.9.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1516,3 +1516,16 @@ invalid forms of parameterizing :class:`collections.abc.Callable` which may have
passed silently in Python 3.9.1. This :exc:`DeprecationWarning` will
become a :exc:`TypeError` in Python 3.10.
(Contributed by Ken Jin in :issue:`42195`.)

urllib.parse
------------

Earlier Python versions allowed using both ``;`` and ``&`` as
query parameter separators in :func:`urllib.parse.parse_qs` and
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
newer W3C recommendations, this has been changed to allow only a single
separator key, with ``&`` as the default. This change also affects
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
functions internally. For more details, please see their respective
documentation.
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)
23 changes: 14 additions & 9 deletions Lib/cgi.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,8 @@ def closelog():
# 0 ==> unlimited input
maxlen = 0

def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
def parse(fp=None, environ=os.environ, keep_blank_values=0,
strict_parsing=0, separator='&'):
"""Parse a query in the environment or from a file (default stdin)
Arguments, all optional:
Expand All @@ -134,6 +135,9 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
strict_parsing: flag indicating what to do with parsing errors.
If false (the default), errors are silently ignored.
If true, errors raise a ValueError exception.
separator: str. The symbol to use for separating the query arguments.
Defaults to &.
"""
if fp is None:
fp = sys.stdin
Expand All @@ -154,7 +158,7 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
if environ['REQUEST_METHOD'] == 'POST':
ctype, pdict = parse_header(environ['CONTENT_TYPE'])
if ctype == 'multipart/form-data':
return parse_multipart(fp, pdict)
return parse_multipart(fp, pdict, separator=separator)
elif ctype == 'application/x-www-form-urlencoded':
clength = int(environ['CONTENT_LENGTH'])
if maxlen and clength > maxlen:
Expand All @@ -178,10 +182,10 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
qs = ""
environ['QUERY_STRING'] = qs # XXX Shouldn't, really
return urllib.parse.parse_qs(qs, keep_blank_values, strict_parsing,
encoding=encoding)
encoding=encoding, separator=separator)


def parse_multipart(fp, pdict, encoding="utf-8", errors="replace"):
def parse_multipart(fp, pdict, encoding="utf-8", errors="replace", separator='&'):
"""Parse multipart input.
Arguments:
Expand All @@ -205,7 +209,7 @@ def parse_multipart(fp, pdict, encoding="utf-8", errors="replace"):
except KeyError:
pass
fs = FieldStorage(fp, headers=headers, encoding=encoding, errors=errors,
environ={'REQUEST_METHOD': 'POST'})
environ={'REQUEST_METHOD': 'POST'}, separator=separator)
return {k: fs.getlist(k) for k in fs}

def _parseparam(s):
Expand Down Expand Up @@ -315,7 +319,7 @@ class FieldStorage:
def __init__(self, fp=None, headers=None, outerboundary=b'',
environ=os.environ, keep_blank_values=0, strict_parsing=0,
limit=None, encoding='utf-8', errors='replace',
max_num_fields=None):
max_num_fields=None, separator='&'):
"""Constructor. Read multipart/* until last part.
Arguments, all optional:
Expand Down Expand Up @@ -363,6 +367,7 @@ def __init__(self, fp=None, headers=None, outerboundary=b'',
self.keep_blank_values = keep_blank_values
self.strict_parsing = strict_parsing
self.max_num_fields = max_num_fields
self.separator = separator
if 'REQUEST_METHOD' in environ:
method = environ['REQUEST_METHOD'].upper()
self.qs_on_post = None
Expand Down Expand Up @@ -589,7 +594,7 @@ def read_urlencoded(self):
query = urllib.parse.parse_qsl(
qs, self.keep_blank_values, self.strict_parsing,
encoding=self.encoding, errors=self.errors,
max_num_fields=self.max_num_fields)
max_num_fields=self.max_num_fields, separator=self.separator)
self.list = [MiniFieldStorage(key, value) for key, value in query]
self.skip_lines()

Expand All @@ -605,7 +610,7 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
query = urllib.parse.parse_qsl(
self.qs_on_post, self.keep_blank_values, self.strict_parsing,
encoding=self.encoding, errors=self.errors,
max_num_fields=self.max_num_fields)
max_num_fields=self.max_num_fields, separator=self.separator)
self.list.extend(MiniFieldStorage(key, value) for key, value in query)

klass = self.FieldStorageClass or self.__class__
Expand Down Expand Up @@ -649,7 +654,7 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
else self.limit - self.bytes_read
part = klass(self.fp, headers, ib, environ, keep_blank_values,
strict_parsing, limit,
self.encoding, self.errors, max_num_fields)
self.encoding, self.errors, max_num_fields, self.separator)

if max_num_fields is not None:
max_num_fields -= 1
Expand Down
29 changes: 24 additions & 5 deletions Lib/test/test_cgi.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,9 @@ def do_test(buf, method):
("", ValueError("bad query field: ''")),
("&", ValueError("bad query field: ''")),
("&&", ValueError("bad query field: ''")),
(";", ValueError("bad query field: ''")),
(";&;", ValueError("bad query field: ''")),
# Should the next few really be valid?
("=", {}),
("=&=", {}),
("=;=", {}),
# This rest seem to make sense
("=a", {'': ['a']}),
("&=a", ValueError("bad query field: ''")),
Expand All @@ -73,8 +70,6 @@ def do_test(buf, method):
("a=a+b&b=b+c", {'a': ['a b'], 'b': ['b c']}),
("a=a+b&a=b+a", {'a': ['a b', 'b a']}),
("x=1&y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
("x=1;y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
("Hbc5161168c542333633315dee1182227:key_store_seqid=400006&cuyer=r&view=bustomer&order_id=0bb2e248638833d48cb7fed300000f1b&expire=964546263&lobale=en-US&kid=130003.300038&ss=env",
{'Hbc5161168c542333633315dee1182227:key_store_seqid': ['400006'],
'cuyer': ['r'],
Expand Down Expand Up @@ -201,6 +196,30 @@ def test_strict(self):
else:
self.assertEqual(fs.getvalue(key), expect_val[0])

def test_separator(self):
parse_semicolon = [
("x=1;y=2.0", {'x': ['1'], 'y': ['2.0']}),
("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
(";", ValueError("bad query field: ''")),
(";;", ValueError("bad query field: ''")),
("=;a", ValueError("bad query field: 'a'")),
(";b=a", ValueError("bad query field: ''")),
("b;=a", ValueError("bad query field: 'b'")),
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
("a=a+b;a=b+a", {'a': ['a b', 'b a']}),
]
for orig, expect in parse_semicolon:
env = {'QUERY_STRING': orig}
fs = cgi.FieldStorage(separator=';', environ=env)
if isinstance(expect, dict):
for key in expect.keys():
expect_val = expect[key]
self.assertIn(key, fs)
if len(expect_val) > 1:
self.assertEqual(fs.getvalue(key), expect_val)
else:
self.assertEqual(fs.getvalue(key), expect_val[0])

def test_log(self):
cgi.log("Testing")

Expand Down
68 changes: 46 additions & 22 deletions Lib/test/test_urlparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,16 +32,10 @@
(b"&a=b", [(b'a', b'b')]),
(b"a=a+b&b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
(b"a=1&a=2", [(b'a', b'1'), (b'a', b'2')]),
(";", []),
(";;", []),
(";a=b", [('a', 'b')]),
("a=a+b;b=b+c", [('a', 'a b'), ('b', 'b c')]),
("a=1;a=2", [('a', '1'), ('a', '2')]),
(b";", []),
(b";;", []),
(b";a=b", [(b'a', b'b')]),
(b"a=a+b;b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
(b"a=1;a=2", [(b'a', b'1'), (b'a', b'2')]),
(";a=b", [(';a', 'b')]),
("a=a+b;b=b+c", [('a', 'a b;b=b c')]),
(b";a=b", [(b';a', b'b')]),
(b"a=a+b;b=b+c", [(b'a', b'a b;b=b c')]),
]

# Each parse_qs testcase is a two-tuple that contains
Expand All @@ -68,16 +62,10 @@
(b"&a=b", {b'a': [b'b']}),
(b"a=a+b&b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
(b"a=1&a=2", {b'a': [b'1', b'2']}),
(";", {}),
(";;", {}),
(";a=b", {'a': ['b']}),
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
("a=1;a=2", {'a': ['1', '2']}),
(b";", {}),
(b";;", {}),
(b";a=b", {b'a': [b'b']}),
(b"a=a+b;b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
(b"a=1;a=2", {b'a': [b'1', b'2']}),
(";a=b", {';a': ['b']}),
("a=a+b;b=b+c", {'a': ['a b;b=b c']}),
(b";a=b", {b';a': [b'b']}),
(b"a=a+b;b=b+c", {b'a':[ b'a b;b=b c']}),
]

class UrlParseTestCase(unittest.TestCase):
Expand Down Expand Up @@ -886,10 +874,46 @@ def test_parse_qsl_encoding(self):
def test_parse_qsl_max_num_fields(self):
with self.assertRaises(ValueError):
urllib.parse.parse_qs('&'.join(['a=a']*11), max_num_fields=10)
with self.assertRaises(ValueError):
urllib.parse.parse_qs(';'.join(['a=a']*11), max_num_fields=10)
urllib.parse.parse_qs('&'.join(['a=a']*10), max_num_fields=10)

def test_parse_qs_separator(self):
parse_qs_semicolon_cases = [
(";", {}),
(";;", {}),
(";a=b", {'a': ['b']}),
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
("a=1;a=2", {'a': ['1', '2']}),
(b";", {}),
(b";;", {}),
(b";a=b", {b'a': [b'b']}),
(b"a=a+b;b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
(b"a=1;a=2", {b'a': [b'1', b'2']}),
]
for orig, expect in parse_qs_semicolon_cases:
with self.subTest(f"Original: {orig!r}, Expected: {expect!r}"):
result = urllib.parse.parse_qs(orig, separator=';')
self.assertEqual(result, expect, "Error parsing %r" % orig)


def test_parse_qsl_separator(self):
parse_qsl_semicolon_cases = [
(";", []),
(";;", []),
(";a=b", [('a', 'b')]),
("a=a+b;b=b+c", [('a', 'a b'), ('b', 'b c')]),
("a=1;a=2", [('a', '1'), ('a', '2')]),
(b";", []),
(b";;", []),
(b";a=b", [(b'a', b'b')]),
(b"a=a+b;b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
(b"a=1;a=2", [(b'a', b'1'), (b'a', b'2')]),
]
for orig, expect in parse_qsl_semicolon_cases:
with self.subTest(f"Original: {orig!r}, Expected: {expect!r}"):
result = urllib.parse.parse_qsl(orig, separator=';')
self.assertEqual(result, expect, "Error parsing %r" % orig)


def test_urlencode_sequences(self):
# Other tests incidentally urlencode things; test non-covered cases:
# Sequence and object values.
Expand Down
Loading

0 comments on commit c9f0781

Please sign in to comment.