Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-42967: only use '&' as a query string separator #24297

Merged
merged 13 commits into from Feb 14, 2021
9 changes: 6 additions & 3 deletions Doc/library/cgi.rst
Expand Up @@ -277,14 +277,14 @@ These are useful if you want more control, or if you want to employ some of the
algorithms implemented in this module in other circumstances.


.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False)
.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False, separator="&")

Parse a query in the environment or from a file (the file defaults to
``sys.stdin``). The *keep_blank_values* and *strict_parsing* parameters are
``sys.stdin``). The *keep_blank_values*, *strict_parsing* and *separator* parameters are
passed to :func:`urllib.parse.parse_qs` unchanged.


.. function:: parse_multipart(fp, pdict, encoding="utf-8", errors="replace")
.. function:: parse_multipart(fp, pdict, encoding="utf-8", errors="replace", separator="&")

Parse input of type :mimetype:`multipart/form-data` (for file uploads).
Arguments are *fp* for the input file, *pdict* for a dictionary containing
Expand All @@ -303,6 +303,9 @@ algorithms implemented in this module in other circumstances.
Added the *encoding* and *errors* parameters. For non-file fields, the
value is now a list of strings, not bytes.

.. versionchanged:: 3.10
Added the *separator* parameter.


.. function:: parse_header(string)

Expand Down
16 changes: 14 additions & 2 deletions Doc/library/urllib.parse.rst
Expand Up @@ -165,7 +165,7 @@ or on combining URL components into a URL string.
now raise :exc:`ValueError`.


.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')

Parse a query string given as a string argument (data of type
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a
Expand All @@ -190,6 +190,8 @@ or on combining URL components into a URL string.
read. If set, then throws a :exc:`ValueError` if there are more than
*max_num_fields* fields read.

The optional argument *separator* is the symbol to use for separating the query arguments. It defaults to `&`.

Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
parameter set to ``True``) to convert such dictionaries into query
strings.
Expand All @@ -201,8 +203,12 @@ or on combining URL components into a URL string.
.. versionchanged:: 3.8
Added *max_num_fields* parameter.

.. versionchanged:: 3.10
Add *separator* parameter with the default value of `&`. Python versions earlier than Python 3.10 allowed using both ";" and "&" as
AdamGold marked this conversation as resolved.
Show resolved Hide resolved
query parameter separator. This has been changed to allow only a single separator key, with "&" as the default separator.


.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')

Parse a query string given as a string argument (data of type
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
Expand All @@ -226,6 +232,8 @@ or on combining URL components into a URL string.
read. If set, then throws a :exc:`ValueError` if there are more than
*max_num_fields* fields read.

The optional argument *separator* is the symbol to use for separating the query arguments. It defaults to `&`.

Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
query strings.

Expand All @@ -235,6 +243,10 @@ or on combining URL components into a URL string.
.. versionchanged:: 3.8
Added *max_num_fields* parameter.

.. versionchanged:: 3.10
Add *separator* parameter with the default value of `&`. Python versions earlier than Python 3.10 allowed using both ";" and "&" as
query parameter separator. This has been changed to allow only a single separator key, with "&" as the default separator.


.. function:: urlunparse(parts)

Expand Down
13 changes: 13 additions & 0 deletions Doc/whatsnew/3.10.rst
Expand Up @@ -459,6 +459,19 @@ Add new method :meth:`~unittest.TestCase.assertNoLogs` to complement the
existing :meth:`~unittest.TestCase.assertLogs`. (Contributed by Kit Yan Choi
in :issue:`39385`.)

urllib.parse
------------

Python versions earlier than Python 3.10 allowed using both ";" and "&" as
AdamGold marked this conversation as resolved.
Show resolved Hide resolved
query parameter separators in :func:`urllib.parse.parse_qs` and
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
newer W3C recommendations, this has been changed to allow only a single
separator key, with "&" as the default. This change also affects
AdamGold marked this conversation as resolved.
Show resolved Hide resolved
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
functions internally. For more details, please see their respective
documentation.
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)

xml
---

Expand Down
13 changes: 13 additions & 0 deletions Doc/whatsnew/3.6.rst
Expand Up @@ -2443,3 +2443,16 @@ because of the behavior of the socket option ``SO_REUSEADDR`` in UDP. For more
details, see the documentation for ``loop.create_datagram_endpoint()``.
(Contributed by Kyle Stanley, Antoine Pitrou, and Yury Selivanov in
:issue:`37228`.)

Notable changes in Python 3.6.13
================================

Earlier Python versions allowed using both ";" and "&" as
query parameter separators in :func:`urllib.parse.parse_qs` and
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
newer W3C recommendations, this has been changed to allow only a single
separator key, with "&" as the default. This change also affects
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
functions internally. For more details, please see their respective
documentation.
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)
13 changes: 13 additions & 0 deletions Doc/whatsnew/3.7.rst
Expand Up @@ -2557,3 +2557,16 @@ because of the behavior of the socket option ``SO_REUSEADDR`` in UDP. For more
details, see the documentation for ``loop.create_datagram_endpoint()``.
(Contributed by Kyle Stanley, Antoine Pitrou, and Yury Selivanov in
:issue:`37228`.)

Notable changes in Python 3.7.10
================================

Earlier Python versions allowed using both ";" and "&" as
AdamGold marked this conversation as resolved.
Show resolved Hide resolved
query parameter separators in :func:`urllib.parse.parse_qs` and
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
newer W3C recommendations, this has been changed to allow only a single
separator key, with "&" as the default. This change also affects
AdamGold marked this conversation as resolved.
Show resolved Hide resolved
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
functions internally. For more details, please see their respective
documentation.
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)
13 changes: 13 additions & 0 deletions Doc/whatsnew/3.8.rst
Expand Up @@ -2234,3 +2234,16 @@ because of the behavior of the socket option ``SO_REUSEADDR`` in UDP. For more
details, see the documentation for ``loop.create_datagram_endpoint()``.
(Contributed by Kyle Stanley, Antoine Pitrou, and Yury Selivanov in
:issue:`37228`.)

Notable changes in Python 3.8.8
===============================

Earlier Python versions allowed using both ";" and "&" as
query parameter separators in :func:`urllib.parse.parse_qs` and
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
newer W3C recommendations, this has been changed to allow only a single
separator key, with "&" as the default. This change also affects
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
functions internally. For more details, please see their respective
documentation.
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)
15 changes: 14 additions & 1 deletion Doc/whatsnew/3.9.rst
Expand Up @@ -1515,4 +1515,17 @@ need to account for this change. A :exc:`DeprecationWarning` may be emitted for
invalid forms of parameterizing :class:`collections.abc.Callable` which may have
passed silently in Python 3.9.1. This :exc:`DeprecationWarning` will
become a :exc:`TypeError` in Python 3.10.
(Contributed by Ken Jin in :issue:`42195`.)
(Contributed by Ken Jin in :issue:`42195`.)

urllib.parse
------------

Earlier Python versions allowed using both ";" and "&" as
query parameter separators in :func:`urllib.parse.parse_qs` and
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
newer W3C recommendations, this has been changed to allow only a single
separator key, with "&" as the default. This change also affects
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
functions internally. For more details, please see their respective
documentation.
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)
23 changes: 14 additions & 9 deletions Lib/cgi.py
Expand Up @@ -115,7 +115,8 @@ def closelog():
# 0 ==> unlimited input
maxlen = 0

def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
def parse(fp=None, environ=os.environ, keep_blank_values=0,
AdamGold marked this conversation as resolved.
Show resolved Hide resolved
strict_parsing=0, separator='&'):
"""Parse a query in the environment or from a file (default stdin)

Arguments, all optional:
Expand All @@ -134,6 +135,9 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
strict_parsing: flag indicating what to do with parsing errors.
If false (the default), errors are silently ignored.
If true, errors raise a ValueError exception.

separator: str. The symbol to use for separating the query arguments.
Defaults to &.
"""
if fp is None:
fp = sys.stdin
Expand All @@ -154,7 +158,7 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
if environ['REQUEST_METHOD'] == 'POST':
ctype, pdict = parse_header(environ['CONTENT_TYPE'])
if ctype == 'multipart/form-data':
return parse_multipart(fp, pdict)
return parse_multipart(fp, pdict, separator=separator)
elif ctype == 'application/x-www-form-urlencoded':
clength = int(environ['CONTENT_LENGTH'])
if maxlen and clength > maxlen:
Expand All @@ -178,10 +182,10 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
qs = ""
environ['QUERY_STRING'] = qs # XXX Shouldn't, really
return urllib.parse.parse_qs(qs, keep_blank_values, strict_parsing,
encoding=encoding)
encoding=encoding, separator=separator)


def parse_multipart(fp, pdict, encoding="utf-8", errors="replace"):
def parse_multipart(fp, pdict, encoding="utf-8", errors="replace", separator='&'):
AdamGold marked this conversation as resolved.
Show resolved Hide resolved
"""Parse multipart input.

Arguments:
Expand All @@ -205,7 +209,7 @@ def parse_multipart(fp, pdict, encoding="utf-8", errors="replace"):
except KeyError:
pass
fs = FieldStorage(fp, headers=headers, encoding=encoding, errors=errors,
environ={'REQUEST_METHOD': 'POST'})
environ={'REQUEST_METHOD': 'POST'}, separator=separator)
return {k: fs.getlist(k) for k in fs}

def _parseparam(s):
Expand Down Expand Up @@ -315,7 +319,7 @@ class FieldStorage:
def __init__(self, fp=None, headers=None, outerboundary=b'',
environ=os.environ, keep_blank_values=0, strict_parsing=0,
limit=None, encoding='utf-8', errors='replace',
max_num_fields=None):
max_num_fields=None, separator='&'):
AdamGold marked this conversation as resolved.
Show resolved Hide resolved
"""Constructor. Read multipart/* until last part.

Arguments, all optional:
Expand Down Expand Up @@ -363,6 +367,7 @@ def __init__(self, fp=None, headers=None, outerboundary=b'',
self.keep_blank_values = keep_blank_values
self.strict_parsing = strict_parsing
self.max_num_fields = max_num_fields
self.separator = separator
if 'REQUEST_METHOD' in environ:
method = environ['REQUEST_METHOD'].upper()
self.qs_on_post = None
Expand Down Expand Up @@ -589,7 +594,7 @@ def read_urlencoded(self):
query = urllib.parse.parse_qsl(
qs, self.keep_blank_values, self.strict_parsing,
encoding=self.encoding, errors=self.errors,
max_num_fields=self.max_num_fields)
max_num_fields=self.max_num_fields, separator=self.separator)
self.list = [MiniFieldStorage(key, value) for key, value in query]
self.skip_lines()

Expand All @@ -605,7 +610,7 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
query = urllib.parse.parse_qsl(
self.qs_on_post, self.keep_blank_values, self.strict_parsing,
encoding=self.encoding, errors=self.errors,
max_num_fields=self.max_num_fields)
max_num_fields=self.max_num_fields, separator=self.separator)
self.list.extend(MiniFieldStorage(key, value) for key, value in query)

klass = self.FieldStorageClass or self.__class__
Expand Down Expand Up @@ -649,7 +654,7 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
else self.limit - self.bytes_read
part = klass(self.fp, headers, ib, environ, keep_blank_values,
strict_parsing, limit,
self.encoding, self.errors, max_num_fields)
self.encoding, self.errors, max_num_fields, self.separator)

if max_num_fields is not None:
max_num_fields -= 1
Expand Down
29 changes: 24 additions & 5 deletions Lib/test/test_cgi.py
Expand Up @@ -53,12 +53,9 @@ def do_test(buf, method):
("", ValueError("bad query field: ''")),
("&", ValueError("bad query field: ''")),
("&&", ValueError("bad query field: ''")),
(";", ValueError("bad query field: ''")),
(";&;", ValueError("bad query field: ''")),
# Should the next few really be valid?
("=", {}),
("=&=", {}),
("=;=", {}),
# This rest seem to make sense
("=a", {'': ['a']}),
("&=a", ValueError("bad query field: ''")),
Expand All @@ -73,8 +70,6 @@ def do_test(buf, method):
("a=a+b&b=b+c", {'a': ['a b'], 'b': ['b c']}),
("a=a+b&a=b+a", {'a': ['a b', 'b a']}),
("x=1&y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
("x=1;y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
("Hbc5161168c542333633315dee1182227:key_store_seqid=400006&cuyer=r&view=bustomer&order_id=0bb2e248638833d48cb7fed300000f1b&expire=964546263&lobale=en-US&kid=130003.300038&ss=env",
{'Hbc5161168c542333633315dee1182227:key_store_seqid': ['400006'],
'cuyer': ['r'],
Expand Down Expand Up @@ -201,6 +196,30 @@ def test_strict(self):
else:
self.assertEqual(fs.getvalue(key), expect_val[0])

def test_separator(self):
parse_semicolon = [
("x=1;y=2.0", {'x': ['1'], 'y': ['2.0']}),
AdamGold marked this conversation as resolved.
Show resolved Hide resolved
("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
(";", ValueError("bad query field: ''")),
(";;", ValueError("bad query field: ''")),
("=;a", ValueError("bad query field: 'a'")),
(";b=a", ValueError("bad query field: ''")),
("b;=a", ValueError("bad query field: 'b'")),
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
("a=a+b;a=b+a", {'a': ['a b', 'b a']}),
]
for orig, expect in parse_semicolon:
env = {'QUERY_STRING': orig}
fs = cgi.FieldStorage(separator=';', environ=env)
if isinstance(expect, dict):
for key in expect.keys():
expect_val = expect[key]
self.assertIn(key, fs)
if len(expect_val) > 1:
self.assertEqual(fs.getvalue(key), expect_val)
else:
self.assertEqual(fs.getvalue(key), expect_val[0])

def test_log(self):
cgi.log("Testing")

Expand Down