Permalink
Browse files

Issue #13703: add a way to randomize the hash values of basic types (…

…str, bytes, datetime)

in order to make algorithmic complexity attacks on (e.g.) web apps much more complicated.

The environment variable PYTHONHASHSEED and the new command line flag -R control this
behavior.
  • Loading branch information...
birkenfeld committed Feb 20, 2012
1 parent ec1712a commit 2daf6ae2495c862adf8bc717bfe9964081ea0b10
@@ -220,8 +220,12 @@ always available.
:const:`ignore_environment` :option:`-E`
:const:`verbose` :option:`-v`
:const:`bytes_warning` :option:`-b`
:const:`hash_randomization` :option:`-R`
============================= =============================
.. versionadded:: 3.1.5
The ``hash_randomization`` attribute.
.. data:: float_info
@@ -1265,6 +1265,8 @@ Basic customization
inheritance of :meth:`__hash__` will be blocked, just as if :attr:`__hash__`
had been explicitly set to :const:`None`.
See also the :option:`-R` command-line option.
.. method:: object.__bool__(self)
@@ -21,7 +21,7 @@ Command line
When invoking Python, you may specify any of these options::
python [-bBdEhiOsSuvVWx?] [-c command | -m module-name | script | - ] [args]
python [-bBdEhiORsSuvVWx?] [-c command | -m module-name | script | - ] [args]
The most common use case is, of course, a simple invocation of a script::
@@ -215,6 +215,29 @@ Miscellaneous options
Discard docstrings in addition to the :option:`-O` optimizations.
.. cmdoption:: -R
Turn on hash randomization, so that the :meth:`__hash__` values of str, bytes
and datetime objects are "salted" with an unpredictable random value.
Although they remain constant within an individual Python process, they are
not predictable between repeated invocations of Python.
This is intended to provide protection against a denial-of-service caused by
carefully-chosen inputs that exploit the worst case performance of a dict
insertion, O(n^2) complexity. See
http://www.ocert.org/advisories/ocert-2011-003.html for details.
Changing hash values affects the order in which keys are retrieved from a
dict. Although Python has never made guarantees about this ordering (and it
typically varies between 32-bit and 64-bit builds), enough real-world code
implicitly relies on this non-guaranteed behavior that the randomization is
disabled by default.
See also :envvar:`PYTHONHASHSEED`.
.. versionadded:: 3.1.5
.. cmdoption:: -s
Don't add user site directory to sys.path
@@ -314,6 +337,7 @@ Miscellaneous options
.. note:: The line numbers in error messages will be off by one.
Options you shouldn't use
~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -328,6 +352,7 @@ Options you shouldn't use
Reserved for alternative implementations of Python to use for their own
purposes.
.. _using-on-envvars:
Environment variables
@@ -435,6 +460,27 @@ These environment variables influence Python's behavior.
import of source modules.
.. envvar:: PYTHONHASHSEED
If this variable is set to ``random``, the effect is the same as specifying
the :option:`-R` option: a random value is used to seed the hashes of str,
bytes and datetime objects.
If :envvar:`PYTHONHASHSEED` is set to an integer value, it is used as a fixed
seed for generating the hash() of the types covered by the hash
randomization.
Its purpose is to allow repeatable hashing, such as for selftests for the
interpreter itself, or to allow a cluster of python processes to share hash
values.
The integer must be a decimal number in the range [0,4294967295]. Specifying
the value 0 will lead to the same hash values as when hash randomization is
disabled.
.. versionadded:: 3.1.5
.. envvar:: PYTHONIOENCODING
Overrides the encoding used for stdin/stdout/stderr, in the syntax
@@ -473,6 +473,12 @@ PyAPI_FUNC(void) Py_ReprLeave(PyObject *);
PyAPI_FUNC(long) _Py_HashDouble(double);
PyAPI_FUNC(long) _Py_HashPointer(void*);
typedef struct {
long prefix;
long suffix;
} _Py_HashSecret_t;
PyAPI_DATA(_Py_HashSecret_t) _Py_HashSecret;
/* Helper for passing objects to printf and the like */
#define PyObject_REPR(obj) _PyUnicode_AsString(PyObject_Repr(obj))
@@ -19,6 +19,7 @@ PyAPI_DATA(int) Py_DivisionWarningFlag;
PyAPI_DATA(int) Py_DontWriteBytecodeFlag;
PyAPI_DATA(int) Py_NoUserSiteDirectory;
PyAPI_DATA(int) Py_UnbufferedStdioFlag;
PyAPI_DATA(int) Py_HashRandomizationFlag;
/* this is a wrapper around getenv() that pays attention to
Py_IgnoreEnvironmentFlag. It should be used for getting variables like
@@ -174,6 +174,8 @@ typedef void (*PyOS_sighandler_t)(int);
PyAPI_FUNC(PyOS_sighandler_t) PyOS_getsig(int);
PyAPI_FUNC(PyOS_sighandler_t) PyOS_setsig(int, PyOS_sighandler_t);
/* Random */
PyAPI_FUNC(int) _PyOS_URandom (void *buffer, Py_ssize_t size);
#ifdef __cplusplus
}
@@ -31,7 +31,9 @@
Compact encoding::
>>> import json
>>> json.dumps([1,2,3,{'4': 5, '6': 7}], separators=(',', ':'))
>>> from collections import OrderedDict
>>> mydict = OrderedDict([('4', 5), ('6', 7)])
>>> json.dumps([1,2,3,mydict], separators=(',', ':'))
'[1,2,3,{"4":5,"6":7}]'
Pretty printing::
@@ -611,23 +611,6 @@ def _pickle_statvfs_result(sr):
except NameError: # statvfs_result may not exist
pass
if not _exists("urandom"):
def urandom(n):
"""urandom(n) -> str
Return a string of n random bytes suitable for cryptographic use.
"""
try:
_urandomfd = open("/dev/urandom", O_RDONLY)
except (OSError, IOError):
raise NotImplementedError("/dev/urandom (or equivalent) not found")
bs = b""
while len(bs) < n:
bs += read(_urandomfd, n - len(bs))
close(_urandomfd)
return bs
# Supply os.popen()
def popen(cmd, mode="r", buffering=-1):
if not isinstance(cmd, str):
@@ -14,7 +14,7 @@ class BasicTestMappingProtocol(unittest.TestCase):
def _reference(self):
"""Return a dictionary of values which are invariant by storage
in the object under test."""
return {1:2, "key1":"value1", "key2":(1,2,3)}
return {"1": "2", "key1":"value1", "key2":(1,2,3)}
def _empty_mapping(self):
"""Return an empty mapping object"""
return self.type2test()
@@ -428,6 +428,11 @@ def main(tests=None, testdir=None, verbose=0, quiet=False, generate=False,
except ValueError:
print("Couldn't find starting test (%s), using all tests" % start)
if randomize:
hashseed = os.getenv('PYTHONHASHSEED')
if not hashseed:
os.environ['PYTHONHASHSEED'] = str(random_seed)
os.execv(sys.executable, [sys.executable] + sys.argv)
return
random.seed(random_seed)
print("Using random seed", random_seed)
random.shuffle(tests)
@@ -3,7 +3,6 @@
import sys
import os
import re
import os.path
import tempfile
import subprocess
@@ -19,11 +18,15 @@ def _assert_python(expected_success, *args, **env_vars):
cmd_line = [sys.executable]
if not env_vars:
cmd_line.append('-E')
cmd_line.extend(args)
# Need to preserve the original environment, for in-place testing of
# shared library builds.
env = os.environ.copy()
# But a special flag that can be set to override -- in this case, the
# caller is responsible to pass the full environment.
if env_vars.pop('__cleanenv', None):
env = {}
env.update(env_vars)
cmd_line.extend(args)
p = subprocess.Popen(cmd_line, stdin=subprocess.PIPE,
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
env=env)
@@ -4,7 +4,6 @@
import os
import test.support, unittest
import os
import sys
import subprocess
@@ -190,6 +189,22 @@ def test_large_PYTHONPATH(self):
self.assertTrue(path1.encode('ascii') in stdout)
self.assertTrue(path2.encode('ascii') in stdout)
def test_hash_randomization(self):
# Verify that -R enables hash randomization:
self.verify_valid_flag('-R')
hashes = []
for i in range(2):
code = 'print(hash("spam"))'
data, rc = self.start_python_and_exit_code('-R', '-c', code)
self.assertEqual(rc, 0)
hashes.append(data)
self.assertNotEqual(hashes[0], hashes[1])
# Verify that sys.flags contains hash_randomization
code = 'import sys; print("random is", sys.flags.hash_randomization)'
data, rc = self.start_python_and_exit_code('-R', '-c', code)
self.assertEqual(rc, 0)
self.assertIn(b'random is 1', data)
def test_main():
test.support.run_unittest(CmdLineTest)
@@ -4300,8 +4300,18 @@ class C(metaclass=M):
def test_repr(self):
# Testing dict_proxy.__repr__
def sorted_dict_repr(repr_):
# Given the repr of a dict, sort the keys
assert repr_.startswith('{')
assert repr_.endswith('}')
kvs = repr_[1:-1].split(', ')
return '{' + ', '.join(sorted(kvs)) + '}'
dict_ = {k: v for k, v in self.C.__dict__.items()}
self.assertEqual(repr(self.C.__dict__), 'dict_proxy({!r})'.format(dict_))
repr_ = repr(self.C.__dict__)
self.assert_(repr_.startswith('dict_proxy('))
self.assert_(repr_.endswith(')'))
self.assertEqual(sorted_dict_repr(repr_[len('dict_proxy('):-len(')')]),
sorted_dict_repr('{!r}'.format(dict_)))
class PTypesLongInitTest(unittest.TestCase):
@@ -3,10 +3,16 @@
#
# Also test that hash implementations are inherited as expected
import datetime
import os
import struct
import unittest
from test import support
from test.script_helper import assert_python_ok
from collections import Hashable
IS_64BIT = (struct.calcsize('l') == 8)
class HashEqualityTestCase(unittest.TestCase):
@@ -118,10 +124,92 @@ def test_hashes(self):
for obj in self.hashes_to_check:
self.assertEqual(hash(obj), _default_hash(obj))
class HashRandomizationTests(unittest.TestCase):
# Each subclass should define a field "repr_", containing the repr() of
# an object to be tested
def get_hash_command(self, repr_):
return 'print(hash(%s))' % repr_
def get_hash(self, repr_, seed=None):
env = os.environ.copy()
env['__cleanenv'] = True # signal to assert_python not to do a copy
# of os.environ on its own
if seed is not None:
env['PYTHONHASHSEED'] = str(seed)
else:
env.pop('PYTHONHASHSEED', None)
out = assert_python_ok(
'-c', self.get_hash_command(repr_),
**env)
stdout = out[1].strip()
return int(stdout)
def test_randomized_hash(self):
# two runs should return different hashes
run1 = self.get_hash(self.repr_, seed='random')
run2 = self.get_hash(self.repr_, seed='random')
self.assertNotEqual(run1, run2)
class StringlikeHashRandomizationTests(HashRandomizationTests):
def test_null_hash(self):
# PYTHONHASHSEED=0 disables the randomized hash
if IS_64BIT:
known_hash_of_obj = 1453079729188098211
else:
known_hash_of_obj = -1600925533
# Randomization is disabled by default:
self.assertEqual(self.get_hash(self.repr_), known_hash_of_obj)
# It can also be disabled by setting the seed to 0:
self.assertEqual(self.get_hash(self.repr_, seed=0), known_hash_of_obj)
def test_fixed_hash(self):
# test a fixed seed for the randomized hash
# Note that all types share the same values:
if IS_64BIT:
h = -4410911502303878509
else:
h = -206076799
self.assertEqual(self.get_hash(self.repr_, seed=42), h)
class StrHashRandomizationTests(StringlikeHashRandomizationTests):
repr_ = repr('abc')
def test_empty_string(self):
self.assertEqual(hash(""), 0)
class BytesHashRandomizationTests(StringlikeHashRandomizationTests):
repr_ = repr(b'abc')
def test_empty_string(self):
self.assertEqual(hash(b""), 0)
class DatetimeTests(HashRandomizationTests):
def get_hash_command(self, repr_):
return 'import datetime; print(hash(%s))' % repr_
class DatetimeDateTests(DatetimeTests):
repr_ = repr(datetime.date(1066, 10, 14))
class DatetimeDatetimeTests(DatetimeTests):
repr_ = repr(datetime.datetime(1, 2, 3, 4, 5, 6, 7))
class DatetimeTimeTests(DatetimeTests):
repr_ = repr(datetime.time(0))
def test_main():
support.run_unittest(HashEqualityTestCase,
HashInheritanceTestCase,
HashBuiltinsTestCase)
HashInheritanceTestCase,
HashBuiltinsTestCase,
StrHashRandomizationTests,
BytesHashRandomizationTests,
DatetimeDateTests,
DatetimeDatetimeTests,
DatetimeTimeTests)
if __name__ == "__main__":
Oops, something went wrong.

0 comments on commit 2daf6ae

Please sign in to comment.