-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer overflow in wrap_lenfunc() on 64-bit build of Windows with len > 2**31-1 #70610
Comments
__len__() always returns an int which on windows machines is tied to the size of a c long and is always 32 bits even if it's compiled for 64 bit. len() however returns an int for values less than sys.maxint and a long above that. Returning an int in __len__() causes it to return negative lengths for objects of size greater than sys.maxint, below you can see a quick test on how to reproduce it. And here's an explanation from \u\Rhomboid on Reddit of why we believe the issue happens. "You'll only see that on Windows. The issue is that, confusingly, the range of the Python int type is tied to the range of the C long type. On Windows long is always 32 bits even on x64 systems, whereas on Unix systems it's the native machine word size. You can confirm this by checking sys.maxint, which will be 2**31 - 1 even with a 64 bit interpreter on Windows. The difference in behavior of foo.__len__ vs len(foo) is that the former goes through an attribute lookup which goes through the slot lookup stuff, finally ending in Python/typeobject.c:wrap_lenfunc(). The error is casting Py_ssize_t to long, which truncates on Windows x64 as Py_ssize_t is a proper signed 64 bit integer. And then it compounds the injury by creating a Python int object with PyInt_FromLong(), so this is hopelessly broken. In the case of len(foo), you end up in Python/bltinmodule.c:builtin_len() which skips all the attribute lookup stuff and uses the object protocol directly, calling PyObject_Size() and creating a Python object of the correct type via PyInt_FromSsize_t() which figures out whether a Python int or long is necessary. This is definitely a bug that should be reported. In 3.x the int/long distinction is gone and all integers are Python longs, but the bogus cast to a C long still exists in wrap_lenfunc():
That means the bug still exists even though the reason for its existence is gone! Oops. That needs to be updated to get rid of the cast and call PyLong_FromSsize_t()." Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul 2 2014, 15:12:11) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> a = 'a'*2500000000
>>> a.__len__()
-1794967296
>>> len(a)
2500000000L
>>> a = [1]*2500000000
>>> len(a)
2500000000L
>>> a.__len__()
-1794967296 |
If I understand the report correctly, this affects Python 3, too. Adding 3.5 and 3.6 to the versions. |
Hum, I don't understand this statement. It looks like the code uses Py_ssize_t everywhere and Py_ssize_t is supposed to be able to store a whole pointer, so be 64-bit when Python is compiled in 64-bit mode.
This is a 32-bit build ("win32"), no? max.size is 2147483647 on 32-bit mode if I recall correctly. On 64-bit, it's 9223372036854775807. By the way, on 64-bit, sys.maxsize == sys.maxint. In Python 2: len(obj) => builtin_len() => PyObject_Size() which returns a Py_ssize_t For string, PyObject_Size() => string_length() => Py_SIZE(obj) => ((PyVarObject *)obj)->ob_size PyVarObject.ob_size has the type Py_ssize_t. builtin_len() gets a Py_ssize_t which is converted to a Python int or long with PyInt_FromSsize_t(). PyInt_FromSsize_t() creates an int if the value fits into a C long, or it calls _PyLong_FromSsize_t(). Difference in Python 3: builtin_len() also gets a Py_ssize_t, but it calls PyLong_FromSsize_t() (since Python short integers as gone, long became int in Python 3). string_length() is replaced with unicode_length() => PyUnicode_GET_LENGTH() => (PyASCIIObject *)obj)->length and PyASCIIObject.length type is Py_ssize_t. |
No. "win32" appears for both 32-bit and 64-bit builds on Windows. This is a 64-bit build. I can reproduce the issue with a stock Python, on both 2.7 and 3.5. |
I tested Python 2.7 on Windows, downloaded from python.org:
Limits :
It looks like the issue is specific to Python compiled by Continuum ("Anaconda" flavor)? |
You need to call |
Ah ok :-) I didn't read carefully the initial message. Attached patch should fix the issue on Python 2.7. It uses the same function than builtin_len() to cast the C Py_ssize_t to a Python int or long. |
Could you write a test? May be on 3.x you can use range(sys.maxsize).__len__(). |
Done. |
Using a bigmem test for this issue looks excessive. Could we use the xrange() object for testing? Or weakref? |
xrange() constructor is implemented in C and looks to use C long for parameters. This issue is specific to 64-bit Windows with 32-bit C long, so xrange() doesn't seem to work here.
What do you mean? Which object? weakref.ref() has no length.
Well, I was too lazy to develop a C extension (add something to _testcapi) just for this tiny bug. I also like tests really testing the final feature (the bug reported in this issue). I failed to find an existing Python object with __len__() implemented in C which doesn't need to allocate 2^31 bytes to test this bug. |
mmap.mmap(-1, 2**31 + 5) could be used here. If the pages are never touched it won't increase the working set size. It merely maps the address range with demand-zero pages. Unpatched: >>> mmap.mmap(-1, 2**31 + 5).__len__()
-2147483643 Patched: >>> mmap.mmap(-1, 2**31 + 5).__len__()
2147483653L |
Yeah, I hesitated to use mmap, but I'm not really a big fan of using directly a mmap object in test_descr. Unit tests are supposed to only test one thing.
It depends on the Kernel configuration. On Solaris, overcommit is disable by default. I prefer to use a regular Python str. Or maybe use a custom Python object for which we can control the output of __len__()? |
What about xrange(-1, 2**31-1)? In any case the fact that xrange works not
I see that the sq_length slot in the weakproxy type is set to proxy_length. |
Nice. Its tp_getattro gets in the way of using __len__ directly, but this can be side stepped by manually binding the descriptor: class Test(object):
def __len__(self):
return 2**31 + 5 >>> t = Test()
>>> p = weakref.proxy(t)
>>> p.__len__()
2147483653L
>>> type(p).__len__.__get__(p)()
-2147483643 |
With fixed bpo-26428 you could use xrange() for testing this issue. |
Could you create a PR for your patch Victor? |
Thanks Zackery Spytz for the fix. It's now applied to 2.7, 3.7 and master branches. |
Oh. The test fails on Python 2.7 on Windows: https://buildbot.python.org/all/#/builders/26/builds/287 ERROR: test_wrap_lenfunc_bad_cast (test.test_descr.OperatorsTest) Traceback (most recent call last):
File "C:\buildbot.python.org\2.7.kloth-win64vs9\build\lib\test\test_descr.py", line 407, in test_wrap_lenfunc_bad_cast
self.assertEqual(xrange(sys.maxsize).__len__(), sys.maxsize)
OverflowError: Python int too large to convert to C long |
Can this be closed? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: