Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-deterministic behavior of int subclass #58835

Closed
brechtm mannequin opened this issue Apr 20, 2012 · 15 comments
Closed

non-deterministic behavior of int subclass #58835

brechtm mannequin opened this issue Apr 20, 2012 · 15 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@brechtm
Copy link
Mannequin

brechtm mannequin commented Apr 20, 2012

BPO 14630
Nosy @mdickinson, @pitrou, @skrah
Files
  • pdf.py
  • issue14630.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/mdickinson'
    closed_at = <Date 2012-04-20.21:02:16.465>
    created_at = <Date 2012-04-20.07:51:22.379>
    labels = ['interpreter-core', 'type-bug']
    title = 'non-deterministic behavior of int subclass'
    updated_at = <Date 2012-04-20.21:02:16.464>
    user = 'https://bugs.python.org/brechtm'

    bugs.python.org fields:

    activity = <Date 2012-04-20.21:02:16.464>
    actor = 'mark.dickinson'
    assignee = 'mark.dickinson'
    closed = True
    closed_date = <Date 2012-04-20.21:02:16.465>
    closer = 'mark.dickinson'
    components = ['Interpreter Core']
    creation = <Date 2012-04-20.07:51:22.379>
    creator = 'brechtm'
    dependencies = []
    files = ['25286', '25291']
    hgrepos = []
    issue_num = 14630
    keywords = ['patch']
    message_count = 15.0
    messages = ['158803', '158812', '158814', '158815', '158816', '158817', '158819', '158822', '158823', '158854', '158861', '158863', '158877', '158886', '158888']
    nosy_count = 5.0
    nosy_names = ['mark.dickinson', 'pitrou', 'skrah', 'python-dev', 'brechtm']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue14630'
    versions = ['Python 3.2', 'Python 3.3']

    @brechtm
    Copy link
    Mannequin Author

    brechtm mannequin commented Apr 20, 2012

    I have subclassed int to add an extra attribute:

    class Integer(int):
        def __new__(cls, value, base=10, indirect=False):
            try:
                obj = int.__new__(cls, value, base)
            except TypeError:
                obj = int.__new__(cls, value)
            return obj
    
        def __init__(self, value, base=10, indirect=False):
            self.indirect = indirect

    Using this class in my application, int(Integer(b'0')) sometimes returns a value of 48 (= ord('0')!) or 192, instead of the correct value 0. str(Integer(b'0')) always returns '0'. This seems to only occur for the value 0. First decoding b'0' to a string, or passing int(b'0') to Integer makes no difference. The problem lies with converting an Integer(0) to an int with int().

    Furthermore, this occurs in a random way. Subsequent runs will produce 48 or 192 at different points in the application (a parser). Both Python 3.2.2 and 3.2.3 behave the same (32-bit, Windows XP). Apparently, the 64-bit Windows Python 3.2.3 does not show this behavior [2]. I haven't tested on other operating systems.

    I cannot seem to reproduce this in a simple test program. The following produces no output:

    for i in range(100000):
        integer = int(Integer(b'0'))
        if integer > 0:
            print(integer)

    Checking for the condition int(Integer()) > 0 in my application (when I know the argument to Integer is b'0') and conditionally printing int(Integer(b'0')) a number of times, the results 48 and 192 do show up now and then.

    As I can't reproduce the problem in a short test program, I have attached the relevant code. It is basically a PDF parser. The output for this [2] PDF file is, for example:

    b'0' 0 Integer(0) 192 0 b'0' 16853712
    b'0' 0 Integer(0) 48 0 b'0' 16938088
    b'0' 0 Integer(0) 192 0 b'0' 17421696
    b'0' 0 Integer(0) 48 0 b'0' 23144888
    b'0' 0 Integer(0) 48 0 b'0' 23185408
    b'0' 0 Integer(0) 48 0 b'0' 23323272

    Search for print function calls in the code to see what this represents.

    [1] http://stackoverflow.com/questions/10230604/non-deterministic-behavior-of-int-subclass#comment13156508_10230604
    [2] http://www.gust.org.pl/projects/e-foundry/math-support/vieth2008.pdf

    @brechtm brechtm mannequin added the type-bug An unexpected behavior, bug, or error label Apr 20, 2012
    @mdickinson
    Copy link
    Member

    I can reproduce this on a 32-bit OS X build of the default branch, so it doesn't seem to be Windows specific (though it may be 32-bit specific).

    Brecht, if you can find a way to reduce the size of your example at all that would be really helpful.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 20, 2012

    Reproduced under 32-bit Linux.
    The problem seems to be that Py_SIZE(x) == 0 when x is Integer(0), but ob_digit[0] is still supposed to be significant. There's probably some overwriting with the trailing attributes.
    By forcing Py_SIZE(x) == 1, the bug disappears, but it probably breaks lots of other stuff in longobject.c.

    @mdickinson
    Copy link
    Member

    If we're accessing ob_digit[0] when Py_SIZE(x) == 0, that sounds like a bug to me.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 20, 2012

    If we're accessing ob_digit[0] when Py_SIZE(x) == 0, that sounds like a
    bug to me.

    _PyLong_Copy does.
    It's ok as long as the object is int(0), because it's part of the small ints and its allocated size is one digit.

    The following hack seems to fix the issue here. Perhaps we can simply fix _PyLong_Copy, but I wonder how many other parts of longobject.c rely on accessing ob_digit[0].

    diff --git a/Objects/longobject.c b/Objects/longobject.c
    --- a/Objects/longobject.c
    +++ b/Objects/longobject.c
    @@ -4194,6 +4194,8 @@ long_subtype_new(PyTypeObject *type, PyO
         n = Py_SIZE(tmp);
         if (n < 0)
             n = -n;
    +    if (n == 0)
    +        n = 1;
         newobj = (PyLongObject *)type->tp_alloc(type, n);
         if (newobj == NULL) {
             Py_DECREF(tmp);
    diff --git a/Objects/object.c b/Objects/object.c
    --- a/Objects/object.c
    +++ b/Objects/object.c
    @@ -1010,6 +1010,8 @@ PyObject **
             tsize = ((PyVarObject *)obj)->ob_size;
             if (tsize < 0)
                 tsize = -tsize;
    +        if (tsize == 0 && PyLong_Check(obj))
    +            tsize = 1;
             size = _PyObject_VAR_SIZE(tp, tsize);
     
             dictoffset += (long)size;
    @@ -1090,6 +1092,8 @@ PyObject *
                     tsize = ((PyVarObject *)obj)->ob_size;
                     if (tsize < 0)
                         tsize = -tsize;
    +                if (tsize == 0 && PyLong_Check(obj))
    +                    tsize = 1;
                     size = _PyObject_VAR_SIZE(tp, tsize);
     
                     dictoffset += (long)size;

    @mdickinson
    Copy link
    Member

    _PyLong_Copy does.

    Grr. So it does. That at least should be fixed, but I agree that it would be good to have the added protection of ensuring that we always allocate space for at least one limb.

    We should also check whether 2.7 is susceptible.

    @mdickinson mdickinson self-assigned this Apr 20, 2012
    @mdickinson
    Copy link
    Member

    Self-contained example that fails for me on 32-bit OS X.

    class Integer(int):
        def __new__(cls, value, base=10, indirect=False):
            try:
                obj = int.__new__(cls, value, base)
            except TypeError:
                obj = int.__new__(cls, value)
            return obj
    
        def __init__(self, value, base=10, indirect=False):
            self.indirect = indirect
    
    
    integers = []
    for i in range(1000):
        integer = Integer(b'0')
        integers.append(integer)
    
    for integer in integers:
        assert int(integer) == 0

    @pitrou
    Copy link
    Member

    pitrou commented Apr 20, 2012

    The fix for _PyLong_Copy is the following:

    diff --git a/Objects/longobject.c b/Objects/longobject.c
    --- a/Objects/longobject.c
    +++ b/Objects/longobject.c
    @@ -156,7 +156,7 @@ PyObject *
         if (i < 0)
             i = -(i);
         if (i < 2) {
    -        sdigit ival = src->ob_digit[0];
    +        sdigit ival = (i == 0) ? 0 : src->ob_digit[0];
             if (Py_SIZE(src) < 0)
                 ival = -ival;
             CHECK_SMALL_INT(ival);

    @pitrou pitrou added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Apr 20, 2012
    @mdickinson
    Copy link
    Member

    Using MEDIUM_VALUE also works.

    I'll cook up a patch tonight, after work.

    diff -r 6762b943ee59 Objects/longobject.c
    --- a/Objects/longobject.c	Tue Apr 17 21:42:07 2012 -0400
    +++ b/Objects/longobject.c	Fri Apr 20 13:18:01 2012 +0100
    @@ -156,9 +156,7 @@
         if (i < 0)
             i = -(i);
         if (i < 2) {
    -        sdigit ival = src->ob_digit[0];
    -        if (Py_SIZE(src) < 0)
    -            ival = -ival;
    +        sdigit ival = MEDIUM_VALUE(src);
             CHECK_SMALL_INT(ival);
         }
         result = _PyLong_New(i);

    @mdickinson mdickinson self-assigned this Apr 20, 2012
    @mdickinson
    Copy link
    Member

    Here's the patch. I searched through the rest of Objects/longobject.c for other occurrences of [0], and found nothing else that looked suspicious, so I'm reasonably confident that this was an isolated case.

    @mdickinson
    Copy link
    Member

    Also, Python 2.7 looks safe here.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 20, 2012

    The patch works fine here, and the test exercises the issue correctly.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 20, 2012

    The patch looks good to me.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 20, 2012

    New changeset cdcc6b489862 by Mark Dickinson in branch '3.2':
    Issue bpo-14630: Fix an incorrect access of ob_digit[0] for a zero instance of an int subclass.
    http://hg.python.org/cpython/rev/cdcc6b489862

    New changeset c7b0f711dc15 by Mark Dickinson in branch 'default':
    Issue bpo-14630: Merge fix from 3.2.
    http://hg.python.org/cpython/rev/c7b0f711dc15

    @mdickinson
    Copy link
    Member

    Fixed. Thanks Brecht for the report (and Antoine for diagnosing the problem).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants