Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OverflowErrors in TripleDictEraser #14254

Closed
jdemeyer opened this issue Mar 11, 2013 · 42 comments
Closed

OverflowErrors in TripleDictEraser #14254

jdemeyer opened this issue Mar 11, 2013 · 42 comments

Comments

@jdemeyer
Copy link

With the new doctesting framework (#12415), I sometimes see errors like

sage -t --long devel/sage/sage/combinat/sf/sfa.py
**********************************************************************
File "devel/sage/sage/combinat/sf/sfa.py", line 388, in sage.combinat.sf.sfa.is_SymmetricFunctionAlgebra
Failed example:
    is_SymmetricFunctionAlgebra(SymmetricFunctions(FractionField(QQ['q','t'])).macdonald().P())
Expected:
    True
Got:
    Exception OverflowError: 'long int too large to convert to int' in <sage.structure.coerce_dict.TripleDictEraser object at 0x9531b3c> ignored
    True
**********************************************************************

on 32-bit systems.

These appear randomly non-reproducibly in doctests.

Depends on #13387

CC: @simon-king-jena @nbruin

Component: coercion

Author: Jeroen Demeyer

Reviewer: Simon King

Merged: sage-5.8.rc0

Issue created by migration from https://trac.sagemath.org/ticket/14254

@jdemeyer jdemeyer added this to the sage-5.9 milestone Mar 11, 2013
@jdemeyer jdemeyer modified the milestones: sage-5.9, sage-5.8 Mar 11, 2013
@simon-king-jena
Copy link
Member

comment:2

There is a comment

# A note about how to store "id" keys in python structures:
#
# In python a "pointer length integer" (size_t) normally, is encoded
# as a *signed* integer, of type Py_ssize_t. This has an advantage in that
# if the value gets encoded as a *python integer* it can do so in a sign-preserving
# way and still make use of all the bits that python offers to store (small) integers.
#
# There is one place where we have to be careful about signs:
# Our hash values most naturally live in Py_ssize_t. We convert those into
# an index into our bucket list by taking the hash modulo the number of buckets.
# However, the modulo operator in C preserves the sign of the number we take the
# modulus of, which is not what we want.
# The solution is to always do
# (<size_t) h)% modulus
# to ensure we're doing an unsigned modulus.

Could it be that this is relevant here?

Is it really safe to do

        cdef Py_ssize_t k1,k2,k3
        cdef int offset
        k1,k2,k3,offset = r.key
        cdef Py_ssize_t h = (k1 + 13*k2 ^ 503*k3)
        PyList_GET_ITEM(buckets, (<size_t>h) % PyList_GET_SIZE(buckets))

?

Yes, we need to care about the sign. But we also have

 Py_ssize_t PyList_GET_SIZE(object list)
 PyObject* PyList_GET_ITEM(object list, Py_ssize_t i)

(hence, the modulus is Py_ssize_t, and the result will be translated into Py_ssize_t, too)

I am afraid I have no 32 bit machine to test.

@simon-king-jena
Copy link
Member

comment:3

PS: Note the discussion that we had on #715 on that matter.

Is there a reason why we use ssize_t and not size_t?

@jdemeyer
Copy link
Author

comment:4

Any idea on how to debug this? The problem is that currently we have no idea where this exception is generated.

@simon-king-jena
Copy link
Member

comment:5

Google seems to tell me that one should use size_t and not ssize_t here anyway. Not sure though.

Nils, do you think it could help to consistently replace ssize_t and Py_ssize_t by size_t?

@simon-king-jena
Copy link
Member

comment:6

Replying to @jdemeyer:

Any idea on how to debug this? The problem is that currently we have no idea where this exception is generated.

Sage's debug version?

@jdemeyer
Copy link
Author

comment:7

Replying to @simon-king-jena:

        k1,k2,k3,offset = r.key

This line might be problematic if there is no guarantee that the components of r.key fit in the respective types (Py_ssize_t, Py_ssize_t, Py_ssize_t, int).

@jdemeyer
Copy link
Author

comment:8

Replying to @simon-king-jena:

Sage's debug version?

Would it really give more information about where the exception is generated?

@simon-king-jena
Copy link
Member

comment:9

Replying to @jdemeyer:

Replying to @simon-king-jena:

Sage's debug version?

Would it really give more information about where the exception is generated?

I thought that this is the point of a debug version? Is there an environment variable similar to MALLOC_CHECK_ that makes the program segfault on an overflow error, so that one could then analyse the problem using gdb?

@simon-king-jena
Copy link
Member

comment:10

How many bytes is Py_ssize_t using? On 32-bit machines, is Py_ssize_t perhaps using more bytes than size_t? Then, the problem could be here:

        cdef Py_ssize_t h1 = <Py_ssize_t><void *>k1
        cdef Py_ssize_t h2 = <Py_ssize_t><void *>k2
        cdef Py_ssize_t h3 = <Py_ssize_t><void *>k3
        cdef Py_ssize_t h = (h1 + 13*h2 ^ 503*h3)

While h1, h2, and h3 should be fine (they are obtained from a pointer and are thus using 32 bit), I could imagine that Python tries to be clever and uses more byte (long versus int) for h, if 13h2 or 503h3 happen to be larger than the greatest number representable by 32 bit.

If this is the case, then the line

cdef list bucket = <object>PyList_GET_ITEM(all_buckets, (<size_t>h) % PyList_GET_SIZE(all_buckets))

(when the "long" Py_ssize_t h is converted down to 32 bit of size_t) could overflow, isn't it?

Caveat: I am not an expert for the subtleties of Py_ssize_t on 32 versus 64 bit...

@jdemeyer
Copy link
Author

comment:11

Replying to @simon-king-jena:

How many bytes is Py_ssize_t using? On 32-bit machines, is Py_ssize_t perhaps using more bytes than size_t? Then, the problem could be here:

        cdef Py_ssize_t h1 = <Py_ssize_t><void *>k1
        cdef Py_ssize_t h2 = <Py_ssize_t><void *>k2
        cdef Py_ssize_t h3 = <Py_ssize_t><void *>k3
        cdef Py_ssize_t h = (h1 + 13*h2 ^ 503*h3)

While h1, h2, and h3 should be fine (they are obtained from a pointer and are thus using 32 bit), I could imagine that Python tries to be clever and uses more byte (long versus int) for h

No, that cannot be the problem, arithmetic on pure C types (like here) doesn't use Python at all.

@simon-king-jena
Copy link
Member

comment:12

I found the following on PEP 353:

Conceptually, Py_intptr_t and Py_ssize_t are different things: Py_intptr_t needs
to be the same size as void*, and Py_ssize_t the same size as size_t. These could
differ, e.g. on machines where pointers have segment and offset. On current
flat-address space machines, there is no difference, so for all practical purposes,
Py_intptr_t would have worked as well.

Note that we actually do want a conversion from and to <void*>. So, shouldn't we actually use Py_intptr_t, not Py_ssize_t?

@simon-king-jena
Copy link
Member

comment:13

Does Py_intptr_t even exist in Sage? I did a grep for it, to no avail.

@jdemeyer
Copy link
Author

comment:14

Replying to @simon-king-jena:

I found the following on PEP 353:

Conceptually, Py_intptr_t and Py_ssize_t are different things: Py_intptr_t needs
to be the same size as void*, and Py_ssize_t the same size as size_t. These could
differ, e.g. on machines where pointers have segment and offset. On current
flat-address space machines, there is no difference, so for all practical purposes,
Py_intptr_t would have worked as well.

Note that we actually do want a conversion from and to <void*>. So, shouldn't we actually use Py_intptr_t, not Py_ssize_t?

That's all true but it wouldn't solve this error.

@simon-king-jena
Copy link
Member

comment:15

Replying to @jdemeyer:

Note that we actually do want a conversion from and to <void*>. So, shouldn't we actually use Py_intptr_t, not Py_ssize_t?

That's all true but it wouldn't solve this error.

Well, you didn't tell yet were exactly the error is located.

Lacking more information, my guess was that an overflow could occur on systems where sizeof(Py_ssize_t)==sizeof(size_t) differs from sizeof(Py_intptr_t)==sizeof(void*). So, why are you sure that the problem is not related with conversion between size_t and pointers?

@jdemeyer
Copy link
Author

comment:16

Replying to @simon-king-jena:

Well, you didn't tell yet were exactly the error is located.

If I would know that, fixing the error would probably be trivial.

@jdemeyer
Copy link
Author

comment:17

I see it also all over the place when building the docs:

[...]
[rings    ] Exception OverflowError: 'long int too large to convert to int' in <sage.structure.coerce_dict.TripleDictEraser object at 0x949611c> ignored
[rings    ] Exception OverflowError: 'long int too large to convert to int' in <sage.structure.coerce_dict.TripleDictEraser object at 0x949611c> ignored
[rings    ] Exception OverflowError: 'long int too large to convert to int' in <sage.structure.coerce_dict.TripleDictEraser object at 0x949611c> ignored
[rings    ] Exception OverflowError: 'long int too large to convert to int' in <sage.structure.coerce_dict.TripleDictEraser object at 0x949611c> ignored
[rings    ] no targets are out of date.
[rings_num] Exception OverflowError: 'long int too large to convert to int' in <sage.structure.coerce_dict.TripleDictEraser object at 0x949611c> ignored
[rings_num] Exception OverflowError: 'long int too large to convert to int' in <sage.structure.coerce_dict.TripleDictEraser object at 0x949611c> ignored
[rings_num] Exception OverflowError: 'long int too large to convert to int' in <sage.structure.coerce_dict.TripleDictEraser object at 0x949611c> ignored
[rings_num] Exception OverflowError: 'long int too large to convert to int' in <sage.structure.coerce_dict.TripleDictEraser object at 0x949611c> ignored
[rings_num] Exception OverflowError: 'long int too large to convert to int' in <sage.structure.coerce_dict.TripleDictEraser object at 0x949611c> ignored
[rings_num] Exception OverflowError: 'long int too large to convert to int' in <sage.structure.coerce_dict.TripleDictEraser object at 0x949611c> ignored
[rings_num] Exception OverflowError: 'long int too large to convert to int' in <sage.structure.coerce_dict.TripleDictEraser object at 0x949611c> ignored
[rings_num] Exception OverflowError: 'long int too large to convert to int' in <sage.structure.coerce_dict.TripleDictEraser object at 0x949611c> ignored
[rings_num] no targets are out of date.
[...]

@simon-king-jena
Copy link
Member

comment:18

Can you test sizeof(Py_ssize_t)==sizeof(void*) on those machines that exposed the error? Namely, this equality would falsify my guess.

@jdemeyer
Copy link
Author

comment:19

Replying to @simon-king-jena:

Lacking more information, my guess was that an overflow could occur on systems where sizeof(Py_ssize_t)==sizeof(size_t) differs from sizeof(Py_intptr_t)==sizeof(void*). So, why are you sure that the problem is not related with conversion between size_t and pointers?

This is a typical 32-bit system where the types int, long, size_t, Py_ssize_t, void*, intptr_t are all 4 bytes.

@jdemeyer
Copy link
Author

comment:20

Besides, even if void* would be larger than Py_ssize_t, I still don't see how it could cause the error that we're seeing. The cast void* -> Py_ssize_t is pure C, so it couldn't overflow.

@jdemeyer
Copy link
Author

comment:21

I am debugging building the docs, as I can reliably get the failure this way.

@jdemeyer
Copy link
Author

comment:22

The error happens in the last line here:

        cdef TripleDict D = <object>PyWeakref_GetObject(self.D)
        if D is None:
            return
        cdef list buckets = D.buckets
        if buckets is None:
            return
        # r is a (weak) reference (typically to a parent), and it knows the
        # stored key of the unique triple r() had been part of.
        # We remove that unique triple from self.D
        cdef Py_ssize_t k1,k2,k3
        k1,k2,k3 = r.key

@simon-king-jena
Copy link
Member

comment:23

Replying to @jdemeyer:

The error happens in the last line here:

        cdef Py_ssize_t k1,k2,k3
        k1,k2,k3 = r.key

Interesting. So, this would probably mean that the actual error occurs when storing the key. Let's see what is done there...

Just out of curiosity: Does the same problem also occur with #14159 and dependencies?

@simon-king-jena
Copy link
Member

comment:24

Aha!

We have

        cdef size_t h1 = <size_t><void *>k1
        cdef size_t h2 = <size_t><void *>k2
        cdef size_t h3 = <size_t><void *>k3
        try:
            ref1 = KeyedRef(k1,self.eraser,(h1, h2, h3))
        except TypeError:
            ref1 = k1

Hence, the three items of the key originally were size_t. But they are assigned to Py_ssize_t.

@jdemeyer
Copy link
Author

comment:25

I can't find this code, where is it?

@jdemeyer
Copy link
Author

comment:26

Replying to @simon-king-jena:

Interesting. So, this would probably mean that the actual error occurs when storing the key.

Indeed.

And r.key does indeed take values like k1,k2,k3 = (3043219468L, 154417500, 150140108), the first of which would not fit in a Py_ssize_t.

So the question is: where does this tuple come from?

@simon-king-jena
Copy link
Member

comment:27

In sage/categories/homset.py:

    _cache[key] = KeyedRef(H, _cache.eraser, (id(X),id(Y),id(category)))

id() returns <type 'int'>. I don't know if assigning int to Py_ssize_t can be a problem.

@simon-king-jena
Copy link
Member

comment:28

Replying to @jdemeyer:

And r.key does indeed take values like k1,k2,k3 = (3043219468L, 154417500, 150140108), the first of which would not fit in a Py_ssize_t.

So the question is: where does this tuple come from?

There are (as much as I know) only the two places that I mentioned (by the way, #14159 would change it).

Is size_t-->Py_ssize_t a potential problem? Or is id(...)-->Py_ssize_t a potential problem?

@jdemeyer
Copy link
Author

comment:29

Replying to @simon-king-jena:

In sage/categories/homset.py:

    _cache[key] = KeyedRef(H, _cache.eraser, (id(X),id(Y),id(category)))

id() returns <type 'int'>. I don't know if assigning int to Py_ssize_t can be a problem.

Bingo! That's the problem indeed.

@simon-king-jena
Copy link
Member

comment:30

Replying to @jdemeyer:

Replying to @simon-king-jena:

In sage/categories/homset.py:

    _cache[key] = KeyedRef(H, _cache.eraser, (id(X),id(Y),id(category)))

id() returns <type 'int'>. I don't know if assigning int to Py_ssize_t can be a problem.

Bingo! That's the problem indeed.

OK. Then it is solved in #14159, I suppose.

@jdemeyer
Copy link
Author

comment:31

But I would really like a quick fix independently of #14159. Then this quick fix here could be merged in sage-5.8.rc0 while #14159 wouldn't. Would you mind if I create a patch?

@simon-king-jena
Copy link
Member

comment:32

Replying to @jdemeyer:

But I would really like a quick fix independently of #14159. Then this quick fix here could be merged in sage-5.8.rc0 while #14159 wouldn't. Would you mind if I create a patch?

No, I wouldn't mind (rebasing #14159 should be easy enough).

But how can one change the id() thingy in sage/categories/homset.py? After all, it is a Python file, hence, one can not just do <size_t><void*>X.

@jdemeyer
Copy link
Author

comment:33

Replying to @simon-king-jena:

But how can one change the id() thingy in sage/categories/homset.py? After all, it is a Python file, hence, one can not just do <size_t><void*>X.

You can't, but you could write a small function to do just this in coerce_dict.pyx for example.

@jdemeyer
Copy link
Author

Author: Jeroen Demeyer

@jdemeyer
Copy link
Author

comment:34

Attachment: 14254_signed_id.patch.gz

@simon-king-jena
Copy link
Member

Dependencies: #13387

@simon-king-jena
Copy link
Member

comment:36

The patch looks fine to me, and it applies cleanly after #13387. Now I wonder what the patchbots have to tell.

Problem: I have no 32-bit machine available. Will the machines on which you first noticed the problem report here, in the hopefully green) blob?

@jdemeyer
Copy link
Author

comment:37

Replying to @simon-king-jena:

Will the machines on which you first noticed the problem report here, in the (hopefully green) blob?

I don't think so.

In any case, the machine in question managed to build the documentation without errors, so that's a good sign. I'm running tests now, but it's a slow machine so I'll report back tomorrow.

Regardless, I think you can review the patch without access to a 32-bit system and assume that I have done my duty in testing the patch.

@nbruin
Copy link
Contributor

nbruin commented Mar 11, 2013

comment:38

Replying to @simon-king-jena:

_cache[key] = KeyedRef(H, _cache.eraser, (id(X),id(Y),id(category)))

Bingo! That's the problem indeed.

OK. Then it is solved in #14159, I suppose.

Good work guys! That line made me feel uncomfortable when I was reviewing it, because it was reaching into implementation details of _cache.eraser and indeed, it wasn't properly doing that, in a way I did not spot. This is even more confirmation that #14159 is the right solution and not just over-engineering.

@simon-king-jena
Copy link
Member

comment:39

OK, I believe Jeroen when he says that building the documentation on his 32-bit systems works with the patch, while it reproducibly failed without the patch.

I think the solution makes sense: id(X) returns a positive number, and this may very well be beyond what a signed number type of the same size (such as Py_ssize_t) can take.

The patchbot gives a green blob, too. Hence, I hope it is safe to give this a positive review (but I think #14159 will be better, on the long run).

@simon-king-jena
Copy link
Member

Reviewer: SimonKing

@jdemeyer
Copy link
Author

Changed reviewer from SimonKing to Simon King

@jdemeyer
Copy link
Author

Merged: sage-5.8.rc0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants