-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: avoid segfaults when casting from boost_python types #20507
Conversation
For calls to PyArray_GetCastingImpl where `from` is a dtype of a boost_python class, it's not safe to assume the pointer (NPY_DType_Slots *)(from)->dt_slots has been initialized. This causes segfaults when resolving overloaded calls to boost_python methods. Thus we test for a null pointer before dereferencing.
Could you explain how it is possible that the dtype is not fully initialized? Is NumPy should fill all of this in once you call |
As demonstrated by @jawsc-alan on the boost issue, this occurs when I have debug builds of all this stuff but I'm not super familiar with it. Willing to experiment with it if you have any suggestions. If it's helpful I will include the defective boost code before and after macro expansion: minipy.cpp
minipy.cpp expanded
|
Thanks, my problem right now is, that I spend an hour yesterday trying to get either a test setup running or trying to understand what For a setup: Is there a very simple Other questions that should help me get started (if you have pointers):
To clarify my problem: As much as I hate that this broke |
I don't have a Dockerfile
To run the code,
When breaking on
If the calls to Sorry that I don't immediately have answers to your other questions. I do note that the call to
where Fully appreciate your point about this band-aid fix, but I have the impression boost_python is not super actively developed anymore, so anything involving a fundamental repair there might be hard to achieve. Thanks very much for your attention to this! As it stands now, cctbx_project and others depending on it will be in a real tricky situation when NumPy 1.20 is no longer supported. |
Thanks a lot, yeah, sorry that I am a bit daft about docker :/. I had run that container yesterday, but not very familiar with it, so need a bit longer to get it. (The fedora one failed for some reason during setup, the centos one worked, but after Thanks a lot for that debug-info!
I am strongly expecting 1 here to be honest. Could you check that it is the same type of nonsense even on the NumPy versions where this succeeds happily? On the up-side, if it is 1, maybe we can argue to just add a "sanity-check" in |
OK, thanks, I got your docker running now, so I can dig a bit myself. OK, |
Yup, I'm sure I have to step away from this for a bit, but I will run the same code with v1.20.x and see if the calls to |
Odd problem running gdb:
and that's it. Seems to happen with a NumPy import (no breakpoints anything), just running python seems OK. |
Interesting, I don't know what that's about. An alternative is like this:
This might skip some of the |
Frustratingly, I never managed to get it to work. My best current guess is some wonky interaction with the host kernel... In any case, I guess the magic line But that is utter nonsense and calling NumPy API with invalid values. What we we end up with is that: Now, that whole chunk of code in After changing the
(The above assumes that I would be exceedingly surprised if that ever worked, i.e. the simple thing to do would be to just delete the whole second branch and cut your losses. Even the windows-only path in Of course it should not be hard to fix it up to hard-code the simple equivalences ( As for band-aid fixing NumPy... not too happy, but I suppose we could do it, probably there isn't even a point in covering all the possible things that can go wrong, but just the obvious nonsense we get here... |
Marking for triage-review, we need to make a call whether to add a "do not break My guess is, we should probably just add the check, but with a note that it was added for
or more complete, but theoretically a slight slowdown for paths that are heavy on
(giving a DeprecationWarning may be good, but could be a bit too spammy to be useful considering that |
If I understand correctly, you're saying the problem is in I think it's very likely correct that this was silently returning False in the past when one of the types was a custom one, but also suspect there are places where So in summary I'd really appreciate this band-aid fix along with clarifying your thoughts a bit on what could make a longer-term boost::python fix. Do I have the right idea here? Please let me know if I can do anything else to help with this. I can try other stuff with my debug build here if needed. |
Yeah, you understand me correctly, except that I think the following line: is shady. And as you rightly say the later Looking at it again, the signature of So the correct thing to fix it, is to ditch all calls to so that you also have a special case for |
Wait, no... the windows path doesn't use the |
@dwpaley are you OK with your "fix" here? (i.e. if I translate it to an equivalent hack? – This exact version may not be quite safe and is not quite the right place). The reason I ask is because I now think that by chance, this may have worked somewhat reliably before... It probably was comparing two NumPy scalar types (rather than dtypes), but comparing some random stuff in the type might find a "match" based on them sharing some code as an implementation detail. So some cases could change in behaviour (some conversions that used to work fine start failing), although it may also reject unlikely buggy cases. |
This adds an almost random "sanity check" to `PyArray_EquivTypes` for the sole purpose of allowing boost::python compiled libs to _not crash_. boost::python is buggy, it needs to be fixed. This may break them (I am unsure), because some conversions which used to work may fail here (if they worked, they only worked because random type data may have matched up correctly for our scalar types). We could error, or warn or... but I hope boost::python will just fix this soon enough and future us can just delete the whole branch. Replaces numpygh-20507
Closing, because gh-20507 is (I think) the proper way to achieve the same thing. I still suspect it used to "work", but considering that this is a big bug in |
This adds an almost random "sanity check" to `PyArray_EquivTypes` for the sole purpose of allowing boost::python compiled libs to _not crash_. boost::python is buggy, it needs to be fixed. This may break them (I am unsure), because some conversions which used to work may fail here (if they worked, they only worked because random type data may have matched up correctly for our scalar types). We could error, or warn or... but I hope boost::python will just fix this soon enough and future us can just delete the whole branch. Replaces numpygh-20507
This adds an almost random "sanity check" to `PyArray_EquivTypes` for the sole purpose of allowing boost::python compiled libs to _not crash_. boost::python is buggy, it needs to be fixed. This may break them (I am unsure), because some conversions which used to work may fail here (if they worked, they only worked because random type data may have matched up correctly for our scalar types). We could error, or warn or... but I hope boost::python will just fix this soon enough and future us can just delete the whole branch. Replaces numpygh-20507
This reflects the resolution of #627 as discussed in several other issues and PRs: - boostorg/python#376 - numpy/numpy#20507 - numpy/numpy#20616 Leaving this "bibliography" here because the fix in numpy PR 20616 is considered temporary; thus someday we may have to revisit this to fix the underlying bug in boost::python. Co-authored-by: Billy Poon <bkpoon@lbl.gov>
This reflects the resolution of #627 as discussed in several other issues and PRs: - boostorg/python#376 - numpy/numpy#20507 - numpy/numpy#20616 Leaving this "bibliography" here because the fix in numpy PR 20616 is considered temporary; thus someday we may have to revisit this to fix the underlying bug in boost::python. Co-authored-by: Billy Poon <bkpoon@lbl.gov>
For calls to PyArray_GetCastingImpl where
from
is a dtype of aboost_python class, it's not safe to assume the pointer
(NPY_DType_Slots *)(from)->dt_slots is non-NULL. This causes
segfaults when resolving overloaded calls to boost_python methods. Thus
we check the pointer before dereferencing.
This would resolve boostorg/python#376 and several more issues in downstream projects. For example cctbx_project has been pinned to numpy 1.20 for a while: cctbx/cctbx_project#627
Testing this is a little tricky because the problem only occurs when building boost_python modules in an environment with numpy. If a regression test is needed I'm happy to try and figure something out, but would appreciate any examples of c++ code that gets built for testing, and how to add it to the build system.