-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
BUG: Catch all exceptions raised while calling PyObjectHashTable methods #62892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| ^^^^^ | ||
| - Bug in :class:`DataFrame` when passing a ``dict`` with a NA scalar and ``columns`` that would always return ``np.nan`` (:issue:`57205`) | ||
| - Bug in :class:`Series` ignoring errors when trying to convert :class:`Series` input data to the given ``dtype`` (:issue:`60728`) | ||
| - Bug in :class:``PyObjectHashTable`` that would silently suppress exceptions thrown from custom ``__hash__`` and ``__eq__`` methods during hashing (:issue:`57052`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to add a test that uses a public API that would be fixed by your changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
doing this inside the khash code is definitely more difficult, but problably the Right Way To Do It. Does this entail a perf hit? BTW #62888 is probably going to have to entail digging into that same bit of khash code. |
|
I tried adapting the suggestion from #57052 (comment) (
I'll set up a tiny benchmark for |
|
I implemented a new layer called The next problem is fixing the dozens of exceptions that were previously silently suppressed. Most of them seem to be either |
FYI @jbrockmendel this small benchmark I did for setupfrom pandas._libs import hashtable as ht
from random import shuffle
class testkey:
def __init__(self, value):
self.value = value
def __hash__(self):
return hash(self.value)
def __eq__(self, other):
return self.value == other.value
def test_pymap_set_get(indexes: list[int]):
table = ht.PyObjectHashTable()
keys = [testkey(f"key{i}") for i in indexes]
shuffle(indexes)
for i in indexes:
table.set_item(keys[i], i)
shuffle(indexes)
for i in indexes:
assert table.get_item(keys[i]) == i
def test_pymap_set_get_no_shuffle(indexes: list[int]):
table = ht.PyObjectHashTable()
keys = [testkey(f"key{i}") for i in indexes]
for i in indexes:
table.set_item(keys[i], i)
for i in indexes:
assert table.get_item(keys[i]) == imain branch (d597079)with shufflewithout shufflethis PR (0a4cba8)with shufflewithout shuffle |
(Replace xxxx with the GitHub issue number)
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.This is a simple workaround. For a proper solution, khash_python.h would require some refactoring to handle exceptions gracefully. It's a bit tricky, though, becausekh_python_hash_{equal,func}is exposed to the vendoredkhashimplementation, which calls those functions in a loop.khash_python.h was silently suppressing all exceptions thrown when calling custom
__hash__and__eq__methods. This PR implements a new layer forpymapthat catches all exceptions thrown during khash computation and raises them properly.