Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Adjust NumPy float hashing to Python's slightly changed hash #18908

Merged
merged 1 commit into from
May 5, 2021

Conversation

seberg
Copy link
Member

@seberg seberg commented May 4, 2021

This is necessary, since we use the Python double hash and the
semi-private function to calculate it in Python has a new signature
to return the identity-hash when the value is NaN.

closes gh-18833, gh-18907


It seems we had no tests at all, so I added some. I honestly did not test this on 3.10b1, since I did not yet bother to get a Py 3.10 version running locally.

@seberg seberg changed the title MAINT: Align float cache with Python's hash MAINT: Align NumPy float hashing with Python's hash May 4, 2021
@seberg seberg changed the title MAINT: Align NumPy float hashing with Python's hash MAINT: Adjust NumPy float hashing to Python's slightly changed hash May 4, 2021
@seberg seberg force-pushed the fix-float-hash branch 2 times, most recently from bf4a1a7 to 6194e3e Compare May 4, 2021 23:04
@seberg
Copy link
Member Author

seberg commented May 4, 2021

OK, I did compile against dev python (which includes the patch) and it works. (I did not run the test suit)

}

/* borrowed from complex_hash */
static npy_hash_t
c@lname@_arrtype_hash(PyObject *obj)
{
npy_hash_t hashreal, hashimag, combined;
hashreal = _Py_HashDouble((double)
PyArrayScalar_VAL(obj, C@name@).real);
hashreal = Npy_HashDouble(obj, (double)PyArrayScalar_VAL(obj, C@name@).real);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hair long.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, how strict are we about the 79 chars? Lately, I have started to not worry if it I figure the meaning is clear even if you don't see the last few chars.

Happy to change though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't limited by the terminal these days, so we could probably increase the line length as long as readability is good, say 85 characters. Might be worth discussing, especially with C++ in the future, C++ tends to long lines.

@@ -3159,7 +3159,7 @@ static npy_hash_t
return y;
}
#endif
/**end repeat**/
/**end repeat**/y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray y?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably disappears when the file is generated, so no error.

@charris
Copy link
Member

charris commented May 5, 2021

Nitpicks: Couple of long lines in C code.

This is necessary, since we use the Python double hash and the
semi-private function to calculate it in Python has a new signature
to return the identity-hash when the value is NaN.

closes numpygh-18833, numpygh-18907
@charris charris merged commit a0f4621 into numpy:main May 5, 2021
@charris
Copy link
Member

charris commented May 5, 2021

Thanks Sebastian.

@seberg seberg deleted the fix-float-hash branch May 5, 2021 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible excessive hash collisions from hash of nan
2 participants