Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: clean up views/strides/dtypes utilities in cluster.hierarcy #18858

Merged
merged 2 commits into from Jul 11, 2023

Conversation

rgommers
Copy link
Member

The utility functions cleaned up here (_copy_arrays_if_base_present, _convert_to_bool, _convert_to_double) were the cause of a couple of open review comments in gh-18668, where it wasn't clear whether or not changing them was preserving the existing semantics. In the process of looking into that, I also discovered some spurious double checks, weird conversion logic (float32 arrays were converted to float64, but only if they weren't views; float16 input remained unconverted) and some other code oddities.

The key thing for removal of _copy_arrays_if_base_present was checking which input arrays for _hierarchy.pyx functions were being modified in-place, so I added some docs for that.

Spell some conversions uniformly as:
```
np.asarray(Z, order='C', dtype=np.float64)
```
rather than use harder to read helper functions like
`_convert_to_double`.

The semantics are unchanged: use a C-contiguous float64 array, copy only
if needed.

Also removes some unused code (`_randdm`, `_convert_to_bool`).
This internal utility function was quite odd and effectively broken.
Its stated purpose in code comments was about strides, but instead
it would:

1. copy if the input array was a view,
2. convert float32 arrays to float64, but only if they were not a copy.
   this conversion would also fail to correctly convert float16 or other
   input types.

In addition to the above removals, document in the Cython code that
was being called which input arguments modify their input array
in-place. There were no bugs here; all input arrays which were being
modified were allocated inside the public functions with `np.zeros` or
a similar array creation function.
@rgommers rgommers added scipy.cluster maintenance Items related to regular maintenance tasks labels Jul 11, 2023
@rgommers rgommers added this to the 1.12.0 milestone Jul 11, 2023
@rgommers
Copy link
Member Author

The one failure is a network connection issue; this PR is pure Python so I won't bother restarting it. This should be good to merge.

Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ralf. CI is happy, let's get this in so we can move forward with the Array API PR.

@tupui tupui merged commit 2158afe into scipy:main Jul 11, 2023
23 of 24 checks passed
@rgommers rgommers deleted the hierarchy-cleanup branch July 11, 2023 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.cluster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants