Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add squareform #89

Merged
merged 3 commits into from
Oct 9, 2017
Merged

Add squareform #89

merged 3 commits into from
Oct 9, 2017

Conversation

jakirkham
Copy link
Owner

Provides an implementation of SciPy's squareform from scipy.spatial.distance for Dask Array's of distances. Also includes some tests to make sure it is well behaved.

@jakirkham jakirkham force-pushed the add_squareform branch 5 times, most recently from f1ce9d3 to df2081d Compare October 9, 2017 04:17
@jakirkham jakirkham changed the title WIP: Add squareform Add squareform Oct 9, 2017
Provides an implementation of `squareform` that operates on Dask Arrays.
This can be used with `pdist`'s result to get a dense symmetric matrix.
Provides some tests for `squareform`, which compares its results to
those of SciPy's.
Instead of unrolling the relevant part of the distance matrix to a
sparse distance matrix by hand in `pdist`, use `squareform` to perform
this. After all its behavior borrows from what `pdist` did here
previously.
]))

result = dask.array.stack(result)
elif conv == "tovec":
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this check is not needed, but it seems nice from a readability standpoint and as a fail-safe.

@@ -1,7 +1,7 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import absolute_import
from __future__ import absolute_import, division
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we don't need this for this change any more. However it is a good idea to add it anyways as we do make use of division in this file.

@jakirkham jakirkham merged commit 307e016 into master Oct 9, 2017
@jakirkham jakirkham deleted the add_squareform branch October 9, 2017 05:04

for j in range(i):
i_j = i - j
col_i.append(X_tri[j][i_j - 1:i_j])
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if we could do away with this inner for-loop somehow. Seems like this is going to come with some performance penalties as it is breaking one of the dimensions into 1x1 chunks.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved with PR ( #91 ), which avoids slicing the rows/columns beyond what is needed to extract them from the distance vector/sparse pairwise distance matrix.

@jakirkham jakirkham changed the title Add squareform Add squareform Oct 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant