-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add squareform
#89
Add squareform
#89
Conversation
f1ce9d3
to
df2081d
Compare
Provides an implementation of `squareform` that operates on Dask Arrays. This can be used with `pdist`'s result to get a dense symmetric matrix.
Provides some tests for `squareform`, which compares its results to those of SciPy's.
df2081d
to
96d4541
Compare
Instead of unrolling the relevant part of the distance matrix to a sparse distance matrix by hand in `pdist`, use `squareform` to perform this. After all its behavior borrows from what `pdist` did here previously.
])) | ||
|
||
result = dask.array.stack(result) | ||
elif conv == "tovec": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this check is not needed, but it seems nice from a readability standpoint and as a fail-safe.
@@ -1,7 +1,7 @@ | |||
#!/usr/bin/env python | |||
# -*- coding: utf-8 -*- | |||
|
|||
from __future__ import absolute_import | |||
from __future__ import absolute_import, division |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we don't need this for this change any more. However it is a good idea to add it anyways as we do make use of division in this file.
|
||
for j in range(i): | ||
i_j = i - j | ||
col_i.append(X_tri[j][i_j - 1:i_j]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice if we could do away with this inner for
-loop somehow. Seems like this is going to come with some performance penalties as it is breaking one of the dimensions into 1x1 chunks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved with PR ( #91 ), which avoids slicing the rows/columns beyond what is needed to extract them from the distance vector/sparse pairwise distance matrix.
Provides an implementation of SciPy's
squareform
fromscipy.spatial.distance
for Dask Array's of distances. Also includes some tests to make sure it is well behaved.