Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of CoverBounds class #98

Merged
merged 27 commits into from
Jul 12, 2018
Merged

Conversation

leesteinberg
Copy link

Adds a CoverBounds class - works similarly to Cover class, but requires extra variable:
limits: Numpy array (n_dimension, 2)
------ The (min,max) desired value for a given dimension
This allows a cover to be defined where the limits of the cover are user defined, rather than being the minimum/maximum value of the function. Useful to ensure a series of networks are created with identical parameters.

deargle and others added 7 commits May 21, 2018 14:37
Clustering functions such as `DBSCAN` accept `metric='precomputed'`, which requires a square distance (similarity) matrix to be passed as, in this case, `inverse_X`. However, the mapper isn't aware when
precomputed matrices are being passed in, so when it slices according to the filter function values and hypercubes, the square-ness is un-squared.

This PR adds a kludgey way to tell the mapper that a precomputed matrix
is being passed in. When set to `True`, the slice will feed a square
matrix to `clusterer.fit()`
@pep8speaks
Copy link
Contributor

pep8speaks commented Jun 6, 2018

Hello @leesteinberg, Thank you for updating !

Line 81:106: E502 the backslash is redundant between brackets
Line 83:31: E122 continuation line missing indentation or outdented
Line 83:83: E502 the backslash is redundant between brackets
Line 85:31: E122 continuation line missing indentation or outdented
Line 144:51: E712 comparison to False should be 'if cond is False:' or 'if not cond:'

Line 44:51: W291 trailing whitespace
Line 50:9: E303 too many blank lines (2)
Line 117:17: E722 do not use bare except'
Line 121:9: E722 do not use bare except'
Line 272:59: W291 trailing whitespace
Line 344:1: W293 blank line contains whitespace
Line 374:1: W293 blank line contains whitespace
Line 456:60: W291 trailing whitespace
Line 489:77: W291 trailing whitespace

Line 12:1: E303 too many blank lines (4)
Line 77:1: E302 expected 2 blank lines, found 1
Line 95:5: E303 too many blank lines (2)
Line 129:45: E231 missing whitespace after ','
Line 130:28: E231 missing whitespace after ','
Line 131:15: E231 missing whitespace after ','
Line 131:33: E231 missing whitespace after ','
Line 132:15: E231 missing whitespace after ','
Line 134:62: E231 missing whitespace after ','
Line 134:67: E231 missing whitespace after ','
Line 136:14: E225 missing whitespace around operator
Line 140:67: E231 missing whitespace after ','
Line 140:77: E231 missing whitespace after ','

Comment last updated on July 10, 2018 at 08:57 Hours UTC

@codecov-io
Copy link

codecov-io commented Jun 6, 2018

Codecov Report

Merging #98 into master will increase coverage by 0.64%.
The diff coverage is 84.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #98      +/-   ##
==========================================
+ Coverage    86.5%   87.15%   +0.64%     
==========================================
  Files           6        6              
  Lines         415      436      +21     
  Branches       89       94       +5     
==========================================
+ Hits          359      380      +21     
  Misses         37       37              
  Partials       19       19
Impacted Files Coverage Δ
kmapper/kmapper.py 82.84% <66.66%> (+0.98%) ⬆️
kmapper/cover.py 96.07% <90%> (-3.93%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a317c07...46a9c77. Read the comment docs.

@leesteinberg leesteinberg mentioned this pull request Jun 6, 2018
@sauln sauln self-requested a review June 19, 2018 09:26
Copy link
Member

@sauln sauln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the pull request. I think the idea is there, but there are a few things I'm hoping can change before we merge the pull request. Right now there is a lot of repeated logic between both Cover and CoverBounds. I think it might work better to merge them both into one class, or rearrange so the repetition can be inherited properly.

kmapper/cover.py Outdated
n_cubes=10,
perc_overlap=0.2,
# Deprecated parameters:
nr_cubes=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm these parameters, nr_cubes and overlap_perc. only necessary in the base class for backwards compatibility.

kmapper/cover.py Outdated
self.limits = limits

# Check limits can actually be handled and are set appropriately
NoneType = type(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defining this NoneType variable is weird. Why do it? Please just use type(None) when you need it.

kmapper/cover.py Outdated

# If self.limits is array-like
if isinstance(self.limits, np.ndarray):
dump_arr = np.zeros(self.limits.shape) # dump_arr is used so we can change the values of self.limits from None to the min/max
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put comment on line above code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does dump_arr mean? Maybe a better name would be limits_array?

kmapper/cover.py Outdated
'Actual Minima: %s\tInput Minima: %s\n' % (np.min(indexless_data, axis=0), bounds_arr[:,0])+ \
'Actual Maxima: %s\tInput Maxima: %s\n' % (np.max(indexless_data, axis=0), bounds_arr[:,1]))

else: # It must be None, as we checked to see if it is array-like or None in __init__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this will only apply bounds if bounds are given, otherwise just behave as the base class. What about making this a little more explicit and not duplicating logic.

What do you think about something like

if isinstance(self.limits, np.ndarray):
    <existing logic>
else:
    super().find_bins(<params>)

This won't work exactly, but I think could take advantage of the inheritance a little better as there is a lot of repeated logic.

@leesteinberg
Copy link
Author

I agree with all the changes. Not sure how to deal with the last one. In principle, the CoverBounds class could become the new Cover class (with default automatic bounds the same as currently).

Alternatively, could split define_bins into two functions in Cover, such as a set_bounds and find_edges function. That way, the set_bounds is the only thing that would need to be modified.

Let me know what you think. In the meantime I'll push the other changes up.

@sauln
Copy link
Member

sauln commented Jun 19, 2018

Thank you!

I like your idea of rolling CoverBounds into Cover.

The original intention was for different cover classes to enable things like hexagonal covers or stranger tilings. Bounds seem like the kind of feature that could be applied to many types of covers.

@leesteinberg
Copy link
Author

How would you feel if I did both? A HexagonalCover would only need to modify the find_edges of a split function, as far as I can tell? Or would it be best just to make one Cover class for now, using the define_bins method from CoverBounds?

@sauln
Copy link
Member

sauln commented Jun 27, 2018

Doing both sounds fine. Making it easier to extend sounds preferable.

find_edges would return the edges of each bin?

@leesteinberg
Copy link
Author

I'm just re-looking at the current version of the code.

From what I can tell, a hexagonal (or other) covering would only need to modify the find_entries method of the Cover class.

If this is the case, it might be best to just use the current CoverBounds as the base Cover class, and just set the default bounds to be the maximum and minimum values in every dimension.

Let me know your thoughts.

@sauln
Copy link
Member

sauln commented Jul 9, 2018

That sounds like a fine idea to me!

What do you think about also adding a stub of a class for CubicalCover? We could then convert some of the examples to use this version so it's explicit this is only one type of cover.

It could be as simple as

class CubicalCover(Cover): pass

@leesteinberg
Copy link
Author

Done as requested. Seems to pass the tests, and the class acts as before when no limits are defined. Also added the CubicalCover class

@sauln sauln merged commit 27f7167 into scikit-tda:master Jul 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants