Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Provide public API for fast DisjointSet subset size. #17714

Merged
merged 1 commit into from
Jan 11, 2023

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Jan 4, 2023

Reference issue

Closes #17285.

What does this implement/fix?

Provide public API to quickly get the size of a DisjointSet subset.
This information is already stored internally as disjoint_set._sizes; this PR simply adds a public method to access it.

Additional information

@j-bowhay j-bowhay added enhancement A new feature or improvement scipy.cluster labels Jan 4, 2023
Copy link
Contributor

@fuglede fuglede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor remarks, but otherwise LGTM.

result : int
Size of the subset containing `x`.
"""
return self._sizes[self[x]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other methods here are diligent about checking performing a

        if x not in self._indices:
            raise KeyError(x)

before doing anything. Probably not a must, as the same KeyError will surface almost immediately anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a good thing to have a better error message. But to follow our conventions it should be raise ValueError(message). And this would need a test with pytest.raises(..., match=message)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the KeyError raised by __getitem__ is just as good. Note that merge and connected likewise do not perform explicit checking and rely on the same check in __getitem__.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which does not necessarily mean it's perfect 😉 I will let @fuglede make the call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, sounds good to me.

@@ -210,6 +210,25 @@ def subset(self, x):
nxt = self._nbrs[nxt]
return set(result)

def subset_size(self, x):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also make sense to include this in the examples on how to use the class, i.e. something like

    >>> disjoint_set.subset('a')
    {'a', 3, 'b'}
    >>> disjoint_set.subset_size('a')
    3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

This information is already stored internally as `disjoint_set._sizes`;
this PR simply adds a public method to access it.
@tupui
Copy link
Member

tupui commented Jan 10, 2023

@fuglede is this good to go for you?

@fuglede
Copy link
Contributor

fuglede commented Jan 11, 2023

@fuglede is this good to go for you?

Yes. Lint check is failing but for a file that hasn't changed here.

Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let's get this in then. Thank you both!

@tupui tupui merged commit 33e7889 into scipy:main Jan 11, 2023
@anntzer anntzer deleted the dsss branch January 11, 2023 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.cluster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Expose DisjointSet._sizes
4 participants