-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Provide public API for fast DisjointSet subset size. #17714
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor remarks, but otherwise LGTM.
result : int | ||
Size of the subset containing `x`. | ||
""" | ||
return self._sizes[self[x]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other methods here are diligent about checking performing a
if x not in self._indices:
raise KeyError(x)
before doing anything. Probably not a must, as the same KeyError
will surface almost immediately anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a good thing to have a better error message. But to follow our conventions it should be raise ValueError(message)
. And this would need a test with pytest.raises(..., match=message)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the KeyError raised by __getitem__
is just as good. Note that merge
and connected
likewise do not perform explicit checking and rely on the same check in __getitem__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which does not necessarily mean it's perfect 😉 I will let @fuglede make the call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, sounds good to me.
@@ -210,6 +210,25 @@ def subset(self, x): | |||
nxt = self._nbrs[nxt] | |||
return set(result) | |||
|
|||
def subset_size(self, x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also make sense to include this in the examples on how to use the class, i.e. something like
>>> disjoint_set.subset('a')
{'a', 3, 'b'}
>>> disjoint_set.subset_size('a')
3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done.
This information is already stored internally as `disjoint_set._sizes`; this PR simply adds a public method to access it.
@fuglede is this good to go for you? |
Yes. Lint check is failing but for a file that hasn't changed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok let's get this in then. Thank you both!
Reference issue
Closes #17285.
What does this implement/fix?
Provide public API to quickly get the size of a DisjointSet subset.
This information is already stored internally as
disjoint_set._sizes
; this PR simply adds a public method to access it.Additional information