Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] OPTICS: add extract_xi method #12077

Merged
merged 107 commits into from Apr 27, 2019

Conversation

Projects
None yet
6 participants
@adrinjalali
Copy link
Member

commented Sep 14, 2018

Related: #12036, #12053, #11677
Fixes #12376

Adding the Xi extraction method to OPTICS.

@espg, I'll need the predecessor_ to add that final touch to it. I'm still testing the implementation, but it'll be nice if I could get some feedback. (done)

Differences with the paper:

  • Definition 11 section 4: comparisons are done with r(e_U + 1), but section 4.3.2 compares values with r(e_U), this implementation takes r(e_U + 1) which seems more consistent.
  • The article assimes min_samples as the minimum size of clusters, which is the default behavior of this implementation, but we also have a min_cluster_size to give more freedom to the user.

@adrinjalali adrinjalali force-pushed the adrinjalali:optics/extractXi branch from 1375b73 to baa94f9 Sep 16, 2018

adrinjalali added some commits Sep 17, 2018

@qinhanmin2014

This comment has been minimized.

Copy link
Member

commented Apr 23, 2019

This should not be possible with a hierarchical approach such as OPTICS. It would violate the condition that a cluster must not have a in-between value that is higher than the ends. That is different if you had an overlapping clustering such as CLIQUE, but in that case it simply does not make sense to flatten this into a single-integer labeling this way at all.

Ahh, I see. Thanks a lot @adrinjalali @kno10. So our method actually produce the same result as R's. That's fine.

@qinhanmin2014

This comment has been minimized.

Copy link
Member

commented Apr 23, 2019

@adrinjalali please mention how we assign the labels.
Please update what's new.
It's very close from my +1 IMO (after CIs are green).
I still think the loop in _xi_cluster can be simplified, but feel free to leave it.

@adrinjalali

This comment has been minimized.

Copy link
Member Author

commented Apr 24, 2019

@qinhanmin2014

Please update what's new.

what do you want me to change? I think I've applied to comments there, sorry if I've missed something, could you please remind me?

@@ -719,7 +719,7 @@ def _correct_predecessor(reachability_plot, predecessor, ordering, s, e):
while s < e:
if reachability_plot[s] > reachability_plot[e]:
return s, e
p_e = predecessor[e]
p_e = ordering[predecessor[e]]

This comment has been minimized.

Copy link
@kno10

kno10 Apr 24, 2019

Contributor

Hmmm. That comes a bit unexpected. I would have expected the predecessor to be an object index, not a plot order index. I'd assume that users will usually expect the predecessor to be in data order, what do you think?

In particular, this is the "externally" used ordering:

predecessor_ : array, shape (n_samples,)
Point that a sample was reached from, indexed by object order.

I'd rather try to get rid of the reindexing here:

clusters = _xi_cluster(reachability[ordering], predecessor[ordering],

or at least use some explicit naming, such as predecessor_plot. Otherwise, this is a maintenance nightmare, and the next author trying to modify this will be similarly confused and also need 4-5 attempts to get the indexing right...

This comment has been minimized.

Copy link
@qinhanmin2014

qinhanmin2014 Apr 25, 2019

Member

that's great, thanks a lot @adrinjalali @kno10

This comment has been minimized.

Copy link
@adrinjalali

adrinjalali Apr 25, 2019

Author Member

yeah, that's why I renamed the parameters to predecessor_plot if the order was the same os reachability_plot. We can/should better document this and clarity in our docs later. But I think this is fine for now.

@qinhanmin2014

This comment has been minimized.

Copy link
Member

commented Apr 25, 2019

@adrinjalali we need to update what's new and mention how we assign the labels, but I'm OK to merge this one first.

@qinhanmin2014
Copy link
Member

left a comment

@qinhanmin2014 qinhanmin2014 changed the title [MRG+1] OPTICS: add extract_xi method [MRG+2] OPTICS: add extract_xi method Apr 25, 2019

@jnothman

This comment has been minimized.

Copy link
Member

commented Apr 25, 2019

@qinhanmin2014
Copy link
Member

left a comment

LGTM, ping @jnothman is your approval still valid? I think we can merge.

@adrinjalali

This comment has been minimized.

Copy link
Member Author

commented Apr 25, 2019

Seems like this is ready for merge :)

@espg

This comment has been minimized.

Copy link

commented Apr 25, 2019

Really good to see this merged-- stellar work everyone! =)

@qinhanmin2014

This comment has been minimized.

Copy link
Member

commented Apr 25, 2019

Seems like this is ready for merge :)

I'd like to wait for @jnothman at least. We've introduced couple of changes since his approval.

@jnothman

This comment has been minimized.

Copy link
Member

commented Apr 27, 2019

We've introduced couple of changes since his approval.

I've been following the conversation more-or-less. Even if I'm a little ashamed at missing lots, I agree with the subsequent changes and am very glad that @qinhanmin2014 and @kno10 have treated this thoroughly.

@jnothman jnothman merged commit a475594 into scikit-learn:master Apr 27, 2019

16 checks passed

LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 97.18% of diff hit (target 96.62%)
Details
codecov/project 96.94% (+0.31%) compared to badaa15
Details
scikit-learn.scikit-learn Build #20190425.30 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_np_atlas) Linux py35_np_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda) Linux pylatest_conda succeeded
Details
scikit-learn.scikit-learn (Windows py35_32) Windows py35_32 succeeded
Details
scikit-learn.scikit-learn (Windows py37_64) Windows py37_64 succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda) macOS pylatest_conda succeeded
Details
@jnothman

This comment has been minimized.

Copy link
Member

commented Apr 27, 2019

Done.

@qinhanmin2014

This comment has been minimized.

Copy link
Member

commented Apr 27, 2019

Thanks everyone for your great work!

@adrinjalali adrinjalali deleted the adrinjalali:optics/extractXi branch Apr 28, 2019

xhlulu added a commit to xhlulu/scikit-learn that referenced this pull request Apr 28, 2019

xhlulu added a commit to xhlulu/scikit-learn that referenced this pull request Apr 28, 2019

xhlulu added a commit to xhlulu/scikit-learn that referenced this pull request Apr 28, 2019

marcelobeckmann added a commit to marcelobeckmann/scikit-learn that referenced this pull request May 1, 2019

marcelobeckmann added a commit to marcelobeckmann/scikit-learn that referenced this pull request May 1, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.