Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastOPTICS in sklearn.cluster #25824

Open
DiTo97 opened this issue Mar 12, 2023 · 4 comments
Open

FastOPTICS in sklearn.cluster #25824

DiTo97 opened this issue Mar 12, 2023 · 4 comments

Comments

@DiTo97
Copy link

DiTo97 commented Mar 12, 2023

Describe the workflow you want to enable

Is there a roadmap to add the FastOPTICS algorithm, [1], to the sklearn.cluster code base that already supports OPTICS?

[1] 2013, J. Schneider, M. Vlachos, _Fast Parameterless Density-based Clustering via Random Projections

Describe your proposed solution

The solution would be to combine what has already been done for the base OPTICS algorithm, combined with the existing code base for random projections and the Johnson-Lindenstrauss bound in sklearn.random_projection, to implement FastOPTICS.

The implementation in the data mining library ELKI (albeit in Java) could be used as an inspiration.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@DiTo97 DiTo97 added Needs Triage Issue requires triage New Feature labels Mar 12, 2023
@adrinjalali
Copy link
Member

On its own the paper doesn't cut our inclusion criteria. But if we see it as a new solver to OPTICS, and the implementation not being too complex, we could think about including it.

Would also need how this plays with the work being done on HDBSCAN: main...hdbscan (cc @Micky774 )

Note that we can NOT use ELKI as inspiration since it's GPL. We went down that road when working on OPTICS, and we needed to implement using the paper, rather than another implementation (which didn't end up being a bad idea anyway).

@adrinjalali adrinjalali added Needs Decision Requires decision module:cluster and removed Needs Triage Issue requires triage labels Mar 12, 2023
@DiTo97
Copy link
Author

DiTo97 commented Mar 12, 2023

Hi @adrinjalali,

Sorry for the GPL constraint, I am not familiar with ELKI and didn't check the license.

As for the selection criteria, I guess the one that the paper doesn't cut is having 200+ citations, am I right? From a few experiments with ELKI, I would say it really speeds OPTICS up, especially when the data is in very high dimensions, but I also noticed how the paper doesn't provide a quantitative analysis on how close it actually is to the base OPTICS in different data scenarios

@Veghit
Copy link
Contributor

Veghit commented Mar 12, 2023

@adrinjalali what's our criteria? (beginner here trying to learn)

@DiTo97
Copy link
Author

DiTo97 commented Mar 12, 2023

Hi @Veghit,

I think they're listed here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants