Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archetypal Analysis #19610

Open
aleixalcacer opened this issue Mar 3, 2021 · 2 comments
Open

Archetypal Analysis #19610

aleixalcacer opened this issue Mar 3, 2021 · 2 comments

Comments

@aleixalcacer
Copy link

aleixalcacer commented Mar 3, 2021

Describe the workflow you want to enable

Archetypal analysis is similar to clustering analysis. However, instead of cluster centers, it seeks extremal points in the dataset (called "archetypes"), sometimes providing more interpretable results than clustering.

Describe your proposed solution

Implement the algorithm proposed in Cutler and Breiman, 1994. Since it is similar to clustering analysis algorithms, it will fit well within the fit, predict/transform interface.

In the R ecosystem there is already a specific package for archetypal analysis: archetypes

What do you think about adding this new algorithm? I can open a PR with a proposal :)

@jnothman
Copy link
Member

jnothman commented Mar 4, 2021

I think this is much more comparable to decomposition than to clustering. The seminal paper has a relatively modest number of citations given its age, as do the papers cited in CRAN's archetypes, so I am not immediately convinced that this will be widely used by the Scikit-learn community. Arguing for it will require an example robustly demonstrating its usefulness for some machine learning problem. Even then, it may be a better candidate for scikit-learn-extra or another external package within scikit-learn's orbit.

@aleixalcacer
Copy link
Author

aleixalcacer commented Mar 4, 2021

Yes, it lies somewhere between decomposition and clustering. For example, if the observations are grouped by the closest archetype, archetypal analysis can be used in clustering problems. Here you can see an example of the usefulness of archetypal analysis.

Anyway, If you are still not 100% convinced, I think the best option is to create an external package. Therefore, other algorithms that do not satisfy the scikit-learn requirements can also be implemented. There are some guidelines or something like that to develop a package following the scikit-learn rules? I have only found this template, but I don't know if it is the right way to start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants