merge Autogmm and GaussianCluster #306

bdpedigo · 2019-12-30T20:28:31Z

Seems like there is enough redundant code, they are both doing something similar, and we can still get both behaviors with different options

bdpedigo · 2020-03-24T22:01:29Z

to clarify, AutoGMM does everything that GaussianCluster does and more. The difference is, GaussianCluster has an option to run kmeans initialization multiple times, which I have found empirically can lead to better results than any of the AutoGMM initializations. If AutoGMM had a way of running multiple kmeans inits and taking the best, we could deprecate GaussianCluster

CaseyWeiner · 2020-04-17T21:57:21Z

I'll get this done this semester

bdpedigo · 2020-11-13T17:46:09Z

https://arxiv.org/abs/1909.02688

PerifanosPrometheus · 2020-11-15T23:35:15Z

@bdpedigo I will take this one, sounds fun! : )

DoD:

Implementation:
- Add option in AutoGMM class to run k-means multiple times:
  - Must be simple and intuitive to call for the user
  - Must be able to specify min_components and max_components to use
    - If max_components is None then it must utilize min_components as maximum number(analogously to gclust)
  - Must be integrated with the k-means method already present in the AutoGMM class
Testing:
- Must have identical output as gclust. Meaning that the difference between number of clusters assigned by gclust and the merged functionality to autoGMM must be identically 0 given same input data and same settings

This is an initial very much qualitative assesment of what the final implementation should do. I am sure it will progressively become more and more quantitative.

Please let me know if you have any suggestions/ideas.

PerifanosPrometheus · 2020-11-16T17:52:36Z

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoLarsIC.html

bdpedigo · 2020-11-16T17:54:13Z

see criterion_ for example
criterion_: array-like of shape (n_alphas,) The value of the information criteria (‘aic’, ‘bic’) across all alphas. The alpha which has the smallest information criterion is chosen. This value is larger by a factor of n_samples compared to Eqns. 2.15 and 2.16 in (Zou et al, 2007).

bdpedigo · 2020-11-17T21:50:14Z

@PerifanosPrometheus: @tliu68 is working on some changes to the internals of AutoGMM (see #371). I think this will just affect the initialization. You two might want to coordinate a bit so that the merge is easy at the end!

bdpedigo · 2020-11-17T21:51:11Z

@tliu68 this PR from @PerifanosPrometheus will more or less add 1 or more runs of kmeans as an initialization in AutoGMM, and maybe will rename the class itself.

tliu68 · 2020-11-18T06:07:03Z

@tliu68 this PR from @PerifanosPrometheus will more or less add 1 or more runs of kmeans as an initialization in AutoGMM, and maybe will rename the class itself.

Cool! @PerifanosPrometheus I'll let you know what specific changes I'm going to make soon. Thanks!

PerifanosPrometheus · 2020-11-18T20:04:56Z

@tliu68 this PR from @PerifanosPrometheus will more or less add 1 or more runs of kmeans as an initialization in AutoGMM, and maybe will rename the class itself.

Cool! @PerifanosPrometheus I'll let you know what specific changes I'm going to make soon. Thanks!

@tliu68 Currently, I am also on the planning stage so I haven't finalized any changes yet. Please let me know if there is anything that I should particularly be careful dealing with

tliu68 · 2020-11-19T14:14:51Z

@tliu68 Currently, I am also on the planning stage so I haven't finalized any changes yet. Please let me know if there is anything that I should particularly be careful dealing with

@PerifanosPrometheus I've opened a draft PR #589. Please see my description on all major changes I made. My plan for next step is to try to clean up/reorganize the for loops and see if I can incorporate parallel processing. But I believe the major updates we wanted to make have been addressed. Please let me know if you have any question and all suggestion/advice welcome!

jovo · 2021-01-26T18:52:06Z

@tliu68 @PerifanosPrometheus please coordinate

PerifanosPrometheus · 2021-01-28T18:47:59Z

@tliu68 I would like to continue working on this issue, however I noticed that PR #589 has been merged. I would like to touch back with you to have an update on changes merged.

Mainly, I believe that fixing this issue will involve modifying the param_grid dict in the previous implementation, has that been changed?

Lastly, I heard that you were planning on merging the function with sklearn(not entirely sure this info is correct) in that case what would be the timeline to make my changes(if even desirable).

tliu68 · 2021-01-29T12:22:46Z

@tliu68 I would like to continue working on this issue, however I noticed that PR #589 has been merged. I would like to touch back with you to have an update on changes merged.

Mainly, I believe that fixing this issue will involve modifying the param_grid dict in the previous implementation, has that been changed?

Lastly, I heard that you were planning on merging the function with sklearn(not entirely sure this info is correct) in that case what would be the timeline to make my changes(if even desirable).

Yes, I am working on merging this into sklearn (described in issue #601). The previous PR #589 basically changed the initialization process so that instead of running AgglomerativeClustering and GaussianMixture for all combinations in the old param_grid in parallel, now

for each set of agglomerative parameters,
AgglomerativeClustering is run for the smallest n_components (which is why I added another output param_grid_ag from _process_paramgrid -- to separate out the agglomerative parameters)
and a hierarchy of clusters is generated based on that assignment,
then after calculating all agglomerative clustering assignments, GaussianMixture is run for all combinations in param_grid

Does that make sense? Also, Ben and I will probably make more changes (to be discussed) but it would be great if we can start a PR into sklearn soon. Do you mind sharing your plan?

bdpedigo assigned bdpedigo and hhelm10 and unassigned bdpedigo and hhelm10 Jan 10, 2020

bdpedigo added help wanted Extra attention is needed good first issue Good for newcomers summer_students? Task that may be of interest for a short project labels Mar 24, 2020

bdpedigo changed the title ~~merge Autogmm and GaussianCluster~~ merge Autogmm and GaussianCluster, PR to sklearn Apr 17, 2020

bdpedigo assigned CaseyWeiner Apr 17, 2020

bdpedigo removed the good first issue Good for newcomers label May 29, 2020

bdpedigo unassigned CaseyWeiner Jun 29, 2020

bdpedigo added this to To Do in pedigo-help-wanted Jun 29, 2020

bdpedigo assigned asaadeldin11 Aug 4, 2020

bdpedigo removed help wanted Extra attention is needed summer_students? Task that may be of interest for a short project labels Aug 4, 2020

bdpedigo mentioned this issue Aug 4, 2020

fix AutoGMM to only run agglomerative clustering once #371

Closed

bdpedigo moved this from Code (To Do) to In Progress in pedigo-help-wanted Aug 4, 2020

bdpedigo mentioned this issue Aug 4, 2020

remove ari return value from AutoGMM.fit_predict #369

Closed

bdpedigo mentioned this issue Aug 21, 2020

Use GridSearch for GClust/KClust internals rather than having our own stuff #204

Closed

daxpryce unassigned asaadeldin11 Sep 15, 2020

daxpryce removed this from In Progress in pedigo-help-wanted Sep 15, 2020

bdpedigo changed the title ~~merge Autogmm and GaussianCluster, PR to sklearn~~ merge Autogmm and GaussianCluster Nov 10, 2020

bdpedigo added this to Code improvements in Neuro Data Design Nov 10, 2020

bdpedigo mentioned this issue Nov 10, 2020

Adding different model selection criteria for GMM #581

Closed

tliu68 mentioned this issue Feb 6, 2021

merge AutoGMM into Sklearn #601

Closed

PerifanosPrometheus mentioned this issue Feb 9, 2021

Add option for more than one kmeans init to autogmm #662

Merged

bdpedigo closed this as completed in #662 Aug 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge Autogmm and GaussianCluster #306

merge Autogmm and GaussianCluster #306

bdpedigo commented Dec 30, 2019

bdpedigo commented Mar 24, 2020

CaseyWeiner commented Apr 17, 2020

bdpedigo commented Nov 13, 2020

PerifanosPrometheus commented Nov 15, 2020

PerifanosPrometheus commented Nov 16, 2020

bdpedigo commented Nov 16, 2020

bdpedigo commented Nov 17, 2020

bdpedigo commented Nov 17, 2020 •

edited

tliu68 commented Nov 18, 2020

PerifanosPrometheus commented Nov 18, 2020

tliu68 commented Nov 19, 2020

jovo commented Jan 26, 2021

PerifanosPrometheus commented Jan 28, 2021

tliu68 commented Jan 29, 2021

merge Autogmm and GaussianCluster #306

merge Autogmm and GaussianCluster #306

Comments

bdpedigo commented Dec 30, 2019

bdpedigo commented Mar 24, 2020

CaseyWeiner commented Apr 17, 2020

bdpedigo commented Nov 13, 2020

PerifanosPrometheus commented Nov 15, 2020

PerifanosPrometheus commented Nov 16, 2020

bdpedigo commented Nov 16, 2020

bdpedigo commented Nov 17, 2020

bdpedigo commented Nov 17, 2020 • edited

tliu68 commented Nov 18, 2020

PerifanosPrometheus commented Nov 18, 2020

tliu68 commented Nov 19, 2020

jovo commented Jan 26, 2021

PerifanosPrometheus commented Jan 28, 2021

tliu68 commented Jan 29, 2021

bdpedigo commented Nov 17, 2020 •

edited