Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge Autogmm and GaussianCluster #306

Closed
bdpedigo opened this issue Dec 30, 2019 · 14 comments · Fixed by #662
Closed

merge Autogmm and GaussianCluster #306

bdpedigo opened this issue Dec 30, 2019 · 14 comments · Fixed by #662

Comments

@bdpedigo
Copy link
Collaborator

Seems like there is enough redundant code, they are both doing something similar, and we can still get both behaviors with different options

@bdpedigo bdpedigo assigned bdpedigo and hhelm10 and unassigned bdpedigo and hhelm10 Jan 10, 2020
@bdpedigo
Copy link
Collaborator Author

to clarify, AutoGMM does everything that GaussianCluster does and more. The difference is, GaussianCluster has an option to run kmeans initialization multiple times, which I have found empirically can lead to better results than any of the AutoGMM initializations. If AutoGMM had a way of running multiple kmeans inits and taking the best, we could deprecate GaussianCluster

@bdpedigo bdpedigo added help wanted Extra attention is needed good first issue Good for newcomers summer_students? Task that may be of interest for a short project labels Mar 24, 2020
@bdpedigo bdpedigo changed the title merge Autogmm and GaussianCluster merge Autogmm and GaussianCluster, PR to sklearn Apr 17, 2020
@CaseyWeiner
Copy link
Contributor

I'll get this done this semester

@bdpedigo bdpedigo removed the good first issue Good for newcomers label May 29, 2020
@bdpedigo bdpedigo added this to To Do in pedigo-help-wanted Jun 29, 2020
@bdpedigo bdpedigo removed help wanted Extra attention is needed summer_students? Task that may be of interest for a short project labels Aug 4, 2020
@bdpedigo bdpedigo moved this from Code (To Do) to In Progress in pedigo-help-wanted Aug 4, 2020
@daxpryce daxpryce removed this from In Progress in pedigo-help-wanted Sep 15, 2020
@bdpedigo bdpedigo changed the title merge Autogmm and GaussianCluster, PR to sklearn merge Autogmm and GaussianCluster Nov 10, 2020
@bdpedigo bdpedigo added this to Code improvements in Neuro Data Design Nov 10, 2020
@bdpedigo
Copy link
Collaborator Author

@PerifanosPrometheus
Copy link
Contributor

@bdpedigo I will take this one, sounds fun! : )

DoD:

  • Implementation:
    • Add option in AutoGMM class to run k-means multiple times:
      • Must be simple and intuitive to call for the user
      • Must be able to specify min_components and max_components to use
        - If max_components is None then it must utilize min_components as maximum number(analogously to gclust)
      • Must be integrated with the k-means method already present in the AutoGMM class
  • Testing:
    • Must have identical output as gclust. Meaning that the difference between number of clusters assigned by gclust and the merged functionality to autoGMM must be identically 0 given same input data and same settings

This is an initial very much qualitative assesment of what the final implementation should do. I am sure it will progressively become more and more quantitative.

Please let me know if you have any suggestions/ideas.

@PerifanosPrometheus
Copy link
Contributor

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoLarsIC.html

@bdpedigo
Copy link
Collaborator Author

see criterion_ for example
criterion_: array-like of shape (n_alphas,) The value of the information criteria (‘aic’, ‘bic’) across all alphas. The alpha which has the smallest information criterion is chosen. This value is larger by a factor of n_samples compared to Eqns. 2.15 and 2.16 in (Zou et al, 2007).

@bdpedigo
Copy link
Collaborator Author

@PerifanosPrometheus: @tliu68 is working on some changes to the internals of AutoGMM (see #371). I think this will just affect the initialization. You two might want to coordinate a bit so that the merge is easy at the end!

@bdpedigo
Copy link
Collaborator Author

bdpedigo commented Nov 17, 2020

@tliu68 this PR from @PerifanosPrometheus will more or less add 1 or more runs of kmeans as an initialization in AutoGMM, and maybe will rename the class itself.

@tliu68
Copy link
Contributor

tliu68 commented Nov 18, 2020

@tliu68 this PR from @PerifanosPrometheus will more or less add 1 or more runs of kmeans as an initialization in AutoGMM, and maybe will rename the class itself.

Cool! @PerifanosPrometheus I'll let you know what specific changes I'm going to make soon. Thanks!

@PerifanosPrometheus
Copy link
Contributor

@tliu68 this PR from @PerifanosPrometheus will more or less add 1 or more runs of kmeans as an initialization in AutoGMM, and maybe will rename the class itself.

Cool! @PerifanosPrometheus I'll let you know what specific changes I'm going to make soon. Thanks!

@tliu68 Currently, I am also on the planning stage so I haven't finalized any changes yet. Please let me know if there is anything that I should particularly be careful dealing with

@tliu68
Copy link
Contributor

tliu68 commented Nov 19, 2020

@tliu68 Currently, I am also on the planning stage so I haven't finalized any changes yet. Please let me know if there is anything that I should particularly be careful dealing with

@PerifanosPrometheus I've opened a draft PR #589. Please see my description on all major changes I made. My plan for next step is to try to clean up/reorganize the for loops and see if I can incorporate parallel processing. But I believe the major updates we wanted to make have been addressed. Please let me know if you have any question and all suggestion/advice welcome!

@jovo
Copy link

jovo commented Jan 26, 2021

@tliu68 @PerifanosPrometheus please coordinate

@PerifanosPrometheus
Copy link
Contributor

@tliu68 I would like to continue working on this issue, however I noticed that PR #589 has been merged. I would like to touch back with you to have an update on changes merged.

Mainly, I believe that fixing this issue will involve modifying the param_grid dict in the previous implementation, has that been changed?

Lastly, I heard that you were planning on merging the function with sklearn(not entirely sure this info is correct) in that case what would be the timeline to make my changes(if even desirable).

@tliu68
Copy link
Contributor

tliu68 commented Jan 29, 2021

@tliu68 I would like to continue working on this issue, however I noticed that PR #589 has been merged. I would like to touch back with you to have an update on changes merged.

Mainly, I believe that fixing this issue will involve modifying the param_grid dict in the previous implementation, has that been changed?

Lastly, I heard that you were planning on merging the function with sklearn(not entirely sure this info is correct) in that case what would be the timeline to make my changes(if even desirable).

Yes, I am working on merging this into sklearn (described in issue #601). The previous PR #589 basically changed the initialization process so that instead of running AgglomerativeClustering and GaussianMixture for all combinations in the old param_grid in parallel, now

  • for each set of agglomerative parameters,
  • AgglomerativeClustering is run for the smallest n_components (which is why I added another output param_grid_ag from _process_paramgrid -- to separate out the agglomerative parameters)
  • and a hierarchy of clusters is generated based on that assignment,
  • then after calculating all agglomerative clustering assignments, GaussianMixture is run for all combinations in param_grid

Does that make sense? Also, Ben and I will probably make more changes (to be discussed) but it would be great if we can start a PR into sklearn soon. Do you mind sharing your plan?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Neuro Data Design
Code improvements
Development

Successfully merging a pull request may close this issue.

7 participants