Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for more than one kmeans init to autogmm #662

Merged
merged 29 commits into from Aug 24, 2021

Conversation

PerifanosPrometheus
Copy link
Contributor

@PerifanosPrometheus PerifanosPrometheus commented Feb 9, 2021

Reference Issues/PRs

Closes #306

What does this implement/fix? Briefly explain your changes.

Adds option to run multiple k-means initializations to autogmm

Any other comments?

@netlify
Copy link

netlify bot commented Feb 9, 2021

❌ Deploy Preview for graspologic failed.

🔨 Explore the source changes: 0c998d5

🔍 Inspect the deploy log: https://app.netlify.com/sites/graspologic/deploys/61250197d1bec00007f1c1e8

@tliu68 tliu68 self-assigned this Feb 11, 2021
@PerifanosPrometheus
Copy link
Contributor Author

@tiliu68 I am open to continue working on improving autogmm if you have any other implementations, even simple such as this one in mind.

@tliu68
Copy link
Contributor

tliu68 commented Feb 17, 2021

@tiliu68 I am open to continue working on improving autogmm if you have any other implementations, even simple such as this one in mind.

Great. Thanks! Any other comments from @bdpedigo? If not, I will update this change into GaussianMixtureIC here

@bdpedigo
Copy link
Collaborator

talked to @PerifanosPrometheus yesterday and I don't think this current PR does what we want, as just changing n_init for each GaussianMixture object that we are already fitting won't change anything (I think).

@bdpedigo
Copy link
Collaborator

I proposed modifying the param_grid somewhere to add a bunch of n_init=1 GaussianMixture objects without any agglomerative initialization, or something like that

@PerifanosPrometheus
Copy link
Contributor Author

PerifanosPrometheus commented Feb 23, 2021

I proposed modifying the param_grid somewhere to add a bunch of n_init=1 GaussianMixture objects without any agglomerative initialization, or something like that

Just pushed the modified code and it should now only append models ran with multiple k-means initializations with no linkage when n_init is an integer and label_init is None. The param_grid is modified in the process_paramgrid function.

@PerifanosPrometheus
Copy link
Contributor Author

@tliu68 I think this PR is ready for review! Please let me know if something in my code isn't clear or could be done more efficiently, I am always open to suggestions and improvements.

@PerifanosPrometheus
Copy link
Contributor Author

@tliu68 I have made some changes requested by Pedigo such as initializing every run with a different seed(even for runs with no multiple initialization), I am now adding rows with n_init: 1. I have yet to remove the redundant run_multiple_init condition as I believe it simplifies reading of the if statement in the process_paramgrid function(in the sense that I think it is much easier to see if run_multiple_init rather than if n_init and label_init is None)but can certainly be removed if requested. Please let me know what you think and if you have suggestions for changing the code.

@PerifanosPrometheus
Copy link
Contributor Author

@tliu68 I should have addressed the feedback provided, please let me know if you would like me to make some other changes : )

@tliu68
Copy link
Contributor

tliu68 commented Apr 29, 2021

No more comments from me. BTW, @PerifanosPrometheus I adapted part of your code into our PR to sklearn and added your name and jhu email here (please let me know if it's not correct).

@PerifanosPrometheus
Copy link
Contributor Author

PerifanosPrometheus commented Apr 29, 2021

No more comments from me. BTW, @PerifanosPrometheus I adapted part of your code into our PR to sklearn and added your name and jhu email here (please let me know if it's not correct).

That is correct! Thank you very much for helping me out throughout the whole process! Will update the code to match that in the sklearn PR

@bdpedigo
Copy link
Collaborator

@tliu68 if you approve can you approve on github?

Copy link
Contributor

@tliu68 tliu68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed 'n_init' to 'kmeans_n_init'. And I think the code looks good.

@PerifanosPrometheus
Copy link
Contributor Author

@tliu68 Thank you very much for everything!

@tliu68
Copy link
Contributor

tliu68 commented May 3, 2021

@tliu68 Thank you very much for everything!

My pleasure!

@bdpedigo bdpedigo changed the title Merge gclust to autogmm Add option for more than one kmeans init to autogmm May 3, 2021
@bdpedigo
Copy link
Collaborator

bdpedigo commented May 3, 2021

@tliu68 can you re-review? there are some issues:

  1. The code doesn't actually run with kmeans_n_init > 1
  2. There are no tests for this new functionality, which I think is why this was not caught.

@PerifanosPrometheus
Copy link
Contributor Author

@tliu68 can you re-review? there are some issues:

  1. The code doesn't actually run with kmeans_n_init > 1
  2. There are no tests for this new functionality, which I think is why this was not caught.

Apologies for the inconvenience! Will quickly check the new functionality and add tests

@PerifanosPrometheus
Copy link
Contributor Author

@Bpedigo Solved the run error. Essentially on the last commits when tliu changed from n_init to kmeans_n_init she forgot that the gmparams appended should not have changed also to kmeans_n_init since GaussianMixture in sklearn only accepts n_init as parameters for kmeans initializations. Would you still like me to add some tests? If so what type of test could I add?

@bdpedigo
Copy link
Collaborator

bdpedigo commented May 4, 2021

@PerifanosPrometheus I'd just like a test showing that the code runs basically. Can copy one of the simple tests that are already there and just change that one parameter to kmeans_n_init=2 or something

@tliu68 tliu68 self-requested a review May 4, 2021 12:55
@tliu68
Copy link
Contributor

tliu68 commented May 4, 2021

Sorry, that was my mistake. As for adding new test, maybe we can have something very similar to the test_two_class here and maybe change this line to AutoGMM = AutoGMMCluster(max_components=5, kmeans_n_init=2) and rename the test to something like test_two_class_multiple_kmeans_inits?

@PerifanosPrometheus
Copy link
Contributor Author

@Bpedigo and @tliu68 will get it done and thank you for the suggestions!

@bdpedigo
Copy link
Collaborator

@tliu68 any chance you'd mind resolving conflicts on this, I'm not sure what has changed since this PR. Do you still think it is good to go?

@tliu68
Copy link
Contributor

tliu68 commented Aug 24, 2021

@tliu68 any chance you'd mind resolving conflicts on this, I'm not sure what has changed since this PR. Do you still think it is good to go?

I believe I've resolved them. And I think it looks good and does what we wanted.

@bdpedigo bdpedigo merged commit 29e1892 into graspologic-org:dev Aug 24, 2021
@bdpedigo
Copy link
Collaborator

@tliu68 any chance you'd mind resolving conflicts on this, I'm not sure what has changed since this PR. Do you still think it is good to go?

I believe I've resolved them. And I think it looks good and does what we wanted.

Thanks for reviewing @tliu68 and thanks for the PR @PerifanosPrometheus !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

merge Autogmm and GaussianCluster
3 participants