Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corner cases for sil_score calculation in _gather_results() #19

Closed
Hu-JIN opened this issue Mar 16, 2021 · 1 comment
Closed

Corner cases for sil_score calculation in _gather_results() #19

Hu-JIN opened this issue Mar 16, 2021 · 1 comment

Comments

@Hu-JIN
Copy link
Collaborator

Hu-JIN commented Mar 16, 2021

There are cases where a cluster contains only 1 sample. In those cases, the sil_score for that signature is currently 0. It might be better to set the sil_score to 1 in those cases.

Example: I ran DenovoSig with mvNMF on simulated PCAWG Lung-AdenoCA data, with init=random and n_replicates=20. SBS3 is difficult to be discovered. It turns out that, in 19 out of the 20 cases, two SBS4's are extracted, while in the rest 1 case, an SBS3 is discovered. So in the final clustering result, there is a SBS3 cluster with 1 sample, and a SBS4 cluster with 39 samples.

This example also illustrates why it may not be good to force each cluster to contain one and only one sample from each replicate, as done in SigProfiler.

@Hu-JIN
Copy link
Collaborator Author

Hu-JIN commented Jul 19, 2021

See eebdf3b

@Hu-JIN Hu-JIN closed this as completed Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant