Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Change the default value of n_replicates_filter_ratio_thresh to 0.2.
We don't want the stability definition to depend too much on n_replicates_after_filtering/n_replicates. Before we were using 0.5. That makes sense, since 'stable' perhaps means that at least the majority (i.e., > 50%) of the solutions are equivalently good. But I observed that mvNMF solutions could be quite unstable, due to the fact that we only tune lambda_tilde for one of the mvNMF runs. That results in relatively lower ratio of n_replicates_after_filtering/n_replicates. So I'm lowering the threshold here.
- Loading branch information
caf9254
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's think about it this way.
Before, we don't use n_replicates_after_filtering at all in defining stability. The problem is that, when there is only 1 replicate left after filtering, it will receive a sil_score of 1 by our definition, according to eebdf3b. That is unwanted, since we do want to consider this solution unstable because most of the solutions are already filtered out. And as long as we have more than 1 replicates, sil_score is well defined and we don't have to rely on n_replicates_after_filtering to define stability.
So, as long as we have a sufficiently large n_replicates and a not too small n_replicates_filter_ratio_thresh, we can take care of these cases.
By setting n_replicates_filter_ratio_thresh to 0.2, if we assume n_replicates > 5, then the cases with a single replicate left will be filtered out. Basically, if n_replicates = 5, and n_replicates_after_filtering = 1, then it won't get filtered out, i.e., it'll still be considered stable, because 1/5 >= 0.2.
In reality, the smallest n_replicates we might do is 10. In that case, n_replicates_after_filtering = 1 will be considered unstable, and n_replicates_after_filtering = 2 will be considered stable. In the latter case, sil_score will be further used to define stability, and in this case, sil_score is well-defined so we are good.