Skip to content

Commit

Permalink
Change the default value of n_replicates_filter_ratio_thresh to 0.2.
Browse files Browse the repository at this point in the history
We don't want the stability definition to depend too much on n_replicates_after_filtering/n_replicates.

Before we were using 0.5. That makes sense, since 'stable' perhaps means that at least the majority (i.e., > 50%) of the solutions are equivalently good.

But I observed that mvNMF solutions could be quite unstable, due to the fact that we only tune lambda_tilde for one of the mvNMF runs. That results in relatively lower ratio of n_replicates_after_filtering/n_replicates. So I'm lowering the threshold here.
  • Loading branch information
Hu-JIN committed Jul 19, 2021
1 parent 46efc98 commit caf9254
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions musical/denovo.py
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,7 @@ def _gather_results(X, Ws, Hs=None, method='cluster_by_matching', n_components=N
def _select_n_components(n_components_all, samplewise_reconstruction_errors_all, sil_score_all,
n_replicates, n_replicates_after_filtering_all,
pthresh=0.05, sil_score_mean_thresh=0.8, sil_score_min_thresh=0.2,
n_replicates_filter_ratio_thresh=0.5,
n_replicates_filter_ratio_thresh=0.2,
method='algorithm1'):
"""Select the best n_components based on reconstruction error and stability.
Expand Down Expand Up @@ -714,7 +714,7 @@ def fit(self, eng=None):
pthresh=self.pthresh,
sil_score_mean_thresh=0.8,
sil_score_min_thresh=0.2,
n_replicates_filter_ratio_thresh=0.5,
n_replicates_filter_ratio_thresh=0.2,
method='algorithm1'
)
self.W = self.W_all[self.n_components]
Expand All @@ -735,7 +735,7 @@ def fit(self, eng=None):
pthresh=self.pthresh,
sil_score_mean_thresh=0.8,
sil_score_min_thresh=0.2,
n_replicates_filter_ratio_thresh=0.5,
n_replicates_filter_ratio_thresh=0.2,
method='algorithm1.1'
)

Expand All @@ -748,7 +748,7 @@ def fit(self, eng=None):
pthresh=self.pthresh,
sil_score_mean_thresh=0.8,
sil_score_min_thresh=0.2,
n_replicates_filter_ratio_thresh=0.5,
n_replicates_filter_ratio_thresh=0.2,
method='algorithm2'
)

Expand All @@ -761,7 +761,7 @@ def fit(self, eng=None):
pthresh=self.pthresh,
sil_score_mean_thresh=0.8,
sil_score_min_thresh=0.2,
n_replicates_filter_ratio_thresh=0.5,
n_replicates_filter_ratio_thresh=0.2,
method='algorithm2.1'
)

Expand Down

1 comment on commit caf9254

@Hu-JIN
Copy link
Collaborator Author

@Hu-JIN Hu-JIN commented on caf9254 Jul 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's think about it this way.

Before, we don't use n_replicates_after_filtering at all in defining stability. The problem is that, when there is only 1 replicate left after filtering, it will receive a sil_score of 1 by our definition, according to eebdf3b. That is unwanted, since we do want to consider this solution unstable because most of the solutions are already filtered out. And as long as we have more than 1 replicates, sil_score is well defined and we don't have to rely on n_replicates_after_filtering to define stability.

So, as long as we have a sufficiently large n_replicates and a not too small n_replicates_filter_ratio_thresh, we can take care of these cases.

By setting n_replicates_filter_ratio_thresh to 0.2, if we assume n_replicates > 5, then the cases with a single replicate left will be filtered out. Basically, if n_replicates = 5, and n_replicates_after_filtering = 1, then it won't get filtered out, i.e., it'll still be considered stable, because 1/5 >= 0.2.

In reality, the smallest n_replicates we might do is 10. In that case, n_replicates_after_filtering = 1 will be considered unstable, and n_replicates_after_filtering = 2 will be considered stable. In the latter case, sil_score will be further used to define stability, and in this case, sil_score is well-defined so we are good.

Please sign in to comment.