-
-
Notifications
You must be signed in to change notification settings - Fork 773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more customizable hyperparameters for SpectralClustering #995
Conversation
Quick update on the performance (DER %) of this proposed spectral clustering (SC) pipeline compared to the default pyannote/speaker-diarization pipeline based on hierarchical agglomerative clustering (HAC):
Overall, it is significantly worse on DIHARD (currently evaluating on VoxConverse). |
That's interesting, but I wouldn't be too surprised by that, since the optimal spectral clustering hyper-params are usually highly dependent on other modules in the diarization system, such as how the speaker embeddings are trained, and how the speaker segmentation is implemented. Some general suggestions that might be interesting to be explored:
Also, spectral clustering is known to work very badly when the sequence of embeddings is very short. So I added We could try to use something like |
Thanks for your feedback. Will try to optimize a bunch of those hyperparameters... but cannot promise any ETA. |
Thanks. Just curious, do you have a script to do the data download, extraction, and parsing? (such that If so, I can give it a try as well. |
I have something like this for the AMI dataset. |
Shall we merge this PR regardless of the results? We can always update |
Codecov Report
@@ Coverage Diff @@
## develop #995 +/- ##
===========================================
- Coverage 35.61% 35.32% -0.29%
===========================================
Files 58 58
Lines 3431 3459 +28
===========================================
Hits 1222 1222
- Misses 2209 2237 +28
Continue to review full report at Codecov.
|
elif ( | ||
self.segmentation == "pyannote/segmentation" | ||
and self.embedding == "speechbrain/spkrec-ecapa-voxceleb" | ||
and self.klustering == "SpectralClustering" | ||
and not self.expects_num_speakers | ||
): | ||
# SpectralClustering has not been optimized. | ||
return { | ||
"onset": 0.810, | ||
"offset": 0.481, | ||
"min_duration_on": 0.055, | ||
"min_duration_off": 0.098, | ||
"min_activity": 6.073, | ||
"stitch_threshold": 0.040, | ||
"clustering": { | ||
"laplacian": "GraphCut", | ||
"eigengap": "Ratio", | ||
"spectral_min_embeddings": 5, | ||
"gaussian_blur_sigma": 1, | ||
"p_percentile": 0.95, | ||
"refinement_sequence": ["GaussianBlur", "RowWiseThreshold", "Symmetrize"], | ||
"symmetrize_type": "Average", | ||
"thresholding_with_binarization": False, | ||
"thresholding_preserve_diagonal": False, | ||
"thresholding_type": "RowMax", | ||
"use_autotune": True, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until I/you/we/anyone actually optimize this, I'd rather not provide default parameters for this combination.
elif ( | |
self.segmentation == "pyannote/segmentation" | |
and self.embedding == "speechbrain/spkrec-ecapa-voxceleb" | |
and self.klustering == "SpectralClustering" | |
and not self.expects_num_speakers | |
): | |
# SpectralClustering has not been optimized. | |
return { | |
"onset": 0.810, | |
"offset": 0.481, | |
"min_duration_on": 0.055, | |
"min_duration_off": 0.098, | |
"min_activity": 6.073, | |
"stitch_threshold": 0.040, | |
"clustering": { | |
"laplacian": "GraphCut", | |
"eigengap": "Ratio", | |
"spectral_min_embeddings": 5, | |
"gaussian_blur_sigma": 1, | |
"p_percentile": 0.95, | |
"refinement_sequence": ["GaussianBlur", "RowWiseThreshold", "Symmetrize"], | |
"symmetrize_type": "Average", | |
"thresholding_with_binarization": False, | |
"thresholding_preserve_diagonal": False, | |
"thresholding_type": "RowMax", | |
"use_autotune": True, | |
}, | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I have removed this.
self.spectral_min_embeddings = Uniform(1, 10000) | ||
|
||
# Hyperparameters for refinement operations. | ||
self.refinement_sequence = Parameter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will most likely break at some point because pyannote.pipeline.Optimizer
does not know how to handle this.
Two (imperfect) options I can think of are:
- hardcode the refinement sequence
- make it a
Categorical
hyper-parameterCategorical(["option1", "option2", ...])
and later use something like:
if self.refinement_sequence == "option1":
refinement_sequence = ["GaussianBlur", "RowWiseThreshold", "Symmetrize"]
elif self.refinement_seuqnece == "option2":
...
elif ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the second option.
I made it something more readable:
Categorical(
["O", "G", "TS", "GTS", "TSD", "GTSD", "TSDN", "GTSDN",
"TSN", "GTSN", "CTSDN", "CGTSDN"])
|
||
# Hyperparameters for refinement operations. | ||
self.refinement_sequence = Parameter() | ||
self.gaussian_blur_sigma = Uniform(0, 10000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to make it LogUniform
?
(I have no intuition whatsoever about the effect of the scale of this hyper-parameter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to self.gaussian_blur_sigma = Categorical([0, 1, 2, 3])
At the same time, I changed spectral_min_embeddings to LogUniform:
self.spectral_min_embeddings = LogUniform(1, 100)
…l, remove default_parameters, and use LogUniform as appropriate
self.refinement_sequence = Categorical( | ||
["O", "G", "TS", "GTS", "TSD", "GTSD", "TSDN", "GTSDN", | ||
"TSN", "GTSN", "CTSDN", "CGTSDN"]) | ||
self.gaussian_blur_sigma = Categorical([0, 1, 2, 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you choose to switch to integer values? Why not stick with Uniform(0, 3)? Does it make sense to make it even larger?
What does 0 mean in this case? No blur?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you choose to switch to integer values?
In my previous experiments we only tried 0/1/2/3. Never tried float values or even larger values. But sigma could actually be float. Changed back to Uniform.
What does 0 mean in this case? No blur?
Right it would be no blur. It's just calling scipy.ndimage.gaussian_filter
.
@@ -200,6 +232,19 @@ def __init__(self, metric: str = "cosine", expects_num_clusters: bool = False): | |||
["Affinity", "Unnormalized", "RandomWalk", "GraphCut"] | |||
) | |||
self.eigengap = Categorical(["Ratio", "NormalizedDiff"]) | |||
self.spectral_min_embeddings = LogUniform(1, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that spectral_min_embeddings
is supposed to be an integer, right?
LogUniform
makes it a float
and artificially increases the dimension of the search space. How critical is this hyperparameter? Can we not make it constant to, say, 5?
I'd rather avoid adding almost useless hyperparameters to the search space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Changed to Categorical([5, 10])
Thanks 🎉 |
Example config: