Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more customizable hyperparameters for SpectralClustering #995

Merged
merged 10 commits into from
Jun 9, 2022
Merged

Add more customizable hyperparameters for SpectralClustering #995

merged 10 commits into from
Jun 9, 2022

Conversation

wq2012
Copy link
Contributor

@wq2012 wq2012 commented May 26, 2022

Example config:

pipeline:
  name: pyannote.audio.pipelines.SpeakerDiarization
  params:
    segmentation: pyannote/segmentation
    embedding: speechbrain/spkrec-ecapa-voxceleb
    clustering: SpectralClustering

params:
  clustering:
    laplacian: GraphCut
    eigengap: Ratio
    gaussian_blur_sigma: 1
    p_percentile: 0.95
    refinement_sequence: ["GaussianBlur", "RowWiseThreshold", "Symmetrize"]
    symmetrize_type: Average
    thresholding_with_binarization: False
    thresholding_preserve_diagonal: False
    thresholding_type: RowMax
    use_autotune: True
  min_activity: 6.073193238899291
  min_duration_off: 0.09791355693027545
  min_duration_on: 0.05537587440407595
  offset: 0.4806866463041527
  onset: 0.8104268538848918
  stitch_threshold: 0.04033955907446252
  

@hbredin
Copy link
Member

hbredin commented May 31, 2022

Quick update on the performance (DER %) of this proposed spectral clustering (SC) pipeline compared to the default pyannote/speaker-diarization pipeline based on hierarchical agglomerative clustering (HAC):

Dataset HAC SC
AMI 21.5 21.4
DIHARD 22.2 27.3

Overall, it is significantly worse on DIHARD (currently evaluating on VoxConverse).

@wq2012
Copy link
Contributor Author

wq2012 commented May 31, 2022

Overall, it is significantly worse on DIHARD

That's interesting, but I wouldn't be too surprised by that, since the optimal spectral clustering hyper-params are usually highly dependent on other modules in the diarization system, such as how the speaker embeddings are trained, and how the speaker segmentation is implemented.

Some general suggestions that might be interesting to be explored:

  1. gaussian_blur_sigma: This really depends on how "dense" the embeddings are. If they are very dense, maybe a larger sigma would be better (but I never tried sigma>3). If the embeddings are extracted from speaker turns, then usually we don't use Gaussian blur at all.
  2. p_percentile: this was previously the most important and most sensitive hyper-param. But now we have auto-tune, so as long as use_autotune=true, this is no longer important.
  3. thresholding_type: this is very important and needs to be tuned.
  4. thresholding_with_binarization and thresholding_preserve_diagonal: worth tuning but not that critical.
  5. default_autotune: currently I hardcoded it in the CL instead of making it a hyperparams. We could make p_percentile_max slightly larger and init_search_step slightly smaller to search for more steps (while sacrificing the efficiency).

Also, spectral clustering is known to work very badly when the sequence of embeddings is very short. So I added fallback_options in wq2012/SpectralCluster@d431505

We could try to use something like FallbackOptions.spectral_min_embeddings=5.

@hbredin
Copy link
Member

hbredin commented Jun 1, 2022

Thanks for your feedback. Will try to optimize a bunch of those hyperparameters... but cannot promise any ETA.

@wq2012
Copy link
Contributor Author

wq2012 commented Jun 1, 2022

Thanks. Just curious, do you have a script to do the data download, extraction, and parsing? (such that for audio, reference in evaluation_set: can be called)

If so, I can give it a try as well.

@hbredin
Copy link
Member

hbredin commented Jun 1, 2022

I have something like this for the AMI dataset.
https://github.com/pyannote/AMI-diarization-setup/tree/main/pyannote

@wq2012
Copy link
Contributor Author

wq2012 commented Jun 4, 2022

Shall we merge this PR regardless of the results?

We can always update default_parameters later when we have better results. But currently spectral clustering is added but not fully configurable from the YAML file.

@codecov
Copy link

codecov bot commented Jun 7, 2022

Codecov Report

Merging #995 (0e021b5) into develop (aede20e) will decrease coverage by 0.28%.
The diff coverage is 0.00%.

@@             Coverage Diff             @@
##           develop     #995      +/-   ##
===========================================
- Coverage    35.61%   35.32%   -0.29%     
===========================================
  Files           58       58              
  Lines         3431     3459      +28     
===========================================
  Hits          1222     1222              
- Misses        2209     2237      +28     
Impacted Files Coverage Δ
pyannote/audio/pipelines/clustering.py 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aede20e...0e021b5. Read the comment docs.

Comment on lines 173 to 200
elif (
self.segmentation == "pyannote/segmentation"
and self.embedding == "speechbrain/spkrec-ecapa-voxceleb"
and self.klustering == "SpectralClustering"
and not self.expects_num_speakers
):
# SpectralClustering has not been optimized.
return {
"onset": 0.810,
"offset": 0.481,
"min_duration_on": 0.055,
"min_duration_off": 0.098,
"min_activity": 6.073,
"stitch_threshold": 0.040,
"clustering": {
"laplacian": "GraphCut",
"eigengap": "Ratio",
"spectral_min_embeddings": 5,
"gaussian_blur_sigma": 1,
"p_percentile": 0.95,
"refinement_sequence": ["GaussianBlur", "RowWiseThreshold", "Symmetrize"],
"symmetrize_type": "Average",
"thresholding_with_binarization": False,
"thresholding_preserve_diagonal": False,
"thresholding_type": "RowMax",
"use_autotune": True,
},
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until I/you/we/anyone actually optimize this, I'd rather not provide default parameters for this combination.

Suggested change
elif (
self.segmentation == "pyannote/segmentation"
and self.embedding == "speechbrain/spkrec-ecapa-voxceleb"
and self.klustering == "SpectralClustering"
and not self.expects_num_speakers
):
# SpectralClustering has not been optimized.
return {
"onset": 0.810,
"offset": 0.481,
"min_duration_on": 0.055,
"min_duration_off": 0.098,
"min_activity": 6.073,
"stitch_threshold": 0.040,
"clustering": {
"laplacian": "GraphCut",
"eigengap": "Ratio",
"spectral_min_embeddings": 5,
"gaussian_blur_sigma": 1,
"p_percentile": 0.95,
"refinement_sequence": ["GaussianBlur", "RowWiseThreshold", "Symmetrize"],
"symmetrize_type": "Average",
"thresholding_with_binarization": False,
"thresholding_preserve_diagonal": False,
"thresholding_type": "RowMax",
"use_autotune": True,
},
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I have removed this.

self.spectral_min_embeddings = Uniform(1, 10000)

# Hyperparameters for refinement operations.
self.refinement_sequence = Parameter()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will most likely break at some point because pyannote.pipeline.Optimizer does not know how to handle this.

Two (imperfect) options I can think of are:

  • hardcode the refinement sequence
  • make it a Categorical hyper-parameter Categorical(["option1", "option2", ...]) and later use something like:
if self.refinement_sequence == "option1":
    refinement_sequence = ["GaussianBlur", "RowWiseThreshold", "Symmetrize"]
elif self.refinement_seuqnece == "option2":
    ...
elif ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the second option.

I made it something more readable:

Categorical(
            ["O", "G", "TS", "GTS", "TSD", "GTSD", "TSDN", "GTSDN",
             "TSN", "GTSN", "CTSDN", "CGTSDN"])


# Hyperparameters for refinement operations.
self.refinement_sequence = Parameter()
self.gaussian_blur_sigma = Uniform(0, 10000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to make it LogUniform?
(I have no intuition whatsoever about the effect of the scale of this hyper-parameter)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to self.gaussian_blur_sigma = Categorical([0, 1, 2, 3])

At the same time, I changed spectral_min_embeddings to LogUniform:

self.spectral_min_embeddings = LogUniform(1, 100)

…l, remove default_parameters, and use LogUniform as appropriate
self.refinement_sequence = Categorical(
["O", "G", "TS", "GTS", "TSD", "GTSD", "TSDN", "GTSDN",
"TSN", "GTSN", "CTSDN", "CGTSDN"])
self.gaussian_blur_sigma = Categorical([0, 1, 2, 3])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you choose to switch to integer values? Why not stick with Uniform(0, 3)? Does it make sense to make it even larger?

What does 0 mean in this case? No blur?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you choose to switch to integer values?

In my previous experiments we only tried 0/1/2/3. Never tried float values or even larger values. But sigma could actually be float. Changed back to Uniform.

What does 0 mean in this case? No blur?

Right it would be no blur. It's just calling scipy.ndimage.gaussian_filter.

@@ -200,6 +232,19 @@ def __init__(self, metric: str = "cosine", expects_num_clusters: bool = False):
["Affinity", "Unnormalized", "RandomWalk", "GraphCut"]
)
self.eigengap = Categorical(["Ratio", "NormalizedDiff"])
self.spectral_min_embeddings = LogUniform(1, 100)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that spectral_min_embeddings is supposed to be an integer, right?
LogUniform makes it a float and artificially increases the dimension of the search space. How critical is this hyperparameter? Can we not make it constant to, say, 5?
I'd rather avoid adding almost useless hyperparameters to the search space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Changed to Categorical([5, 10])

pyannote/audio/pipelines/clustering.py Show resolved Hide resolved
@hbredin hbredin merged commit 1c0f1a9 into pyannote:develop Jun 9, 2022
@hbredin
Copy link
Member

hbredin commented Jun 9, 2022

Thanks 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants