feat(spk): optionally return per-speaker embedding centroids#1
feat(spk): optionally return per-speaker embedding centroids#1phoenixray2000 wants to merge 1 commit into
Conversation
Add a return_spk_center option so AutoModel.generate surfaces the per-speaker centroid embeddings (mean of clustered chunk embeddings) that diarization already computes in postprocess() but currently discards. Lets downstream speaker voiceprint / identity reuse them without re-embedding. Backward compatible: default off; postprocess return shape is unchanged unless return_spk_center=True. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a6fa3a8ad0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if return_spk_center: | ||
| # spk_embs[i] is the centroid (mean of clustered chunk embeddings) for | ||
| # corrected speaker label i, aligned with the `spk` ids in sentence_info. | ||
| return distribute_res, spk_embs |
There was a problem hiding this comment.
Recompute centroids after smoothing speaker labels
For recordings containing diarization regions shorter than smooth()'s 0.7s threshold, smooth() can reassign those regions to neighboring speakers, but spk_embs was already computed from the pre-smoothed labels. Returning it here means spk_embedding_center can include speakers that no longer appear in sentence_info, and the remaining speakers' centroids exclude embeddings that were assigned to them in the final diarization output, so downstream voiceprint matching uses centroids that do not match the returned spk IDs.
Useful? React with 👍 / 👎.
|
改为向上游 modelscope/FunASR 提交:modelscope#2967 |
What
Opt-in
return_spk_centerflag forAutoModel.generate(speaker diarization path). When enabled, the result includesspk_embedding_center: a[num_speakers, embedding_dim]array of per-speaker centroid embeddings, indexed by thespkids insentence_info.Why
postprocess()already computes these centroids (mean of clustered chunk embeddings) for diarization but discards them. Surfacing them lets downstream speaker-voiceprint / identity workflows reuse the embeddings without a second extraction pass.Compatibility
Opt-in, default off.
postprocess()return shape unchanged unlessreturn_spk_center=True; existing callers (auto_model,auto_frontend) unaffected.Verification
Local run (paraformer-zh + ERes2NetV2, punc_segment): a 2-speaker clip returns
spk_embedding_centershape(2, 192), matching the 2 speakers insentence_info; cross-speaker cosine 0.34 (distinct).🤖 Generated with Claude Code