feat(spk): optionally return per-speaker embedding centroids by phoenixray2000 · Pull Request #1 · phoenixray2000/FunASR

phoenixray2000 · 2026-06-06T18:35:24Z

What

Opt-in return_spk_center flag for AutoModel.generate (speaker diarization path). When enabled, the result includes spk_embedding_center: a [num_speakers, embedding_dim] array of per-speaker centroid embeddings, indexed by the spk ids in sentence_info.

Why

postprocess() already computes these centroids (mean of clustered chunk embeddings) for diarization but discards them. Surfacing them lets downstream speaker-voiceprint / identity workflows reuse the embeddings without a second extraction pass.

Compatibility

Opt-in, default off. postprocess() return shape unchanged unless return_spk_center=True; existing callers (auto_model, auto_frontend) unaffected.

Verification

Local run (paraformer-zh + ERes2NetV2, punc_segment): a 2-speaker clip returns spk_embedding_center shape (2, 192), matching the 2 speakers in sentence_info; cross-speaker cosine 0.34 (distinct).

🤖 Generated with Claude Code

Add a return_spk_center option so AutoModel.generate surfaces the per-speaker centroid embeddings (mean of clustered chunk embeddings) that diarization already computes in postprocess() but currently discards. Lets downstream speaker voiceprint / identity reuse them without re-embedding. Backward compatible: default off; postprocess return shape is unchanged unless return_spk_center=True. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6fa3a8ad0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-06T18:38:00Z

+    if return_spk_center:
+        # spk_embs[i] is the centroid (mean of clustered chunk embeddings) for
+        # corrected speaker label i, aligned with the `spk` ids in sentence_info.
+        return distribute_res, spk_embs


Recompute centroids after smoothing speaker labels

For recordings containing diarization regions shorter than smooth()'s 0.7s threshold, smooth() can reassign those regions to neighboring speakers, but spk_embs was already computed from the pre-smoothed labels. Returning it here means spk_embedding_center can include speakers that no longer appear in sentence_info, and the remaining speakers' centroids exclude embeddings that were assigned to them in the final diarization output, so downstream voiceprint matching uses centroids that do not match the returned spk IDs.

Useful? React with 👍 / 👎.

phoenixray2000 · 2026-06-06T18:39:31Z

改为向上游 modelscope/FunASR 提交:modelscope#2967

chatgpt-codex-connector Bot reviewed Jun 6, 2026

View reviewed changes

phoenixray2000 closed this Jun 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(spk): optionally return per-speaker embedding centroids#1

feat(spk): optionally return per-speaker embedding centroids#1
phoenixray2000 wants to merge 1 commit into
mainfrom
feat/spk-embedding-center

phoenixray2000 commented Jun 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Uh oh!

phoenixray2000 commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

phoenixray2000 commented Jun 6, 2026

What

Why

Compatibility

Verification

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

phoenixray2000 commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant