diart.stream microphone detects only 2 speakers #133

kaleaniket · 2023-03-23T15:52:05Z

Hello,

I am using diart.stream microphone from command line for inference but it is not detecting for more than 2 speakers even if there are.

For ex. if I play the recording of 3 people speaking (1 female and 2 male) then, It considers 2 male speakers as 1 speaker. and If I'm playing the recording of 2 male speakers or 2 female then it is working fine.

I've explored files from https://github.com/juanmc2005/StreamingSpeakerDiarization/tree/main/src/diart/blocks to see if there is anything metioned related to num_speakers and in most of the files found out about the max_speakers = 20.

Do I have to make changes to any part of the code for more number of speakers?

juanmc2005 · 2023-03-24T10:32:20Z

Hi @kaleaniket,

This has been discussed briefly in issue #4:

New speaker detection is affected by the hyper-parameter delta, which is a threshold on the cosine distance between a speaker's embedding and its closest centroid. A distance lower than delta assigns the speaker to that centroid (reidentification of a known speaker), whereas a distance higher than delta assigns the speaker to a new centroid (new speaker detection).

It may be possible that the delta value you're using is not adapted to your recordings. If you find that new speakers are not being detected, my first suggestion would be to lower delta.

If you're using diart.stream you can change it with --delta=1.0, and if you're in python you can set in PipelineConfig:

from diart import PipelineConfig, OnlineSpeakerDiarization

config = PipelineConfig(delta_new=1.0)
diarization = OnlineSpeakerDiarization(config)

Note that there's a tradeoff here between recognizing too few or too many speakers.
Currently, this threshold strategy is a bit too simple, it is a key area of improvement on which I'm currently working.

someonewating · 2023-04-20T13:40:28Z

Hi @kaleaniket,

This has been discussed briefly in issue #4:

New speaker detection is affected by the hyper-parameter , which is a threshold on the cosine distance between a speaker's embedding and its closest centroid. A distance lower than assigns the speaker to that centroid (reidentification of a known speaker), whereas a distance higher than assigns the speaker to a new centroid (new speaker detection).delta``delta``delta

It may be possible that the value you're using is not adapted to your recordings. If you find that new speakers are not being detected, my first suggestion would be to lower .delta``delta

If you're using you can change it with , and if you're in python you can set in :diart.stream``--delta=1.0``PipelineConfig
from diart import PipelineConfig, OnlineSpeakerDiarization

config = PipelineConfig(delta_new=1.0)
diarization = OnlineSpeakerDiarization(config)
Note that there's a tradeoff here between recognizing too few or too many speakers. Currently, this threshold strategy is a bit too simple, it is a key area of improvement on which I'm currently working.

Hi there. Thank you for your solution. However, I tried to set delta_new=0.1, delta_new=0.5, and delta_new=0.01, but the result didn't change. Would you mind to letting me know other suggestions?

juanmc2005 · 2023-04-20T14:53:50Z

Hi @someonewating,

Could you post your code and results? Also, can you provide more information about your audio file? Like duration, number of speakers, expected RTTM, etc?

someonewating · 2023-04-20T15:07:21Z

Hi @someonewating,

Could you post your code and results? Also, can you provide more information about your audio file? Like duration, number of speakers, expected RTTM, etc?

Hi @juanmc2005,

Thank you for your reply. I checked my code again since I don't want to ask a stupid question here, and right now I successfully fixed the problem. This is because of my incorrect code. Right now everything works well. Thank you.😀

someonewating · 2023-04-20T15:12:51Z

Hi @someonewating,

Could you post your code and results? Also, can you provide more information about your audio file? Like duration, number of speakers, expected RTTM, etc?

And by the way, would you mind letting me know what does the beta, gamma, rho_update, and tau_active means in PipelineConfig file? Thank you.

juanmc2005 · 2023-04-20T15:31:43Z

Glad it's working!

And by the way, would you mind letting me know what does the beta, gamma, rho_update, and tau_active means in PipelineConfig file? Thank you.

Sure, I actually wrote an intuitive explanation of tau_active, rho_update and delta_new in this post (see "Creating the speaker diarization module").

They essentially regulate the sensitivity of speaker recognition:

tau_active=0.5: Only recognize speakers whose probability of speech is higher than 50%.
rho_update=0.1: Diart automatically gathers information from speakers to improve itself. Here we only use speech longer than 100ms per speaker for self-improvement.
delta_new=0.57: This is an internal threshold between 0 and 2 that regulates new speaker detection. The lower the value, the more sensitive the system will be to differences in voices.

On the other hand, beta and gamma regulate overlap-aware speaker embedding extraction (see Equation 2 in the paper).

The higher the value of gamma, the more the embedding model ignores audio regions where the segmentation model is not confident.
beta acts as a temperature parameter on per-frame speaker probabilities to determine the predominant speaker (with softmax)

I know this sounds a bit obscure but I think it's better explained with the figures in the paper.

kaleaniket changed the title ~~diart.stream mictophone detects only 2 speakers~~ diart.stream microphone detects only 2 speakers Mar 24, 2023

juanmc2005 added the question Further information is requested label Mar 24, 2023

juanmc2005 closed this as completed Nov 9, 2023

thaokimctu mentioned this issue Dec 25, 2023

diart vs whisperx diarization accuracy #226

Closed

juanmc2005 mentioned this issue Feb 2, 2024

quality concerns #229

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diart.stream microphone detects only 2 speakers #133

diart.stream microphone detects only 2 speakers #133

kaleaniket commented Mar 23, 2023

juanmc2005 commented Mar 24, 2023

someonewating commented Apr 20, 2023 •

edited

Loading

juanmc2005 commented Apr 20, 2023

someonewating commented Apr 20, 2023

someonewating commented Apr 20, 2023

juanmc2005 commented Apr 20, 2023

diart.stream microphone detects only 2 speakers #133

diart.stream microphone detects only 2 speakers #133

Comments

kaleaniket commented Mar 23, 2023

juanmc2005 commented Mar 24, 2023

someonewating commented Apr 20, 2023 • edited Loading

juanmc2005 commented Apr 20, 2023

someonewating commented Apr 20, 2023

someonewating commented Apr 20, 2023

juanmc2005 commented Apr 20, 2023

someonewating commented Apr 20, 2023 •

edited

Loading