Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the settings in speech_data_simulator #9288

Closed
sappho192 opened this issue May 23, 2024 · 8 comments
Closed

Question about the settings in speech_data_simulator #9288

sappho192 opened this issue May 23, 2024 · 8 comments
Assignees
Labels

Comments

@sappho192
Copy link

sappho192 commented May 23, 2024

Hi, I'm currently using NeMo/tools/speech_data_simulator to fine-tune the MSDD model and have some questions about the data_simulator.

1. How can I ensure that every session has exactly as many speakers as num_speakers?

Currently in my case, sessions are occasionally created that contain fewer speakers than num_speakers.
This seemed to become more frequent as num_speakers became larger than 4.
For example, I've created 32 sessions with num_speakers as 4, but 9 sessions include only 3 speakers.

I used a custom dataset as an input to this simulator, and the total number of speakers in the dataset was around 50.
The minimum number of utterances from speakers was 300, and the average length of an utterance was about 5 seconds.

As far as I've looked up, the following parameters are related with the above question:

session_config:
num_speakers: 4 # Number of unique speakers per multispeaker audio session

speaker_enforcement:
enforce_num_speakers: true # Enforce that all requested speakers are present in the output wav file
enforce_time: # Percentage of the way through the audio session that enforcement mode is triggered (sampled between time 1 and 2)
- 0.25
- 0.75

dominance_var: 0.11 # Variance in speaker dominance (where each speaker's dominance is sampled from a normal distribution centered on 1/`num_speakers`, and then the dominance values are together normalized to 1)
min_dominance: 0.05 # Minimum percentage of speaking time per speaker (note that this can cause the dominance of the other speakers to be slightly reduced)
turn_prob: 0.875 # Probability of switching speakers after each utterance
min_turn_prob: 0.5 # Minimum turn probability when enforce mode is True to prevent from making excessive session length

I tried tweaking the settings to fix this, but nothing worked.

My current setup is as follows:

config.data_simulator.session_config.num_speakers = # This setting varies from 2 to 6
config.data_simulator.session_config.session_length = # This setting varies from 10min to 40min
config.data_simulator.session_params.min_dominance = 1 / (num_speakers + 1)
config.data_simulator.session_params.mean_silence = 0.08
config.data_simulator.session_params.turn_prob=0.875
config.data_simulator.session_params.min_turn_prob=0.875
config.data_simulator.speaker_enforcement.enforce_num_speakers = True
config.data_simulator.speaker_enforcement.enforce_time = {0: 1.0, 1: 1.0} # I've tried {0: 0.75, 1: 1.0}, {0: 0.99, 1: 1.0}, too

2. Why the default value of sentence_length_params is not an integer?

According to the comments, the value of sentence_length_params must be a positive integer but the value is set to 0.4.
The session itself creates fine with this setting, but I'd like to ask why this is the default.

sentence_length_params: # k,p values for a negative_binomial distribution which is sampled to get the sentence length (in number of words)
- 0.4 # k (Number of successes until the experiment is stopped) value must be a positive integer.
- 0.05 # p (Success probability) must be in the range (0, 1]. The average sentence length will be k*(1-p)/p

Thank you in advance.

@sappho192
Copy link
Author

Just in case, I've been using the latest version of NeMo with:

apt-get update && apt-get install -y libsndfile1 ffmpeg
git clone https://github.com/NVIDIA/NeMo
cd NeMo
./reinstall.sh

@anteju
Copy link
Collaborator

anteju commented May 23, 2024

@tango4j, could you check the above issue with num_speakers and sentence_length_params?

@stevehuang52
Copy link
Collaborator

  1. How can I ensure that every session has exactly as many speakers as num_speakers?

We need a little more time to figure out why enforce_num_speakers: true is not working as expected for 10mins sessions and more than 4 speakers. @tango4j we have a primitive fix in mind but need to further test it

  1. Why the default value of sentence_length_params is not an integer?

You're right that the k in sentence_length_params should usually be an integer. We use a default 0.4 in order to match the segment length distribution in AMI dataset, but you can set it to other integer values, and that would generally increase the lengths of segments

@sappho192
Copy link
Author

We need a little more time to figure out why enforce_num_speakers: true is not working as expected for 10mins sessions and more than 4 speakers.

@stevehuang52 Thank you for figuring out this issue. You can check these dataset I used and the simulated meetings I've generated in case it helps:
[dataset(4.0GB)] [alignments in simple,condensed format(2MB)] [generated sim_meet over 2~6 speakers(4.3GB)]

You're right that the k in sentence_length_params should usually be an integer. We use a default 0.4 in order to match the segment length distribution in AMI dataset, but you can set it to other integer values, and that would generally increase the lengths of segments

Thanks a lot. Then I'll set it to an integer in my case.

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Jun 30, 2024
@sappho192
Copy link
Author

(Just a bump)

@github-actions github-actions bot removed the stale label Jul 1, 2024
Copy link
Contributor

github-actions bot commented Aug 1, 2024

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Aug 1, 2024
Copy link
Contributor

github-actions bot commented Aug 9, 2024

This issue was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants