You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can use any dataset that satisfies the definition of supervised clustering, meaning you can extract sequences of features, and associate those features with ground truth labels. Features can be speaker embeddings, face embeddings, etc.
Example datasets include NIST SRE 2000 CALLHOME for speaker diarization. But for any dataset, you need to process them yourself to extract features and align the features with labels. This library only provides the API for the clustering part.
More details are in the README.md file and the paper on arXiv.
Which dataset should I use for training network?
The text was updated successfully, but these errors were encountered: