Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset #21

Closed
DenisSouth opened this issue Jan 1, 2019 · 1 comment
Closed

Dataset #21

DenisSouth opened this issue Jan 1, 2019 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@DenisSouth
Copy link

Which dataset should I use for training network?

@wq2012 wq2012 self-assigned this Jan 1, 2019
@wq2012 wq2012 added the question Further information is requested label Jan 1, 2019
@wq2012
Copy link
Member

wq2012 commented Jan 1, 2019

It depends on what you want to work on.

You can use any dataset that satisfies the definition of supervised clustering, meaning you can extract sequences of features, and associate those features with ground truth labels. Features can be speaker embeddings, face embeddings, etc.

Example datasets include NIST SRE 2000 CALLHOME for speaker diarization. But for any dataset, you need to process them yourself to extract features and align the features with labels. This library only provides the API for the clustering part.

More details are in the README.md file and the paper on arXiv.

@wq2012 wq2012 closed this as completed Jan 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants