Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用speaker diarization结合视频的DER结果效果比单音频的还要差,请问这可以微调嘛? #90

Closed
Coconut059 opened this issue Apr 12, 2024 · 3 comments

Comments

@Coconut059
Copy link

在MISP2021数据集上使用speaker diarization,使用cam++模型,audio_only:MISS=23;FA=2.56;SER=9;DER=35;;audio_visual:MISS=23,FA=2.56;SER=15;DER=40;
在eval数据上DER差距更大分别是36%和48%,请问clustering部分可以微调嘛

@wanghuii1
Copy link
Collaborator

可以调,但是当前pipeline无法处理overlap的问题,而MISP有大量的overlap,如果想在MISP数据集上做出好的结果,建议follow历届MISP的report,使用多模态的TASVD方案

@Coconut059
Copy link
Author

可以调,但是当前pipeline无法处理overlap的问题,而MISP有大量的overlap,如果想在MISP数据集上做出好的结果,建议follow历届MISP的report,使用多模态的TASVD方案
谢谢!想问一下该代码效果比较好的数据集有哪些?同时如果可以调整audio和visual的联合聚类的话要怎么调呢

@wanghuii1
Copy link
Collaborator

我们后续会开源一个overlap较少的音视频数据集。调参可以试着调整下conf/diar_video.yaml中的vision_cluster.fix_cos_thr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants