Skip to content

Can't predict timestamp, and speaker diarization relies on timestamps. #2121

@TaiYouWeb

Description

@TaiYouWeb
model = AutoModel(
    model="FunAudioLLM/SenseVoiceSmall",
    vad_model="fsmn-vad",
    punc_model="ct-punc", 
    spk_model="cam++",
    vad_kwargs={"max_single_segment_time": 15000},
    batch_size=1,
    hub="hf",
    device=device,
)

console error =>

ERROR:root:Only 'iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch'
                    and 'iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch'
                    can predict timestamp, and speaker diarization relies on timestamps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions