Working with the new VAD Feature #68

bballboy8 · 2023-02-06T03:04:33Z

I'm currently trying to work with the new VAD feature but I'm getting the following error:

TypeError: transcribe_with_vad() missing 1 required positional argument: 'vad_pipeline'

Is there sample code anywhere for transcribing with vad?

The text was updated successfully, but these errors were encountered:

kanjieater · 2023-02-19T17:57:42Z

I'm also hoping for some example code for this feature.

Barabazs · 2023-02-19T19:02:16Z

You'll need this function

whisperX/whisperx/transcribe.py

Line 303 in f7093e6

def transcribe_with_vad(

And you can see how to use it in the cli function.
Most of it here:

whisperX/whisperx/transcribe.py

Lines 643 to 691 in f7093e6

    
           vad_pipeline = None 
        
           if vad_filter: 
        
               if hf_token is None: 
        
                   print("Warning, no huggingface token used, needs to be saved in environment variable, otherwise will throw error loading VAD model...") 
        
               from pyannote.audio import Inference 
        
               vad_pipeline = Inference("pyannote/segmentation", 
        
                                   pre_aggregation_hook=lambda segmentation: segmentation, 
        
                                   use_auth_token=hf_token) 
        
           diarize_pipeline = None 
        
           if diarize: 
        
               if hf_token is None: 
        
                   print("Warning, no --hf_token used, needs to be saved in environment variable, otherwise will throw error loading diarization model...") 
        
               from pyannote.audio import Pipeline 
        
               diarize_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2.1", 
        
                                           use_auth_token=hf_token) 
        
           os.makedirs(output_dir, exist_ok=True) 
        
           if model_name.endswith(".en") and args["language"] not in {"en", "English"}: 
        
               if args["language"] is not None: 
        
                   warnings.warn(f'{model_name} is an English-only model but receipted "{args["language"]}"; using English instead.') 
        
               args["language"] = "en" 
        
           temperature = args.pop("temperature") 
        
           temperature_increment_on_fallback = args.pop("temperature_increment_on_fallback") 
        
           if temperature_increment_on_fallback is not None: 
        
               temperature = tuple(np.arange(temperature, 1.0 + 1e-6, temperature_increment_on_fallback)) 
        
           else: 
        
               temperature = [temperature] 
        
           threads = args.pop("threads") 
        
           if threads > 0: 
        
               torch.set_num_threads(threads) 
        
           from . import load_model 
        
           model = load_model(model_name, device=device, download_root=model_dir) 
        
           align_language = args["language"] if args["language"] is not None else "en" # default to loading english if not specified 
        
           align_model, align_metadata = load_align_model(align_language, device, model_name=align_model) 
        
           for audio_path in args.pop("audio"): 
        
               if vad_filter: 
        
                   if parallel_bs > 1: 
        
                       print("Performing VAD and parallel transcribing ...") 
        
                       result = transcribe_with_vad_parallel(model, audio_path, vad_pipeline, temperature=temperature, batch_size=parallel_bs, **args) 
        
                   else: 
        
                       print("Performing VAD...") 
        
                       result = transcribe_with_vad(model, audio_path, vad_pipeline, temperature=temperature, **args)

kanjieater · 2023-02-19T19:17:31Z

You'll need this function

whisperX/whisperx/transcribe.py

Line 303 in f7093e6

def transcribe_with_vad(

And you can see how to use it in the cli function.
Most of it here:

whisperX/whisperx/transcribe.py

Lines 643 to 691 in f7093e6

vad_pipeline = None

if vad_filter:

if hf_token is None:

print("Warning, no huggingface token used, needs to be saved in environment variable, otherwise will throw error loading VAD model...")

from pyannote.audio import Inference

vad_pipeline = Inference("pyannote/segmentation",

pre_aggregation_hook=lambda segmentation: segmentation,

use_auth_token=hf_token)

diarize_pipeline = None

if diarize:

if hf_token is None:

print("Warning, no --hf_token used, needs to be saved in environment variable, otherwise will throw error loading diarization model...")

from pyannote.audio import Pipeline

diarize_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2.1",

use_auth_token=hf_token)

os.makedirs(output_dir, exist_ok=True)

if model_name.endswith(".en") and args["language"] not in {"en", "English"}:

if args["language"] is not None:

warnings.warn(f'{model_name} is an English-only model but receipted "{args["language"]}"; using English instead.')

args["language"] = "en"

temperature = args.pop("temperature")

temperature_increment_on_fallback = args.pop("temperature_increment_on_fallback")

if temperature_increment_on_fallback is not None:

temperature = tuple(np.arange(temperature, 1.0 + 1e-6, temperature_increment_on_fallback))

else:

temperature = [temperature]

threads = args.pop("threads")

if threads > 0:

torch.set_num_threads(threads)

from . import load_model

model = load_model(model_name, device=device, download_root=model_dir)

align_language = args["language"] if args["language"] is not None else "en" # default to loading english if not specified

align_model, align_metadata = load_align_model(align_language, device, model_name=align_model)

for audio_path in args.pop("audio"):

if vad_filter:

if parallel_bs > 1:

print("Performing VAD and parallel transcribing ...")

result = transcribe_with_vad_parallel(model, audio_path, vad_pipeline, temperature=temperature, batch_size=parallel_bs, **args)

else:

print("Performing VAD...")

result = transcribe_with_vad(model, audio_path, vad_pipeline, temperature=temperature, **args)

Thanks for the example.

I think I'll hold off in importing it into some other scripts as of right now, as it seems like you would have to pull in quite a few other pieces from the cli snippet to get that transcribe_with_vad working.

m-bain · 2023-04-04T19:56:43Z

turned on by default now

bballboy8 changed the title ~~Working the new VAD Feature~~ Working with the new VAD Feature Feb 6, 2023

m-bain closed this as completed Apr 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Working with the new VAD Feature #68

Working with the new VAD Feature #68

bballboy8 commented Feb 6, 2023

kanjieater commented Feb 19, 2023

Barabazs commented Feb 19, 2023 •

edited

kanjieater commented Feb 19, 2023

m-bain commented Apr 4, 2023

Working with the new VAD Feature #68

Working with the new VAD Feature #68

Comments

bballboy8 commented Feb 6, 2023

kanjieater commented Feb 19, 2023

Barabazs commented Feb 19, 2023 • edited

kanjieater commented Feb 19, 2023

m-bain commented Apr 4, 2023

Barabazs commented Feb 19, 2023 •

edited