Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working with the new VAD Feature #68

Closed
bballboy8 opened this issue Feb 6, 2023 · 4 comments
Closed

Working with the new VAD Feature #68

bballboy8 opened this issue Feb 6, 2023 · 4 comments

Comments

@bballboy8
Copy link

I'm currently trying to work with the new VAD feature but I'm getting the following error:

TypeError: transcribe_with_vad() missing 1 required positional argument: 'vad_pipeline'

Is there sample code anywhere for transcribing with vad?

@bballboy8 bballboy8 changed the title Working the new VAD Feature Working with the new VAD Feature Feb 6, 2023
@kanjieater
Copy link

I'm also hoping for some example code for this feature.

@Barabazs
Copy link
Contributor

Barabazs commented Feb 19, 2023

You'll need this function

def transcribe_with_vad(

And you can see how to use it in the cli function.
Most of it here:
vad_pipeline = None
if vad_filter:
if hf_token is None:
print("Warning, no huggingface token used, needs to be saved in environment variable, otherwise will throw error loading VAD model...")
from pyannote.audio import Inference
vad_pipeline = Inference("pyannote/segmentation",
pre_aggregation_hook=lambda segmentation: segmentation,
use_auth_token=hf_token)
diarize_pipeline = None
if diarize:
if hf_token is None:
print("Warning, no --hf_token used, needs to be saved in environment variable, otherwise will throw error loading diarization model...")
from pyannote.audio import Pipeline
diarize_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2.1",
use_auth_token=hf_token)
os.makedirs(output_dir, exist_ok=True)
if model_name.endswith(".en") and args["language"] not in {"en", "English"}:
if args["language"] is not None:
warnings.warn(f'{model_name} is an English-only model but receipted "{args["language"]}"; using English instead.')
args["language"] = "en"
temperature = args.pop("temperature")
temperature_increment_on_fallback = args.pop("temperature_increment_on_fallback")
if temperature_increment_on_fallback is not None:
temperature = tuple(np.arange(temperature, 1.0 + 1e-6, temperature_increment_on_fallback))
else:
temperature = [temperature]
threads = args.pop("threads")
if threads > 0:
torch.set_num_threads(threads)
from . import load_model
model = load_model(model_name, device=device, download_root=model_dir)
align_language = args["language"] if args["language"] is not None else "en" # default to loading english if not specified
align_model, align_metadata = load_align_model(align_language, device, model_name=align_model)
for audio_path in args.pop("audio"):
if vad_filter:
if parallel_bs > 1:
print("Performing VAD and parallel transcribing ...")
result = transcribe_with_vad_parallel(model, audio_path, vad_pipeline, temperature=temperature, batch_size=parallel_bs, **args)
else:
print("Performing VAD...")
result = transcribe_with_vad(model, audio_path, vad_pipeline, temperature=temperature, **args)

@kanjieater
Copy link

You'll need this function

def transcribe_with_vad(

And you can see how to use it in the cli function.
Most of it here:

vad_pipeline = None
if vad_filter:
if hf_token is None:
print("Warning, no huggingface token used, needs to be saved in environment variable, otherwise will throw error loading VAD model...")
from pyannote.audio import Inference
vad_pipeline = Inference("pyannote/segmentation",
pre_aggregation_hook=lambda segmentation: segmentation,
use_auth_token=hf_token)
diarize_pipeline = None
if diarize:
if hf_token is None:
print("Warning, no --hf_token used, needs to be saved in environment variable, otherwise will throw error loading diarization model...")
from pyannote.audio import Pipeline
diarize_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2.1",
use_auth_token=hf_token)
os.makedirs(output_dir, exist_ok=True)
if model_name.endswith(".en") and args["language"] not in {"en", "English"}:
if args["language"] is not None:
warnings.warn(f'{model_name} is an English-only model but receipted "{args["language"]}"; using English instead.')
args["language"] = "en"
temperature = args.pop("temperature")
temperature_increment_on_fallback = args.pop("temperature_increment_on_fallback")
if temperature_increment_on_fallback is not None:
temperature = tuple(np.arange(temperature, 1.0 + 1e-6, temperature_increment_on_fallback))
else:
temperature = [temperature]
threads = args.pop("threads")
if threads > 0:
torch.set_num_threads(threads)
from . import load_model
model = load_model(model_name, device=device, download_root=model_dir)
align_language = args["language"] if args["language"] is not None else "en" # default to loading english if not specified
align_model, align_metadata = load_align_model(align_language, device, model_name=align_model)
for audio_path in args.pop("audio"):
if vad_filter:
if parallel_bs > 1:
print("Performing VAD and parallel transcribing ...")
result = transcribe_with_vad_parallel(model, audio_path, vad_pipeline, temperature=temperature, batch_size=parallel_bs, **args)
else:
print("Performing VAD...")
result = transcribe_with_vad(model, audio_path, vad_pipeline, temperature=temperature, **args)

Thanks for the example.

I think I'll hold off in importing it into some other scripts as of right now, as it seems like you would have to pull in quite a few other pieces from the cli snippet to get that transcribe_with_vad working.

@m-bain
Copy link
Owner

m-bain commented Apr 4, 2023

turned on by default now

@m-bain m-bain closed this as completed Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants