Skip to content

Releases: huggingface/speechbox

Patch Release v0.2.1

27 Jan 18:40
79eb397
Compare
Choose a tag to compare

Fixes import checks for ASRDiarizationPipeline class (see #16), displaying a nice message if either pyannote.audio or torchaudio are not installed.

v0.2.0

27 Jan 17:37
c757285
Compare
Choose a tag to compare

Second Release

The second release of Speechbox adds a pipeline for ASR + Speaker Diarization. This allows you to transcribe long audio files and annotate the transcriptions with who spoke when.

To use this feature, you need to install speechbox as well as transformers & pyannote.audio:

pip install --upgrade speechbox transformers pyannote.audio

For an initial example, we recommend to also install datasets:

pip install datasets

Then you can run the following code snippet:

import torch
from speechbox import ASRDiarizationPipeline
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipeline = ASRDiarizationPipeline.from_pretrained("openai/whisper-tiny", device=device)

# load dataset of concatenated LibriSpeech samples
concatenated_librispeech = load_dataset("sanchit-gandhi/concatenated_librispeech", split="train", streaming=True)
# get first sample
sample = next(iter(concatenated_librispeech))

out = pipeline(sample["audio"])
print(out)

Patch Release v0.1.2

29 Dec 09:38
Compare
Choose a tag to compare

Fixes a bug with beam search. See: 4d15bc9

Beam search (num_beams > 1) has now been checked against greedy search and it seems to work as expected.

Patch Release v0.1.1

28 Dec 22:02
Compare
Choose a tag to compare

Make sure a nice error message is given if accelerate is not installed. See: 8671ba2

Initial Release

28 Dec 21:23
Compare
Choose a tag to compare

Hello world speechbox!

This is the first release of speechbox, providing the Punctuation Restoration task using whisper.

You need to install speechbox as well as transformers & accelerate in order to use the PunctuationRestorer class:

pip install --upgrade speechbox transformers accelerate

For an initial example, we recommend to also install datasets:

pip install datasets

Then you can run the following code snippet:

from speechbox import PunctuationRestorer
from datasets import load_dataset

streamed_dataset = load_dataset("librispeech_asr", "clean", split="validation", streaming=True)

# get first sample
sample = next(iter(streamed_dataset))

# print out normalized transcript
print(sample["text"])
# => "HE WAS IN A FEVERED STATE OF MIND OWING TO THE BLIGHT HIS WIFE'S ACTION THREATENED TO CAST UPON HIS ENTIRE FUTURE"

# load the restoring class
restorer = PunctuationRestorer.from_pretrained("openai/whisper-tiny.en")
restorer.to("cuda")

restored_text, log_probs = restorer(sample["audio"]["array"], sample["text"], sampling_rate=sample["audio"]["sampling_rate"], num_beams=1)

print("Restored text:\n", restored_text)

Note: This project is very young and intended to be run largely by the community. Please check out the Contribution Guide if you'd like to contribute ❤️

You can try out the model here: https://huggingface.co/spaces/speechbox/whisper-restore-punctuation as well.

Speechly,
🤗