Skip to content

WhisperPlus: Advancing Speech-to-Text Processing πŸš€

License

Notifications You must be signed in to change notification settings

majiajue/whisper-plus

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

40 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

WhisperPlus: Advancing Speech-to-Text Processing πŸš€

teaser

πŸ› οΈ Installation

pip install whisperplus

πŸ€— Model Hub

You can find the models on the HuggingFace Spaces or on the HuggingFace Model Hub

πŸŽ™οΈ Usage

To use the whisperplus library, follow the steps below for different tasks:

🎡 Youtube URL to Audio

from whisperplus import SpeechToTextPipeline, download_and_convert_to_mp3

# Define the URL of the YouTube video that you want to convert to text.
url = "https://www.youtube.com/watch?v=di3rHkEZuUw"

# Initialize the Speech to Text Pipeline with the specified model.
audio_path = download_and_convert_to_mp3(url)
pipeline = SpeechToTextPipeline(model_id="openai/whisper-large-v3")

# Run the pipeline on the audio file.
transcript = pipeline(
    audio_path=audio_path, model_id="openai/whisper-large-v3", language="english"
)

# Print the transcript of the audio.
print(transcript)

Summarization

from whisperplus.pipelines.summarization import TextSummarizationPipeline

summarizer = TextSummarizationPipeline(model_id="facebook/bart-large-cnn")
summary = summarizer.summarize(transcript)
print(summary[0]["summary_text"])

Speaker Diarization

from whisperplus import (
    ASRDiarizationPipeline,
    download_and_convert_to_mp3,
    format_speech_to_dialogue,
)

audio_path = download_and_convert_to_mp3("https://www.youtube.com/watch?v=mRB14sFHw2E")

device = "cuda"  # cpu or mps
pipeline = ASRDiarizationPipeline.from_pretrained(
    asr_model="openai/whisper-large-v3",
    diarizer_model="pyannote/speaker-diarization",
    use_auth_token=False,
    chunk_length_s=30,
    device=device,
)

output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)
dialogue = format_speech_to_dialogue(output_text)
print(dialogue)

Contributing

pip install -r dev-requirements.txt
pre-commit install
pre-commit run --all-files

πŸ“œ License

This project is licensed under the terms of the Apache License 2.0.

πŸ€— Acknowledgments

This project is based on the HuggingFace Transformers library.

πŸ€— Citation

@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

About

WhisperPlus: Advancing Speech-to-Text Processing πŸš€

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%