Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: add video transcript loaders using Whisper for enhanced video transcription #21426

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

TeodorZlatanov
Copy link

Description:
This pull request introduces six new loader classes to the community package, enhancing video processing capabilities across different environments using the Whisper model. These classes facilitate the transcription of video files into either segmented or paragraphed text formats. Each loader class returns comprehensive information about the transcription, including the source, start time, end time, and other relevant metadata. The classes are designed to cater to different setups: Azure, Local, and OpenAI platforms, each offering both paragraph and segment-based processing.

New Classes:

  • AzureWhisperVideoParagraphLoader: Processes video files into paragraphs using Azure's Whisper API.
  • AzureWhisperVideoSegmentLoader: Processes video files into segments using Azure's Whisper API.
  • LocalWhisperVideoParagraphLoader: Transcribes local video files into paragraphs using the local Whisper model.
  • LocalWhisperVideoSegmentLoader: Transcribes local video files into segments using the local Whisper model.
  • OpenAIWhisperVideoParagraphLoader: Utilizes OpenAI's cloud-based Whisper API to transcribe videos into paragraphs.
  • OpenAIWhisperVideoSegmentLoader: Utilizes OpenAI's cloud-based Whisper API to transcribe videos into segments.

Dependencies:

  • Whisper: Required for local loaders. Install via pip:
    pip install openai-whisper
    
  • OpenAI: Required for Azure OpenAI and OpenAI loaders. Install via pip:
    pip install openai
    
  • FFmpeg: Required for preprocessing video files into audio formats that Whisper can process. Install FFmpeg:
    # on Ubuntu or Debian
    sudo apt update && sudo apt install ffmpeg
    
    # on Arch Linux
    sudo pacman -S ffmpeg
    
    # on MacOS using Homebrew (https://brew.sh/)
    brew install ffmpeg
    
    # on Windows using direct download:
    Download from https://ffmpeg.org/download.html and add the executable to your PATH.
    
    # on Windows using Chocolatey (https://chocolatey.org/)
    choco install ffmpeg
    
    # on Windows using Scoop (https://scoop.sh/)
    scoop install ffmpeg

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 8, 2024
Copy link

vercel bot commented May 8, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 15, 2024 7:20am

@dosubot dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🔌: openai Primarily related to OpenAI integrations 🤖:improvement Medium size change to existing code to handle new use-cases labels May 8, 2024
@TeodorZlatanov
Copy link
Author

@hwchase17 , just wanted to bring this PR to your attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases 🔌: openai Primarily related to OpenAI integrations size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant