Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new document_loader: AssemblyAIAudioTranscriptLoader #9667

Conversation

patrickloeber
Copy link
Contributor

This PR adds a new document loader AssemblyAIAudioTranscriptLoader that allows to transcribe audio files with the AssemblyAI API and loads the transcribed text into documents.

  • Add new document_loader with class AssemblyAIAudioTranscriptLoader
  • Add optional dependency assemblyai
  • Add unit tests (using a Mock client)
  • Add docs notebook

This is the equivalent to the JS integration already available in LangChain.js. See the LangChain JS docs AssemblyAI page.

At its simplest, you can use the loader to get a transcript back from an audio file like this:

from langchain.document_loaders.assemblyai import AssemblyAIAudioTranscriptLoader

loader =  AssemblyAIAudioTranscriptLoader(file_path="./testfile.mp3")
docs = loader.load()

To use it, it needs the assemblyai python package installed, and the
environment variable ASSEMBLYAI_API_KEY set with your API key. Alternatively, the API key can also be passed as an argument.

Twitter handles to shout out if so kindly 🙇
@AssemblyAI and @patloeber

@vercel
Copy link

vercel bot commented Aug 23, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Aug 24, 2023 5:36am

@dosubot dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Aug 23, 2023
- Add new class `AssemblyAIAudioTranscriptLoader`
- Add optional dependency `assemblyai`
- Add unit tests (using a Mock client)
- Add docs notebook

The `AssemblyAIAudioTranscriptLoader` allows to transcribe audio files
with the AssemblyAI API and loads the transcribed text into documents.
@patrickloeber patrickloeber force-pushed the add-assemblyai-audio-transcript-loader branch from 712cbdb to d917776 Compare August 23, 2023 21:19
@baskaryan baskaryan added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Aug 23, 2023
Copy link
Collaborator

@eyurtsev eyurtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baskaryan looks good to me feel free to merge

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
@baskaryan baskaryan merged commit 5990651 into langchain-ai:master Aug 24, 2023
27 checks passed
baskaryan pushed a commit that referenced this pull request Aug 24, 2023
…9687)

Uses the shorter import path

`from langchain.document_loaders import` instead of the full path
`from langchain.document_loaders.assemblyai`

Applies those changes to the docs and the unit test.

See #9667 that adds this new loader.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: doc loader Related to document loader module (not documentation) 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features lgtm PR looks good. Use to confirm that a PR is ready for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants