Add new document_loader: AssemblyAIAudioTranscriptLoader #9667

patrickloeber · 2023-08-23T20:41:43Z

This PR adds a new document loader AssemblyAIAudioTranscriptLoader that allows to transcribe audio files with the AssemblyAI API and loads the transcribed text into documents.

Add new document_loader with class AssemblyAIAudioTranscriptLoader
Add optional dependency assemblyai
Add unit tests (using a Mock client)
Add docs notebook

This is the equivalent to the JS integration already available in LangChain.js. See the LangChain JS docs AssemblyAI page.

At its simplest, you can use the loader to get a transcript back from an audio file like this:

from langchain.document_loaders.assemblyai import AssemblyAIAudioTranscriptLoader

loader =  AssemblyAIAudioTranscriptLoader(file_path="./testfile.mp3")
docs = loader.load()

To use it, it needs the assemblyai python package installed, and the
environment variable ASSEMBLYAI_API_KEY set with your API key. Alternatively, the API key can also be passed as an argument.

Twitter handles to shout out if so kindly 🙇
@AssemblyAI and @patloeber

vercel · 2023-08-23T20:41:46Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)	Visit Preview		Aug 24, 2023 5:36am

- Add new class `AssemblyAIAudioTranscriptLoader` - Add optional dependency `assemblyai` - Add unit tests (using a Mock client) - Add docs notebook The `AssemblyAIAudioTranscriptLoader` allows to transcribe audio files with the AssemblyAI API and loads the transcribed text into documents.

eyurtsev

@baskaryan looks good to me feel free to merge

libs/langchain/langchain/document_loaders/assemblyai.py

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>

…9687) Uses the shorter import path `from langchain.document_loaders import` instead of the full path `from langchain.document_loaders.assemblyai` Applies those changes to the docs and the unit test. See #9667 that adds this new loader.

dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Aug 23, 2023

vercel bot deployed to Preview – langchain August 23, 2023 20:52 View deployment

patrickloeber force-pushed the add-assemblyai-audio-transcript-loader branch from 712cbdb to d917776 Compare August 23, 2023 21:19

baskaryan requested a review from eyurtsev August 23, 2023 21:28

baskaryan approved these changes Aug 23, 2023

View reviewed changes

baskaryan added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Aug 23, 2023

vercel bot deployed to Preview – langchain August 23, 2023 21:29 View deployment

eyurtsev approved these changes Aug 24, 2023

View reviewed changes

libs/langchain/langchain/document_loaders/assemblyai.py Show resolved Hide resolved

Update libs/langchain/langchain/document_loaders/assemblyai.py

3dbce3b

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>

baskaryan merged commit 5990651 into langchain-ai:master Aug 24, 2023
27 checks passed

patrickloeber mentioned this pull request Aug 24, 2023

Fix docs for AssemblyAIAudioTranscriptLoader (shorter import path) #9687

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new document_loader: AssemblyAIAudioTranscriptLoader #9667

Add new document_loader: AssemblyAIAudioTranscriptLoader #9667

patrickloeber commented Aug 23, 2023

vercel bot commented Aug 23, 2023 •

edited

eyurtsev left a comment

Add new document_loader: AssemblyAIAudioTranscriptLoader #9667

Add new document_loader: AssemblyAIAudioTranscriptLoader #9667

Conversation

patrickloeber commented Aug 23, 2023

vercel bot commented Aug 23, 2023 • edited

eyurtsev left a comment

Choose a reason for hiding this comment

vercel bot commented Aug 23, 2023 •

edited