-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add new document_loader: AssemblyAIAudioTranscriptLoader
- Add new class `AssemblyAIAudioTranscriptLoader` - Add optional dependency `assemblyai` - Add unit tests (using a Mock client) - Add docs notebook The `AssemblyAIAudioTranscriptLoader` allows to transcribe audio files with the AssemblyAI API and loads the transcribed text into documents.
- Loading branch information
1 parent
fa05e18
commit d917776
Showing
6 changed files
with
490 additions
and
2 deletions.
There are no files selected for viewing
224 changes: 224 additions & 0 deletions
224
docs/extras/integrations/document_loaders/assemblyai.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,224 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# AssemblyAI Audio Transcripts\n", | ||
"\n", | ||
"The `AssemblyAIAudioTranscriptLoader` allows to transcribe audio files with the [AssemblyAI API](https://www.assemblyai.com) and loads the transcribed text into documents.\n", | ||
"\n", | ||
"To use it, you should have the `assemblyai` python package installed, and the\n", | ||
"environment variable `ASSEMBLYAI_API_KEY` set with your API key. Alternatively, the API key can also be passed as an argument.\n", | ||
"\n", | ||
"More info about AssemblyAI:\n", | ||
"\n", | ||
"- [Website](https://www.assemblyai.com/)\n", | ||
"- [Get a Free API key](https://www.assemblyai.com/dashboard/signup)\n", | ||
"- [AssemblyAI API Docs](https://www.assemblyai.com/docs)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Installation\n", | ||
"\n", | ||
"First, you need to install the `assemblyai` python package.\n", | ||
"\n", | ||
"You can find more info about it inside the [assemblyai-python-sdk GitHub repo](https://github.com/AssemblyAI/assemblyai-python-sdk)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#!pip install assemblyai" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Example\n", | ||
"\n", | ||
"The `AssemblyAIAudioTranscriptLoader` needs at least the `file_path` argument. Audio files can be specified as an URL or a local file path." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain.document_loaders.assemblyai import AssemblyAIAudioTranscriptLoader\n", | ||
"\n", | ||
"audio_file = \"https://storage.googleapis.com/aai-docs-samples/nbc.mp3\"\n", | ||
"# or a local file path: audio_file = \"./nbc.mp3\"\n", | ||
"\n", | ||
"loader = AssemblyAIAudioTranscriptLoader(file_path=audio_file)\n", | ||
"\n", | ||
"docs = loader.load()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Note: Calling `loader.load()` blocks until the transcription is finished." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The transcribed text is available in the `page_content`:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"docs[0].page_content" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"```\n", | ||
"\"Load time, a new president and new congressional makeup. Same old ...\"\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The `metadata` contains the full JSON response with more meta information:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"docs[0].metadata" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"```\n", | ||
"{'language_code': <LanguageCode.en_us: 'en_us'>,\n", | ||
" 'audio_url': 'https://storage.googleapis.com/aai-docs-samples/nbc.mp3',\n", | ||
" 'punctuate': True,\n", | ||
" 'format_text': True,\n", | ||
" ...\n", | ||
"}\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Transcript Formats\n", | ||
"\n", | ||
"You can specify the `transcript_format` argument for different formats.\n", | ||
"\n", | ||
"Depending on the format, one or more documents are returned. These are the different `TranscriptFormat` options:\n", | ||
"\n", | ||
"- `TEXT`: One document with the transcription text\n", | ||
"- `SENTENCES`: Multiple documents, splits the transcription by each sentence\n", | ||
"- `PARAGRAPHS`: Multiple documents, splits the transcription by each paragraph\n", | ||
"- `SUBTITLES_SRT`: One document with the transcript exported in SRT subtitles format\n", | ||
"- `SUBTITLES_VTT`: One document with the transcript exported in VTT subtitles format" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain.document_loaders.assemblyai import (\n", | ||
" AssemblyAIAudioTranscriptLoader,\n", | ||
" TranscriptFormat,\n", | ||
")\n", | ||
"\n", | ||
"loader = AssemblyAIAudioTranscriptLoader(\n", | ||
" file_path=\"./your_file.mp3\",\n", | ||
" transcript_format=TranscriptFormat.SENTENCES,\n", | ||
")\n", | ||
"\n", | ||
"docs = loader.load()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Transcription Config\n", | ||
"\n", | ||
"You can also specify the `config` argument to use different audio intelligence models.\n", | ||
"\n", | ||
"Visit the [AssemblyAI API Documentation](https://www.assemblyai.com/docs) to get an overview of all available models!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import assemblyai as aai\n", | ||
"\n", | ||
"config = aai.TranscriptionConfig(speaker_labels=True,\n", | ||
" auto_chapters=True,\n", | ||
" entity_detection=True\n", | ||
")\n", | ||
"\n", | ||
"loader = AssemblyAIAudioTranscriptLoader(\n", | ||
" file_path=\"./your_file.mp3\",\n", | ||
" config=config\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Pass the API Key as argument\n", | ||
"\n", | ||
"Next to setting the API key as environment variable `ASSEMBLYAI_API_KEY`, it is also possible to pass it as argument." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"loader = AssemblyAIAudioTranscriptLoader(\n", | ||
" file_path=\"./your_file.mp3\",\n", | ||
" api_key=\"YOUR_KEY\"\n", | ||
")" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"language_info": { | ||
"name": "python" | ||
}, | ||
"orig_nbformat": 4 | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
110 changes: 110 additions & 0 deletions
110
libs/langchain/langchain/document_loaders/assemblyai.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
from __future__ import annotations | ||
|
||
from enum import Enum | ||
from typing import TYPE_CHECKING, List, Optional | ||
|
||
from langchain.docstore.document import Document | ||
from langchain.document_loaders.base import BaseLoader | ||
|
||
if TYPE_CHECKING: | ||
import assemblyai | ||
|
||
|
||
class TranscriptFormat(Enum): | ||
"""Transcript format to use for the document loader.""" | ||
|
||
TEXT = "text" | ||
"""One document with the transcription text""" | ||
SENTENCES = "sentences" | ||
"""Multiple documents, splits the transcription by each sentence""" | ||
PARAGRAPHS = "paragraphs" | ||
"""Multiple documents, splits the transcription by each paragraph""" | ||
SUBTITLES_SRT = "subtitles_srt" | ||
"""One document with the transcript exported in SRT subtitles format""" | ||
SUBTITLES_VTT = "subtitles_vtt" | ||
"""One document with the transcript exported in VTT subtitles format""" | ||
|
||
|
||
class AssemblyAIAudioTranscriptLoader(BaseLoader): | ||
""" | ||
Loader for AssemblyAI audio transcripts. | ||
It uses the AssemblyAI API to transcribe audio files | ||
and loads the transcribed text into one or more Documents, | ||
depending on the specified format. | ||
To use, you should have the ``assemblyai`` python package installed, and the | ||
environment variable ``ASSEMBLYAI_API_KEY`` set with your API key. | ||
Alternatively, the API key can also be passed as an argument. | ||
Audio files can be specified via an URL or a local file path. | ||
""" | ||
|
||
def __init__( | ||
self, | ||
file_path: str, | ||
transcript_format: TranscriptFormat = TranscriptFormat.TEXT, | ||
config: Optional[assemblyai.TranscriptionConfig] = None, | ||
api_key: Optional[str] = None, | ||
): | ||
""" | ||
Initializes the AssemblyAI AudioTranscriptLoader. | ||
Args: | ||
file_path: An URL or a local file path. | ||
transcript_format: Transcript format to use. | ||
See class ``TranscriptFormat`` for more info. | ||
config: Transcription options and features. If ``None`` is given, | ||
the Transcriber's default configuration will be used. | ||
api_key: AssemblyAI API key. | ||
""" | ||
try: | ||
import assemblyai | ||
except ImportError: | ||
raise ImportError( | ||
"Could not import assemblyai python package. " | ||
"Please install it with `pip install assemblyai`." | ||
) | ||
if api_key is not None: | ||
assemblyai.settings.api_key = api_key | ||
|
||
self.file_path = file_path | ||
self.transcript_format = transcript_format | ||
self.transcriber = assemblyai.Transcriber(config=config) | ||
|
||
def load(self) -> List[Document]: | ||
"""Transcribes the audio file and loads the transcript into documents. | ||
It uses the AssemblyAI API to transcribe the audio file and blocks until | ||
the transcription is finished. | ||
""" | ||
transcript = self.transcriber.transcribe(self.file_path) | ||
# This will raise a ValueError if no API key is set. | ||
|
||
if transcript.error: | ||
raise ValueError(f"Could not transcribe file: {transcript.error}") | ||
|
||
if self.transcript_format == TranscriptFormat.TEXT: | ||
return [ | ||
Document( | ||
page_content=transcript.text, metadata=transcript.json_response | ||
) | ||
] | ||
elif self.transcript_format == TranscriptFormat.SENTENCES: | ||
sentences = transcript.get_sentences() | ||
return [ | ||
Document(page_content=s.text, metadata=s.dict(exclude={"text"})) | ||
for s in sentences | ||
] | ||
elif self.transcript_format == TranscriptFormat.PARAGRAPHS: | ||
paragraphs = transcript.get_paragraphs() | ||
return [ | ||
Document(page_content=p.text, metadata=p.dict(exclude={"text"})) | ||
for p in paragraphs | ||
] | ||
elif self.transcript_format == TranscriptFormat.SUBTITLES_SRT: | ||
return [Document(page_content=transcript.export_subtitles_srt())] | ||
elif self.transcript_format == TranscriptFormat.SUBTITLES_VTT: | ||
return [Document(page_content=transcript.export_subtitles_vtt())] | ||
else: | ||
raise ValueError("Unknown transcript format.") |
Oops, something went wrong.