Skip to content
This repository was archived by the owner on Mar 1, 2024. It is now read-only.

Conversation

@patrickloeber
Copy link
Contributor

This PR adds a new data reader AssemblyAIAudioTranscriptReader that allows to transcribe audio files with the AssemblyAI API and loads the transcribed text into documents.

  • add new reader with class AssemblyAIAudioTranscriptReader
  • add README
  • add unit tests. The test use a MockerFixture and don't use API calls
  • Add new dev dependency pytest-mock

Description

At its simplest, you can use the loader to get a transcript back from an audio file like this:

from llama_hub.assemblyai.base import AssemblyAIAudioTranscriptReader

audio_file = "https://storage.googleapis.com/aai-docs-samples/nbc.mp3"
# or a local file path: audio_file = "./nbc.mp3"

reader = AssemblyAIAudioTranscriptReader(file_path=audio_file)

docs = reader.load_data()

To use it, it needs the assemblyai python package installed, and the environment variable ASSEMBLYAI_API_KEY set with your API key. Alternatively, the API key can also be passed as an argument.

LangChain already has this integration here: https://python.langchain.com/docs/integrations/document_loaders/assemblyai

Twitter handles in case you want to connect 🙇
@AssemblyAI and @patloeber

Type of Change

Please delete options that are not relevant.

  • New Loader/Tool
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Added new unit tests. The test use a MockerFixture and don't use API calls. For this, the PR adds a new dev dependency pytest-mock

Suggested Checklist:

  • I have added a library.json file if a new loader/tool was added
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran make format; make lint to appease the lint gods

Copy link
Collaborator

@EmanuelCampos EmanuelCampos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one!

nit: the lint didn't pass, can you certificate to run make format;make lint?

- add new reader with class `AssemblyAIAudioTranscriptReader`
- add README
- add unit tests. The test use a MockerFixture and don't use API calls
- Add new dev dependency `pytest-mock`

This reader allows to transcribe audio files with the AssemblyAI API
and loads the transcribed text into documents.
@patrickloeber
Copy link
Contributor Author

Nice one!

nit: the lint didn't pass, can you certificate to run make format;make lint?

Yes, sorry about that. Pushed new change with the formatting updates after running make format;make lint

Copy link
Collaborator

@jerryjliu jerryjliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is awesome! thanks @patrickloeber

@jerryjliu jerryjliu merged commit 0eacabf into run-llama:main Oct 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants