-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some videos don't have a subtitles: app should get the transcript with whisper for example #2
Comments
we need to handle videos without subtitles. |
The situation is as follows: There's currently no known method of loading a video directly into Gemini using just the YouTube URL, without first saving it to your local hard drive. Initially, my experience with this approach was very positive.
I've actually been using this API to generate Python scripts as markdowns from tutorial videos, particularly from YouTubers who have the unfortunate habit of not providing GitHub links or reserving them for members-only areas. My need was more for gemini's vision capabilities. But it was perfectly capable of generating
Regarding the issue of handling videos without subtitles, implementing Whisper or a similar solution could indeed be a viable option. However, it's worth noting that this would introduce additional complexity and potential resource constraints to the application. In other words.
|
Here a script I developed for this task: import os
from dotenv import load_dotenv
import google.generativeai as genai
import sys
import time
load_dotenv(override=True)
# Initialize the Gemini API
api_key = os.getenv("GOOGLE_API_KEY")
genai.configure(api_key=api_key)
# Check if a file path was passed
if len(sys.argv) < 2:
print("Please provide the path to the audio file.")
sys.exit(1)
audio_file_path = sys.argv[1]
# Upload the audio file
audio_file = genai.upload_file(audio_file_path)
# Wait for the audio file to be processed
while audio_file.state.name == "PROCESSING":
print(".", end="")
time.sleep(10)
audio_file = genai.get_file(audio_file.name)
# Tell Gemini to transcribe the audio file
prompt = "Listen carefully to the following audio file. Transcribe the spoken words accurately. Adjust the spelling accordingly. Remove filler words such as 'um' or similar. There should be no pauses in the text. The text should be easy to read and contain no linguistic or grammatical errors. In any case, generate complete, meaningful and comprehensible sentences from the content."
model = genai.GenerativeModel("models/gemini-1.5-pro-latest")
response = model.generate_content(
[prompt, audio_file], request_options={"timeout": 600}
)
# Extract the filename from the file path
filename, file_extension = os.path.splitext(os.path.basename(audio_file_path))
# Save the result to a text file with the same name as the audio file
output_file = f"{filename}.txt"
with open(output_file, "w", encoding="utf-8") as f:
f.write(response.text)
print(f"The transcription has been saved in the file '{output_file}'.")
# Delete the uploaded files from Gemini
genai.delete_file(audio_file.name) transcribe_with_gemini.py DocumentationOverviewThe Requirements
How to Use
Notes
ConclusionThe |
No description provided.
The text was updated successfully, but these errors were encountered: