This is a FastAPI-based web application that summarizes YouTube videos by extracting their transcripts, generating detailed summaries in various styles, performing sentiment analysis, and optionally converting summaries to speech. The application leverages AI models and APIs to provide comprehensive and engaging outputs.
- Search for YouTube videos by query and retrieve the top result.
- Extract transcripts from YouTube videos using the
youtube_transcript_api. - Generate detailed summaries (500+ words) in specified styles (e.g., concise, detailed, casual) using Grok's
llama-3.3-70b-versatilemodel via thegroqAPI. - Perform sentiment analysis (250+ words) on the transcript, identifying overall sentiment, specific emotions, and key phrases.
- Optionally convert summaries to audio using Google Text-to-Speech (
gTTS). - Serve a simple web interface for user interaction and provide API endpoints for programmatic access.
- Web Interface: Users access the app via a browser at
GET /, which serves an HTML form (index.html) for inputting a YouTube query, summary style, and text-to-speech option. - API Request: The form submits a
POST /api/summarizerequest with the query, style, and TTS preference. - Video Search: The app searches YouTube using the
google-api-python-clientto find the video URL and thumbnail. - Transcript Extraction: The transcript is fetched using
youtube_transcript_apiwith proxy support for reliability. - Processing: The transcript is processed with Grok's AI model to generate a summary and sentiment analysis using predefined tools.
- Audio (Optional): If TTS is enabled, the summary is converted to an MP3 file and played.
- Response: The API returns a JSON object with the summary, video URL, sentiment analysis, and thumbnail URL.
- Audio Playback: Users can access the generated audio via
GET /play_audio.
- Python 3.8+
- API keys for:
- YouTube Data API (
YOUTUBE_API_KEY) - Groq API (
GROQ_API_KEY) - Webshare proxy (optional,
PROXY_USERNAMEandPROXY_PASSWORD)
- YouTube Data API (
- FFmpeg (for audio playback on some systems)
- Clone the repository:
git clone <repository-url> cd <repository-name>
- Install dependencies:
pip install -r requirements.txt
- Create a
.envfile in the root directory with your API keys:YOUTUBE_API_KEY=your_youtube_api_key GROQ_API_KEY=your_groq_api_key PROXY_USERNAME=your_proxy_username PROXY_PASSWORD=your_proxy_password - Run the FastAPI server:
uvicorn main:app --reload
- Access the app at
http://127.0.0.1:8000.
- FastAPI: Web framework for building the API and serving static files.
- google-api-python-client: Interacts with the YouTube Data API to search for videos.
- youtube_transcript_api: Extracts transcripts from YouTube videos, with proxy support via
WebshareProxyConfig. - gTTS: Converts text summaries to speech.
- groq: Interfaces with Grok's AI model for summarization and sentiment analysis.
- python-dotenv: Loads environment variables from a
.envfile. - json: Parses tool call arguments from Grok's responses.
get_form(): Serves the HTML form at the root endpoint (GET /).summarize(): HandlesPOST /api/summarize, processes the query, and returns the summary, sentiment, and metadata.play_audio(): Serves the generatedsummary.mp3file (GET /play_audio).
search_youtube_video(query): Searches YouTube for a video matching the query and returns its URL and thumbnail.extract_transcript(video_url): Extracts the transcript from a YouTube video, limiting it to 16,000 characters if necessary.summarize(transcript, style): Generates a detailed summary of the transcript in the specified style using Grok's AI.analyze_sentiment(transcript): Performs a detailed sentiment analysis of the transcript.process_with_tools(transcript, style): Orchestrates summarization and sentiment analysis using Grok's tool-calling feature, with fallbacks if tools fail.text_to_speech(summary, output_file): Converts the summary to an MP3 file using gTTS and plays it.summarize_youtube_video(query, summary_style, tts_enabled): Main function that ties everything together, returning a dictionary with results.
The app defines two Grok tools:
summarize: Takes a transcript and style, producing a 500+ word summary.analyze_sentiment: Takes a transcript and returns a 250+ word sentiment analysis.
These tools are invoked automatically by Grok's llama-3.3-70b-versatile model via the tool_choice="auto" setting.
├── main.py # FastAPI app entry point
├── summarizer.py # Core summarization and processing logic
├── templates/ # HTML templates
│ └── index.html # Web form
├── static/ # Static files (e.g., CSS, JS)
├── summary.mp3 # Generated audio file (temporary)
├── .env # Environment variables (not tracked)
└── requirements.txt # Python dependencies
- Visit
http://127.0.0.1:8000. - Enter a query (e.g., "Python tutorial"), select a style (e.g., "detailed"), and choose whether to enable TTS.
- Submit the form to receive a JSON response with the summary, sentiment, video URL, and thumbnail URL.
- If TTS is enabled, the summary will be played as audio.
- The app limits transcripts to 16,000 characters to avoid API token limits.
- Proxy settings are optional but recommended for reliable transcript extraction.
- Summaries and sentiment analyses are designed to be verbose and detailed for maximum insight.