## 📚 **Usage Guide & Tips**

### ✅ **Quick Start**
1. **Set your source** in the first cell (YouTube, Google Drive, Dropbox, or Local)
2. **Choose your API provider** and model in the second cell
3. **Run the setup cell** - it will guide you through API key configuration
4. **Adjust settings** like prompt type and chunk size in the fourth cell
5. **Hit Run** in the final cell and watch the magic happen!

### 🔑 **API Key Setup**
- **Colab Secrets** (Recommended): Go to 🔒 Secrets in the sidebar, add your API key with the name shown in setup
- **Environment File**: Create a `.env` file with your keys like `perplexity=pplx-your-key-here`

### 💡 **Tips for Best Results**
- **YouTube videos**: Leave captions enabled for speed and cost efficiency
- **Long videos**: Use smaller chunk sizes (8000-12000) for better context
- **Multiple languages**: Set the language parameter correctly
- **Complex content**: Try "Research" or "Distill Wisdom" prompt types
- **Performance**: Increase parallel calls (20-30) for faster processing

### 🎯 **Prompt Types Explained**
- **Summarization**: General overview and key points
- **Questions and answers**: Structured Q&A format
- **Distill Wisdom**: Key insights, quotes, and takeaways
- **Research**: Academic-style analysis and findings
- **Fact Checker**: Verification and accuracy analysis

### 🚀 **Advanced Features**
- **Verbose Output**: Enable to see detailed progress with spinners and progress bars
- **Custom Models**: Switch to specific models for different tasks
- **Batch Processing**: Process multiple videos by running cells repeatedly
- **Auto Download**: Enable to automatically download results


https://github.com/martinopiaggi/summarize

In [None]:
# @markdown ## 🔗 **Source Configuration**

# @markdown **Source Type**
Type_of_source = "YouTube Video"  # @param ["YouTube Video", "Google Drive Video Link", "Dropbox Video Link", "Local File"]

# @markdown **Source URL or Path**
Source = ""  # @param {type:"string"}

# @markdown **Use YouTube Captions** (Recommended for speed and cost efficiency)
use_Youtube_captions = True  # @param {type:"boolean"}

# @markdown ## 🎤 **Transcription Settings** (Only used if captions unavailable)
transcription_method = "Cloud Whisper"  # @param ["Cloud Whisper", "Local Whisper"]
language = "it"  # @param ["en", "es", "fr", "de", "it", "pt", "auto"]

print(f"📁 Source Type: {Type_of_source}")
print(f"🔗 Source: {Source}")
print(f"📋 Using captions: {'✅ Yes' if use_Youtube_captions else '❌ No (will transcribe)'}")

In [None]:
# @markdown ## 🌐 **API Configuration**

predefined_endpoint = "Perplexity"  # @param ["OpenAI", "Groq", "DeepSeek", "Perplexity", "Google", "Hyperbolic", "Custom"]

# Modern endpoint configurations with correct environment variable mapping
endpoints = {
    "OpenAI": {"url": "https://api.openai.com/v1", "default_model": "gpt-4o", "key_env": "openai"},
    "Groq": {"url": "https://api.groq.com/openai/v1", "default_model": "llama-3.3-70b-versatile", "key_env": "groq"},
    "DeepSeek": {"url": "https://api.deepseek.com/v1", "default_model": "deepseek-chat", "key_env": "deepseek"},
    "Perplexity": {"url": "https://api.perplexity.ai", "default_model": "sonar-pro", "key_env": "perplexity"},
    "Google": {"url": "https://generativelanguage.googleapis.com/v1beta/openai", "default_model": "gemini-2.0-flash-exp", "key_env": "generativelanguage"},
    "Hyperbolic": {"url": "https://api.hyperbolic.xyz/v1", "default_model": "meta-llama/Llama-3.3-70B-Instruct", "key_env": "hyperbolic"}
}

use_default_model = True  # @param {type:"boolean"}
custom_model = "gpt-4o"  # @param {type:"string"}
custom_endpoint_url = "https://api.openai.com/v1"  # @param {type:"string"}

# Configure endpoint and model
if predefined_endpoint == "Custom":
    base_url = custom_endpoint_url
    model = custom_model
    key_env = "openai"  # Default fallback
else:
    config = endpoints[predefined_endpoint]
    base_url = config["url"]
    model = config["default_model"] if use_default_model else custom_model
    key_env = config["key_env"]

print(f"🌐 Using: {predefined_endpoint}")
print(f"🔗 Endpoint: {base_url}")
print(f"🤖 Model: {model}")
print(f"🔑 Expected env key: {key_env}")

In [None]:
# @markdown ## 🛠️ **Setup Dependencies**

import os
import sys
from IPython.display import display, HTML, clear_output

def show_status(message, status="info"):
    """Display status with nice formatting"""
    colors = {
        "info": "#e3f2fd", "success": "#e8f5e8", 
        "warning": "#fff3e0", "error": "#ffebee"
    }
    symbols = {
        "info": "ℹ️", "success": "✅", 
        "warning": "⚠️", "error": "❌"
    }
    color = colors.get(status, colors["info"])
    symbol = symbols.get(status, symbols["info"])
    
    display(HTML(f"""
    <div style='padding: 12px; background: {color}; border-radius: 8px; 
                 margin: 8px 0; border-left: 4px solid #2196F3;'>
        <b>{symbol} {message}</b>
    </div>
    """))

show_status("Setting up environment...", "info")

# Install required packages
try:
    # Essential packages for all functionality
    !pip install nest_asyncio python-dotenv groq -q
    
    # Install the summarizer package from the latest commit
    !pip install --upgrade "git+https://github.com/martinopiaggi/summarize.git@feature/refactor-backend" -q
    
    show_status("Core dependencies installed successfully", "success")
    
    # Optional: Install Local Whisper (only if user wants local transcription)
    if transcription_method == "Local Whisper":
        show_status("Installing OpenAI Whisper for local transcription...", "info")
        !pip install openai-whisper -q
        show_status("Local Whisper installed", "success")
    else:
        show_status("Using Cloud Whisper (via Groq API) - no local Whisper needed", "info")
    
except Exception as e:
    show_status(f"Installation error: {str(e)}", "error")
    show_status("Trying alternative installation...", "warning")
    !pip install nest_asyncio python-dotenv groq
    !pip install --upgrade "git+https://github.com/martinopiaggi/summarize.git@feature/refactor-backend"

# Import and configure
try:
    import nest_asyncio
    import asyncio
    from dotenv import load_dotenv
    from summarizer.core import main
    
    # Fix async event loop for Jupyter
    nest_asyncio.apply()
    
    # Ensure we have a clean event loop
    try:
        loop = asyncio.get_running_loop()
        show_status("Using existing Jupyter event loop", "info")
    except RuntimeError:
        # No running loop, create one
        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
        show_status("Created new event loop", "info")
    
    load_dotenv()
    
    show_status("Summarizer loaded successfully", "success")
    
except ImportError as e:
    show_status(f"Import error: {str(e)}", "error")
    show_status("Please restart runtime and try again", "warning")

# API Key Management
def get_api_key():
    """Get API key from Colab secrets or environment"""
    api_key = None
    
    # Try Colab secrets first
    try:
        from google.colab import userdata
        api_key = userdata.get(key_env)
        if api_key:
            show_status(f"Found API key in Colab secrets ({key_env})", "success")
            return api_key
    except:
        pass
    
    # Try environment variables
    api_key = os.getenv(key_env)
    if api_key:
        show_status(f"Found API key in environment ({key_env})", "success")
        return api_key
        
    show_status(f"No API key found for '{key_env}'. Please add it to Colab secrets or .env file", "warning")
    return None

# Get API keys
main_api_key = get_api_key()

# For transcription - check what's needed based on method
if transcription_method == "Cloud Whisper":
    groq_key = None
    try:
        from google.colab import userdata
        groq_key = userdata.get('groq')
    except:
        pass
    if not groq_key:
        groq_key = os.getenv('groq')
    
    if groq_key:
        show_status("Groq API key found for Cloud Whisper", "success")
    else:
        show_status("Groq API key needed for Cloud Whisper transcription", "warning")
        show_status("Add 'groq' to Colab secrets or .env file", "info")
    
    os.environ['groq'] = groq_key or ""

elif transcription_method == "Local Whisper":
    # Check if openai-whisper is available
    try:
        import whisper
        show_status("Local Whisper (OpenAI) package available", "success")
    except ImportError:
        show_status("Local Whisper package not installed", "warning")
        show_status("Run: !pip install openai-whisper", "info")

# Set environment variables for the summarizer
os.environ[key_env] = main_api_key or ""

print(f"🔧 Setup complete for {predefined_endpoint} with {transcription_method}")

In [None]:
# @markdown ## ⚙️ **Summarization Settings**

prompt_type = "Questions and answers"  # @param ['Summarization', 'Only grammar correction with highlights', 'Distill Wisdom', 'Questions and answers', 'DNA Extractor', 'Research', 'Fact Checker', 'Essay Writing in Paul Graham Style']
chunk_size = 12000  # @param {type:"slider", min:4000, max:28000, step:2000}
parallel_api_calls = 20  # @param {type:"slider", min:1, max:50, step:1}
max_output_tokens = 4096  # @param {type:"slider", min:1024, max:8192, step:1024}

# @markdown **Visual Feedback**
verbose_output = True  # @param {type:"boolean"}

print("⚙️ Configuration Summary:")
print(f"📝 Prompt Type: {prompt_type}")
print(f"📊 Chunk Size: {chunk_size:,} characters")
print(f"🔀 Parallel Calls: {parallel_api_calls}")
print(f"🎯 Max Tokens: {max_output_tokens:,}")
print(f"👁️ Verbose Output: {'✅ Enabled' if verbose_output else '❌ Disabled'}")

In [None]:
# @markdown ## 🚀 **Run Summarization**

auto_download = False  # @param {type:"boolean"}

import time
import asyncio
from datetime import datetime
from concurrent.futures import ThreadPoolExecutor

# Handle local file uploads if needed
if Type_of_source == "Local File" and not Source:
    show_status("Please upload your video file...", "info")
    from google.colab import files
    uploaded = files.upload()
    if uploaded:
        Source = list(uploaded.keys())[0]
        show_status(f"Using uploaded file: {Source}", "success")
    else:
        show_status("No file uploaded. Please try again.", "error")
        raise ValueError("No file provided")

# Mount Google Drive if needed
if Type_of_source == "Google Drive Video Link":
    try:
        from google.colab import drive
        drive.mount('/content/drive')
        show_status("Google Drive mounted successfully", "success")
    except Exception as e:
        show_status(f"Failed to mount Google Drive: {str(e)}", "error")

# Configure the summarizer
config = {
    "type_of_source": Type_of_source,
    "source_url_or_path": Source,
    "use_youtube_captions": use_Youtube_captions if Type_of_source == "YouTube Video" else False,
    "transcription_method": transcription_method,
    "language": language,
    "prompt_type": prompt_type,
    "chunk_size": chunk_size,
    "parallel_api_calls": parallel_api_calls,
    "max_output_tokens": max_output_tokens,
    "base_url": base_url,
    "model": model,
    "verbose": verbose_output  # Enable new visual feedback system
}

# Validate API key before starting
if not main_api_key:
    show_status(f"Missing API key for {predefined_endpoint}. Please configure it in the setup cell.", "error")
    raise ValueError("API key required")

def run_summarizer_in_thread(config):
    """Run summarizer in a separate thread with its own event loop to avoid conflicts"""
    # Create a new event loop for this thread
    new_loop = asyncio.new_event_loop()
    asyncio.set_event_loop(new_loop)
    
    try:
        return main(config)
    finally:
        new_loop.close()

# Show processing start
show_status(f"Starting summarization with {model}...", "info")
start_time = time.time()

try:
    # Run the summarizer in a separate thread to avoid event loop conflicts
    with ThreadPoolExecutor(max_workers=1) as executor:
        future = executor.submit(run_summarizer_in_thread, config)
        summary = future.result()  # This will block until completion
    
    # Calculate processing time
    end_time = time.time()
    duration = end_time - start_time
    mins, secs = divmod(duration, 60)
    
    show_status(f"Summarization completed in {int(mins)}m {int(secs)}s", "success")
    
    # Generate filename with timestamp
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    safe_source = Source.split('/')[-1].replace('?', '_').replace('&', '_')[:50]
    filename = f"summary_{safe_source}_{timestamp}.md"
    
    # Save the summary
    with open(filename, "w", encoding="utf-8") as f:
        f.write(f"# Video Summary\n\n")
        f.write(f"**Source:** {Source}\n")
        f.write(f"**Model:** {model} ({predefined_endpoint})\n")
        f.write(f"**Generated:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
        f.write(f"**Processing Time:** {int(mins)}m {int(secs)}s\n\n")
        f.write("---\n\n")
        f.write(summary)
    
    show_status(f"Summary saved as: {filename}", "success")
    
    # Display preview
    preview_length = 500
    preview = summary[:preview_length] + ("..." if len(summary) > preview_length else "")
    
    display(HTML(f"""
    <div style='padding: 15px; background: #f8f9fa; border-radius: 8px; 
                 margin: 15px 0; border: 1px solid #dee2e6;'>
        <h3>📄 Summary Preview</h3>
        <div style='background: white; padding: 15px; border-radius: 5px; 
                    font-family: system-ui; line-height: 1.6; max-height: 300px; 
                    overflow-y: auto;'>
            {preview.replace(chr(10), '<br>')}
        </div>
        <small style='color: #6c757d;'>
            Full summary saved to: <code>{filename}</code> 
            ({len(summary):,} characters)
        </small>
    </div>
    """))
    
    # Auto-download if requested
    if auto_download:
        try:
            from google.colab import files
            files.download(filename)
            show_status("File downloaded successfully", "success")
        except Exception as e:
            show_status(f"Download failed: {str(e)}", "warning")
            
except Exception as e:
    error_msg = str(e)
    show_status(f"Summarization failed: {error_msg}", "error")
    
    # Provide helpful error suggestions
    if "api" in error_msg.lower() or "key" in error_msg.lower():
        show_status("Check your API key configuration", "warning")
    elif "transcript" in error_msg.lower():
        show_status("Try enabling 'Use YouTube Captions' or check the video URL", "warning")
    elif "ffmpeg" in error_msg.lower():
        show_status("Installing ffmpeg...", "info")
        !apt-get update -q && apt-get install -y ffmpeg -q
        show_status("Please run this cell again", "info")
    else:
        show_status("Check your configuration and try again", "warning")