# 🎵 Audio Transcription with Google Gemini 2.5 Pro

This notebook provides a complete solution for transcribing audio files using Google's Gemini 2.5 Pro model.

## Features
- 🎯 **Interactive prompt selection** - Choose from specialized transcription styles
- 🔊 **Multiple audio formats** - MP3, WAV, M4A, FLAC, OGG, WebM, MP4, AAC
- 📝 **Formatted output** - Proper speaker identification and formatting
- 📁 **File management** - Upload, transcribe, and download results
- 🔒 **Secure API handling** - Safe API key management

---

## 🚀 Setup and Installation

In [None]:
# Install required packages
!pip install google-genai python-dotenv

# Import necessary libraries
import os
import base64
import mimetypes
from pathlib import Path
from google import genai
from google.genai import types
import zipfile
from google.colab import files
from IPython.display import display, HTML, clear_output
import ipywidgets as widgets

print("✅ Installation complete!")

## 🔑 API Key Configuration

You'll need a Google Gemini API key to use this notebook.

1. Get your API key from [Google AI Studio](https://aistudio.google.com/app/apikey)
2. Run the cell below and enter your API key when prompted

**Security Note**: Your API key will be stored securely for this session only.

In [None]:
import getpass

# Securely input API key
print("🔑 Please enter your Google Gemini API key:")
print("(Get it from: https://aistudio.google.com/app/apikey)")
api_key = getpass.getpass("API Key: ")

if api_key:
    os.environ['GEMINI_API_KEY'] = api_key
    print("✅ API key configured successfully!")
else:
    print("❌ No API key provided. Please run this cell again.")

## 📝 Transcription Prompts

Different types of audio content benefit from specialized transcription approaches. Choose the prompt that best matches your content type.

In [None]:
# Define transcription prompts
TRANSCRIPTION_PROMPTS = {
    "general": {
        "name": "General",
        "description": "Standard transcription with speaker identification",
        "prompt": """Please provide an accurate transcription of this audio file.
Format the text with proper punctuation, capitalization, and paragraph breaks.
If there are multiple speakers, please indicate speaker changes with "Speaker 1:", "Speaker 2:", etc.
If you hear background music or sound effects, you may mention them in [brackets].
Focus on clarity and readability of the final transcript."""
    },
    "meeting": {
        "name": "Meeting",
        "description": "Focus on action items, decisions, and meeting structure",
        "prompt": """Please transcribe this meeting audio with a focus on business context.
Structure the transcription to clearly show:
- Speaker identification (use names if mentioned, otherwise "Speaker 1:", "Speaker 2:", etc.)
- Key decisions made
- Action items assigned
- Important discussion points
- Meeting agenda items
Format with clear paragraph breaks and bullet points where appropriate.
Highlight any deadlines or specific commitments mentioned."""
    },
    "interview": {
        "name": "Interview",
        "description": "Question-answer format with conversational flow",
        "prompt": """Please transcribe this interview audio with attention to conversational dynamics.
Format as a natural dialogue with:
- Clear speaker identification (Interviewer:, Interviewee:, or use names if mentioned)
- Preserve the question-and-answer flow
- Include natural speech patterns and hesitations where relevant
- Note significant pauses with [pause]
- Maintain the conversational tone and personality of speakers
Focus on accuracy while preserving the natural flow of conversation."""
    },
    "lecture": {
        "name": "Lecture",
        "description": "Educational content with key concepts highlighted",
        "prompt": """Please transcribe this educational/lecture audio with focus on learning content.
Structure the transcription to emphasize:
- Key concepts and definitions
- Main topics and subtopics
- Examples and explanations
- Questions from students (if any)
- Important terminology
Use clear paragraph breaks to separate different topics.
Maintain academic tone and preserve technical accuracy."""
    }
}

print("📝 Transcription prompts loaded successfully!")
print(f"Available prompts: {', '.join([p['name'] for p in TRANSCRIPTION_PROMPTS.values()])}")

## 🤖 Audio Transcriber Class

The main transcription engine adapted for Google Colab.

In [None]:
class ColabAudioTranscriber:
    def __init__(self, api_key=None):
        """
        Initialize the Audio Transcriber for Google Colab.
        
        Args:
            api_key (str, optional): Gemini API key.
        """
        self.api_key = api_key or os.environ.get("GEMINI_API_KEY")
        if not self.api_key:
            raise ValueError("GEMINI_API_KEY not found. Please configure your API key first.")
        
        # Initialize the Gemini client
        self.client = genai.Client(api_key=self.api_key)
        
        # Supported audio formats
        self.supported_formats = {
            '.mp3': 'audio/mpeg',
            '.wav': 'audio/wav',
            '.m4a': 'audio/mp4',
            '.flac': 'audio/flac',
            '.ogg': 'audio/ogg',
            '.webm': 'audio/webm',
            '.mp4': 'audio/mp4',
            '.aac': 'audio/aac'
        }
        
        # Default transcription prompt
        self.default_prompt = TRANSCRIPTION_PROMPTS["general"]["prompt"]
    
    def prepare_audio_for_api(self, audio_file_path):
        """
        Prepare audio file for Gemini API by reading it as bytes.
        
        Args:
            audio_file_path: Path to the audio file
            
        Returns:
            tuple: (audio_bytes, mime_type)
        """
        try:
            with open(audio_file_path, 'rb') as audio_file:
                audio_bytes = audio_file.read()
            
            # Get MIME type
            file_extension = Path(audio_file_path).suffix.lower()
            mime_type = self.supported_formats.get(file_extension)
            
            if not mime_type:
                mime_type, _ = mimetypes.guess_type(str(audio_file_path))
            
            return audio_bytes, mime_type
        
        except Exception as e:
            print(f"❌ Error reading audio file {audio_file_path}: {e}")
            return None, None
    
    def transcribe_audio(self, audio_file_path, prompt_type="general"):
        """
        Transcribe a single audio file using Gemini 2.5 Pro.
        
        Args:
            audio_file_path: Path to the audio file
            prompt_type (str): Type of transcription prompt to use
            
        Returns:
            str: Transcribed text or None if error
        """
        print(f"🎵 Transcribing: {Path(audio_file_path).name}")
        
        # Prepare audio data
        audio_bytes, mime_type = self.prepare_audio_for_api(audio_file_path)
        if not audio_bytes or not mime_type:
            return None
        
        try:
            # Create audio part for the API
            audio_part = types.Part.from_bytes(
                data=audio_bytes,
                mime_type=mime_type
            )
            
            # Get the selected prompt
            prompt = TRANSCRIPTION_PROMPTS.get(prompt_type, TRANSCRIPTION_PROMPTS["general"])["prompt"]
            
            # Generate transcription using Gemini 2.5 Pro
            response = self.client.models.generate_content(
                model='gemini-2.5-pro',
                contents=[prompt, audio_part],
                config=types.GenerateContentConfig(
                    temperature=0.1,
                    max_output_tokens=65536,
                )
            )
            
            print(f"✅ Transcription completed for {Path(audio_file_path).name}")
            return response.text.strip()
        
        except Exception as e:
            print(f"❌ Error transcribing {Path(audio_file_path).name}: {e}")
            return None
    
    def save_transcription(self, transcription, audio_file_path, output_folder="transcriptions"):
        """
        Save transcription to a text file.
        
        Args:
            transcription (str): Transcribed text
            audio_file_path: Original audio file path
            output_folder (str): Output folder for transcriptions
        """
        # Create output folder if it doesn't exist
        os.makedirs(output_folder, exist_ok=True)
        
        # Create output filename
        audio_name = Path(audio_file_path).stem
        output_filename = f"{audio_name}_transcription.txt"
        output_file_path = os.path.join(output_folder, output_filename)
        
        try:
            with open(output_file_path, 'w', encoding='utf-8') as f:
                # Write header with metadata
                f.write(f"Transcription of: {Path(audio_file_path).name}\n")
                f.write(f"Generated using: Google Gemini 2.5 Pro\n")
                f.write("=" * 50 + "\n\n")
                f.write(transcription)
            
            print(f"💾 Transcription saved: {output_file_path}")
            return output_file_path
        
        except Exception as e:
            print(f"❌ Error saving transcription: {e}")
            return None

print("🤖 AudioTranscriber class loaded successfully!")

## 📁 Upload Audio Files

Upload your audio files to start transcribing. You can upload multiple files at once.

In [None]:
# Create upload directory
upload_dir = "uploaded_audio"
os.makedirs(upload_dir, exist_ok=True)

print("📁 Ready to upload your audio files in Google Colab:")
print("Supported formats: MP3, WAV, M4A, FLAC, OGG, WebM, MP4, AAC")
print()

# For Google Colab - Use files.upload() which will show a file picker
try:
    from google.colab import files
    print("🔄 Click 'Choose Files' button below to upload your audio files...")
    uploaded = files.upload()
    
    # Move uploaded files to the upload directory
    audio_files = []
    for filename in uploaded.keys():
        file_path = os.path.join(upload_dir, filename)
        # Move file to upload directory
        os.rename(filename, file_path)
        audio_files.append(file_path)
        print(f"✅ Uploaded: {filename}")
    
    print(f"\n📊 Total files uploaded: {len(audio_files)}")
    
except ImportError:
    # Fallback for non-Colab environments
    print("⚠️ This cell is designed for Google Colab.")
    print("For local use, place your audio files in the 'Audio' folder.")
    
    # Check for existing audio files in Audio folder
    audio_folder = "Audio"
    if os.path.exists(audio_folder):
        audio_files = []
        for file in os.listdir(audio_folder):
            if file.lower().endswith(('.mp3', '.wav', '.m4a', '.flac', '.ogg', '.webm', '.mp4', '.aac')):
                audio_files.append(os.path.join(audio_folder, file))
        
        if audio_files:
            print(f"📁 Found {len(audio_files)} audio file(s) in Audio folder:")
            for file in audio_files:
                print(f"  📄 {os.path.basename(file)}")
        else:
            print("❌ No audio files found in Audio folder.")
            audio_files = []
    else:
        print("❌ No Audio folder found.")
        audio_files = []

## 🎯 Select Transcription Style

Choose the transcription style that best matches your audio content.

In [None]:
def display_prompt_options():
    """Display available transcription prompts"""
    print("🎯 Available Transcription Styles:")
    print("=" * 50)
    
    for i, (key, prompt_info) in enumerate(TRANSCRIPTION_PROMPTS.items(), 1):
        print(f"{i}. {prompt_info['name']} - {prompt_info['description']}")
    
    print("\n📝 Preview a prompt by entering its number, or select it for transcription.")

def preview_prompt(prompt_key):
    """Preview a specific prompt"""
    if prompt_key in TRANSCRIPTION_PROMPTS:
        prompt_info = TRANSCRIPTION_PROMPTS[prompt_key]
        print(f"\n📖 Preview: {prompt_info['name']} Prompt")
        print("=" * 40)
        print(prompt_info['prompt'])
        print("=" * 40)

# Display options
display_prompt_options()

# Create dropdown for prompt selection
prompt_options = [(f"{info['name']} - {info['description']}", key) 
                 for key, info in TRANSCRIPTION_PROMPTS.items()]

prompt_dropdown = widgets.Dropdown(
    options=prompt_options,
    value='general',
    description='Style:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='70%')
)

preview_button = widgets.Button(
    description='Preview Prompt',
    button_style='info',
    layout=widgets.Layout(width='150px')
)

preview_output = widgets.Output()

def on_preview_click(b):
    with preview_output:
        clear_output()
        preview_prompt(prompt_dropdown.value)

preview_button.on_click(on_preview_click)

display(widgets.HBox([prompt_dropdown, preview_button]))
display(preview_output)

print("\n✅ Prompt selection ready! Use the dropdown above to choose your transcription style.")

## 🚀 Start Transcription

Run this cell to transcribe all uploaded audio files using your selected style.

In [None]:
# Initialize transcriber
try:
    transcriber = ColabAudioTranscriber()
    print("🤖 Audio transcriber initialized successfully!")
except Exception as e:
    print(f"❌ Error initializing transcriber: {e}")
    print("Please make sure you've configured your API key in the previous cells.")
    raise

# Check if we have files to transcribe
if not audio_files:
    print("⚠️ No audio files found. Please upload files in the previous cell.")
else:
    print(f"\n🎵 Starting transcription of {len(audio_files)} file(s)...")
    print(f"📝 Using prompt style: {TRANSCRIPTION_PROMPTS[prompt_dropdown.value]['name']}")
    print("="*60)
    
    # Create output directory
    output_dir = "transcriptions"
    os.makedirs(output_dir, exist_ok=True)
    
    successful_transcriptions = 0
    failed_transcriptions = 0
    transcribed_files = []
    
    # Transcribe each file
    for i, audio_file in enumerate(audio_files, 1):
        print(f"\n📂 Processing file {i}/{len(audio_files)}")
        
        try:
            # Transcribe the audio
            transcription = transcriber.transcribe_audio(
                audio_file, 
                prompt_type=prompt_dropdown.value
            )
            
            if transcription:
                # Save the transcription
                output_file = transcriber.save_transcription(
                    transcription, 
                    audio_file, 
                    output_dir
                )
                if output_file:
                    successful_transcriptions += 1
                    transcribed_files.append(output_file)
                else:
                    failed_transcriptions += 1
            else:
                print(f"❌ Failed to transcribe {Path(audio_file).name}")
                failed_transcriptions += 1
                
        except Exception as e:
            print(f"❌ Error processing {Path(audio_file).name}: {e}")
            failed_transcriptions += 1
    
    # Summary
    print("\n" + "="*60)
    print("📊 TRANSCRIPTION SUMMARY")
    print("="*60)
    print(f"📁 Total files processed: {len(audio_files)}")
    print(f"✅ Successful transcriptions: {successful_transcriptions}")
    print(f"❌ Failed transcriptions: {failed_transcriptions}")
    
    if successful_transcriptions > 0:
        print(f"\n💾 Transcription files saved in '{output_dir}' folder")
        print("📥 Use the next cell to download your transcriptions!")
    
    print("\n🎉 Transcription process completed!")

## 👀 View Transcription Results

Preview your transcription results before downloading.

In [None]:
# List transcription files
output_dir = "transcriptions"
transcription_files = []

if os.path.exists(output_dir):
    transcription_files = [f for f in os.listdir(output_dir) if f.endswith('.txt')]

if transcription_files:
    print(f"📄 Found {len(transcription_files)} transcription file(s):")
    
    # Create dropdown to select file for preview
    file_dropdown = widgets.Dropdown(
        options=transcription_files,
        description='File:',
        style={'description_width': 'initial'},
        layout=widgets.Layout(width='70%')
    )
    
    preview_btn = widgets.Button(
        description='Preview File',
        button_style='success',
        layout=widgets.Layout(width='150px')
    )
    
    file_output = widgets.Output()
    
    def on_preview_file_click(b):
        with file_output:
            clear_output()
            file_path = os.path.join(output_dir, file_dropdown.value)
            try:
                with open(file_path, 'r', encoding='utf-8') as f:
                    content = f.read()
                print(f"📄 Preview of {file_dropdown.value}:")
                print("="*60)
                print(content[:2000] + ("...\n\n[Content truncated for preview]" if len(content) > 2000 else ""))
                print("="*60)
                print(f"📊 File size: {len(content):,} characters")
            except Exception as e:
                print(f"❌ Error reading file: {e}")
    
    preview_btn.on_click(on_preview_file_click)
    
    display(widgets.HBox([file_dropdown, preview_btn]))
    display(file_output)
    
else:
    print("⚠️ No transcription files found. Please run the transcription process first.")

## 📥 Download Transcriptions

Download your transcription files or create a ZIP archive with all results.

In [None]:
output_dir = "transcriptions"

if os.path.exists(output_dir) and os.listdir(output_dir):
    transcription_files = [f for f in os.listdir(output_dir) if f.endswith('.txt')]
    
    print(f"📦 Preparing {len(transcription_files)} transcription file(s) for download...")
    
    if len(transcription_files) == 1:
        # Single file - download directly
        file_path = os.path.join(output_dir, transcription_files[0])
        print(f"📄 Downloading: {transcription_files[0]}")
        files.download(file_path)
        
    else:
        # Multiple files - create ZIP
        zip_filename = "audio_transcriptions.zip"
        
        with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
            for filename in transcription_files:
                file_path = os.path.join(output_dir, filename)
                zipf.write(file_path, filename)
                print(f"📄 Added to ZIP: {filename}")
        
        print(f"\n📦 Downloading ZIP file: {zip_filename}")
        files.download(zip_filename)
    
    print("\n✅ Download complete!")
    
    # Optional: Download individual files
    if len(transcription_files) > 1:
        print("\n📁 Individual file downloads:")
        download_buttons = []
        
        for filename in transcription_files:
            button = widgets.Button(
                description=f"📄 {filename}",
                button_style='info',
                layout=widgets.Layout(width='400px', margin='2px')
            )
            
            def make_download_handler(fname):
                def handler(b):
                    files.download(os.path.join(output_dir, fname))
                return handler
            
            button.on_click(make_download_handler(filename))
            download_buttons.append(button)
        
        for button in download_buttons:
            display(button)
            
else:
    print("⚠️ No transcription files found to download.")
    print("Please run the transcription process first.")

## 🧹 Cleanup (Optional)

Clean up uploaded files and temporary data to free up space.

In [None]:
import shutil

print("🧹 Cleanup Options:")
print("This will remove files from the current session (they won't affect your downloads)")

cleanup_audio = widgets.Button(
    description='🗑️ Remove Uploaded Audio',
    button_style='warning',
    layout=widgets.Layout(width='200px', margin='5px')
)

cleanup_transcriptions = widgets.Button(
    description='🗑️ Remove Transcriptions',
    button_style='warning',
    layout=widgets.Layout(width='200px', margin='5px')
)

cleanup_all = widgets.Button(
    description='🗑️ Remove All Files',
    button_style='danger',
    layout=widgets.Layout(width='200px', margin='5px')
)

cleanup_output = widgets.Output()

def cleanup_audio_files(b):
    with cleanup_output:
        if os.path.exists("uploaded_audio"):
            shutil.rmtree("uploaded_audio")
            print("✅ Uploaded audio files removed")
        else:
            print("ℹ️ No uploaded audio files to remove")

def cleanup_transcription_files(b):
    with cleanup_output:
        if os.path.exists("transcriptions"):
            shutil.rmtree("transcriptions")
            print("✅ Transcription files removed")
        else:
            print("ℹ️ No transcription files to remove")

def cleanup_all_files(b):
    with cleanup_output:
        removed = []
        if os.path.exists("uploaded_audio"):
            shutil.rmtree("uploaded_audio")
            removed.append("uploaded audio")
        if os.path.exists("transcriptions"):
            shutil.rmtree("transcriptions")
            removed.append("transcriptions")
        if os.path.exists("audio_transcriptions.zip"):
            os.remove("audio_transcriptions.zip")
            removed.append("ZIP file")
        
        if removed:
            print(f"✅ Removed: {', '.join(removed)}")
        else:
            print("ℹ️ No files to remove")

cleanup_audio.on_click(cleanup_audio_files)
cleanup_transcriptions.on_click(cleanup_transcription_files)
cleanup_all.on_click(cleanup_all_files)

display(widgets.HBox([cleanup_audio, cleanup_transcriptions, cleanup_all]))
display(cleanup_output)

---

## 🎉 Congratulations!

You've successfully used the Audio Transcription notebook! Here's what you accomplished:

✅ **Uploaded audio files** in various formats  
✅ **Selected specialized prompts** for your content type  
✅ **Transcribed audio** using Google Gemini 2.5 Pro  
✅ **Downloaded results** as formatted text files  

### 🔄 Want to transcribe more files?
Simply run the upload cell again with new files, select your preferred prompt style, and run the transcription process.

### 🛠️ Need help or have suggestions?
Check out the [GitHub repository](https://github.com/fmadore/Audio-transcription).

### 🔐 Security Reminder
Your API key is only stored for this session and will be cleared when you close the notebook.

---
*Powered by Google Gemini 2.5 Pro • Made with ❤️ for Google Colab*