# Meeting Minutes Generator with Gradio Interface

This notebook creates a user-friendly Gradio interface for uploading audio files and generating meeting minutes automatically. It uses:
- OpenAI's Whisper model for audio transcription
- Local LLM (Phi-3) for generating structured meeting minutes
- Gradio for the web interface

In [None]:
# Install required packages
!pip install -q --upgrade torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
!pip install -q requests bitsandbytes==0.46.0 transformers==4.48.3 accelerate==1.3.0 openai gradio

In [None]:
# Import required libraries
import os
import io
import tempfile
import gradio as gr
import requests
from IPython.display import Markdown, display, update_display
from openai import OpenAI
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig
import torch
from pathlib import Path
import shutil
import warnings
warnings.filterwarnings("ignore")

In [None]:
# Constants
AUDIO_MODEL = "whisper-1"
LLAMA = "meta-llama/Meta-Llama-3.1-8B-Instruct"
PHI3 = "microsoft/Phi-3-mini-4k-instruct"

In [None]:
# Setup authentication - replace with your own API keys
# For Colab, use userdata.get('KEY_NAME')
# For local Jupyter, use os.getenv('KEY_NAME') or input directly

try:
    # Try Colab first
    from google.colab import userdata
    from huggingface_hub import login
    
    hf_token = userdata.get('HF_TOKEN')
    openai_api_key = userdata.get('OPENAI_API_KEY')
    
    login(hf_token, add_to_git_credential=True)
    print("Using Colab authentication")
except ImportError:
    # Local environment
    hf_token = os.getenv('HF_TOKEN')
    openai_api_key = os.getenv('OPENAI_API_KEY')
    
    if hf_token:
        from huggingface_hub import login
        login(hf_token, add_to_git_credential=True)
    
    print("Using local environment authentication")

# Initialize OpenAI client
if openai_api_key:
    openai_client = OpenAI(api_key=openai_api_key)
    print("OpenAI client initialized successfully")
else:
    print("Warning: OpenAI API key not found. Audio transcription will not work.")

In [None]:
# Setup quantization configuration for efficient model loading
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"
)

In [None]:
# Load the language model and tokenizer
print("Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(PHI3)
tokenizer.pad_token = tokenizer.eos_token

print("Loading model...")
model = AutoModelForCausalLM.from_pretrained(
    PHI3, 
    device_map="auto", 
    quantization_config=quant_config
)

print("Model loaded successfully!")

In [None]:
def transcribe_audio(audio_file):
    """
    Transcribe audio file using OpenAI's Whisper model
    
    Args:
        audio_file: File path or file-like object
    
    Returns:
        str: Transcribed text
    """
    try:
        if not openai_client:
            return "Error: OpenAI API key not configured"
        
        # Handle different input types
        if isinstance(audio_file, str):
            # File path
            with open(audio_file, "rb") as f:
                transcription = openai_client.audio.transcriptions.create(
                    model=AUDIO_MODEL,
                    file=f,
                    response_format="text"
                )
        else:
            # File-like object (from Gradio)
            with open(audio_file.name, "rb") as f:
                transcription = openai_client.audio.transcriptions.create(
                    model=AUDIO_MODEL,
                    file=f,
                    response_format="text"
                )
        
        return transcription
    
    except Exception as e:
        return f"Error transcribing audio: {str(e)}"

In [None]:
def generate_meeting_minutes(transcription):
    """
    Generate structured meeting minutes from transcription using local LLM
    
    Args:
        transcription (str): Audio transcription text
    
    Returns:
        str: Generated meeting minutes in markdown format
    """
    try:
        system_message = "You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown."
        
        user_prompt = f"""Below is a transcript of a meeting. Please write professional meeting minutes in markdown format, including:
        
        1. **Meeting Summary** - Date, attendees (if mentioned), location/platform
        2. **Key Discussion Points** - Main topics covered
        3. **Decisions Made** - Important decisions reached
        4. **Action Items** - Tasks assigned with owners (if mentioned)
        5. **Next Steps** - Follow-up actions or next meeting
        
        Transcript:
        {transcription}"""
        
        messages = [
            {"role": "system", "content": system_message},
            {"role": "user", "content": user_prompt}
        ]
        
        # Prepare input for the model
        inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
        
        # Generate response
        with torch.no_grad():
            outputs = model.generate(
                inputs, 
                max_new_tokens=2000,
                do_sample=True,
                temperature=0.7,
                pad_token_id=tokenizer.eos_token_id
            )
        
        # Decode the response
        full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        
        # Extract only the assistant's response (remove the input prompt)
        # Find the last assistant response
        if "<|assistant|>" in full_response:
            response = full_response.split("<|assistant|>")[-1].strip()
        else:
            # Fallback: try to extract after the user prompt
            response = full_response[full_response.find(user_prompt) + len(user_prompt):].strip()
        
        return response
    
    except Exception as e:
        return f"Error generating meeting minutes: {str(e)}"

In [None]:
def process_audio_file(audio_file, progress=gr.Progress()):
    """
    Main function to process audio file and generate meeting minutes
    
    Args:
        audio_file: Uploaded audio file from Gradio
        progress: Gradio progress indicator
    
    Returns:
        tuple: (transcription, meeting_minutes)
    """
    if audio_file is None:
        return "No audio file uploaded.", "Please upload an audio file first."
    
    try:
        progress(0.1, desc="Processing audio file...")
        
        # Step 1: Transcribe audio
        progress(0.3, desc="Transcribing audio...")
        transcription = transcribe_audio(audio_file)
        
        if transcription.startswith("Error"):
            return transcription, "Transcription failed - cannot generate minutes."
        
        progress(0.6, desc="Generating meeting minutes...")
        
        # Step 2: Generate meeting minutes
        meeting_minutes = generate_meeting_minutes(transcription)
        
        progress(1.0, desc="Complete!")
        
        return transcription, meeting_minutes
    
    except Exception as e:
        error_msg = f"Error processing audio: {str(e)}"
        return error_msg, error_msg

In [None]:
# Create Gradio interface
def create_interface():
    """
    Create and configure the Gradio interface
    """
    
    # Custom CSS for better styling
    custom_css = """
    .gradio-container {
        max-width: 1200px !important;
        margin: auto !important;
    }
    .tab-nav button {
        font-size: 16px !important;
        padding: 12px 24px !important;
    }
    .markdown-text {
        font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif !important;
    }
    """
    
    with gr.Blocks(css=custom_css, title="Meeting Minutes Generator") as interface:
        gr.Markdown(
            """
            # 🎤 Meeting Minutes Generator
            
            Upload an audio file from your meeting, and this tool will:
            1. **Transcribe** the audio using OpenAI's Whisper model
            2. **Generate** structured meeting minutes using AI
            
            **Supported formats**: MP3, WAV, M4A, MP4, WEBM, and more
            """
        )
        
        with gr.Row():
            with gr.Column(scale=1):
                # Audio upload section
                gr.Markdown("### 📁 Upload Audio File")
                audio_input = gr.File(
                    label="Select your meeting audio file",
                    file_types=["audio"],
                    file_count="single"
                )
                
                process_btn = gr.Button(
                    "🚀 Process Audio & Generate Minutes",
                    variant="primary",
                    size="lg"
                )
                
                # Status indicator
                status = gr.Textbox(
                    label="Status",
                    value="Ready to process audio file",
                    interactive=False
                )
        
        # Results section
        gr.Markdown("### 📄 Results")
        
        with gr.Tabs():
            with gr.Tab("📝 Meeting Minutes"):
                minutes_output = gr.Markdown(
                    label="Generated Meeting Minutes",
                    value="Meeting minutes will appear here after processing...",
                    elem_classes=["markdown-text"]
                )
                
                # Download button for minutes
                download_minutes = gr.File(
                    label="Download Meeting Minutes",
                    visible=False
                )
            
            with gr.Tab("📄 Full Transcription"):
                transcription_output = gr.Textbox(
                    label="Audio Transcription",
                    value="Transcription will appear here after processing...",
                    lines=15,
                    max_lines=20,
                    interactive=False
                )
        
        # Event handlers
        def update_status(audio_file):
            if audio_file is None:
                return "Please select an audio file"
            return f"Audio file loaded: {audio_file.name if hasattr(audio_file, 'name') else 'file.audio'} - Ready to process"
        
        def process_and_save(audio_file, progress=gr.Progress()):
            """Process audio and return results with download file"""
            transcription, minutes = process_audio_file(audio_file, progress)
            
            # Create download file if minutes were generated successfully
            download_file = None
            if minutes and not minutes.startswith("Meeting minutes will appear") and not minutes.startswith("Error"):
                temp_file = tempfile.NamedTemporaryFile(mode='w', suffix='.md', delete=False)
                temp_file.write(minutes)
                temp_file.close()
                download_file = temp_file.name
            
            return transcription, minutes, download_file
        
        # Wire up the interface
        audio_input.change(
            fn=update_status,
            inputs=[audio_input],
            outputs=[status]
        )
        
        process_btn.click(
            fn=process_and_save,
            inputs=[audio_input],
            outputs=[transcription_output, minutes_output, download_minutes],
            show_progress=True
        )
        
        # Add examples section
        gr.Markdown(
            """
            ### 💡 Tips for Best Results
            
            - **Audio Quality**: Use clear, high-quality audio recordings
            - **File Size**: Large files may take longer to process
            - **Content**: Works best with structured meetings and clear speech
            - **Languages**: Optimized for English, but supports multiple languages
            
            ### 🔧 Technical Details
            - **Transcription**: OpenAI Whisper model
            - **Minutes Generation**: Microsoft Phi-3 local model
            - **Privacy**: Audio processing happens on this server
            """
        )
    
    return interface

In [None]:
# Create and launch the Gradio interface
print("Creating Gradio interface...")
demo = create_interface()

print("Launching interface...")
# Launch the interface
demo.launch(
    share=True,  # Creates a public link for sharing
    debug=True,  # Enable debug mode for development
    server_name="0.0.0.0",  # Allow external access
    server_port=7860,  # Default Gradio port
    show_error=True  # Show detailed error messages
)

## 📋 Usage Instructions

1. **Upload Audio**: Click on the file upload area and select your meeting audio file
2. **Process**: Click the "Process Audio & Generate Minutes" button
3. **View Results**: Switch between the "Meeting Minutes" and "Full Transcription" tabs
4. **Download**: Use the download button to save your meeting minutes as a markdown file

## 🚀 Features

- **Easy Upload**: Drag and drop or browse for audio files
- **Progress Tracking**: Real-time progress updates during processing
- **Professional Output**: Structured meeting minutes with key sections
- **Download Option**: Save results as markdown files
- **Multiple Formats**: Supports various audio formats (MP3, WAV, M4A, etc.)

## 🔧 Customization

You can customize this notebook by:

- **Changing Models**: Replace PHI3 with other language models
- **Modifying Prompts**: Edit the system message and user prompts for different output styles
- **Adding Features**: Include speaker identification, sentiment analysis, etc.
- **Styling**: Modify the CSS for different visual themes

## 🛠️ Troubleshooting

### Common Issues:

1. **"OpenAI API key not configured"**
   - Make sure to set your OpenAI API key in the environment variables
   - For Colab: Add it to Secrets as 'OPENAI_API_KEY'
   - For local: Set environment variable or modify the authentication cell

2. **"CUDA out of memory"**
   - Try using a smaller model or reduce batch size
   - Restart the runtime and run cells again

3. **"File format not supported"**
   - Convert your audio to MP3, WAV, or M4A format
   - Use online converters or audio editing software

4. **"Model loading failed"**
   - Check your HuggingFace token permissions
   - Ensure you have enough disk space
   - Try restarting the runtime

### Performance Tips:
- Use shorter audio files (< 1 hour) for faster processing
- Ensure good audio quality for better transcription
- Close other applications to free up GPU memory