## AI Bot: Transcription → Notes → Summary (Gradio + Jupyter)

This notebook builds a simple, modular AI bot that:
- Records or uploads audio
- Transcribes the audio to text
- Generates detailed notes and topic-wise summaries using an LLM

We use open-source tools (`faster-whisper` for transcription, Hugging Face `transformers` for summarization) and optionally OpenAI for stronger summaries. Each section is modular in its own cell.

**Requirements**: Ensure FFmpeg is installed on your system for audio decoding. On Debian/Ubuntu:

```bash
sudo apt-get update && sudo apt-get install -y ffmpeg
```


### Setup / Imports
Install required packages and import libraries. If OpenAI is available and you set `OPENAI_API_KEY`, the summarizer can use it; otherwise it falls back to local models.

If running for the first time, uncomment the `%pip install` lines below to install dependencies.

In [None]:
# If running first time, uncomment to install
# %pip install -q gradio==4.* faster-whisper==1.* transformers==4.* torch --extra-index-url https://download.pytorch.org/whl/cpu
# Optional: OpenAI for summarization via API
# %pip install -q openai==1.* tiktoken==0.*

import os
from typing import Optional, Tuple, Dict, Union, List

import gradio as gr

# Transcription
from faster_whisper import WhisperModel

# Summarization (local)
from transformers import pipeline

# Optional: OpenAI
try:
    from openai import OpenAI  # requires openai>=1.0
    OPENAI_AVAILABLE = True
except Exception:
    OPENAI_AVAILABLE = False

OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')
USE_OPENAI = OPENAI_AVAILABLE and bool(OPENAI_API_KEY)
print('OpenAI enabled:', USE_OPENAI)


### Audio Recording / Upload
We will use Gradio's `Audio` component to record audio in the browser or upload files. We'll configure it to return a local file path which is then passed to the transcription function.

In [None]:
# Gradio Audio configuration
AUDIO_SOURCES = ['microphone', 'upload']
AUDIO_TYPE = 'filepath'  # Gradio will pass a file path for the audio


### Transcription Function (Whisper)
We use `faster-whisper` for CPU/GPU-friendly inference. Select a model size like `small`, `medium`, or `large-v3` via the `WHISPER_MODEL` environment variable. The function accepts a file path (from Gradio) or raw bytes and returns the transcribed text.

In [None]:
# Initialize a global Whisper model lazily to avoid repeated loads.
_WHISPER_MODEL_NAME = os.environ.get('WHISPER_MODEL', 'small')
_WHISPER_DEVICE = os.environ.get('WHISPER_DEVICE', 'cpu')  # 'cpu' or 'cuda'
_whisper_model: Optional[WhisperModel] = None

def get_whisper_model() -> WhisperModel:
    global _whisper_model
    if _whisper_model is None:
        compute_type = 'int8' if _WHISPER_DEVICE == 'cpu' else 'float16'
        _whisper_model = WhisperModel(_WHISPER_MODEL_NAME, device=_WHISPER_DEVICE, compute_type=compute_type)
    return _whisper_model

def transcribe_audio(audio_file_or_bytes: Union[bytes, str]) -> str:
    """Transcribe audio using faster-whisper. Accepts a file path or raw bytes.
    Returns the full transcript as a string.
    """
    model = get_whisper_model()

    import tempfile, os as _os
    tmp_path = None
    if isinstance(audio_file_or_bytes, (bytes, bytearray)):
        with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as tmp:
            tmp.write(audio_file_or_bytes)
            tmp_path = tmp.name
    else:
        tmp_path = audio_file_or_bytes

    segments, info = model.transcribe(tmp_path, beam_size=5, vad_filter=True)
    text_parts: List[str] = []
    for seg in segments:
        text_parts.append(seg.text)
    transcript = ' '.join(part.strip() for part in text_parts if part.strip())

    # Clean up temp file if created
    if isinstance(audio_file_or_bytes, (bytes, bytearray)) and tmp_path and _os.path.exists(tmp_path):
        try:
            _os.remove(tmp_path)
        except Exception:
            pass
    return transcript


### LLM Summary and Notes Function
Two backends are provided:
- Local: Hugging Face `transformers` summarization pipeline (default `facebook/bart-large-cnn`).
- OpenAI (optional): If `OPENAI_API_KEY` is set, use GPT for stronger summaries and topic breakdowns.

The function returns three strings:
- Raw transcription text
- Clean summarized notes
- Topic-wise breakdown (Markdown headings)

In [None]:
import re

_LOCAL_SUMMARY_MODEL = os.environ.get('SUMMARY_MODEL', 'facebook/bart-large-cnn')
_local_summarizer = None

def get_local_summarizer():
    global _local_summarizer
    if _local_summarizer is None:
        _local_summarizer = pipeline(
            'summarization',
            model=_LOCAL_SUMMARY_MODEL,
            tokenizer=_LOCAL_SUMMARY_MODEL,
            truncation=True
        )
    return _local_summarizer

_SYSTEM_PROMPT = (
    'You are an expert note-taker. Given a meeting transcript, '
    'produce: 1) concise bullet notes, 2) topic-wise breakdown with headings, '
    '3) explicit action items and decisions if present. Keep factual, clear.'
)

def summarize_with_openai(transcript: str) -> Tuple[str, str]:
    client = OpenAI()
    user_prompt = (
        'Transcript:\n\n' + transcript + '\n\n'
        'Return exactly two sections in Markdown with these exact headers:\n'
        'NOTES:\n- bullet points...\n\n'
        'TOPICS:\n### Title... with brief summaries.'
    )
    completion = client.chat.completions.create(
        model=os.environ.get('OPENAI_MODEL', 'gpt-4o-mini'),
        messages=[
            {'role': 'system', 'content': _SYSTEM_PROMPT},
            {'role': 'user', 'content': user_prompt},
        ],
        temperature=0.2,
    )
    content = completion.choices[0].message.content or ''
    m = re.search(r'(?is)NOTES\s*:\s*(.*?)\n\s*TOPICS\s*:\s*(.*)$', content, re.DOTALL)
    if m:
        notes = m.group(1).strip()
        topics = m.group(2).strip()
    else:
        notes = content.strip()
        topics = content.strip()
    return notes, topics

def summarize_with_local_model(transcript: str) -> Tuple[str, str]:
    summarizer = get_local_summarizer()
    # Chunk long transcripts to fit model limits (simple heuristic by characters)
    chunks: List[str] = []
    curr: List[str] = []
    curr_len = 0
    for token in transcript.split():
        tlen = len(token) + 1
        if curr_len + tlen > 3500:
            chunks.append(' '.join(curr))
            curr = [token]
            curr_len = tlen
        else:
            curr.append(token)
            curr_len += tlen
    if curr:
        chunks.append(' '.join(curr))

    summaries: List[str] = []
    for ch in chunks:
        out = summarizer(ch, max_length=256, min_length=64, do_sample=False)
        summaries.append(out[0]['summary_text'])

    combined = '\n\n'.join(summaries)

    # Build notes and topics using a simple template from combined summary
    notes_lines = [f'- {line.strip()}' for line in combined.split('. ') if line.strip()]
    notes = '\n'.join(notes_lines)

    # Naive topic split; real systems may use topic modeling
    topics = '\n\n'.join(
        [f'### Topic {i+1}\n{para.strip()}' for i, para in enumerate(combined.split('\n\n')) if para.strip()]
    )
    return notes, topics

def generate_notes_and_topics(transcript: str) -> Tuple[str, str, str]:
    if not transcript or not transcript.strip():
        return '', '', ''
    if USE_OPENAI:
        notes, topics = summarize_with_openai(transcript)
    else:
        notes, topics = summarize_with_local_model(transcript)
    return transcript, notes, topics


### Gradio Interface
A simple UI allowing you to record or upload audio, view the transcription, and generate notes plus a topic-wise breakdown.

In [None]:
def gradio_process(audio: Union[str, None]) -> Tuple[str, str, str]:
    """audio is a filepath from the upload/recorder or None.
    Returns (transcript, notes, topics).
    """
    if not audio:
        return '', '', ''
    transcript = transcribe_audio(audio)
    t, notes, topics = generate_notes_and_topics(transcript)
    return t, notes, topics

with gr.Blocks(title='Transcribe → Notes → Topics') as demo:
    gr.Markdown(
        '## Transcription → Notes → Topics\n'
        'Record or upload audio. The app will transcribe it and produce notes and a topic-wise breakdown.'
    )

    with gr.Row():
        audio_in = gr.Audio(sources=AUDIO_SOURCES, type=AUDIO_TYPE, label='Record or Upload Audio')
    btn = gr.Button('Transcribe and Summarize')

    with gr.Row():
        out_transcript = gr.Textbox(label='Transcript', lines=12)
    with gr.Row():
        out_notes = gr.Markdown()
    with gr.Row():
        out_topics = gr.Markdown()

    btn.click(fn=gradio_process, inputs=[audio_in], outputs=[
        out_transcript, out_notes, out_topics
    ])


### Demo: Launch the App
Run the cell below to launch the Gradio app. Set `share=True` to get a public URL.

In [None]:
# If running remotely, you can share via `share=True`.
demo.launch(share=False)
