# Meeting Minutes Generator (STT with LLMs)
---

- üåç Task: Generate structured meeting minutes from audio recordings using Speech-to-Text (STT) and Large Language Models
- üß† Models:
    - AUDIO_MODEL: whisper1
    - LLM_MODEL: meta-llama/Meta-Llama-3.1-8B-Instruct
- üöÄ Tools: Python, Gradio UI, OpenAI / HuggingFace APIs
- üì§ Output: Structured meeting minutes in Markdown format with real-time streaming
- üßë‚Äçüíª Skill Level: Intermediate

üéØ How It Works
- 1Ô∏è‚É£ Upload a .mp3 meeting recording
- 2Ô∏è‚É£ Submit the audio to generate meeting minutes in text format

You can download some meetings from this link to test the code:
[https://www.rmofspringfield.ca/p/meeting-audio-files](https://www.rmofspringfield.ca/p/meeting-audio-files)


üõ†Ô∏è Requirements
- ‚öôÔ∏è Hardware: ‚úÖ GPU required (model download); Google Colab recommended (T4)
- üîë OpenAI API Key (used for whisper1 transcription)
- üîë Hugging Face Token (for the LLM model)

‚öôÔ∏è Customizable by user
- ü§ñ Selected model: AUDIO_MODEL / LLM_MODEL
- üìú system_prompt: Controls model behavior (concise, accurate, structured output)
- üí¨ user_prompt

---
üì¢ Find more LLM notebooks on my [GitHub repository](https://github.com/ramboorgadda/llm-engineering)

In [1]:
%pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.28.1 gradio

%pip install -U bitsandbytes

[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m59.1/59.1 MB[0m [31m14.3 MB/s[0m eta [36m0:00:00[0m


In [2]:
!pip install -U bitsandbytes



In [1]:
import torch
import threading
from openai import OpenAI
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline,BitsAndBytesConfig,TextIteratorStreamer
from huggingface_hub import login
from google.colab import userdata
import requests
import gradio as gr

In [2]:
AUDIO_MODEL="whisper-1"
LLM_MODEL="meta-llama/Meta-Llama-3.1-8B-Instruct"

In [3]:
hf_token=userdata.get("HF_TOKEN")
openai=OpenAI(api_key=userdata.get("OPENAI_API_KEY"))
login(hf_token,add_to_git_credential=True)

In [4]:
import torch
import threading
from openai import OpenAI
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline,BitsAndBytesConfig,TextIteratorStreamer
from huggingface_hub import login
from google.colab import userdata
import requests
import gradio as gr

class MeetingAssistant:

  def __init__(self,model_name=LLM_MODEL,audio_model=AUDIO_MODEL):

      quant_config=BitsAndBytesConfig(load_in_4bit=True, # Fixed: Load_in_4bit -> load_in_4bit
                                  bnb_4bit_use_double_quant=True,
                                  bnb_4bit_quant_type="nf4",
                                  bnb_4bit_compute_dtype=torch.bfloat16)

      self.tokenizer=AutoTokenizer.from_pretrained(model_name)
      self.model=AutoModelForCausalLM.from_pretrained(model_name,
                                                      quantization_config=quant_config,
                                                      device_map="auto")


  def transcribe_audio(self,audio_path,progress=None):
    """Transcribe the uploaded audi file using OpenI whisper"""
    if progress is None:
        progress = lambda x, desc: None # Dummy progress function
    print(f"[DEBUG] transcribe_audio called with path: {audio_path}")

    progress(0.3,"Transcribing Audio")
    try:
      with open(audio_path,"rb") as audio_file:
         transcript=openai.audio.transcriptions.create(model=AUDIO_MODEL,
                                            file=audio_file,
                                            response_format="text")
         print(f"[DEBUG] Transcription successful. Transcript length: {len(transcript) if transcript else 0}")
         print(f"[DEBUG] Transcript preview: {transcript[:200]}...") # Print first 200 chars
         return transcript
    except Exception as e:
      print(f"[ERROR] Error during transcription: {str(e)}")
      return f"Error during transcription: {str(e)}"

  def generate_minutes(self,transcription,progress=None):
    """Generate meeting minutes using LLM"""
    if progress is None:
        progress = lambda x, desc: None # Dummy progress function
    print(f"[DEBUG] generate_minutes called. Transcription length: {len(transcription) if transcription else 0}")
    progress(0.6,desc="Generating Minutes")
    system_message="You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown."
    user_prompt=f"Below is an extract transcript of a meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\n{transcription}"

    messages=[{"role":"system","content":system_message},
              {"role":"user","content":user_prompt}]
    print(f"[DEBUG] Messages sent to LLM: {messages}")

    inputs = self.tokenizer.apply_chat_template(messages,return_tensors="pt").to("cuda")
    streamer = TextIteratorStreamer(self.tokenizer)
    thread = threading.Thread(target=self.model.generate,kwargs={
                    "input_ids": inputs,
                    "max_new_tokens": 2000,
                    "streamer": streamer
    })
    thread.start()

    started = False
    full_response = ""
    for new_text in streamer:
      print(f"[DEBUG] Streamer received text chunk: {new_text}")
      if not started:
        if "<|start_header_id|>assistant<|end_header_id|>" in new_text:
          started = True
          new_text = new_text.split("<|start_header_id|>assistant<|end_header_id|>")[-1].strip()
      if started:
        if "<|eot_id|>" in new_text:
          new_text = new_text.replace("<|eot_id|>","")

        if new_text.strip():
          full_response += new_text
          yield full_response # Yield the accumulated text for Gradio streaming


  def process_meeting(self,audio_file_path,progress=None):
    """Handles the complete file of transcribing the file and generating the minutes."""
    if progress is None:
        progress = lambda x, desc: None # Dummy progress function
    print(f"[DEBUG] process_meeting called with audio_file_path: {audio_file_path}")
    progress(0.1,desc="Processing the audio file")

    if audio_file_path is None:
      print("[DEBUG] No audio file uploaded.")
      return "Please upload an audio file."

    try:
      # audio_file_path is now expected to be a string (the path)
      if audio_file_path.split(".")[-1] != "mp3":
        print(f"[DEBUG] Invalid file type: {audio_file_path.split('.')[-1]}")
        return "Please upload an mp3 file"

      transcription = self.transcribe_audio(audio_file_path,progress) # Pass the string path
      if "Error during transcription" in transcription:
          print(f"[ERROR] Transcription error encountered: {transcription}")
          yield transcription # Yield error message
          return
      print(f"[DEBUG] Transcription received in process_meeting. Length: {len(transcription) if transcription else 0}")

      minutes_generator = self.generate_minutes(transcription,progress)
      accumulated_text = ""
      try:
        for chunk in minutes_generator:
          accumulated_text = chunk # The generator now yields accumulated text
          print(f"[DEBUG] Yielding accumulated chunk to GradioInterface. Current length: {len(accumulated_text)}")
          yield accumulated_text
      except Exception as llm_e:
          print(f"[ERROR] Error during LLM generation streaming: {str(llm_e)}")
          yield f"Error during LLM generation streaming: {str(llm_e)}"

    except Exception as e:
      print(f"[ERROR] Error during overall processing: {str(e)}")
      return f"Error during overall processing: {str(e)}"

In [5]:
#Gradio Interface

class GradioInterface():
  def __init__(self):
    """ Initialize Gradio Interface to process audio files """

    self.assistant=MeetingAssistant()
    self.iface=gr.Interface(fn=self.process_audio,
                            inputs=gr.Audio(type="filepath", label="Upload MP3 File", format="mp3"),
                            outputs=gr.Markdown(label="Meeting Minutes",min_height=60),
                            title="AI Meeting Assistant",
                            description="Upload an audio file to transcribe and generate meeting minutes.",
                            flagging_mode="never")



  def process_audio(self,audio_file,progress=None):
    """Handle the userInput from Gradio Process the file and return the minutes of meeting"""
    print(f"[DEBUG] process_audio function called by Gradio.")
    if progress is None:
        progress = gr.Progress() # Instantiate gr.Progress() if not provided by Gradio
    minutes_generator = self.assistant.process_meeting(audio_file,progress)
    print(f"[DEBUG] After the process_meeting call, starting to iterate minutes_generator")
    try:
        for chunk in minutes_generator:
          print(f"[DEBUG] GradioInterface yielding chunk to UI. Length: {len(chunk)}")
          yield chunk
    except Exception as e:
        print(f"[ERROR] Error in process_audio yielding: {str(e)}")
        yield f"Error in Gradio interface: {str(e)}"

  def launch(self):
        """Launches the Gradio interface."""
        self.iface.launch(debug=True)


In [7]:
if __name__ == "__main__":
    app = GradioInterface()
    app.launch()

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://f7cc830a29bd4e5c68.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/fastapi/applications.py", line 1139, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.12/dist-packages/starlette/applications.py", line 107, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/errors.py",

[DEBUG] process_audio function called by Gradio.
[DEBUG] After the process_meeting call, starting to iterate minutes_generator
[DEBUG] process_meeting called with audio_file_path: /tmp/gradio/12785fa57f15a30a4344511967a197a0c9e00cd07df575d151d0c092824effc4/denver_extract.mp3
[DEBUG] transcribe_audio called with path: /tmp/gradio/12785fa57f15a30a4344511967a197a0c9e00cd07df575d151d0c092824effc4/denver_extract.mp3


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


[DEBUG] Transcription successful. Transcript length: 12515
[DEBUG] Transcript preview: and kind of the confluence of this whole idea of the confluence week, the merging of two rivers and as we've kind of seen recently in politics and in the world, there's a lot of situations where water...
[DEBUG] Transcription received in process_meeting. Length: 12515
[DEBUG] generate_minutes called. Transcription length: 12515
[DEBUG] Messages sent to LLM: [{'role': 'system', 'content': 'You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown.'}, {'role': 'user', 'content': "Below is an extract transcript of a meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\nand kind of the confluence of this whole idea of the confluence week, the merging of two rivers and as we've kind of seen recently 