<a href="https://colab.research.google.com/github/srinath-19/practice/blob/main/Week_3_Day_5_Meeting_Minutes_product.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Create meeting minutes from an Audio file

I downloaded some Denver City Council meeting minutes and selected a portion of the meeting for us to transcribe. You can download it here:  
https://drive.google.com/file/d/1N_kpSojRR5RYzupz6nqM8hMSoEF_R7pU/view?usp=sharing

If you'd rather work with the original data, the HuggingFace dataset is [here](https://huggingface.co/datasets/huuuyeah/meetingbank) and the audio can be downloaded [here](https://huggingface.co/datasets/huuuyeah/MeetingBank_Audio/tree/main).

The goal of this product is to use the Audio to generate meeting minutes, including actions.

For this project, you can either use the Denver meeting minutes, or you can record something of your own!


## Again - please note: pro-tip for using Colab:

**Pro-tip:**

In the middle of running a Colab, you might get an error like this:

> Runtime error: CUDA is required but not available for bitsandbytes. Please consider installing [...]

This is a super-misleading error message! Please don't try changing versions of packages...

This actually happens because Google has switched out your Colab runtime, perhaps because Google Colab was too busy. The solution is:

1. Kernel menu >> Disconnect and delete runtime
2. Reload the colab from fresh and Edit menu >> Clear All Outputs
3. Connect to a new T4 using the button at the top right
4. Select "View resources" from the menu on the top right to confirm you have a GPU
5. Rerun the cells in the colab, from the top down, starting with the pip installs

And all should work great - otherwise, ask me!

In [5]:
!pip install -q --upgrade bitsandbytes accelerate

In [6]:
# imports

import os
import requests
from IPython.display import Markdown, display, update_display
from openai import OpenAI
from google.colab import drive
from huggingface_hub import login
from google.colab import userdata
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig
import torch

In [7]:
# Constants

LLAMA = "meta-llama/Llama-3.2-3B-Instruct"

In [None]:
# New capability - connect this Colab to your Google Drive
# See immediately below this for instructions to obtain denver_extract.mp3
# Place the file on your drive in a folder called llms, and call it denver_extract.mp3

drive.mount("/content/drive")
audio_filename = "/content/drive/MyDrive/llms/denver_extract.mp3"

# Download denver_extract.mp3

You can either use the same file as me, the extract from Denver city council minutes, or you can try your own..

If you want to use the same as me, then please download my extract here, and put this on your Google Drive:  
https://drive.google.com/file/d/1N_kpSojRR5RYzupz6nqM8hMSoEF_R7pU/view?usp=sharing


In [9]:
# Sign in to HuggingFace Hub

hf_token = userdata.get('HF_TOKEN')
login(hf_token, add_to_git_credential=True)

# Open the file

# audio_file = open(audio_filename, "rb")

# STEP 1: Transcribe Audio

## Option 1: Use Open Source for Transcription - Hugging Face Pipelines

In [None]:
from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-medium.en",
    dtype=torch.float16,
    device='cuda',
    return_timestamps=True
)

result = pipe(audio_filename)
transcription = result["text"]
print(transcription)

In [None]:
open_source_transcription = transcription

## Option 2: Use OpenAI for Transcription

In [None]:
# Sign in to OpenAI using Secrets in Colab

AUDIO_MODEL = "gpt-4o-mini-transcribe"

openai_api_key = userdata.get('OPENAI_API_KEY')
openai = OpenAI(api_key=openai_api_key)
transcription = openai.audio.transcriptions.create(model=AUDIO_MODEL, file=audio_file, response_format="text")
print(transcription)

In [None]:
display(Markdown(open_source_transcription))
print("\n\n")
display(Markdown(transcription))

# STEP 2: Analyze & Report

In [13]:
system_message = """
You produce minutes of meetings from transcripts, with summary, key discussion points,
takeaways and action items with owners, in markdown format without code blocks.
"""

user_prompt = f"""
Below is an extract transcript of a Denver council meeting.
Please write minutes in markdown without code blocks, including:
- a summary with attendees, location and date
- discussion points
- takeaways
- action items with owners

Transcription:

"""

messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
  ]


In [10]:
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"
)

In [14]:
tokenizer = AutoTokenizer.from_pretrained(LLAMA)
tokenizer.pad_token = tokenizer.eos_token
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(LLAMA, device_map="auto", quantization_config=quant_config)
outputs = model.generate(inputs, max_new_tokens=2000, streamer=streamer)

config.json:   0%|          | 0.00/878 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/20.9k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.46G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 23 Jan 2026

You produce minutes of meetings from transcripts, with summary, key discussion points,
takeaways and action items with owners, in markdown format without code blocks.<|eot_id|><|start_header_id|>user<|end_header_id|>

Below is an extract transcript of a Denver council meeting.
Please write minutes in markdown without code blocks, including:
- a summary with attendees, location and date
- discussion points
- takeaways
- action items with owners

Transcription:<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Please provide the transcript extract of the Denver council meeting. I will then create the minutes in markdown format.<|eot_id|>


In [None]:
response = tokenizer.decode(outputs[0])

In [None]:
display(Markdown(response))

# Student contribution

Student Emad S. has made this powerful variation that uses `TextIteratorStreamer` to stream back results into a Gradio UI, and takes advantage of background threads for performance! I'm sharing it here if you'd like to take a look at some very interesting work. Thank you, Emad!

https://colab.research.google.com/drive/1Ja5zyniyJo5y8s1LKeCTSkB2xyDPOt6D

In [15]:
# imports
import gradio as gr, requests, json, time, os, torch
from transformers import pipeline, set_seed
from functools import partial
from openai import OpenAI, APIError, AuthenticationError
from google.colab import drive, userdata
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# Sample user_prompt = "a list of student profiles with full name, email, course studied, and GPA for each of 6 semesters, and a CGPA for the 6 semesters"

# Sign in to HuggingFace Hub
hf_token = userdata.get('HF_TOKEN')
login(hf_token, add_to_git_credential=True)

# Sign in to OpenAI using Secrets in Colab
openai_api_key = userdata.get('OPENAI_API_KEY')

# Initialize client
try:
    openai = OpenAI(api_key=openai_api_key)
except Exception as e:
    openai = None
    print(f"OpenAI client not initialized: {e}")

# Constants
GPT_MODEL = "gpt-3.5-turbo"

# Local Llama Model Setup
# Loads a Llama model from Hugging Face for local inference.
# Note: This requires a powerful GPU and specific library installations (e.g., bitsandbytes, accelerate).
LLAMA_MODEL = "meta-llama/Meta-Llama-3.1-8B-Instruct"

try:
    # Set up quantization config for efficient memory usage.
    # This loads the model in 4-bit precision, significantly reducing VRAM requirements.
    quant_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_quant_type="nf4"
    )

    # Load the tokenizer and model.
    tokenizer = AutoTokenizer.from_pretrained(LLAMA_MODEL)
    model = AutoModelForCausalLM.from_pretrained(
        LLAMA_MODEL,
        device_map="auto",
        quantization_config=quant_config,
        trust_remote_code=True
    )

    # Set the model to evaluation mode for inference.
    model.eval()

except Exception as e:
    model = None
    tokenizer = None
    print(f"Failed to load local Llama model: {e}")


def generate_with_llama(user_prompt: str, num_samples: int = 5):
    """
    Generates synthetic data using a local Llama model.
    Return a JSON string.
    """
    if not model or not tokenizer:
        return json.dumps({"error": "Llama model not loaded. Check model paths and hardware compatibility."}, indent=2)

    # Llama 3.1 uses a specific chat template for conversation formatting.
    messages = [
        {"role": "system", "content": f"You are a data generation assistant. Generate a JSON array of exactly {num_samples} objects based on the user's request. The output must be valid JSON only, without any other text or formatting."},
        {"role": "user", "content": user_prompt}
    ]

    try:
        inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")

        outputs = model.generate(inputs, max_new_tokens=2000, do_sample=True, top_p=0.9, temperature=0.7)

        # Decode the generated tokens.
        response_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

        # Extract only the assistant's part from the complete chat history.
        assistant_start = "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
        if assistant_start in response_text:
            response_text = response_text.split(assistant_start)[-1]

        # Parse the JSON and return it.
        parsed_json = json.loads(response_text)
        return json.dumps(parsed_json, indent=2)

    except Exception as e:
        return json.dumps({"error": f"An error occurred during local model generation: {e}"}, indent=2)



def generate_with_gpt(user_prompt: str, num_samples: int = 5):
    """
    Generates synthetic data using OpenAI's GPT.
    Return a JSON string.
    """
    if not openai:
        return json.dumps({"error": "OpenAI client not initialized. Please check your API key."}, indent=2)

    try:
        response = openai.chat.completions.create(
            model=GPT_MODEL,
            messages=[
                {"role": "system", "content": f"You are a data generation assistant. Generate a JSON array of exactly {num_samples} objects based on the user's request. The output must be valid JSON only, without any other text or formatting."},
                {"role": "user", "content": user_prompt}
            ],
            response_format={"type": "json_object"}
        )

        json_text = response.choices[0].message.content
        return json_text
    except APIError as e:
        return json.dumps({"error": f"Error from OpenAI API: {e.body}"}, indent=2)
    except Exception as e:
        return json.dumps({"error": f"An unexpected error occurred: {e}"}, indent=2)


def generate_data(user_prompt, model_choice):
    """
    Wrapper function that calls the appropriate generation function based on model choice.
    """
    if not user_prompt:
        return json.dumps({"error": "Please provide a description for the data."}, indent=2)

    if model_choice == f"Hugging Face ({LLAMA_MODEL})":
        return generate_with_llama(user_prompt)
    elif model_choice == f"OpenAI ({GPT_MODEL})":
        return generate_with_gpt(user_prompt)
    else:
        return json.dumps({"error": "Invalid model choice."}, indent=2)

# Gradio UI
with gr.Blocks(theme=gr.themes.Soft(), title="Synthetic Data Generator") as ui:
    gr.Markdown("# Synthetic Data Generator")
    gr.Markdown("Describe the type of data you need, select a model, and click 'Generate'.")

    with gr.Row():
        with gr.Column(scale=3):
            data_prompt = gr.Textbox(
                lines=5,
                label="Data Prompt",
                placeholder="e.g., a list of customer profiles with name, email, and a favorite product"
            )

        with gr.Column(scale=1):
            model_choice = gr.Radio(
                [f"Hugging Face ({LLAMA_MODEL})", f"OpenAI ({GPT_MODEL})"],
                label="Choose a Model",
                value=f"Hugging Face ({LLAMA_MODEL})"
            )

            generate_btn = gr.Button("Generate Data")

    with gr.Row():
        output_json = gr.JSON(label="Generated Data")

    # Click trigger
    generate_btn.click(
        fn=generate_data,
        inputs=[data_prompt, model_choice],
        outputs=output_json
    )

ui.launch(inbrowser=True, debug=True)


tokenizer_config.json:   0%|          | 0.00/55.4k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/855 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

  with gr.Blocks(theme=gr.themes.Soft(), title="Synthetic Data Generator") as ui:


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://8520ab05a8782aadac.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://8520ab05a8782aadac.gradio.live


