# Create meeting minutes from an Audio file

I downloaded some Denver City Council meeting minutes and selected a portion of the meeting for us to transcribe. You can download it here:  
https://drive.google.com/file/d/1N_kpSojRR5RYzupz6nqM8hMSoEF_R7pU/view?usp=sharing

If you'd rather work with the original data, the HuggingFace dataset is [here](https://huggingface.co/datasets/huuuyeah/meetingbank) and the audio can be downloaded [here](https://huggingface.co/datasets/huuuyeah/MeetingBank_Audio/tree/main).

The goal of this product is to use the Audio to generate meeting minutes, including actions.

For this project, you can either use the Denver meeting minutes, or you can record something of your own!


## Again - please note: 2 important pro-tips for using Colab:

**Pro-tip 1:**

The top of every colab has some pip installs. You may receive errors from pip when you run this, such as:

> gcsfs 2025.3.2 requires fsspec==2025.3.2, but you have fsspec 2025.3.0 which is incompatible.

These pip compatibility errors can be safely ignored; and while it's tempting to try to fix them by changing version numbers, that will actually introduce real problems!

**Pro-tip 2:**

In the middle of running a Colab, you might get an error like this:

> Runtime error: CUDA is required but not available for bitsandbytes. Please consider installing [...]

This is a super-misleading error message! Please don't try changing versions of packages...

This actually happens because Google has switched out your Colab runtime, perhaps because Google Colab was too busy. The solution is:

1. Kernel menu >> Disconnect and delete runtime
2. Reload the colab from fresh and Edit menu >> Clear All Outputs
3. Connect to a new T4 using the button at the top right
4. Select "View resources" from the menu on the top right to confirm you have a GPU
5. Rerun the cells in the colab, from the top down, starting with the pip installs

And all should work great - otherwise, ask me!

In [4]:
!pip install -q --upgrade torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
!pip install -q requests bitsandbytes==0.46.0 transformers==4.48.3 accelerate==1.3.0 openai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m908.2/908.2 MB[0m [31m1.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.3/7.3 MB[0m [31m92.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m83.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m60.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m54.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m39.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━

In [1]:
# imports
from langchain_google_genai import ChatGoogleGenerativeAI
import os
import requests
from IPython.display import Markdown, display, update_display
from openai import OpenAI
from google.colab import drive
from huggingface_hub import login
from google.colab import userdata
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig
import torch

In [2]:
# Constants

AUDIO_MODEL = "whisper-1"
LLAMA = "meta-llama/Meta-Llama-3.1-8B-Instruct"

In [3]:
drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [30]:
from google.colab import userdata
api_key=userdata.get('GOOGLE_API_KEY')

In [6]:
!ls "/content/drive/MyDrive/Colab Notebooks"

 00_pytorch_fundamentals_vide.ipynb
'Copy of Copy of 😡🤢😱😊😐😔😲 Emotion Detection'
'Copy of 😡🤢😱😊😐😔😲 Emotion Detection'
'Copy of tool-calling-in-langchain.ipynb'
'Copy of tools-in-langchain.ipynb'
'Copy of Week 3 Day 3 - tokenizers.ipynb'
'Copy of Week 3 Day 4 - models.ipynb'
'Copy of Week 3 Day 5 - Meeting Minutes product.ipynb'
 denver_extract.mp3
 langchain-aam-zindagi.ipynb
 langchain-retrievers.ipynb
'next word predictor.ipynb'
 rag-using-langchain.ipynb
 tensor_ops.ipynb
 Tensors.ipynb
 Untitled
 Untitled0.ipynb
 Untitled1.ipynb
 Untitled2.ipynb
 Untitled3.ipynb
 vector-stores.ipynb
'week 3 day 2 - pipelines.ipynb'


In [13]:
# New capability - connect this Colab to my Google Drive
# See immediately below this for instructions to obtain denver_extract.mp3

drive.mount("/content/drive")
audio_filename = "/content/drive/MyDrive/Colab Notebooks/denver_extract.mp3"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [14]:
audio_filename

'/content/drive/MyDrive/Colab Notebooks/denver_extract.mp3'

# Download denver_extract.mp3

You can either use the same file as me, the extract from Denver city council minutes, or you can try your own..

If you want to use the same as me, then please download my extract here, and put this on your Google Drive:  
https://drive.google.com/file/d/1N_kpSojRR5RYzupz6nqM8hMSoEF_R7pU/view?usp=sharing


In [21]:
# Sign in to HuggingFace Hub

hf_token = userdata.get('HF_TOKEN')
login(hf_token, add_to_git_credential=True)

In [None]:
# Sign in to OpenAI using Secrets in Colab

openai_api_key = userdata.get('OPENAI_API_KEY')
openai = OpenAI(api_key=openai_api_key)

In [23]:
# Use the Whisper OpenAI model to convert the Audio to Text
# If you'd prefer to use an Open Source model, class student Youssef has contributed an open source version
# which I've added to the bottom of this colab

audio_file = open(audio_filename, "rb")
transcription = openai.audio.transcriptions.create(model=AUDIO_MODEL, file=audio_file, response_format="text")
print(transcription)

NameError: name 'openai' is not defined

In [37]:
system_message = "You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown."
user_prompt = f"Below is an extract transcript of a Denver council meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\n{transcribed_text}"

messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
  ]


In [36]:
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"
)

In [None]:
tokenizer = AutoTokenizer.from_pretrained(LLAMA)
tokenizer.pad_token = tokenizer.eos_token
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(LLAMA, device_map="auto", quantization_config=quant_config)
outputs = model.generate(inputs, max_new_tokens=2000, streamer=streamer)

In [None]:
response = tokenizer.decode(outputs[0])

In [None]:
display(Markdown(response))

# Student contribution

Student Emad S. has made this powerful variation that uses `TextIteratorStreamer` to stream back results into a Gradio UI, and takes advantage of background threads for performance! I'm sharing it here if you'd like to take a look at some very interesting work. Thank you, Emad!

https://colab.research.google.com/drive/1Ja5zyniyJo5y8s1LKeCTSkB2xyDPOt6D

## Alternative implementation

Class student Youssef has contributed this variation in which we use an open-source model to transcribe the meeting Audio.

Thank you Youssef!

In [None]:
AUDIO_MODEL = "openai/whisper-medium"
speech_model = AutoModelForSpeechSeq2Seq.from_pretrained(AUDIO_MODEL, torch_dtype=torch.float16, low_cpu_mem_usage=True, use_safetensors=True)
speech_model.to('cuda')
processor = AutoProcessor.from_pretrained(AUDIO_MODEL)

pipe = pipeline(
    "automatic-speech-recognition",
    model=speech_model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch.float16,
    device='cuda',
)

In [None]:
# Use the Whisper OpenAI model to convert the Audio to Text
result = pipe(audio_filename)

In [None]:
transcription = result["text"]
print(transcription)

Of course. Here is the equivalent code using the latest LangChain libraries.

The main changes involve replacing the manual `transformers` model loading and generation loop with LangChain's `HuggingFacePipeline` and using LangChain Expression Language (LCEL) to create a clear, composable chain.

### Key LangChain Concepts Used:

*   **`langchain-huggingface`**: Provides the `HuggingFacePipeline` class, a seamless way to integrate quantized Hugging Face models into LangChain.
*   **`langchain-openai`**: (Implicitly used) LangChain integrations for OpenAI models. The original script's direct use of the `openai` SDK for transcription is already efficient and clear, so we'll keep it.
*   **Message Objects**: Using `SystemMessage` and `HumanMessage` for cleaner and more structured prompt management.
*   **LCEL (LangChain Expression Language)**: The `|` (pipe) operator is used to chain components together. The flow `prompt | model | parser` is a standard and powerful LangChain pattern.
*   **`.stream()`**: The chain's `.stream()` method is the direct equivalent of using a `TextStreamer`, providing an iterable of output chunks for a real-time effect.


In [None]:
!pip install langchain_google_genai langchain_huggingface

In [3]:
# imports
import sys
import os
import json
from dotenv import load_dotenv
from openai import OpenAI
import gradio as gr
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage, ToolMessage
from langchain_core.tools import tool

In [16]:


### Updated LangChain Code
# 1. INSTALLS
# Make sure to install the updated langchain packages
# !pip install -q --upgrade torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
# !pip install -q requests bitsandbytes==0.46.0 transformers==4.48.3 accelerate==1.3.0
# !pip install -q openai langchain langchain-huggingface

# 2. IMPORTS
import os
import torch
from openai import OpenAI
from google.colab import drive, userdata
from huggingface_hub import login
from IPython.display import Markdown, display
from transformers import BitsAndBytesConfig

# LangChain specific imports
from langchain_huggingface import HuggingFacePipeline
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_core.output_parsers import StrOutputParser


# 3. CONSTANTS AND SETUP
AUDIO_MODEL = "whisper-1"
LLAMA = "meta-llama/Meta-Llama-3.1-8B-Instruct"

# 4. AUTHENTICATION
# Sign in to HuggingFace Hub
hf_token = userdata.get('HF_TOKEN')
login(hf_token, add_to_git_credential=True)


print("Transcription complete.")
print("-" * 20)
# print(transcription) # Uncomment to see the full transcript


# ==============================================================================
# 6. LANGCHAIN IMPLEMENTATION
# ==============================================================================

# Define the prompt using LangChain message objects
system_message = "You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown."
user_prompt = f"Below is an extract transcript of a Denver council meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\n{transcription}"

messages = [
    SystemMessage(content=system_message),
    HumanMessage(content=user_prompt)
]

# Define the quantization configuration
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"
)

# Create the LangChain HuggingFacePipeline
# This class handles loading the model and tokenizer, and setting up the text-generation pipeline
llm = HuggingFacePipeline.from_model_id(
    model_id=LLAMA,
    task="text-generation",
    model_kwargs={"quantization_config": quant_config},
    device_map="auto",
    pipeline_kwargs={"max_new_tokens": 2000},
)

# Create the LCEL chain by piping the components together
# The StrOutputParser ensures the output is a clean string
chain = llm | StrOutputParser()

# Stream the response and collect the full text
print("Generating meeting minutes...\n")
full_response = ""
for chunk in chain.stream(messages):
    print(chunk, end="", flush=True)
    full_response += chunk

print("\n\n" + "="*50 + "\nGeneration complete.\n" + "="*50)

# Display the final, complete response as formatted Markdown
display(Markdown(full_response))


# ==============================================================================
# The alternative open-source transcription part can remain the same
# as it uses the `transformers` library directly for a specific task.
# ==============================================================================

SecretNotFoundError: Secret OPENAI_API_KEY does not exist.

In [17]:
# @title
import base64
from io import BytesIO
from pydub import AudioSegment
from pydub.playback import play
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage
from langchain_google_genai import ChatGoogleGenerativeAI
from google.api_core.exceptions import ResourceExhausted, GoogleAPICallError
from IPython.display import Image, display, Audio, Markdown
import mimetypes

# Assuming key_utils.py exists and works
# from key_utils import get_next_key

import langchain_google_genai.chat_models as chat_mod

class LLMHandler:
    # --- Model Constants ---
    # Using modern, flexible models that support conversation history is recommended
    TEXT_MODEL = "gemini-2.0-flash"
    AUDIO_MODEL = "gemini-2.5-flash-preview-tts"
    IMAGE_MODEL = "gemini-2.0-flash-preview-image-generation"
    def __init__(self, system_message="You are a helpful assistant."):
        self.conversation_history = []
        self.system_message = system_message
        self._original_chat_with_retry = chat_mod._chat_with_retry

 # --- Patch retry ---
    def _patch_retry(self, enable_retry: False):
        """Enable or disable internal retry inside LangChain/Google API"""
        if not enable_retry:
            def no_retry_chat_with_retry(**kwargs):
                generation_method = kwargs.pop("generation_method")
                metadata = kwargs.pop("metadata", None)
                return generation_method(
                    request=kwargs.get("request"),
                    retry=None,
                    timeout=None,
                    metadata=metadata
                )
            chat_mod._chat_with_retry = no_retry_chat_with_retry
        else:
            chat_mod._chat_with_retry = self._original_chat_with_retry

        # --- Helper: extract base64 image ---
    def _get_image_base64(self, response):
        for block in response.content:
            if isinstance(block, dict) and "image_url" in block:
                return block["image_url"]["url"].split(",")[-1]
        return None



    # --- REFACTORED AND FIXED: Message Building ---
    def _build_messages(self, user_input, task_type="text"):
        """
        Correctly builds the message list for the API, preserving history for all task types.
        """
        messages = []

        # FIX: Only add system message for conversational tasks that support it.
        if task_type in ["text", "tool"]:
            messages.append(SystemMessage(content=self.system_message))

        # FIX: Convert the entire history correctly for the model.
        for h in self.conversation_history:
            role = h.get("role")
            content = h.get("content")
            if role == "user":
                messages.append(HumanMessage(content=content))
            elif role == "assistant":
                messages.append(AIMessage(content=content))

        # Always add the latest user input.
        messages.append(HumanMessage(content=user_input))
        return messages

    # --- REFACTORED AND FIXED: History Updating ---
    def _update_history(self, user_input, assistant_response_content):
        """
        Correctly updates the internal history with the latest turn.
        """
        self.conversation_history.append({"role": "user", "content": user_input})
        # FIX: Always store the actual content from the assistant.
        self.conversation_history.append({"role": "assistant", "content": assistant_response_content})

    def clear_history(self):
        """Helper to reset the conversation."""
        self.conversation_history = []
        print("Conversation history cleared.")


    # --- THE UNIFIED RUN METHOD ---
    def run(self, user_input, task_type="text", tools=None, max_retries=11, enable_retry=False,width=400,height=None):
        """
        A single, unified method to handle text, audio, image, and tool generation.
        This method RETURNS data instead of displaying it.
        """
        self._patch_retry(enable_retry)

                # 1. Select model
        model_name = {
            "text": self.TEXT_MODEL,
            "tool": self.TEXT_MODEL,
            "audio": self.AUDIO_MODEL,
            "image": self.IMAGE_MODEL,
            "audio_transcription": self.TEXT_MODEL, # Use the multimodal model for transcription
        }.get(task_type, self.TEXT_MODEL)

        # 2. Build messages (with a special case for transcription)
        if task_type == "audio_transcription":
            # For transcription, the user_input is already a fully formed HumanMessage
            messages = [user_input]
        else:
            messages = self._build_messages(user_input, task_type=task_type)

        # 3. Centralized Retry Loop
        for attempt in range(max_retries):
            # api_key, user_name = get_next_key()
            # print(f"➡️ Attempt {attempt + 1}/{max_retries} using key from '{user_name}'...")

            try:
                llm = ChatGoogleGenerativeAI(model=model_name, google_api_key=api_key)

                # 4. Handle different task types
                if task_type == "audio":
                    tts_message = [HumanMessage(content=user_input)] # Use only the direct input for TTS
                    response = llm.invoke(tts_message, generation_config={"response_modalities": ["AUDIO"]})
                    audio_bytes = response.additional_kwargs.get("audio")
                    self._update_history(user_input, "[Generated audio in response to prompt]")
                    return {"type": "audio", "data": audio_bytes, "text": response.content}

                elif task_type == "image":
                    # For images, we just need the user prompt
                    response = llm.invoke(messages, generation_config={"response_modalities": ["TEXT","IMAGE"]})
                    image_base64 = self._get_image_base64(response)
                    if image_base64:
                        display(Image(data=base64.b64decode(image_base64), width=width, height=height))
                        # Save only prompt for reference
                        self._update_history(user_input, "[Generated image in response to prompt]")
                        print( "image_generated")
                        return {"type": "image", "data": image_base64, "text": "Image generated successfully."}
                    else:
                        print("No image returned")

                else: # Handles "text" and "tool" and "audio_transcription"
                    model_to_invoke = llm
                    if tools:
                        model_to_invoke = llm.bind_tools(tools, tool_choice="any")

                    response = model_to_invoke.invoke(messages)

                    # Tool Execution Logic
                    if response.tool_calls:
                        messages.append(response)
                        for tool_call in response.tool_calls:
                            tool_name = tool_call["name"]
                            tool_args = tool_call["args"]
                            # Find the callable tool function
                            matched_tool_func = next((t for t in tools if getattr(t, '__name__', None) == tool_name), None)
                            if matched_tool_func:
                                result = matched_tool_func(**tool_args)
                                print(f"✅ [TOOL] Called '{tool_name}' with {tool_args}. Result: {result}")
                            else:
                                result = f"Error: Tool '{tool_name}' not found."
                                print(f"❌ [TOOL] {result}")
                            messages.append(ToolMessage(content=str(result), tool_call_id=tool_call["id"]))

                        # Call the model again with the tool results
                        response = llm.invoke(messages)
                     # Update history differently for transcription vs. other tasks
                    if task_type == "audio_transcription":
                        # History entry for the audio file itself
                        prompt_summary = f"[Transcription requested for an audio file]"
                        self._update_history(prompt_summary, response.content)
                    else:
                        self._update_history(user_input, response.content)


                    return {"type": "text", "data": response.content, "text": response.content}

            except (ResourceExhausted, GoogleAPICallError, ValueError) as e:
                print(f"⚠️ Attempt {attempt + 1} failed: {e.__class__.__name__} - {e}")
                if attempt + 1 >= max_retries:
                    raise RuntimeError("All API keys failed or quota exceeded.") from e
                continue # Try the next key

        raise RuntimeError("All API keys failed or quota exceeded after all attempts.")

        # --- NEW CONVENIENCE METHOD FOR AUDIO ---
    def generate_audio(self, user_input, play_audio=True):
        """
        A convenience method that generates audio and optionally plays it.
        This method acts as a wrapper around the main `run()` method.
        """
        print(f"--- Generating Audio for: '{user_input}' ---")
        response_dict = self.run(user_input, task_type="audio")

        if play_audio and response_dict and response_dict.get('type') == 'audio' and response_dict.get('data'):
            print("Audio generation successful. Playing audio...")
            try:
                audio_bytes = response_dict['data']
                audio_segment = AudioSegment.from_file(BytesIO(audio_bytes), format="wav")
                play(audio_segment)
                print("Playback finished.")
            except Exception as e:
                print(f"❌ Error playing audio: {e}")
        elif not response_dict or not response_dict.get('data'):
            print("❌ Audio generation failed or no audio data was returned.")
            return None

        return response_dict

        # --- NEW: GENERATE TEXT FROM AUDIO METHOD ---
    def generate_text_from_audio(self, file_name, enable_retry=False):
        """
        Transcribes an audio file into text using a multimodal model.

        Args:
            file_name (str): The path to the audio file (e.g., .mp3, .wav, .flac).
            enable_retry (bool): Whether to enable the built-in LangChain retry mechanism.

        Returns:
            str: The transcribed text from the audio, or None if an error occurred.
        """
        print(f"--- Transcribing Audio from: '{file_name}' ---")

        if not os.path.exists(file_name):
            print(f"❌ Error: Audio file not found at '{file_name}'")
            return None

        # 1. Automatically detect MIME type
        mime_type, _ = mimetypes.guess_type(file_name)
        if not mime_type or not mime_type.startswith("audio"):
            print(f"❌ Error: Could not determine a valid audio MIME type for '{file_name}'")
            return None

        print(f"Detected MIME type: {mime_type}")

        # 2. Read file and encode in base64
        with open(file_name, "rb") as audio_file:
            encoded_audio = base64.b64encode(audio_file.read()).decode("utf-8")

        # 3. Construct the special multimodal message
        message = HumanMessage(
            content=[
                {
                    "type": "text",
                    "text": "Transcribe the audio and provide the full text.",
                },
                {
                    "type": "media",
                    "data": encoded_audio,
                    "mime_type": mime_type,
                },
            ]
        )

        # 4. Call the main run method with the new task type
        response_dict = self.run(
            user_input=message,
            task_type="audio_transcription",
            enable_retry=enable_retry
        )

        if response_dict and response_dict.get('type') == 'text':
            return response_dict['text']
        else:
            print("❌ Audio transcription failed or no text was returned.")
            return None


In [None]:
# Now, use the handler to transcribe it
handler = LLMHandler()

In [22]:
# Now, use the handler to transcribe it

# Transcribe the audio file
transcribed_text = handler.generate_text_from_audio(audio_filename)

if transcribed_text:
    print("\n--- Transcription Result ---")
    print(transcribed_text)

        # Define the prompt using LangChain message objects
    system_message = "You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown."

    user_prompt = f"Below is an extract transcript of a Denver council meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\n{transcription}"

    texthandler = LLMHandler(system_message=system_message)

    # You can now use the transcribed text in a follow-up conversation
    print("\n--- Minutes of the Meetings are  ---")
    texthandler = LLMHandler(system_message=system_message)

    response = texthandler.run(user_prompt)
    print(response['text'])

# You can check the conversation history
# print(handler.conversation_history)

--- Transcribing Audio from: '/content/drive/MyDrive/Colab Notebooks/denver_extract.mp3' ---
Detected MIME type: audio/mpeg

--- Transcription Result ---
and kind of the confluence of this whole idea of confluence week, the merging of two rivers. And, uh, as we've kind of seen recently in politics and in the world, there's a lot of, um, situations where water is very important right now, and it's a very big issue. So that is the reason that the back of the logo is considered water. So let you see the creation of the logo here. And yeah. So that basically, um, kind of sums up the reason behind the logo and the, um, all the meanings behind the symbolism. And, uh, I'll, you'll hear a little bit more about our Confluence Week is basically highlighting all of these indigenous events and things that are happening around Denver so that we can kind of bring more people together and kind of share this whole idea of Indigenous People's Day. So, thank you. Thank you so much, and thanks for your l

In [23]:
display(Markdown(response['text']))

## Denver City Council Meeting Minutes - October 9, 2017

**Summary:**

The Denver City Council convened on October 9, 2017, for a regular meeting. Key items included the adoption of Proclamation 1127, observing Indigenous People's Day in the City and County of Denver. Council members discussed the importance of recognizing the contributions of indigenous peoples, celebrating inclusivity, and addressing ongoing challenges faced by the Native American community. The meeting included announcements and presentations.

**Attendees:**

*   Councilman Black
*   Councilman Clark
*   Councilman Espinosa
*   Councilman Flynn
*   Councilman Gilmore
*   Councilman Cashman
*   Councilman Kenech
*   Councilman Lopez
*   Councilman New
*   Councilwoman Ortega
*   Councilman Susman
*   Mr. President

**Location:** Denver City Council Chambers

**Date:** October 9, 2017

### Discussion Points

*   **Indigenous People's Day Proclamation:**
    *   Councilman Lopez read Proclamation 1127, series of 2017, which officially designates the second Monday of October as Indigenous People's Day in Denver.
    *   The proclamation recognizes the historical and contemporary contributions of indigenous people to the city.
    *   Council members emphasized the importance of celebrating indigenous culture out of pride and respect, not out of spite or contempt for other cultures.
    *   Councilman Lopez highlighted the need to address the challenges faced by Native Americans in Denver, including poverty, lack of access to services, and homelessness.
    *   Councilwoman Ortega emphasized the importance of cultural preservation and language maintenance within the Native American community.
    *   Councilwoman Kenech expressed appreciation for the role of Native American peoples in defending public lands and highlighted the intersection of cultural preservation and environmental protection.
    *   Mr. President emphasized that Indigenous People's Day promotes inclusivity and respects those whose history has been silenced.
*   **Confluence Week:**
    *   Confluence week is highlighting all of these indigenous events and things that are happening around Denver so that we can kind of bring more people together and kind of share this whole idea of indigenous people's day.
*   **Halloween Parade Announcement:**
    *   Councilman Clark announced the first-ever Halloween parade on Broadway in Lucky District 7, scheduled for October 21st at 6:00 p.m.

### Takeaways

*   Denver City Council is committed to recognizing and honoring the contributions of indigenous people to the city's history, culture, and future.
*   Indigenous People's Day is an opportunity to celebrate inclusivity, promote education, and address the challenges faced by the Native American community.
*   Cultural preservation, language maintenance, and environmental protection are important aspects of supporting indigenous communities.
*   The Council acknowledged the importance of art and symbolism in conveying the power of culture.

### Action Items

*   **Council Members:** Support the initiatives and programs aimed at addressing the needs of the Native American community in Denver. (Ongoing)
    *   Owner: All Council Members
*   **Council Members:** Attend and promote the Broadway Halloween Parade on October 21st.
    *   Owner: Councilman Clark, All Council Members
*   **Clerk of the City and County of Denver:** Transmit a copy of Proclamation 1127 to the Denver American Indian Commission, the City and County of Denver School District number one, and the Colorado Commission on Indian Affairs. (Completed)
    *   Owner: Clerk of the City and County of Denver

In [19]:
audio_filename

'/content/drive/MyDrive/Colab Notebooks/denver_extract.mp3'

In [None]:


### Updated LangChain Code
# 1. INSTALLS
# Make sure to install the updated langchain packages
# !pip install -q --upgrade torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
# !pip install -q requests bitsandbytes==0.46.0 transformers==4.48.3 accelerate==1.3.0
# !pip install -q openai langchain langchain-huggingface

# 2. IMPORTS
import os
import torch
from openai import OpenAI
from google.colab import drive, userdata
from huggingface_hub import login
from IPython.display import Markdown, display
from transformers import BitsAndBytesConfig

# LangChain specific imports
from langchain_huggingface import HuggingFacePipeline
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_core.output_parsers import StrOutputParser


# 3. CONSTANTS AND SETUP
AUDIO_MODEL = "whisper-1"
LLAMA = "meta-llama/Meta-Llama-3.1-8B-Instruct"



# 4. AUTHENTICATION
# Sign in to HuggingFace Hub
hf_token = userdata.get('HF_TOKEN')
login(hf_token, add_to_git_credential=True)



# Transcribe the audio file
transcription = handler.generate_text_from_audio(audio_filename)
print("Transcription complete.")
print("-" * 20)
print(transcription) # Uncomment to see the full transcript


# ==============================================================================
# 6. LANGCHAIN IMPLEMENTATION
# ==============================================================================

# Define the prompt using LangChain message objects
system_message = "You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown."
user_prompt = f"Below is an extract transcript of a Denver council meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\n{transcription}"

messages = [
    SystemMessage(content=system_message),
    HumanMessage(content=user_prompt)
]

# Define the quantization configuration
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"
)

# Create the LangChain HuggingFacePipeline
# This class handles loading the model and tokenizer, and setting up the text-generation pipeline
llm = HuggingFacePipeline.from_model_id(
    model_id=LLAMA,
    task="text-generation",
    model_kwargs={"quantization_config": quant_config},
    device_map="auto",
    pipeline_kwargs={"max_new_tokens": 2000},
)

# Create the LCEL chain by piping the components together
# The StrOutputParser ensures the output is a clean string
chain = llm | StrOutputParser()

# Stream the response and collect the full text
print("Generating meeting minutes...\n")
full_response = ""
for chunk in chain.stream(messages):
    print(chunk, end="", flush=True)
    full_response += chunk

print("\n\n" + "="*50 + "\nGeneration complete.\n" + "="*50)

# Display the final, complete response as formatted Markdown
display(Markdown(full_response))


# ==============================================================================
# The alternative open-source transcription part can remain the same
# as it uses the `transformers` library directly for a specific task.
# ==============================================================================

Of course. This is an excellent example of using a cutting-edge, asynchronous API.

Since Google's Veo video generation is a long-running, asynchronous task that requires polling, it doesn't fit the standard synchronous `invoke` or `stream` pattern of most LangChain models.

The best and most "LangChain-idiomatic" way to handle this is to create a **custom LangChain Runnable**. This encapsulates the entire asynchronous workflow (initiation, polling, and result extraction) into a single, chainable component. This makes your complex workflow clean, reusable, and compatible with the LangChain Expression Language (LCEL).

Here is the equivalent LangChain code, structured within a custom runnable class.

### Key LangChain Concepts Used:

*   **`langchain_core.runnables.RunnableSerializable`**: The base class for creating custom, chainable components in LCEL. Our new `GoogleVeoVideoGenerator` will inherit from this.
*   **`.invoke()` method**: The core method of our runnable. It will contain the logic to start the video generation and poll for its completion.
*   **Encapsulation**: We hide the complex `while` loop and API calls inside our runnable, exposing a simple interface that just takes a prompt and returns a result.

---

### LangChain Equivalent Code with Custom Runnable

```python
import os
import time
from typing import Any, Dict, Optional

# Make sure to install the necessary libraries
# !pip install -q langchain google-generativeai langchain-google-genai

import google.generativeai as genai
from google.generativeai.client import File, Operation
from google.generativeai.types.generative_models import GeneratedVideo
from langchain_core.runnables import RunnableSerializable, RunnableConfig
from pydantic.v1 import Field, PrivateAttr

# --- Step 1: Configure your API Key ---
# Make sure to set your Google API Key in your environment or Colab secrets
# from google.colab import userdata
# GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
# genai.configure(api_key=GOOGLE_API_KEY)

# A placeholder for the actual API key setup
if "GOOGLE_API_KEY" not in os.environ:
    print("⚠️ GOOGLE_API_KEY environment variable not set.")
    # You would typically exit or raise an error here
else:
    genai.configure(api_key=os.environ["GOOGLE_API_KEY"])


# --- Step 2: Create a Custom LangChain Runnable for Veo ---

class GoogleVeoVideoGenerator(RunnableSerializable):
    """
    A custom LangChain Runnable to generate videos using Google's Veo model.
    
    This runnable encapsulates the asynchronous workflow of:
    1. Starting the video generation task.
    2. Polling the operation until the video is ready.
    3. Returning the final generated video object.
    """
    model: str = "veo-3.0-generate-001"
    polling_interval_seconds: int = 10
    
    # Use PrivateAttr for the client to avoid serialization issues
    _client: Any = PrivateAttr()

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self._client = genai.Client()

    def invoke(self, input: Dict, config: Optional[RunnableConfig] = None) -> Optional[GeneratedVideo]:
        """
        Takes a dictionary with a 'prompt' key and returns a GeneratedVideo object.
        """
        prompt = input.get("prompt")
        if not prompt:
            raise ValueError("Input dictionary must contain a 'prompt' key.")

        print(f"▶️ Starting video generation for prompt: '{prompt[:50]}...'")
        
        # 1. Start the asynchronous generation operation
        operation = self._client.models.generate_videos(
            model=self.model,
            prompt=prompt,
        )

        print(f"⏳ Operation initiated. Polling every {self.polling_interval_seconds} seconds...")

        # 2. Poll the operation status until the video is ready
        while not operation.done:
            print("   - Waiting for video to complete...")
            time.sleep(self.polling_interval_seconds)
            operation = self._client.operations.get(operation.name)

        # 3. Check for errors and get the result
        if operation.error:
            print(f"❌ Video generation failed with error: {operation.error.message}")
            return None
            
        if not operation.response or not operation.response.generated_videos:
            print("❌ Operation completed but no video was returned.")
            return None

        print("✅ Video generation complete!")
        
        # 4. Return the generated video object
        return operation.response.generated_videos[0]


# --- Step 3: Create a Helper Function to Save the Video ---
# This separates the generation logic from the saving logic.

def save_video_from_response(video_object: GeneratedVideo, output_filename: str):
    """Saves the generated video to a local file."""
    if not video_object:
        print("Cannot save video, the generation result is empty.")
        return
        
    print(f"💾 Downloading and saving generated video to '{output_filename}'...")
    try:
        # The video object has a 'video' attribute which is a File object
        video_file = video_object.video
        video_file.save(output_filename)
        print(f"✅ Video successfully saved to '{output_filename}'")
    except Exception as e:
        print(f"❌ An error occurred while saving the video: {e}")


# --- Step 4: Execute the Workflow ---

# The prompt from your example
prompt = """A close up of two people staring at a cryptic drawing on a wall, torchlight flickering.
A man murmurs, 'This must be it. That's the secret code.' The woman looks at him and whispering excitedly, 'What did you find?'"""

# Instantiate our custom LangChain runnable
video_generator_chain = GoogleVeoVideoGenerator()

# Invoke the chain. This single call will handle the start, polling, and completion.
# The input is a dictionary, which is standard for LCEL.
generated_video_result = video_generator_chain.invoke({"prompt": prompt})

# Use our helper to save the result
if generated_video_result:
    save_video_from_response(generated_video_result, "dialogue_example_langchain.mp4")

```

### How This LangChain Version Improves on the Original:

1.  **Reusability**: The `GoogleVeoVideoGenerator` is now a reusable component. You can plug it into any LangChain chain with the `|` operator, just like any other LLM.
2.  **Abstraction**: It hides the messy details of the `while` loop. The end-user only needs to call `.invoke()` with a prompt, making the main script much cleaner and easier to read.
3.  **LCEL Compatibility**: As a `Runnable`, you can now chain it with other components. For example, you could have a `ChatGoogleGenerativeAI` model generate a creative prompt first, and then pipe its output directly into `video_generator_chain`.
4.  **Separation of Concerns**: The logic to *generate* the video is cleanly separated from the logic to *save* the video, which is a better software design practice.