<a href="https://colab.research.google.com/github/michaelawe01/Automation_LPS/blob/main/LMS_VBU.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **LMS-VBU prototype**

In [1]:
!pip install transformers torch librosa

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5

In [2]:
import torch
import librosa
import numpy as np
import pandas as pd
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq

# Load model and processor
model_id = "openai/whisper-large-v3-turbo"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32

processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, torch_dtype=dtype)
model.to(device)

# Audio config
audio_path = "audio.wav"
target_sr = 16000
chunk_duration_sec = 30
chunk_samples = chunk_duration_sec * target_sr

# Load and resample audio
audio, sr = librosa.load(audio_path, sr=target_sr)
chunks = [audio[i:i + chunk_samples] for i in range(0, len(audio), chunk_samples)]

# Store results
data = []

for i, chunk in enumerate(chunks):
    inputs = processor(chunk, sampling_rate=target_sr, return_tensors="pt")
    input_features = inputs["input_features"].to(device=device, dtype=dtype)

    with torch.no_grad():
        generated_ids = model.generate(input_features, max_new_tokens=444)
        transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

    # Add to dataframe with 30 char limit
    start_time = i * chunk_duration_sec
    end_time = min((i + 1) * chunk_duration_sec, len(audio) / target_sr)
    data.append({
        "Start Time (s)": round(start_time, 2),
        "End Time (s)": round(end_time, 2),
        "Transcript": transcription[:30]  # Limit to 30 characters
    })

# Create DataFrame
df = pd.DataFrame(data)
df


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


preprocessor_config.json:   0%|          | 0.00/340 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/283k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.71M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/34.6k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.19k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.26k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.62G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/3.77k [00:00<?, ?B/s]

Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Unnamed: 0,Start Time (s),End Time (s),Transcript
0,0,30.0,"So in a way, we have to disti"
1,30,60.0,she said that technical elect
2,60,90.0,"If I go to the system, I will"
3,90,120.0,it should be two
4,120,150.0,This is another concern which
5,150,180.0,So when they said that I shou
6,180,210.0,They put BIS in the first pla
7,210,240.0,yeah so you can put one or is
8,240,270.0,"Okay, so let's move ahead. I"
9,270,300.0,I've created two workers but


In [3]:
import torch
import librosa
import numpy as np
import pandas as pd
from datasets import Dataset
from transformers import pipeline, AutoProcessor, AutoModelForSpeechSeq2Seq

# Whisper setup
model_id = "openai/whisper-large-v3-turbo"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32

processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, torch_dtype=dtype)
model.to(device)

# QA pipeline
qa = pipeline("question-answering", model="distilbert-base-cased-distilled-squad", device=0 if device == "cuda" else -1)

# Load and chunk audio
audio_path = "audio.wav"
target_sr = 16000
chunk_duration_sec = 30
chunk_samples = chunk_duration_sec * target_sr

audio, sr = librosa.load(audio_path, sr=target_sr)
chunks = [audio[i:i + chunk_samples] for i in range(0, len(audio), chunk_samples)]

# Transcribe chunks
transcripts = []
for i, chunk in enumerate(chunks):
    inputs = processor(chunk, sampling_rate=target_sr, return_tensors="pt")
    input_features = inputs["input_features"].to(device=device, dtype=dtype)

    with torch.no_grad():
        generated_ids = model.generate(input_features, max_new_tokens=444)
        transcript = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

    transcripts.append({
        "start": round(i * chunk_duration_sec, 2),
        "end": round(min((i + 1) * chunk_duration_sec, len(audio) / target_sr), 2),
        "transcript": transcript
    })

# Dataset conversion
dataset = Dataset.from_list(transcripts)

# Extract info with QA
def extract_info(example):
    def ask(question):
        try:
            return qa(question=question, context=example["transcript"])["answer"].strip()
        except:
            return ""

    actor = ask("Who is speaking or mentioned?")
    project = ask("What is the project name?")
    issue = ask("What issue is being discussed?")

    return {
        "Actor": actor[:30],
        "Project": project[:30],
        "Issue": issue[:30]
    }

dataset = dataset.map(extract_info)

# Use surrounding context to assess resolution
def assess_issue_status(example_idx, dataset):
    # Get previous, current, next context
    prev = dataset[example_idx - 1]["transcript"] if example_idx > 0 else ""
    curr = dataset[example_idx]["transcript"]
    next_ = dataset[example_idx + 1]["transcript"] if example_idx < len(dataset) - 1 else ""

    full_context = f"{prev} {curr} {next_}"
    try:
        result = qa(question="Was the issue resolved?", context=full_context)
        answer = result["answer"].strip().lower()
        return "resolved" if any(x in answer for x in ["yes", "resolved", "fixed", "done", "completed", "solved"]) else "unresolved"
    except:
        return "unresolved"

# Compute issue status for each row
issue_statuses = [assess_issue_status(i, dataset) for i in range(len(dataset))]
dataset = dataset.add_column("Issue Status", issue_statuses)

# Limit transcript too
dataset = dataset.map(lambda e: {"Transcript": e["transcript"][:30]})

# Final DataFrame
df = dataset.to_pandas()
df = df[["start", "end", "Transcript", "Actor", "Project", "Issue", "Issue Status"]]
df.columns = ["Start Time (s)", "End Time (s)", "Transcript", "Actor", "Project", "Issue", "Issue Status"]

df


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Device set to use cuda:0


Map:   0%|          | 0/33 [00:00<?, ? examples/s]

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
  block_group = [InMemoryTable(cls._concat_blocks(list(block_group), axis=axis))]
  table = cls._concat_blocks(blocks, axis=0)


Map:   0%|          | 0/33 [00:00<?, ? examples/s]

Unnamed: 0,Start Time (s),End Time (s),Transcript,Actor,Project,Issue,Issue Status
0,0,30.0,"So in a way, we have to disti",all of them,we have to distinguish all of,we have to distinguish all of,unresolved
1,30,60.0,she said that technical elect,she said that technical electr,technical electrical,technical electrical,unresolved
2,60,90.0,"If I go to the system, I will",somebody else is not here,CCE,we have a different way of tre,unresolved
3,90,120.0,it should be two,it should be two,it should be two,it should be two,unresolved
4,120,150.0,This is another concern which,Joshua,Joshua,Joshua should know about,unresolved
5,150,180.0,So when they said that I shou,people that are different,TG2000 That's why this is TG20,people that are different,unresolved
6,180,210.0,They put BIS in the first pla,BIS,BIS,BIS in the first place,unresolved
7,210,240.0,yeah so you can put one or is,one or is that what you're say,yeah so you can put one,one,unresolved
8,240,270.0,"Okay, so let's move ahead. I",Joshua,Bramble Energy Resources,We have Bramble Energy Resourc,unresolved
9,270,300.0,I've created two workers but,two workers,new the new,energy,unresolved


# **Streamlit protoype**

In [4]:
#!pip install streamlit pyngrok transformers datasets librosa soundfile --quiet
!pip install streamlit pyngrok streamlit-webrtc transformers datasets librosa soundfile av --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m108.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m217.3/217.3 kB[0m [31m20.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m35.3/35.3 MB[0m [31m39.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.9/92.9 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m110.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.1/79.1 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.5/4.5 MB[0m [31m116.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [5]:
%%writefile app.py
import streamlit as st
import torch
import librosa
import pandas as pd
import tempfile
from datasets import Dataset
from transformers import pipeline, AutoProcessor, AutoModelForSpeechSeq2Seq
import soundfile as sf

# Title
st.title("🎙️ Meeting Transcript Analyzer")

uploaded_file = st.file_uploader("Upload a .wav file", type=["wav"])

if uploaded_file:
    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_file:
        tmp_file.write(uploaded_file.read())
        audio_path = tmp_file.name

    st.info("Processing audio...")

    # Load models
    model_id = "openai/whisper-large-v3-turbo"
    device = "cuda" if torch.cuda.is_available() else "cpu"
    dtype = torch.float16 if device == "cuda" else torch.float32

    processor = AutoProcessor.from_pretrained(model_id)
    model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, torch_dtype=dtype)
    model.to(device)

    qa = pipeline("question-answering", model="distilbert-base-cased-distilled-squad", device=0 if device=="cuda" else -1)

    # Audio config
    target_sr = 16000
    chunk_duration_sec = 30
    chunk_samples = chunk_duration_sec * target_sr
    audio, sr = librosa.load(audio_path, sr=target_sr)
    chunks = [audio[i:i + chunk_samples] for i in range(0, len(audio), chunk_samples)]

    transcripts = []
    for i, chunk in enumerate(chunks):
        inputs = processor(chunk, sampling_rate=target_sr, return_tensors="pt")
        input_features = inputs["input_features"].to(device=device, dtype=dtype)
        with torch.no_grad():
            generated_ids = model.generate(input_features, max_new_tokens=444)
            transcript = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
        transcripts.append({
            "start": round(i * chunk_duration_sec, 2),
            "end": round(min((i + 1) * chunk_duration_sec, len(audio) / target_sr), 2),
            "transcript": transcript
        })

    dataset = Dataset.from_list(transcripts)

    def extract_info(example):
        def ask(question):
            try:
                return qa(question=question, context=example["transcript"])["answer"].strip()
            except:
                return ""
        return {
            "Actor": ask("Who is speaking or mentioned?")[:30],
            "Project": ask("What is the project name?")[:30],
            "Issue": ask("What issue is being discussed?")[:30]
        }

    dataset = dataset.map(extract_info)

    def assess_issue_status(example_idx, dataset):
        prev = dataset[example_idx - 1]["transcript"] if example_idx > 0 else ""
        curr = dataset[example_idx]["transcript"]
        next_ = dataset[example_idx + 1]["transcript"] if example_idx < len(dataset) - 1 else ""
        full_context = f"{prev} {curr} {next_}"
        try:
            result = qa(question="Was the issue resolved?", context=full_context)
            answer = result["answer"].strip().lower()
            return "resolved" if any(x in answer for x in ["yes", "resolved", "fixed", "done", "completed", "solved"]) else "unresolved"
        except:
            return "unresolved"

    statuses = [assess_issue_status(i, dataset) for i in range(len(dataset))]
    dataset = dataset.add_column("Issue Status", statuses)
    dataset = dataset.map(lambda e: {"Transcript": e["transcript"][:30]})

    df = dataset.to_pandas()
    df = df[["start", "end", "Transcript", "Actor", "Project", "Issue", "Issue Status"]]
    df.columns = ["Start Time (s)", "End Time (s)", "Transcript", "Actor", "Project", "Issue", "Issue Status"]

    st.success("Done!")
    st.dataframe(df)
    st.download_button("Download CSV", df.to_csv(index=False), file_name="meeting_summary.csv", mime="text/csv")


Writing app.py


In [6]:
#!pip install streamlit pyngrok transformers datasets librosa soundfile --quiet
!ngrok authtoken "2TIkeifoloMUIua7jguNvWAlMKw_kqLM38ahA7fRL83xGUsF"

from pyngrok import ngrok

# Kill any previous tunnels
ngrok.kill()

public_url = ngrok.connect(addr='8501')
print(f"🔗 Public URL: {public_url}")

# Run Streamlit app
!streamlit run app.py &

# Open the public URL
#public_url = ngrok.connect(addr='8501')
#print(f"🔗 Public URL: {public_url}")



Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml
🔗 Public URL: NgrokTunnel: "https://622b-34-143-205-104.ngrok-free.app" -> "http://localhost:8501"

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://34.143.205.104:8501[0m
[0m




2025-06-10 14:36:18.779203: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1749566178.803804    7832 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1749566178.811526    7832 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-06-10 14:36:23.523 Examining the path of torch.classes raised:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/streamlit/web/bootstrap.py", line 347, in run
    if asyncio.get_running_loop().is_running():
       ^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: no running event loop

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/loca

# **Gradio**

In [7]:
!pip install -q gradio transformers datasets librosa torchaudio accelerate soundfile

import gradio as gr
import torch
import librosa
import pandas as pd
import json
from transformers import pipeline, AutoProcessor, WhisperForConditionalGeneration

# Load models
whisper_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v3-turbo")
whisper_processor = AutoProcessor.from_pretrained("openai/whisper-large-v3-turbo")
qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

device = "cuda" if torch.cuda.is_available() else "cpu"
whisper_model.to(device)

# DataFrame holder
output_df = pd.DataFrame()

def transcribe_and_analyze(audio_path):
    global output_df
    audio, sr = librosa.load(audio_path, sr=16000)
    chunk_duration = 30
    step = int(chunk_duration * sr)
    data = []

    for i in range(0, len(audio), step):
        chunk = audio[i:i + step]
        if len(chunk) < 10000:
            continue
        input_features = whisper_processor(chunk, sampling_rate=sr, return_tensors="pt").input_features.to(device)
        with torch.no_grad():
            predicted_ids = whisper_model.generate(input_features, max_new_tokens=100)
            transcript = whisper_processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

        start_time = i / sr
        end_time = min((i + step), len(audio)) / sr
        timestamp = f"{int(start_time)}s - {int(end_time)}s"
        short_transcript = transcript[:30]

        context = transcript
        prev_context = data[-1]["Transcript"] if data else ""

        def ask(question, context):
            result = qa_pipeline(question=question, context=context)
            return result['answer'][:30]

        actor = ask("Who is speaking?", context)
        project = ask("What project is being discussed?", context)
        issue = ask("What issue is mentioned?", context)

        issue_context = f"{prev_context} {context}"
        issue_status_answer = ask("Is the issue resolved?", issue_context).lower()
        issue_status = "resolved" if "yes" in issue_status_answer or "resolved" in issue_status_answer else "unresolved"

        data.append({
            "Timestamp": timestamp[:30],
            "Transcript": short_transcript,
            "Actor": actor[:30],
            "Project": project[:30],
            "Issue": issue[:30],
            "Issue Status": issue_status
        })

    output_df = pd.DataFrame(data)
    return output_df

# Save functions
def save_json():
    global output_df
    file_path = "data.json"
    with open(file_path, "w") as f:
        json.dump(output_df.to_dict(orient="records"), f, indent=2)
    return file_path

def save_xml():
    global output_df
    file_path = "data.xml"
    xml_str = "<Meeting>\n"
    for _, row in output_df.iterrows():
        xml_str += "  <Chunk>\n"
        for key, value in row.items():
            xml_str += f"    <{key}>{value}</{key}>\n"
        xml_str += "  </Chunk>\n"
    xml_str += "</Meeting>"
    with open(file_path, "w") as f:
        f.write(xml_str)
    return file_path

def save_csv():
    global output_df
    file_path = "data.csv"
    output_df.to_csv(file_path, index=False)
    return file_path

# Gradio app
with gr.Blocks(css="""
.gradio-container {
    max-width: 900px;
    margin: auto;
}
""") as app:
    gr.Markdown("## 🎙️ LMS-VBU Meeting Summarisation", elem_id="title")
    gr.Markdown("Upload or record a meeting. Transcription and structured summary will appear below.")

    with gr.Column():
        audio_input = gr.Audio(type="filepath", label="🎤 Record or Upload Audio")
        output_table = gr.Dataframe(label="📊 Meeting Summary Table", wrap=True)

        # Save buttons
        with gr.Row():
            save_json_btn = gr.Button("💾 Save as JSON")
            save_xml_btn = gr.Button("💾 Save as XML")
            save_csv_btn = gr.Button("💾 Save as CSV")

        # Output files
        with gr.Row():
            json_file = gr.File(label="Download JSON")
            xml_file = gr.File(label="Download XML")
            csv_file = gr.File(label="Download CSV")

    # Interactions
    audio_input.change(fn=transcribe_and_analyze, inputs=audio_input, outputs=output_table)
    save_json_btn.click(fn=save_json, outputs=json_file)
    save_xml_btn.click(fn=save_xml, outputs=xml_file)
    save_csv_btn.click(fn=save_csv, outputs=csv_file)

app.launch()


Device set to use cuda:0


It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://4b605e964ba9cc0324.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


