## Convo Summary

In this notebook, we loook at different types of LLMS and how they summarize audio transcripts. Specifically we are interested in:
- How concise each model summarizes the transcripts to
- How understandable the summaries are- measured in terms of how quickly a receiver may view some measurement of "urgency"

In [None]:
#imports
import os
import openai
from transformers import pipeline

We store a list of prompts to reuse for every model.

In [None]:
concise_prompt = "Summarize the following audio transcript in 3–4 sentences. Focus on the main issue discussed, the speaker’s emotional state, and any signs of urgency or distress"
analytical_prompt = "Read this conversation transcript and provide a concise summary that identifies the key concern, emotional tone, and any notable escalation or risk indicators. Use professional and neutral language."
structured_prompt = "Summarize the transcript with the following format:\n- Main issue: \n- Emotional state: \n- Urgency level (low/medium/high):"

list_of_prompts = [concise_prompt, analytical_prompt, structured_prompt]

## ChatGPT

We start by looking into summaries generated by ChatGPT 4o.

In [None]:
# Setting up the API Key, stored in seperate environment file
openai.api_key = os.getenv("OPENAI_API_KEY")

In [None]:
# Function to prompt ChatGPT
def get_chatgpt_response(prompt):
    completion = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ]
    )
    return completion.choices[0].message.content


def store_chatgpt_responses(list_of_prompts):
    # Loop through each prompt
    for prompt_num, prompt in enumerate(list_of_prompts, start=1):
        # Define output path for this prompt's results
        output_filename = f"Summaries/gpt-4/prompt{prompt_num}_summary.txt"
        
        # Create/open the file for writing all call summaries under this prompt
        with open(output_filename, "w", encoding="utf-8") as out_file:
            # Loop through all transcripts (call01–call05)
            for i in range(1, 6):
                filename = f"call{i:02d}.txt"
                filepath = f"Transcripts/{filename}"  # based on your folder structure

                with open(filepath, "r", encoding="utf-8") as f:
                    transcript_text = f.read()

                # Combine prompt with transcript
                full_prompt = f"{prompt}\n\nTranscript:\n{transcript_text}"
                response = get_chatgpt_response(full_prompt)

                # Write formatted summary to the output file
                out_file.write(f"=== Summary for {filename} ===\n")
                out_file.write(response.strip() + "\n\n")  # add space between summaries

            print(f"Saved all summaries for prompt {prompt_num} to {output_filename}")


Calling the function

In [None]:
store_chatgpt_responses(list_of_prompts)

Saved all summaries for prompt 1 to Summaries/gpt-4/prompt1_summary.txt
Saved all summaries for prompt 2 to Summaries/gpt-4/prompt2_summary.txt
Saved all summaries for prompt 3 to Summaries/gpt-4/prompt3_summary.txt


In each txt file, we are able to see how each prompt summarizes the five different audio transcripts. We observe that the **structured_prompt** does the best in summarizing in **the most organized/easily understandable manner**.

## Hugging Face

In [None]:
# Define the models you want to use
HUGGINGFACE_MODELS = {
    "bart": "facebook/bart-large-cnn",
    "t5": "t5-base"
}
# Loop over each model
for model_name, model_id in HUGGINGFACE_MODELS.items():
    summarizer = pipeline("summarization", model=model_id)

    for prompt_num, prompt in enumerate(list_of_prompts, start=1):
        output_path = f"Summaries/{model_name}/prompt{prompt_num}_summary.txt"
        os.makedirs(f"Summaries/{model_name}", exist_ok=True)

        with open(output_path, "w", encoding="utf-8") as out_file:
            for i in range(1, 6):  # call01 to call05
                filename = f"call{i:02d}.txt"
                filepath = f"Transcripts/{filename}"

                with open(filepath, "r", encoding="utf-8") as f:
                    transcript_text = f.read()

                # Create the input to the summarization model
                input_text = f"{prompt}\n\nTranscript:\n{transcript_text}"

                # Hugging Face models usually limit input to ~1024–2048 tokens
                input_trimmed = input_text[:1024]  # Trim to avoid token cutoff
                summary = summarizer(input_trimmed, max_length=150, min_length=30, do_sample=False)[0]["summary_text"]

                out_file.write(f"=== Summary for {filename} ===\n")
                out_file.write(summary.strip() + "\n\n")

            print(f"[{model_name.upper()}] Saved summaries for prompt {prompt_num} to {output_path}")


RuntimeError: At least one of TensorFlow 2.0 or PyTorch should be installed. To install TensorFlow 2.0, read the instructions at https://www.tensorflow.org/install/ To install PyTorch, read the instructions at https://pytorch.org/.