In [None]:
---
title: "Going to Have to Stop You Right There"
description: "Probabilistic rudeness for better discussions" 
author: "Eric Zou"
date: "9/15/2025"
categories:
  - LLMs
  - Conversations
---

## Randomness as Interjections
We're going to try to continue improving on the authenticity and naturalness of the conversations we're generating. One issue with LLM conversations is how structured they are, even without incredibly formal language or long responses. The fact that each model gets to (and needs to) completely finish before the next one can speak means that things that often happen in real conversations (interruptions, affirmations, etc.). This is not something that our paradigm explored [last time](https://ezou626.github.io/comm4190_F25_Using_LLMs_Blog/posts/003_approaching_a_true_discussion/Spontaneity.html) has changed with random speaker ordering and potentially skipping speakers. 

We're going to try to add these things to hopefully make these conversations even more real.

In [7]:
# first, some boilerplate
from openai import OpenAI
import os
import base64
import requests
from tqdm import tqdm
from IPython.display import FileLink, display
from dotenv import load_dotenv
from random import shuffle, randint, choice, random
from math import floor
# Load API key
_ = load_dotenv("../../../comm4190_F25/01_Introduction_and_setup/.env")
client = OpenAI()

# changing the topic to make it a bit more conversational too and less of a debate
TOPIC = """Code, testing, and infra as a source of truth versus comprehensive documentation."""

# prompt to analyze conversations
EVALUATION_PROMPT = """
Your objective is to analyze this conversation between speakers.
Your response should follow this organization:
- A Brief Summary
- Final Outputs/Artifacts/Takeaways
- Characteristics/Dynamic (Competitive/Collaborative/etc.)
"""

def analyze_conversation(conversation: str):
    input_chat = [
        {
            "role": "system",
            "content": EVALUATION_PROMPT
        },
        {
            "role": "user",
            "content": "Here is the transcript\n" + conversation
        }
    ]
    response = client.chat.completions.create(
        model = "gpt-4o",
        messages = input_chat,
        store = False
    )
    print(response.choices[0].message.content)

# code to save the conversation
def save_conversation(
    filename: str,
    conversation_history: list[dict]
) -> str:

    messages = []

    for record in debate_history:

        if record["role"] == "user":
            messages.append("mediator:\n" + record["content"])
        
        if record["role"] == "assistant":
            messages.append(f"{record["name"]}:\n{record["content"]}")
    
    conversation_transcript = "\n\n".join(messages)
    
    with open(filename, "w", encoding="utf-8") as f:
        f.write(conversation_transcript)
    
    display(FileLink(filename))

    return conversation_transcript

### Experiment 1: Cutoffs

In [21]:
NEW_SYSTEM_PROMPT = "You a participant in a conversation between experienced software engineers. Keep questions minimal and only use them when necessary. Please greet the other participants when you join."
interruption_prompt = lambda x: f" You are interrupting speaker {x}." if x != -1 else ""

def run_conversation(
    iterations: int, 
    openai_model_id: str,
    participant_count: int,
    topic: str,
    system_prompt: str,
    dropout_chance: float,
    cutoff_chance: float
) -> list[dict]:
    # model 1 is the first speaker
    conversation_history = [
        {"role": "system", "content": system_prompt + " The topic is: " + topic}
    ]

    ordering = list(range(1, participant_count + 1))
    last_speaker = -1
    speaker_interrupted = -1

    for i in tqdm(range(iterations)):
        if i > 0:
            first = choice(ordering[:-1])
            remaining = [i for i in ordering if i != first]
            shuffle(remaining)
            ordering = [first] + remaining
        for model in ordering: # RANDOM ORDERING
            if random() < dropout_chance:
                continue # SKIP
            if last_speaker == model:
                continue
            speaker_id = f"speaker_{model}"
            response = client.chat.completions.create(
                model = openai_model_id,
                messages = conversation_history + [
                    {
                        "role": "user", 
                        "content": f"{speaker_id}, please share your perspective with the others and engage with the responses of the other participants." + interruption_prompt(speaker_interrupted)
                    }
                ],
                store = False
            );
            speaker_interrupted = -1
            message = response.choices[0].message.content

            # cutoff message
            if random() < cutoff_chance:
                chunks = message.split(" ")
                message = " ".join(chunks[:floor(len(chunks) * random())])
                speaker_interrupted = model
            
            conversation_history.append({"role": "assistant", "name": speaker_id, "content": message})
            last_speaker = model

    return conversation_history

conversation = run_conversation(8, 'gpt-4o', 3, TOPIC, NEW_SYSTEM_PROMPT, 0.5, 0.2)

100%|██████████| 8/8 [00:28<00:00,  3.59s/it]


In [22]:
conversation_1 = save_conversation("conversation_1.txt", conversation)

#### Analysis
We can now use AI to take a look at this conversation.

In [23]:
analyze_conversation(conversation_1)

- **A Brief Summary**: The conversation focuses on the balance between treating code and infrastructure as the source of truth versus relying on extensive documentation in a development environment. All speakers advocate for leaning towards the code-centric approach, highlighting automated tests and infrastructure definitions. However, they also acknowledge the necessity of certain high-level documentation for onboarding and providing context, especially about architectural decisions. Mentorship and interactive documentation platforms are mentioned as effective onboarding strategies.

- **Final Outputs/Artifacts/Takeaways**: The takeaways include the consensus on using code as the primary source of truth while maintaining some documentation for context. Key artifacts include the use of automated tools like Swagger, practices like Architecture Decision Records (ADRs), and platforms such as Confluence for maintaining "living" documentation. Also, integrating documentation updates into de

I think the interruptions aren't helping so much. For example, let's look at this section in the conversation:
> (speaker_3)
> Speaker_1's point about onboarding emphasizes that while self-documenting code is ideal, some introductory material will always be necessary to bring new team members up to speed efficiently. It’s about > finding the right balance where both the code and key supporting documents align to provide clarity and understanding across the board. 
> How do others in the group
>
> speaker_1:
> Thanks for the reminder! To jump into the responses—speaker_3, your point about using tools like Swagger for APIs is spot on. It helps keep documentation from going stale by being inherently tied to the codebase. I've also seen success with ReadMe documentation that's auto-generated from comments embedded within code, though it requires developers to maintain those comments diligently.

Speaker 1 doesn't really acknowledge that they have cut someone off, despite the prompt indicating that this was the case. 

There are some cases where it is recognized though, and the speaker is able to continue off the other person's text.
> speaker_1:
Apologies for the interruption earlier. Thanks for catching that!

In these cases, it definitely improves (from the subjective perspective of me as a reader) the spontaneity of the conversation.

### Experiment 2: Affirmations
We can also see if affirmations does anything for the conversation. Our model of affirmations is a message from one model embedded in another model. We'll randomly select from a list of affirmations generated by ChatGPT (GPT5 on the chatgpt.com website as of 9/20).

In [31]:
affirmations = [
    "Okay",
    "Sure",
    "Yep",
    "Mm",
    "Absolutely",
    "Definitely",
    "For sure",
    "Sounds good",
    "Of course",
    "Right on"
]

def run_conversation_interjections(
    iterations: int, 
    openai_model_id: str,
    participant_count: int,
    topic: str,
    system_prompt: str,
    dropout_chance: float,
    interjection_chance: float,
    interjections: list[str]
) -> list[dict]:
    # model 1 is the first speaker
    conversation_history = [
        {"role": "system", "content": system_prompt + " The topic is: " + topic}
    ]

    ordering = list(range(1, participant_count + 1))
    last_speaker = -1
    speaker_interrupted = -1

    for i in tqdm(range(iterations)):
        if i > 0:
            first = choice(ordering[:-1])
            remaining = [i for i in ordering if i != first]
            shuffle(remaining)
            ordering = [first] + remaining
        for model in ordering: # RANDOM ORDERING
            if random() < dropout_chance:
                continue # SKIP
            if last_speaker == model:
                continue
            speaker_id = f"speaker_{model}"
            response = client.chat.completions.create(
                model = openai_model_id,
                messages = conversation_history + [
                    {
                        "role": "user", 
                        "content": f"{speaker_id}, please share your perspective with the others and engage with the responses of the other participants."
                    }
                ],
                store = False
            );
            message = response.choices[0].message.content

            # affirm message
            if random() < interjection_chance:
                chunks = message.split(" ")
                interjection_index = floor(len(chunks) * random())
                message_1 = " ".join(chunks[:interjection_index])
                message_2 = " ".join(chunks[interjection_index:])
                interjection = choice(interjections)
                # randomly pick an interjector
                interjection_model = f"speaker_{choice([i for i in range(1, participant_count + 1) if i != model])}"
                if message_1:
                    conversation_history.append({"role": "assistant", "name": speaker_id, "content": message_1})
                conversation_history.append({"role": "assistant", "name": interjection_model, "content": interjection})
                if message_2:
                    conversation_history.append({"role": "assistant", "name": speaker_id, "content": message_2})
            else:
                conversation_history.append({"role": "assistant", "name": speaker_id, "content": message})
            
            last_speaker = model

    return conversation_history

In [26]:
conversation = run_conversation_interjections(8, 'gpt-4o', 3, TOPIC, NEW_SYSTEM_PROMPT, 0.5, 0.2, affirmations)
conversation_2 = save_conversation("conversation_2.txt", conversation)

100%|██████████| 8/8 [00:44<00:00,  5.60s/it]


#### Analysis
Let's first generate an AI summary.

In [27]:
analyze_conversation(conversation_2)

- **Brief Summary**: The conversation involves three speakers discussing the role of code, tests, infrastructure as code, and documentation in development workflows. They explore balancing code as the source of truth with the need for human-readable documentation, especially for onboarding and communication with non-technical stakeholders. They also discuss strategies to integrate documentation effectively into workflows and the possibility of using documentation sprints or hackathons.

- **Final Outputs/Artifacts/Takeaways**: Key takeaways include the acknowledgment that while code and tests should be the primary sources of truth due to their inherent reliability, documentation plays a crucial role in explaining the rationale behind code, particularly for onboarding and broader communication. There is agreement on the importance of integrating documentation efforts into standard development practices using tools that automate document generation and conducting regular documentation re

It seems adding the interjections doesn't do much for the conversation itself, but it certainly "looks" more like a regular conversation now at a glance because these kinds of things are normal to see in a spoken setting. 
> One tactic I've found effective is peer reviews for documentation updates, much like code reviews. This ensures any significant changes in the codebase get reflected in the accompanying documentation with a second pair of eyes checking for
>
>speaker_3:
Okay
>
> speaker_1:
consistency. Implementing this as a standard part of our workflow has significantly improved the reliability and relevance of our documentation.

I think one limitation of this approach is that since the interjections are basically completely random, it could be that the interjector is contradicting themselves by interjecting in a way that isn't consistent with their original position. In order to control this factor, we can use neutral interjections.

### Experiment 3: Other Interruptions
I had ChatGPT (GPT5 on 9/20) generate me a list of 10 neutral interjections that we can select from randomly. These should blend better with the average speaker position.

In [32]:
discussion_interjections = [
    "I see",
    "Fair enough",
    "Makes sense",
    "Right",
    "Got it",
    "Noted",
    "True",
    "All right",
    "That’s something",
    "Point taken"
]

In [35]:
conversation = run_conversation_interjections(8, 'gpt-4o', 3, TOPIC, NEW_SYSTEM_PROMPT, 0.5, 0.2, discussion_interjections)
conversation_3 = save_conversation("conversation_3.txt", conversation)

100%|██████████| 8/8 [01:45<00:00, 13.23s/it]


#### Analysis
Let's see how this conversation fares.

In [36]:
analyze_conversation(conversation_3)

- **Brief Summary**: The conversation revolves around the balance between using code and tests as primary sources of truth in software projects versus maintaining documentation for broader understanding and context. The speakers agree on the importance of both, emphasizing integrated practices for documentation updates and the use of tools such as ADRs, JSDoc, and Sphinx for maintaining relevant documentation.

- **Final Outputs/Artifacts/Takeaways**: 
  - Code and tests are considered the most accurate reflection of the system, given their evolving nature.
  - Documentation should focus on context, rationale, and high-level architecture.
  - Establishing documentation as part of the development workflow can help keep it relevant.
  - ADRs are valuable for capturing architectural decisions.
  - Prioritizing documentation based on complexity and impact is a common strategy.
  - The use of structured documentation tools aids in seamless documentation upkeep.
  - Regular checkpoints and r

I think the interjections are sometimes a little misplaced, even considering that these would be simultaneous in a real conversation with the other person's words. 

> (speaker_2)
> How do you all handle the challenge of ensuring that documentation remains accessible and easily digestible, especially when integrated closely with the code? Do you find structured documentation tools
>
> speaker_3:
Point taken
>
> speaker_2:
helpful, or do you rely more on traditional documentation practices?

I think some of the issues with the affirmation approach are still present. I think a next step in a future blog may be to take a look at how we can place these interjections a little more smartly, given what the speaker themselves would know about the conversation.

### Experiment 4: Combining Cutoffs and Interruptions
The last thing we'll take a look at is the combination of both of these approaches. For every message, there is a chance it will be cut off or interjected, or both. I don't expect anything super surprising from this conversation. 

In [39]:
all_affirmations = affirmations + discussion_interjections

def run_conversation_interrupt_interject(
    iterations: int, 
    openai_model_id: str,
    participant_count: int,
    topic: str,
    system_prompt: str,
    dropout_chance: float,
    cutoff_chance: float,
    interjection_chance: float,
    both_chance: float,
    interjections: list[str]
) -> list[dict]:
    # model 1 is the first speaker
    conversation_history = [
        {"role": "system", "content": system_prompt + " The topic is: " + topic}
    ]

    ordering = list(range(1, participant_count + 1))
    last_speaker = -1
    speaker_interrupted = -1

    for i in tqdm(range(iterations)):
        if i > 0:
            first = choice(ordering[:-1])
            remaining = [i for i in ordering if i != first]
            shuffle(remaining)
            ordering = [first] + remaining
        for model in ordering: # RANDOM ORDERING
            if random() < dropout_chance:
                continue # SKIP
            if last_speaker == model:
                continue
            speaker_id = f"speaker_{model}"
            response = client.chat.completions.create(
                model = openai_model_id,
                messages = conversation_history + [
                    {
                        "role": "user", 
                        "content": f"{speaker_id}, please share your perspective with the others and engage with the responses of the other participants." + interruption_prompt(speaker_interrupted)
                    }
                ],
                store = False
            );
            message = response.choices[0].message.content
            speaker_interrupted = -1

            # affirm message
            value = random()
            if value < interjection_chance:
                chunks = message.split(" ")
                interjection_index = floor(len(chunks) * random())
                message_1 = " ".join(chunks[:interjection_index])
                message_2 = " ".join(chunks[interjection_index:])
                interjection = choice(interjections)
                # randomly pick an interjector
                interjection_model = f"speaker_{choice([i for i in range(1, participant_count + 1) if i != model])}"
                if message_1:
                    conversation_history.append({"role": "assistant", "name": speaker_id, "content": message_1})
                conversation_history.append({"role": "assistant", "name": interjection_model, "content": interjection})
                if message_2:
                    conversation_history.append({"role": "assistant", "name": speaker_id, "content": message_2})
            elif value < interjection_chance + cutoff_chance:
                chunks = message.split(" ")
                message = " ".join(chunks[:floor(len(chunks) * random())])
                conversation_history.append({"role": "assistant", "name": speaker_id, "content": message})
                speaker_interrupted = model
            elif value < interjection_chance + cutoff_chance + both_chance:
                chunks = message.split(" ")
                interjection_index = floor(len(chunks) * random())
                message_1 = " ".join(chunks[:interjection_index])
                interjection = choice(interjections)
                # randomly pick an interjector
                interjection_model = f"speaker_{choice([i for i in range(1, participant_count + 1) if i != model])}"
                if message_1:
                    conversation_history.append({"role": "assistant", "name": speaker_id, "content": message_1})
                conversation_history.append({"role": "assistant", "name": interjection_model, "content": interjection})
                speaker_interrupted = model
            else:
                conversation_history.append({"role": "assistant", "name": speaker_id, "content": message})
            
            last_speaker = model

    return conversation_history

In [42]:
conversation = run_conversation_interrupt_interject(8, 'gpt-4o', 3, TOPIC, NEW_SYSTEM_PROMPT, 0.5, 0.2, 0.2, 0.1, discussion_interjections)
conversation_4 = save_conversation("conversation_4.txt", conversation)

100%|██████████| 8/8 [00:40<00:00,  5.10s/it]


#### Analysis
I'll skip the AI analysis for this one. It feels like more of the same which we've been seeing before. I think for word-granularity interruptions from randomness, this might be close the limit.

## Closing Remarks
In general, while these seem to make the conversation appear more natural from a bird's eye view, there are evidently issues with the approach. 

First, because LLM's operate with text, the introduction of interjections in the way done here doesn't really model the way an interjection appears in real life conversations (separation vs. mixing, in that order). This could cause interjections to not be as effective in their intended purpose, or worse, interrupt the conversation flow. A potential remedy to this would be to reframe the problem as looking at an online chatroom and only allowing interjections at punctuation marks like periods or paragraph breaks. In that way, it's much more like a chat conversation where messages are delivered in whole pieces without chance of interruption in the middle.

Second, I think there's also an opportunity to develop speaker personas to enforce consistent actions across multiple messages. When speakers are prompted, there is a chance they may not remember what they are trying to work on, and there's no way in the current simulation environment to differentiate one speaker from the other except in the order in which they speak. Maintaining a consistent state for each speaker could go a long way in ensuring better defined speakers, and as a result, more diverse conversations. Additionally, we may be able to use these personas to inform actions like interruptions or cutoffs.

> **Future Work:**
> - There's definitely an opportunity to develop speaker-level context that can be delivered in the message generation prompt. This would allow a more rich representation of the speaker's internal thoughts, and this could even be managed by an LLM.
> - There's also an opportunity to make some of these random things motivated by this context or the conversation, asking the AI if it wants to do a specific action given it's current position.
> - It might be cool to have a general framework that considers all of these possible scenarios and allows for the simulation of these kinds of conversations with different models, parameters, etc. 