# Binary Classification
Takes in all transcript text and organizes each call into one of two categories

### E.x. Questions
Engagement / Qualification
- Did the prospect express interest in learning more?
- Did the prospect agree to a follow-up meeting/demo?
- Did the prospect mention having purchasing authority?
- Did the call uncover a clear business pain point?

Objections / Barriers
- Did the prospect raise an objection?
- Was pricing specifically discussed?
- Did the prospect explicitly reject the offer?

Sales Process Steps
- Was a specific product/service mentioned?
- Did the rep attempt to close the deal (e.g., ask for commitment)?
- Was a next step explicitly scheduled (follow-up call, send contract, etc.)?

Customer Sentiment
- Did the customer sound satisfied/positive about the offering?
- Did the prospect express dissatisfaction or frustration?

In [27]:
import pandas as pd  
from typing import List
from pydantic import BaseModel
from openai import OpenAI
from dotenv import load_dotenv
import os
from concurrent.futures import ThreadPoolExecutor

# Load environment
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

client = OpenAI(api_key=api_key)

# Define schema for structured output
class BinaryOutcome(BaseModel):
    call_id: str
    binary_explanation: str
    binary_label: str

# -------------------------------
# Helper function for one row
# -------------------------------
def process_row_binary(row,
                       context_prompt,
                       input_data,
                       positive_label,
                       negative_label,
                       client):
    call_id = row["call_id"]
    transcript = row[input_data]

    messages = [
        {
            "role": "system",
            "content": f"""
            You are a strict binary classifier. You must answer the user's binary question using evidence from the transcript. 
            A 'yes' requires evidence that satisfies the question. 
            A 'no' means the evidence is missing, unclear, or contradictory. 
            Don't make large assumptions. 
            Always return valid JSON matching the schema.""",
        },
        {
            "role": "user",
            "content": f"""

            Question: {context_prompt}
            Transcript: {transcript}
            
            Respond ONLY with:
            - binary_explanation: reason why it was labeled yes or no, referencing evidence from the transcript
            - binary_label: "{positive_label}" or "{negative_label}"
            """,
        },
    ]

    try:
        response = client.responses.parse(
            model="gpt-4o-mini",
            input=messages,
            text_format=BinaryOutcome,
            temperature=0,
            max_output_tokens=300,
        )

        parsed: BinaryOutcome = response.output_parsed
        parsed.call_id = call_id

    except Exception as e:
        print(f"Failed to parse {call_id}: {e}")
        parsed = BinaryOutcome(
            call_id=call_id,
            binary_explanation="No explanation found",
            binary_label=negative_label,
        )

    return parsed.model_dump()

# -------------------------------
# Parallel classifier
# -------------------------------
def binary_classifier_parallel(df,
                               context_prompt,
                               input_data="call_text",
                               positive_label="true",
                               negative_label="false",
                               explanation_col="binary_explanation",
                               label_col="binary_label",
                               max_workers=8) -> pd.DataFrame:
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = [
            executor.submit(
                process_row_binary,
                row,
                context_prompt,
                input_data,
                positive_label,
                negative_label,
                client
            )
            for _, row in df.iterrows()
        ]
        results = [f.result() for f in futures]

    # Convert list of dicts into DataFrame and merge
    results_df = pd.DataFrame(results)
    results_df = results_df.rename(columns={
        "binary_explanation": explanation_col,
        "binary_label": label_col,
    })

    df = df.merge(results_df, on="call_id", how="left")
    return df


In [28]:
import pandas as pd

df = pd.read_csv("5TestCalls.csv")
df.head(10) 

Unnamed: 0,call_id,call_text
0,Call1,"[Agent] ""Thank you for choosing Optimum Busine..."
1,Call10,"[Agent] ""Thank you for choosing Optimum Busine..."
2,Call100,"[Agent] ""Good morning. Thank you for calling O..."
3,Call101,"[Agent] ""Good morning. Thank you for calling O..."
4,Call102,"[Agent] ""Hold on one second, hold on, do not d..."


In [32]:
context_prompt = "Was pricing mentioned?"

results = binary_classifier_parallel(
    df,
    context_prompt,
    positive_label="yes",
    negative_label="no",
    explanation_col="explanation",
    label_col="label",
    max_workers=5  # tune this for API limits
)

display(results)

Unnamed: 0,call_id,call_text,explanation,label
0,Call1,"[Agent] ""Thank you for choosing Optimum Busine...",Pricing was explicitly mentioned when the agen...,yes
1,Call10,"[Agent] ""Thank you for choosing Optimum Busine...",Pricing was explicitly mentioned multiple time...,yes
2,Call100,"[Agent] ""Good morning. Thank you for calling O...",Pricing was explicitly mentioned multiple time...,yes
3,Call101,"[Agent] ""Good morning. Thank you for calling O...",Pricing was explicitly mentioned when the agen...,yes
4,Call102,"[Agent] ""Hold on one second, hold on, do not d...",Pricing was explicitly mentioned in the transc...,yes


In [33]:
results.to_csv("pricing.csv", index=False)

# Open Ended Classification
Ask a question on all rows separately and add distinct responses to a new column


In [4]:
import pandas as pd
from pydantic import BaseModel
from openai import OpenAI
from dotenv import load_dotenv
import os
from concurrent.futures import ThreadPoolExecutor

# Load Environment
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

client = OpenAI(api_key=api_key)


class OpenEnded(BaseModel):
    call_id: str
    open_response: str


def process_row_open_ended(row, context_prompt, input_data, response_col, client):
    call_id = row["call_id"]
    transcript = row[input_data]
    messages = [
        {
            "role": "system",
            "content": "You are an assistant answering open-ended questions about sales calls. Always return valid JSON that matches the schema. Do not include extra text.",
        },
        {
            "role": "user",
            "content": f"""Question: {context_prompt}
Transcript: {transcript}

Respond with:
- open_response: answer to the question, using evidence from the transcript (1-2 sentences, escape quotes)
""",
        },
    ]
    try:
        response = client.responses.parse(
            model="gpt-4o-mini",
            input=messages,
            text_format=OpenEnded,
            temperature=0,
            max_output_tokens=300
        )
        parsed = response.output_parsed
        parsed.call_id = call_id
    except Exception as e:
        print(f"Failed to parse {call_id}: {e}")
        parsed = OpenEnded(call_id=call_id, open_response="No answer found")
    return parsed.model_dump()



def open_ended_parallel(df,
                        context_prompt: str,
                        input_data="call_text",
                        response_col="open_response",
                        max_workers=8) -> pd.DataFrame:
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = [
            executor.submit(process_row_open_ended, row, context_prompt, input_data, response_col, client)
            for _, row in df.iterrows()
        ]
        results = [future.result() for future in futures]
    results_df = pd.DataFrame(results)
    results_df = results_df.rename(columns={"open_response": response_col})
    df = df.merge(results_df, left_on="call_id", right_on="call_id", how="left")
    return df


In [5]:
context_prompt = "How could the sales technician improve the result of the call?"

df = open_ended_parallel(
    df,
    context_prompt,
    response_col="sales_technician_improvement",
    max_workers=5  # adjust for API limits
)


display(df)

Unnamed: 0,call_id,call_text,sales_technician_improvement
0,Call1,"[Agent] ""Thank you for choosing Optimum Busine...",The sales technician could improve the result ...
1,Call10,"[Agent] ""Thank you for choosing Optimum Busine...",The sales technician could improve the result ...
2,Call100,"[Agent] ""Good morning. Thank you for calling O...",The sales technician could improve the result ...
3,Call101,"[Agent] ""Good morning. Thank you for calling O...",The sales technician could improve the result ...
4,Call102,"[Agent] ""Hold on one second, hold on, do not d...",The sales technician could improve the result ...


In [8]:
df.to_csv("open_ended_results.csv", index=False)

# Summarizer
Use prompt to iterate over a column and summarize all text from those columns

In [33]:
from pydantic import BaseModel


class SummarizationOutput(BaseModel):
    summary: str
    explanation: str

def summarize_column(df, 
                   context_prompt, 
                   target_col,
                   id_col) -> str:
    all_text = "\n\n".join([f"{row[id_col]}: {row[target_col]}" for _, row in df.iterrows()])
    messages = [
        {
            "role": "system",
            "content": "You are an assistant that summarizes structured outputs from previous analysis. "
            "Always return valid JSON matching the schema. Be concise.",
        },
        {
            "role": "user",
            "content": f"""
            Context: {context_prompt}
            Data:
            {all_text}
            Respond with:
            - summary: summary text of all entries. Be concise.
            - explanation: brief explanation of how you summarized
            """,
        },
    ]

    try:
        response = client.responses.parse(
            model="gpt-4o-mini",
            input=messages,
            text_format=SummarizationOutput,
            temperature=0,
            max_output_tokens=500,
        )
        output = response.output_parsed
    except Exception as e:
        print(f"Failed to parse summary: {e}")
        output = SummarizationOutput(
            input_text=all_text,
            summary="",
            explanation="Parsing failed"
        )
    return output.model_dump()

In [34]:
import json

prompt = "Summarize sales technician improvement areas. Make logical claims about best practices and adjustments agents should do."
column = "sales_technician_improvement"

result = summarize_column(df, prompt, target_col=column, id_col="call_id")

print(json.dumps(result, indent=2))

{
  "summary": "Sales technicians should focus on improving their communication by actively summarizing key points, confirming customer needs, and providing clear explanations about services and pricing. They should listen attentively to customer concerns, emphasize the benefits of proposed upgrades, and clarify installation timelines to reduce confusion. Specific examples and reassurances about service reliability can enhance customer confidence.",
  "explanation": "The summary consolidates the main improvement areas identified across the calls, highlighting the importance of active listening, clear communication, and customer reassurance. Each call's suggestions were distilled into broader best practices for sales technicians."
}


# Table Size Splitting
Iterate through a column and add the number of tokens in that row to a buffer, continue until we reach max_token size and then add buffer contents to return list. Reset and continue until end of dataframe

In [19]:
from typing import List
import tiktoken

def count_tokens(text: str, model: str = "gpt-4o-mini") -> int:
    """
    Count tokens in a string using OpenAI's tiktoken for a specific model.
    """
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

def table_size_splitting(
    df,
    target_col: str,
    max_tokens: int,
    buffer_size: int
) -> List[int]:
    """
    Returns a list of indices (row numbers) where the DataFrame should be split
    so that each chunk does not exceed max_tokens (including buffer per row).
    """
    split_indices = []
    current_tokens = 0

    for idx, row in df.iterrows():
        row_tokens = count_tokens(row[target_col]) + buffer_size
        current_tokens += row_tokens

        if current_tokens > max_tokens:
            split_indices.append(idx - 1 if idx > 0 else 0)
            current_tokens = row_tokens  # Start new chunk with current row

    return split_indices

In [32]:
splits = table_size_splitting(df, target_col="customer_receptive_response", max_tokens=250, buffer_size=10)
print(splits)  # e.g., [12, 25, 39]

[3]


In [31]:
splits = table_size_splitting(df, target_col="customer_receptive_response", max_tokens=250, buffer_size=10)

start = 0
for i, end in enumerate(splits):
    chunk = df.iloc[start:end+1]
    # Calculate total tokens for this chunk
    total_tokens = sum(count_tokens(row["customer_receptive_response"]) + 10 for _, row in chunk.iterrows())
    print(f"\n--- Chunk {i+1} (rows {start} to {end}) | Total tokens (with buffer): {total_tokens} ---")
    display(chunk)  # Use display() in Jupyter, or print(chunk) in scripts
    start = end + 1

# Print any remaining rows after the last split
if start < len(df):
    chunk = df.iloc[start:]
    total_tokens = sum(count_tokens(row["customer_receptive_response"]) + 10 for _, row in chunk.iterrows())
    print(f"\n--- Chunk {len(splits)+1} (rows {start} to {len(df)-1}) | Total tokens (with buffer): {total_tokens} ---")
    display(chunk)


--- Chunk 1 (rows 0 to 3) | Total tokens (with buffer): 213 ---


Unnamed: 0,call_id,call_text,customer_receptive_question,customer_receptive_response
0,Call1,"[Agent] ""Thank you for choosing Optimum Busine...",Did the customer seem receptive to the sales t...,"Yes, the customer appeared receptive to the sa..."
1,Call10,"[Agent] ""Thank you for choosing Optimum Busine...",Did the customer seem receptive to the sales t...,The customer appeared receptive to the sales t...
2,Call100,"[Agent] ""Good morning. Thank you for calling O...",Did the customer seem receptive to the sales t...,The customer appeared somewhat receptive but u...
3,Call101,"[Agent] ""Good morning. Thank you for calling O...",Did the customer seem receptive to the sales t...,"Yes, the customer seemed receptive to the sale..."



--- Chunk 2 (rows 4 to 4) | Total tokens (with buffer): 49 ---


Unnamed: 0,call_id,call_text,customer_receptive_question,customer_receptive_response
4,Call102,"[Agent] ""Hold on one second, hold on, do not d...",Did the customer seem receptive to the sales t...,"Yes, the customer appeared receptive to the sa..."


# Comparator
Given multiple dataframes (usually post splitter) analyze and compare shared columns to find similarities, differences, and key findings

In [43]:
import pandas as pd
from typing import List, Dict
from pydantic import BaseModel
from openai import OpenAI
from dotenv import load_dotenv
import os

# Load environment
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

client = OpenAI(api_key=api_key)

# -------------------------------
# Schema for structured comparison
# -------------------------------
class GroupSummary(BaseModel):
    group_name: str
    summary: str

class ComparisonOutput(BaseModel):
    introduction: str
    key_findings: List[str]
    similarities: List[str]
    differences: List[str]
    group_summaries: List[GroupSummary]

# -------------------------------
# Comparison function
# -------------------------------
def comparison_function(grouped_dfs: Dict[str, pd.DataFrame],
                        columns_to_analyze: List[str],
                        context_prompt: str) -> Dict:
    """
    Compare multiple groups of dataframes directly against each other.

    Args:
        grouped_dfs: dictionary of {group_name: dataframe}
        columns_to_analyze: list of column names to analyze
        context_prompt: guiding analysis question (e.g. "What makes sales calls effective?")
    
    Returns:
        Dict with overall comparison, similarities, differences, and group summaries
    """
    # Build compact summaries for each group
    group_texts = {}
    for group_name, df in grouped_dfs.items():
        sample_text = df[columns_to_analyze].astype(str).apply(lambda row: " | ".join(row), axis=1)
        combined_text = "\n".join(sample_text.tolist()[:150])  # cap records per group
        group_texts[group_name] = combined_text

    # Prepare prompt with all groups
    group_descriptions = "\n\n".join(
        [f"### {name}:\n{txt}" for name, txt in group_texts.items()]
    )

    messages = [
        {
            "role": "system",
            "content": """
            You are an expert data analyst. Your job is to compare groups of records,
            identify similarities and differences, and explain what variables contribute to outcomes.
            Always return valid JSON matching the schema.
            """,
        },
        {
            "role": "user",
            "content": f"""
            Context: {context_prompt}

            Here are the grouped records (truncated samples shown for each):

            {group_descriptions}

            Please provide:
            - overall_comparison: a narrative comparing all groups directly
            - similarities: what traits or variables appear across groups
            - differences: what distinguishes successful vs unsuccessful outcomes
            - group_summaries: return as a JSON list of objects, each with keys "group_name" and "summary"
            """,
        },
    ]

    try:
        response = client.responses.parse(
            model="gpt-4o-mini",
            input=messages,
            text_format=ComparisonOutput,
            temperature=0,
            max_output_tokens=800,
        )

        parsed: ComparisonOutput = response.output_parsed
        return parsed.model_dump()

    except Exception as e:
        print(f"Comparison failed: {e}")
        return {
            "overall_comparison": "No comparison generated",
            "similarities": "N/A",
            "differences": "N/A",
            "group_summaries": [{"group_name": name, "summary": "No summary"} for name in grouped_dfs.keys()]
        }
 

In [45]:
grouped_dfs = {
    "Definitive Sale": pd.read_csv("split_results_100calls_definitivesale.csv"),
    "Potential Interest": pd.read_csv("split_results_100calls_interest.csv"),
}

results = comparison_function(
    grouped_dfs=grouped_dfs,
    columns_to_analyze=["Sales_Technique_Used"],
    context_prompt="Compare sales call outcomes. What makes sales calls effective vs ineffective?"
)

# Print clean structured output
print("\n=== INTRODUCTION ===")
print(results["introduction"])

print("\n=== KEY FINDINGS ===")
for finding in results["key_findings"]:
    print(f"- {finding}")

print("\n=== SIMILARITIES ===")
for sim in results["similarities"]:
    print(f"- {sim}")

print("\n=== DIFFERENCES ===")
for diff in results["differences"]:
    print(f"- {diff}")

print("\n=== GROUP SUMMARIES ===")
for summary in results["group_summaries"]:
    print(f"[{summary['group_name']}] {summary['summary']}")



=== INTRODUCTION ===
This analysis compares the outcomes of sales calls categorized into two groups: 'Definitive Sale' and 'Potential Interest'. The goal is to identify the traits that contribute to effective versus ineffective sales calls.

=== KEY FINDINGS ===
- Effective sales calls heavily utilize account setup and verification processes.
- Upselling and cross-selling strategies are common in both successful and potential interest calls, but their execution differs.
- Customer relationship building is more prevalent in successful calls, indicating its importance in closing sales.

=== SIMILARITIES ===
- Both groups frequently employ account setup and verification processes.
- Upselling and cross-selling strategies are present in both groups.
- Value proposition and competitive comparison are utilized in both groups.

=== DIFFERENCES ===
- Definitive Sale calls place a stronger emphasis on customer relationship building and rapport establishment compared to Potential Interest calls

# --------------------Notes for Edits-------------------
- the column names should be an option for variables, allows for good modularity and rerunning, input column should be a variable, and string name of the output column should be a variable
- output should be a new df with new column