# Conversation Generation with Persuasion Strategies

This notebook generates realistic insurance sales conversations where each agent utterance is annotated with one of 5 persuasion strategies:
- **logical**: Facts, data, comparisons, cost-benefit analysis
- **emotional**: Fears, peace of mind, security, past trauma
- **credibility**: Expert opinions, certifications, industry standards
- **personalization**: Tailored advice to specific customer profile
- **social_proof**: What similar customers do, popular choices, testimonials

In [1]:
def conversation_prompt(persona_json: str) -> str:
    """
    Generate the prompt for conversation creation with persuasion strategy annotations.
    
    Args:
        persona_json: JSON string representation of the customer persona
        
    Returns:
        Formatted prompt string for the LLM
    """
    conversation_prompt = f"""
You are an expert conversational AI agent for insurance sales and advisory.

You will be provided with a CUSTOMER PERSONA in JSON format.
Your task is to generate a realistic, long-horizon conversation between a user and an insurance agent based strictly on that persona.

The goal is to simulate how an agent would gradually guide the customer toward clarity and trust using different persuasion strategies.

----------------
GOAL OF THE CONVERSATION
----------------
Based on the information available in the persona:
- Reflect the customer's concerns, motivations, and past experiences
- Adapt the agent's responses to the customer's preferences and psychology
- Progress naturally over multiple turns
- Gradually converge toward an appropriate insurance recommendation

----------------
PERSUASION STRATEGIES
----------------
Each AGENT utterance must be annotated with EXACTLY ONE persuasion strategy from the list below:

- logical: Use facts, data, comparisons, cost-benefit analysis
- emotional: Address fears, peace of mind, security, past trauma
- credibility: Cite expert opinions, certifications, industry standards, trusted sources
- personalization: Tailor advice to specific customer profile, preferences, and situation
- social_proof: Reference what similar customers do, popular choices, testimonials

User utterances must NOT be annotated.

Do not repeat the same persuasion strategy in consecutive agent turns.

----------------
CONVERSATION RULES
----------------
- The conversation must be fully grounded in the given persona
- Do not invent traits that contradict the persona
- User messages should reflect the persona's mindset and concerns
- Agent messages should respect the customer's communication and decision style
- The tone should be natural, calm, and professional
- The conversation should feel realistic and human, not scripted

----------------
OUTPUT FORMAT
----------------
The output must be STRICTLY valid JSON and follow this exact structure:

{{
  "conversation_id": "C_017_2022",
  "turns": [
    {{
      "speaker": "user",
      "utterance": "..."
    }},
    {{
      "speaker": "agent",
      "utterance": "...",
      "persuasion_strategy": "logical"
    }}
  ]
}}

----------------
EXAMPLE CONVERSATION (FORMAT REFERENCE ONLY)
----------------
{{
  "conversation_id": "C_017_2022",
  "turns": [
    {{
      "speaker": "user",
      "utterance": "My renewal premium went up again this year. I'm thinking of switching."
    }},
    {{
      "speaker": "agent",
      "utterance": "That makes sense. Since cost is an important factor for you, I'll focus on plans that reduce your premium without removing essential coverage.",
      "persuasion_strategy": "logical"
    }},
    {{
      "speaker": "user",
      "utterance": "I just don't want another headache if I have to file a claim."
    }},
    {{
      "speaker": "agent",
      "utterance": "I understand that concern, especially given your past experience. This option is known for smoother claim processing and faster approvals.",
      "persuasion_strategy": "emotional"
    }},
    {{
      "speaker": "user",
      "utterance": "Okay, but I don't want a lot of extras."
    }},
    {{
      "speaker": "agent",
      "utterance": "Based on your city driving needs, I'll recommend a single plan with only roadside assistance and windshield cover to keep things simple and cost-effective.",
      "persuasion_strategy": "personalization"
    }}
  ]
}}

----------------
STRUCTURE REQUIREMENTS
----------------
- Generate 8 to 14 total turns
- conversation_id must be unique
- Maintain a logical progression across turns
- Ensure internal consistency with the persona
- Do not include markdown, explanations, or extra text

----------------
TASK
----------------
Using the CUSTOMER PERSONA provided below, generate ONE complete conversation in the exact JSON format shown.

CUSTOMER PERSONA:
{persona_json}

Now generate the conversation.
"""
    return conversation_prompt


In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch


model_name = "mistralai/Mistral-7B-Instruct-v0.2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)


2026-01-26 12:53:51.876464: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2026-01-26 12:53:51.890378: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2026-01-26 12:53:51.907059: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2026-01-26 12:53:51.912000: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2026-01-26 12:53:51.924447: I tensorflow/core/platform/cpu_feature_guar

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [3]:
def llm_generate(tokenizer, model, prompt: str, max_tokens: int = 2048):
    """Generate text using the LLM."""
    messages = f"<s>[INST] {prompt} [/INST]"

    inputs = tokenizer(messages, return_tensors="pt").to(model.device)

    outputs = model.generate(
        **inputs,
        max_new_tokens=max_tokens,
        do_sample=True,
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id
    )

    generated_ids = outputs[0][inputs.input_ids.shape[1]:]
    return tokenizer.decode(generated_ids, skip_special_tokens=True)


In [4]:
import json
import re
from pathlib import Path
from typing import Dict, List, Optional, Tuple


def extract_json_from_response(raw_output: str) -> str:
    """
    Extract JSON from LLM response that may contain markdown or extra text.
    
    Args:
        raw_output: Raw LLM response
        
    Returns:
        Extracted JSON string
    """
    # Try to find JSON between code blocks
    json_match = re.search(r'```(?:json)?\s*({.*?})\s*```', raw_output, re.DOTALL)
    if json_match:
        return json_match.group(1)
    
    # Try to find JSON directly
    json_match = re.search(r'({\s*"conversation_id".*})', raw_output, re.DOTALL)
    if json_match:
        return json_match.group(1)
    
    # Return original if no pattern matches
    return raw_output.strip()


def validate_conversation(conversation: Dict) -> Tuple[bool, Optional[str]]:
    """
    Validate the conversation structure and persuasion strategy annotations.
    
    Args:
        conversation: Parsed conversation JSON
        
    Returns:
        Tuple of (is_valid, error_message)
    """
    valid_strategies = {"logical", "emotional", "credibility", "personalization", "social_proof"}
    
    # Check required fields
    if "conversation_id" not in conversation:
        return False, "Missing 'conversation_id'"
    
    if "turns" not in conversation:
        return False, "Missing 'turns'"
    
    turns = conversation["turns"]
    if not isinstance(turns, list):
        return False, "'turns' must be a list"
    
    if len(turns) < 8 or len(turns) > 14:
        return False, f"Conversation must have 8-14 turns, got {len(turns)}"
    
    # Validate each turn
    prev_strategy = None
    for i, turn in enumerate(turns):
        if "speaker" not in turn:
            return False, f"Turn {i} missing 'speaker'"
        
        if "utterance" not in turn:
            return False, f"Turn {i} missing 'utterance'"
        
        speaker = turn["speaker"]
        if speaker not in ["user", "agent"]:
            return False, f"Turn {i} has invalid speaker: {speaker}"
        
        # Agent turns must have persuasion_strategy
        if speaker == "agent":
            if "persuasion_strategy" not in turn:
                return False, f"Agent turn {i} missing 'persuasion_strategy'"
            
            strategy = turn["persuasion_strategy"]
            if strategy not in valid_strategies:
                return False, f"Turn {i} has invalid strategy: {strategy}"
            
            # Check for consecutive same strategies
            if prev_strategy and strategy == prev_strategy:
                return False, f"Consecutive agent turns {i-1} and {i} use same strategy: {strategy}"
            
            prev_strategy = strategy
        else:
            # User turns should not have persuasion_strategy
            if "persuasion_strategy" in turn:
                return False, f"User turn {i} should not have 'persuasion_strategy'"
            prev_strategy = None  # Reset for proper consecutive check
    
    return True, None


def get_next_conversation_id(output_dir: str = "conversations") -> str:
    """
    Generate the next conversation ID based on existing files.
    
    Args:
        output_dir: Directory containing conversation files
        
    Returns:
        Next conversation ID (e.g., "C_001")
    """
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    
    max_id = 0
    for file_path in output_dir.glob("C_*.json"):
        match = re.search(r"C_(\d+)", file_path.name)
        if match:
            num = int(match.group(1))
            if num > max_id:
                max_id = num
    
    return f"C_{max_id + 1:03d}"


def save_conversation_json(raw_output: str, output_dir: str = "conversations") -> Tuple[Path, Dict]:
    """
    Parse, validate, and save a conversation JSON.
    
    Args:
        raw_output: Raw LLM output
        output_dir: Directory to save conversations
        
    Returns:
        Tuple of (file_path, conversation_dict)
    """
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    
    # Extract and parse JSON
    json_str = extract_json_from_response(raw_output)
    conversation = json.loads(json_str)
    
    # Validate conversation
    is_valid, error_msg = validate_conversation(conversation)
    if not is_valid:
        raise ValueError(f"Invalid conversation: {error_msg}")
    
    conversation_id = conversation["conversation_id"]
    file_path = output_dir / f"{conversation_id}.json"
    
    if file_path.exists():
        raise FileExistsError(
            f"Conversation {conversation_id} already exists. Use a different ID."
        )
    
    with open(file_path, "w", encoding="utf-8") as f:
        json.dump(conversation, f, indent=2)
    
    return file_path, conversation


In [5]:
def generate_conversation_from_persona(
    persona_path: str,
    tokenizer,
    model,
    output_dir: str = "conversations",
    max_retries: int = 3
) -> Tuple[Path, Dict]:
    """
    Generate a conversation from a persona file.
    
    Args:
        persona_path: Path to persona JSON file
        tokenizer: HuggingFace tokenizer
        model: HuggingFace model
        output_dir: Directory to save conversations
        max_retries: Maximum number of retries on validation failure
        
    Returns:
        Tuple of (file_path, conversation_dict)
    """
    # Load persona
    with open(persona_path, "r", encoding="utf-8") as f:
        persona = json.load(f)
    
    persona_json = json.dumps(persona, indent=2)
    prompt = conversation_prompt(persona_json)
    
    # Generate with retries
    for attempt in range(max_retries):
        try:
            print(f"Generating conversation (attempt {attempt + 1}/{max_retries})...")
            raw_output = llm_generate(tokenizer, model, prompt, max_tokens=2048)
            
            file_path, conversation = save_conversation_json(raw_output, output_dir)
            print(f"✓ Conversation saved: {file_path}")
            return file_path, conversation
            
        except (json.JSONDecodeError, ValueError, KeyError) as e:
            print(f"✗ Attempt {attempt + 1} failed: {e}")
            if attempt == max_retries - 1:
                print(f"\nFailed output:\n{raw_output}\n")
                raise
    
    raise RuntimeError("Failed to generate valid conversation after all retries")


def analyze_conversation_stats(conversation: Dict) -> Dict:
    """
    Analyze statistics of a conversation.
    
    Args:
        conversation: Conversation dictionary
        
    Returns:
        Dictionary of statistics
    """
    turns = conversation["turns"]
    
    strategy_counts = {}
    user_turns = 0
    agent_turns = 0
    
    for turn in turns:
        if turn["speaker"] == "user":
            user_turns += 1
        else:
            agent_turns += 1
            strategy = turn.get("persuasion_strategy")
            strategy_counts[strategy] = strategy_counts.get(strategy, 0) + 1
    
    return {
        "total_turns": len(turns),
        "user_turns": user_turns,
        "agent_turns": agent_turns,
        "strategy_distribution": strategy_counts
    }


In [6]:
def batch_generate_conversations(
    persona_dir: str = "personas",
    output_dir: str = "conversations",
    max_personas: Optional[int] = None
):
    """
    Generate conversations for all personas in a directory.
    
    Args:
        persona_dir: Directory containing persona JSON files
        output_dir: Directory to save conversations
        max_personas: Maximum number of personas to process (None for all)
    """
    persona_dir = Path(persona_dir)
    persona_files = sorted(persona_dir.glob("P_*.json"))
    
    if max_personas:
        persona_files = persona_files[:max_personas]
    
    print(f"Found {len(persona_files)} persona(s) to process\n")
    
    results = []
    errors = []
    
    for i, persona_path in enumerate(persona_files, 1):
        print(f"\n[{i}/{len(persona_files)}] Processing: {persona_path.name}")
        print("=" * 60)
        
        try:
            file_path, conversation = generate_conversation_from_persona(
                str(persona_path),
                tokenizer,
                model,
                output_dir
            )
            
            stats = analyze_conversation_stats(conversation)
            print(f"\nStats:")
            print(f"  Total turns: {stats['total_turns']}")
            print(f"  Strategy distribution: {stats['strategy_distribution']}")
            
            results.append({
                "persona": persona_path.name,
                "conversation": file_path.name,
                "stats": stats
            })
            
        except Exception as e:
            print(f"\n✗ ERROR: {e}")
            errors.append({
                "persona": persona_path.name,
                "error": str(e)
            })
    
    # Summary
    print("\n" + "=" * 60)
    print("SUMMARY")
    print("=" * 60)
    print(f"Successfully generated: {len(results)}")
    print(f"Failed: {len(errors)}")
    
    if errors:
        print("\nErrors:")
        for error in errors:
            print(f"  - {error['persona']}: {error['error']}")
    
    return results, errors


## Usage Examples

### Example 1: Generate a single conversation

In [None]:
# Generate conversation for a single persona
persona_path = "personas/P_001.json"

file_path, conversation = generate_conversation_from_persona(
    persona_path,
    tokenizer,
    model,
    output_dir="conversations"
)

print(f"\nGenerated conversation: {file_path}")
print(f"Conversation ID: {conversation['conversation_id']}")
print(f"Number of turns: {len(conversation['turns'])}")

# Display conversation
print("\n" + "=" * 60)
print("CONVERSATION")
print("=" * 60)
for i, turn in enumerate(conversation['turns'], 1):
    speaker = turn['speaker'].upper()
    utterance = turn['utterance']
    strategy = turn.get('persuasion_strategy', '')
    
    print(f"\n[Turn {i}] {speaker}")
    if strategy:
        print(f"Strategy: {strategy}")
    print(f"{utterance}")

# Display statistics
stats = analyze_conversation_stats(conversation)
print("\n" + "=" * 60)
print("STATISTICS")
print("=" * 60)
print(json.dumps(stats, indent=2))


### Example 2: Batch generate conversations for all personas

In [8]:
# Generate conversations for all personas
results, errors = batch_generate_conversations(
    persona_dir="personas",
    output_dir="conversations",
    max_personas=7  # Set to a number to limit, e.g., 5 for testing
)


Found 7 persona(s) to process


[1/7] Processing: P_001.json
Generating conversation (attempt 1/3)...



✗ ERROR: Conversation C_018_2022 already exists. Use a different ID.

[2/7] Processing: P_002.json
Generating conversation (attempt 1/3)...
✗ Attempt 1 failed: Invalid conversation: Turn 5 has invalid strategy: personalization, emotional
Generating conversation (attempt 2/3)...
✗ Attempt 2 failed: Invalid conversation: Turn 7 has invalid strategy: information_format
Generating conversation (attempt 3/3)...
✗ Attempt 3 failed: Invalid conversation: Turn 11 has invalid strategy: multiple_options

Failed output:
{
  "conversation_id": "C_022_2022",
  "turns": [
    {
      "speaker": "user",
      "utterance": "I'm looking for car insurance for the first time."
    },
    {
      "speaker": "agent",
      "utterance": "Great! As a first-time buyer, I'll guide you through the process while keeping your budget in consideration.",
      "persuasion_strategy": "personalization"
    },
    {
      "speaker": "user",
      "utterance": "What coverage should I get? I don't want to pay for thing

### Example 3: Validate an existing conversation

In [None]:
# Load and validate an existing conversation
conversation_path = "conversations/C_001.json"

with open(conversation_path, "r", encoding="utf-8") as f:
    conversation = json.load(f)

is_valid, error_msg = validate_conversation(conversation)

if is_valid:
    print(f"✓ Conversation is valid!")
    stats = analyze_conversation_stats(conversation)
    print(f"\nStatistics:")
    print(json.dumps(stats, indent=2))
else:
    print(f"✗ Validation failed: {error_msg}")
