# Conversation Generation with Persuasion Strategies

This notebook generates realistic insurance sales conversations where each agent utterance is annotated with one of 5 persuasion strategies:
- **logical**: Facts, data, comparisons, cost-benefit analysis
- **emotional**: Fears, peace of mind, security, past trauma
- **credibility**: Expert opinions, certifications, industry standards
- **personalization**: Tailored advice to specific customer profile
- **social_proof**: What similar customers do, popular choices, testimonials

In [11]:
def conversation_prompt(persona_json: str) -> str:
    """
    Generate the prompt for conversation creation with persuasion strategy annotations.
    
    Args:
        persona_json: JSON string representation of the customer persona
        
    Returns:
        Formatted prompt string for the LLM
    """
    conversation_prompt = f"""
You are an expert conversational AI agent for insurance sales and advisory.

You will be provided with a CUSTOMER PERSONA in JSON format.
Your task is to generate a realistic, long-horizon conversation between a user and an insurance agent based strictly on that persona.

The goal is to simulate how an agent would gradually guide the customer toward clarity and trust using different persuasion strategies.

----------------
GOAL OF THE CONVERSATION
----------------
Based on the information available in the persona:
- Reflect the customer's concerns, motivations, and past experiences
- Adapt the agent's responses to the customer's preferences and psychology
- Progress naturally over multiple turns
- Gradually converge toward an appropriate insurance recommendation

----------------
PERSUASION STRATEGIES
----------------
Each AGENT utterance must be annotated with EXACTLY ONE persuasion strategy from the list below:

- logical: Use facts, data, comparisons, cost-benefit analysis
- emotional: Address fears, peace of mind, security, past trauma
- credibility: Cite expert opinions, certifications, industry standards, trusted sources
- personalization: Tailor advice to specific customer profile, preferences, and situation
- social_proof: Reference what similar customers do, popular choices, testimonials

User utterances must NOT be annotated.

Do not repeat the same persuasion strategy in consecutive agent turns.

----------------
CONVERSATION RULES
----------------
- The conversation must be fully grounded in the given persona
- Do not invent traits that contradict the persona
- User messages should reflect the persona's mindset and concerns
- Agent messages should respect the customer's communication and decision style
- The tone should be natural, calm, and professional
- The conversation should feel realistic and human, not scripted

----------------
OUTPUT FORMAT
----------------
The output must be STRICTLY valid JSON and follow this exact structure:

{{
  "conversation_id": "C_017_2022",
  "turns": [
    {{
      "speaker": "user",
      "utterance": "..."
    }},
    {{
      "speaker": "agent",
      "utterance": "...",
      "persuasion_strategy": "logical"
    }}
  ]
}}

----------------
EXAMPLE CONVERSATION (FORMAT REFERENCE ONLY)
----------------
{{
  "conversation_id": "C_017_2022",
  "turns": [
    {{
      "speaker": "user",
      "utterance": "My renewal premium went up again this year. I'm thinking of switching."
    }},
    {{
      "speaker": "agent",
      "utterance": "That makes sense. Since cost is an important factor for you, I'll focus on plans that reduce your premium without removing essential coverage.",
      "persuasion_strategy": "logical"
    }},
    {{
      "speaker": "user",
      "utterance": "I just don't want another headache if I have to file a claim."
    }},
    {{
      "speaker": "agent",
      "utterance": "I understand that concern, especially given how stressful claims can feel. This option is known for smoother claim processing and faster approvals.",
      "persuasion_strategy": "emotional"
    }},
    {{
      "speaker": "user",
      "utterance": "Okay, but I don't want a lot of extras."
    }},
    {{
      "speaker": "agent",
      "utterance": "Based on your city driving needs, I'll recommend a single plan with only roadside assistance and windshield cover to keep things simple and cost-effective.",
      "persuasion_strategy": "personalization"
    }},
    {{
      "speaker": "user",
      "utterance": "How do I know this insurer is reliable?"
    }},
    {{
      "speaker": "agent",
      "utterance": "That's a fair question. This insurer has consistently maintained a high claim settlement ratio over the past several years.",
      "persuasion_strategy": "credibility"
    }},
    {{
      "speaker": "user",
      "utterance": "I’ve heard some companies delay payouts."
    }},
    {{
      "speaker": "agent",
      "utterance": "Many customers with similar policies have shared positive experiences, especially appreciating the quick turnaround during claims.",
      "persuasion_strategy": "social_proof"
    }},
    {{
      "speaker": "user",
      "utterance": "I drive mostly short distances. Does that really matter?"
    }},
    {{
      "speaker": "agent",
      "utterance": "Yes, it does. Since your annual mileage is low, choosing a lower-risk profile helps reduce unnecessary premium costs.",
      "persuasion_strategy": "logical"
    }},
    {{
      "speaker": "user",
      "utterance": "What if something unexpected happens?"
    }},
    {{
      "speaker": "agent",
      "utterance": "That worry is completely natural. This plan still protects you from major financial shocks while avoiding over-insurance.",
      "persuasion_strategy": "emotional"
    }},
    {{
      "speaker": "user",
      "utterance": "I don't want to keep changing policies every year."
    }},
    {{
      "speaker": "agent",
      "utterance": "Given your preference for stability, this plan is designed with long-term renewability benefits and minimal premium jumps.",
      "persuasion_strategy": "personalization"
    }},
    {{
      "speaker": "user",
      "utterance": "Are these benefits actually documented?"
    }},
    {{
      "speaker": "agent",
      "utterance": "Yes, they are clearly stated in the policy wording and approved by regulatory authorities, ensuring transparency.",
      "persuasion_strategy": "credibility"
    }},
    {{
      "speaker": "user",
      "utterance": "Have other people like me chosen this plan?"
    }},
    {{
      "speaker": "agent",
      "utterance": "Absolutely. Many urban drivers with similar usage patterns have opted for this plan and renewed it without issues.",
      "persuasion_strategy": "social_proof"
    }},
    {{
      "speaker": "user",
      "utterance": "What happens if I sell my car next year?"
    }},
    {{
      "speaker": "agent",
      "utterance": "In that case, the policy allows easy transfer or cancellation, helping you avoid unnecessary financial loss.",
      "persuasion_strategy": "logical"
    }},
    {{
      "speaker": "user",
      "utterance": "I just want peace of mind."
    }},
    {{
      "speaker": "agent",
      "utterance": "That’s completely understandable. The goal here is to give you confidence that you’re protected without constant worry.",
      "persuasion_strategy": "emotional"
    }},
    {{
      "speaker": "user",
      "utterance": "Alright, this does sound reasonable."
    }},
    {{
      "speaker": "agent",
      "utterance": "Based on everything you've shared, this option aligns well with your budget, driving habits, and desire for simplicity.",
      "persuasion_strategy": "personalization"
    }}
  ]
}}


----------------
STRUCTURE REQUIREMENTS
----------------
- Generate 20 to 30 total turns
- conversation_id must be unique
- Maintain a logical progression across turns
- Ensure internal consistency with the persona
- Do not include markdown, explanations, or extra text

----------------
TASK
----------------
Using the CUSTOMER PERSONA provided below, generate ONE complete conversation in the exact JSON format shown.

CUSTOMER PERSONA:
{persona_json}

Now generate the conversation.
"""
    return conversation_prompt


In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch


model_name = "mistralai/Mistral-7B-Instruct-v0.2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)


2026-01-29 12:13:09.211861: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2026-01-29 12:13:09.226276: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2026-01-29 12:13:09.245712: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2026-01-29 12:13:09.251368: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2026-01-29 12:13:09.265919: I tensorflow/core/platform/cpu_feature_guar

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [12]:
def llm_generate(tokenizer, model, prompt: str, max_tokens: int = 2048):
    """Generate text using the LLM."""
    messages = f"<s>[INST] {prompt} [/INST]"

    inputs = tokenizer(messages, return_tensors="pt").to(model.device)

    outputs = model.generate(
        **inputs,
        max_new_tokens=max_tokens,
        do_sample=True,
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id
    )

    generated_ids = outputs[0][inputs.input_ids.shape[1]:]
    return tokenizer.decode(generated_ids, skip_special_tokens=True)


In [13]:
def save_conversation_json(raw_output: str, persona_id: str, output_dir: str = "conversations") -> Tuple[Path, Dict]:
    """
    Parse, validate, and save a conversation JSON.
    Generates unique ID in format: C_{persona_id}_{counter}
    
    Args:
        raw_output: Raw LLM output
        persona_id: Persona ID (e.g., 'P_001')
        output_dir: Directory to save conversations
        
    Returns:
        Tuple of (file_path, conversation_dict)
    """
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    
    # Extract and parse JSON
    json_str = extract_json_from_response(raw_output)
    conversation = json.loads(json_str)
    
    # Validate conversation
    is_valid, error_msg = validate_conversation(conversation)
    if not is_valid:
        raise ValueError(f"Invalid conversation: {error_msg}")
    
    # Clean persona_id (remove underscores: P_001 -> P001)
    clean_persona_id = persona_id.replace("_", "")
    
    # Generate unique conversation ID based on persona and counter
    # Format: C_{clean_persona_id}_{counter:03d} (e.g., C_P001_001, C_P001_002)
    counter = 1
    while True:
        conversation_id = f"C_{clean_persona_id}_{counter:03d}"
        file_path = output_dir / f"{conversation_id}.json"
        if not file_path.exists():
            break
        counter += 1
    
    # Update conversation with generated ID
    conversation["conversation_id"] = conversation_id
    
    # Save the conversation
    with open(file_path, "w", encoding="utf-8") as f:
        json.dump(conversation, f, indent=2)
    
    return file_path, conversation


In [14]:
def generate_conversation_from_persona(
    persona_path: str,
    tokenizer,
    model,
    output_dir: str = "conversations",
    max_retries: int = 3
) -> Tuple[Path, Dict]:
    """
    Generate a conversation from a persona file.
    
    Args:
        persona_path: Path to persona JSON file
        tokenizer: HuggingFace tokenizer
        model: HuggingFace model
        output_dir: Directory to save conversations
        max_retries: Maximum number of retries on validation failure
        
    Returns:
        Tuple of (file_path, conversation_dict)
    """
    # Load persona
    with open(persona_path, "r", encoding="utf-8") as f:
        persona = json.load(f)
    
    # Extract persona_id
    persona_id = persona.get("persona_id", Path(persona_path).stem)
    
    persona_json = json.dumps(persona, indent=2)
    prompt = conversation_prompt(persona_json)
    
    # Generate with retries
    for attempt in range(max_retries):
        try:
            print(f"Generating conversation (attempt {attempt + 1}/{max_retries})...")
            raw_output = llm_generate(tokenizer, model, prompt, max_tokens=3072)
            
            # Pass persona_id to save function
            file_path, conversation = save_conversation_json(raw_output, persona_id, output_dir)
            print(f"✓ Conversation saved: {file_path}")
            return file_path, conversation
            
        except (json.JSONDecodeError, ValueError, KeyError) as e:
            print(f"✗ Attempt {attempt + 1} failed: {e}")
            if attempt == max_retries - 1:
                print(f"\nFailed output:\n{raw_output}\n")
                raise
    
    raise RuntimeError("Failed to generate valid conversation after all retries")


def analyze_conversation_stats(conversation: Dict) -> Dict:
    """
    Analyze statistics of a conversation.
    
    Args:
        conversation: Conversation dictionary
        
    Returns:
        Dictionary of statistics
    """
    turns = conversation["turns"]
    
    strategy_counts = {}
    user_turns = 0
    agent_turns = 0
    
    for turn in turns:
        if turn["speaker"] == "user":
            user_turns += 1
        else:
            agent_turns += 1
            strategy = turn.get("persuasion_strategy")
            strategy_counts[strategy] = strategy_counts.get(strategy, 0) + 1
    
    return {
        "total_turns": len(turns),
        "user_turns": user_turns,
        "agent_turns": agent_turns,
        "strategy_distribution": strategy_counts
    }


In [15]:
def batch_generate_conversations(
    persona_dir: str = "personas",
    output_dir: str = "conversations",
    max_personas: Optional[int] = None
):
    """
    Generate conversations for all personas in a directory.
    
    Args:
        persona_dir: Directory containing persona JSON files
        output_dir: Directory to save conversations
        max_personas: Maximum number of personas to process (None for all)
    """
    persona_dir = Path(persona_dir)
    persona_files = sorted(persona_dir.glob("P_*.json"))
    
    if max_personas:
        persona_files = persona_files[:max_personas]
    
    print(f"Found {len(persona_files)} persona(s) to process\n")
    
    results = []
    errors = []
    
    for i, persona_path in enumerate(persona_files, 1):
        print(f"\n[{i}/{len(persona_files)}] Processing: {persona_path.name}")
        print("=" * 60)
        
        try:
            file_path, conversation = generate_conversation_from_persona(
                str(persona_path),
                tokenizer,
                model,
                output_dir
            )
            
            stats = analyze_conversation_stats(conversation)
            print(f"\nStats:")
            print(f"  Total turns: {stats['total_turns']}")
            print(f"  Strategy distribution: {stats['strategy_distribution']}")
            
            results.append({
                "persona": persona_path.name,
                "conversation": file_path.name,
                "stats": stats
            })
            
        except Exception as e:
            print(f"\n✗ ERROR: {e}")
            errors.append({
                "persona": persona_path.name,
                "error": str(e)
            })
    
    # Summary
    print("\n" + "=" * 60)
    print("SUMMARY")
    print("=" * 60)
    print(f"Successfully generated: {len(results)}")
    print(f"Failed: {len(errors)}")
    
    if errors:
        print("\nErrors:")
        for error in errors:
            print(f"  - {error['persona']}: {error['error']}")
    
    return results, errors


## Usage Examples

### Example 1: Generate a single conversation

In [None]:
# Generate conversation for a single persona
persona_path = "personas/P_001.json"

file_path, conversation = generate_conversation_from_persona(
    persona_path,
    tokenizer,
    model,
    output_dir="conversations"
)

print(f"\nGenerated conversation: {file_path}")
print(f"Conversation ID: {conversation['conversation_id']}")
print(f"Number of turns: {len(conversation['turns'])}")

# Display conversation
print("\n" + "=" * 60)
print("CONVERSATION")
print("=" * 60)
for i, turn in enumerate(conversation['turns'], 1):
    speaker = turn['speaker'].upper()
    utterance = turn['utterance']
    strategy = turn.get('persuasion_strategy', '')
    
    print(f"\n[Turn {i}] {speaker}")
    if strategy:
        print(f"Strategy: {strategy}")
    print(f"{utterance}")

# Display statistics
stats = analyze_conversation_stats(conversation)
print("\n" + "=" * 60)
print("STATISTICS")
print("=" * 60)
print(json.dumps(stats, indent=2))


### Example 2: Batch generate conversations for all personas

In [16]:
# Generate conversations for all personas
results, errors = batch_generate_conversations(
    persona_dir="personas",
    output_dir="conversations",
    max_personas=5  # Set to a number to limit, e.g., 5 for testing
)


Found 5 persona(s) to process


[1/5] Processing: P_001.json
Generating conversation (attempt 1/3)...
✓ Conversation saved: conversations/C_P001_001.json

Stats:
  Total turns: 22
  Strategy distribution: {'emotional': 2, 'logical': 5, 'personalization': 3, 'credibility': 1}

[2/5] Processing: P_002.json
Generating conversation (attempt 1/3)...
✓ Conversation saved: conversations/C_P002_001.json

Stats:
  Total turns: 22
  Strategy distribution: {'personalization': 4, 'logical': 4, 'credibility': 1, 'emotional': 2}

[3/5] Processing: P_003.json
Generating conversation (attempt 1/3)...
✓ Conversation saved: conversations/C_P003_001.json

Stats:
  Total turns: 26
  Strategy distribution: {'personalization': 7, 'logical': 3, 'emotional': 2, 'credibility': 1}

[4/5] Processing: P_004.json
Generating conversation (attempt 1/3)...
✗ Attempt 1 failed: Invalid conversation: Turn 21 has invalid strategy: multiple_options
Generating conversation (attempt 2/3)...
✓ Conversation saved: conversatio

### Example 3: Validate an existing conversation

In [None]:
# Load and validate an existing conversation
conversation_path = "conversations/C_001.json"

with open(conversation_path, "r", encoding="utf-8") as f:
    conversation = json.load(f)

is_valid, error_msg = validate_conversation(conversation)

if is_valid:
    print(f"✓ Conversation is valid!")
    stats = analyze_conversation_stats(conversation)
    print(f"\nStatistics:")
    print(json.dumps(stats, indent=2))
else:
    print(f"✗ Validation failed: {error_msg}")
