## Loading Conversations from DataFrames

In some cases, conversation data might be available in DataFrames rather than JSON. 
For example, you might have a DataFrame for messages and another for outcomes.
This section demonstrates how to create Conversation objects from these separate DataFrames.

## Setup and Imports

In [1]:
# Import necessary libraries
import pandas as pd

# Import Agentune simulate components
from agentune.simulate.models import Conversation, Message, Outcome, ParticipantRole

## Create Sample Conversation Data

First, let's create a fabricated sample dataset that mimics the structure of real conversation data.

In [2]:
# Create sample DataFrames that might come from a database or CSV files
# First, let's create a DataFrame for messages
messages_df = pd.DataFrame([
    {'conversation_id': 'conv_001', 'sender': 'customer', 'content': 'I received a damaged product and need a replacement', 'timestamp': '2024-05-10T09:15:00.000000'},
    {'conversation_id': 'conv_001', 'sender': 'agent', 'content': 'I apologize for the inconvenience. We can arrange a replacement right away.', 'timestamp': '2024-05-10T09:17:30.000000'},
    {'conversation_id': 'conv_001', 'sender': 'customer', 'content': 'Please do, and I expect a refund on the delivery fee as well.', 'timestamp': '2024-05-10T09:21:05.000000'},
    {'conversation_id': 'conv_002', 'sender': 'customer', 'content': 'Is your warranty transferable if I sell the product?', 'timestamp': '2024-05-15T14:35:22.000000'},
    {'conversation_id': 'conv_002', 'sender': 'agent', 'content': 'Yes, our warranty stays with the product for the full term regardless of ownership changes.', 'timestamp': '2024-05-15T14:38:45.000000'},
    {'conversation_id': 'conv_002', 'sender': 'customer', 'content': 'No, that\'s all. Thanks again!', 'timestamp': '2024-05-15T14:42:20.000000'}
])

# Now, let's create a DataFrame for outcomes
outcomes_df = pd.DataFrame([
    {'conversation_id': 'conv_001', 'name': 'resolved', 'description': 'Issue was successfully resolved'},
    {'conversation_id': 'conv_002', 'name': 'unresolved', 'description': 'Issue was not resolved'}
])

### Display the DataFrames

In [3]:
# Display the DataFrames
print("Messages DataFrame:")
messages_df.head()

Messages DataFrame:


Unnamed: 0,conversation_id,sender,content,timestamp
0,conv_001,customer,I received a damaged product and need a replac...,2024-05-10T09:15:00.000000
1,conv_001,agent,I apologize for the inconvenience. We can arra...,2024-05-10T09:17:30.000000
2,conv_001,customer,"Please do, and I expect a refund on the delive...",2024-05-10T09:21:05.000000
3,conv_002,customer,Is your warranty transferable if I sell the pr...,2024-05-15T14:35:22.000000
4,conv_002,agent,"Yes, our warranty stays with the product for t...",2024-05-15T14:38:45.000000


In [4]:
print("Outcomes DataFrame:")
outcomes_df.head()

Outcomes DataFrame:


Unnamed: 0,conversation_id,name,description
0,conv_001,resolved,Issue was successfully resolved
1,conv_002,unresolved,Issue was not resolved


### Functions for Loading and Processing Conversation Data

In [5]:
def create_conversations_from_dataframes(
    messages_df: pd.DataFrame,
    outcomes_df: pd.DataFrame
) -> list[Conversation]:
    """
    Convert message and outcome DataFrames into Conversation objects.
    Simplified version using patterns from utils.py
    """
    conversations = []

    # Group by conversation_id, similar to load_conversations_from_csv
    for conv_id, group in messages_df.groupby('conversation_id'):
        # Sort by timestamp to ensure message order
        group = group.sort_values('timestamp')

        # Create messages using the same logic as utils.py
        messages = []
        for _, row in group.iterrows():
            # Reuse the sender conversion logic from utils.py
            sender = ParticipantRole.CUSTOMER if row['sender'].lower() == 'customer' else ParticipantRole.AGENT

            message = Message(
                sender=sender,
                content=str(row['content']),
                timestamp=pd.to_datetime(row['timestamp']).to_pydatetime()
            )
            messages.append(message)

        # Get outcome for this conversation
        outcome_row = outcomes_df[outcomes_df['conversation_id'] == conv_id]
        outcome = None
        if not outcome_row.empty:
            first_outcome = outcome_row.iloc[0]
            outcome = Outcome(
                name=str(first_outcome['name']),
                description=str(first_outcome['description'])
            )

        # Create conversation
        conversation = Conversation(
            messages=tuple(messages),
            outcome=outcome
        )
        conversations.append(conversation)
    
    return conversations

## Generate and Save Sample Data

In [6]:
# Convert the DataFrames to Conversation objects
conversations = create_conversations_from_dataframes(messages_df, outcomes_df)

# Display the resulting DataFrame
print(f"Created {len(conversations)} conversations from DataFrames")

Created 2 conversations from DataFrames


In [None]:
# Now with the conversations in the right format, we can load them into a vector store and run simulations