# Simulating interviews with prior survey respondents
This notebook provides sample [EDSL](https://docs.expectedparrot.com/) code for creating AI agents representing respondents of your existing survey data, and automating simulated follow-on interviews with them.

EDSL is an open-source libary for simulating surveys experiments with AI agents and language models. Please see our [documentation page](https://docs.expectedparrot.com/) for tips and tutorials on getting started.

## Importing prior survey data
We start by importing a dataset of responses from an existing survey. For purposes of demonstration, we use a set of mock responses to a survey about a popular product management newsletter that consisted of the following questions:

1. *Multiple choice:* **How often do you read the newsletter?** Options: Daily, Weekly, Monthly, Rarely, Never

2. *Free text:* **What topics would you like to see covered in future issues?**

3. *Linear scale:* **On a scale of 1 to 10, how would you rate the overall quality of the newsletter?** Options: 1-10

4. *Multiple choice:* **Which section of the newsletter do you find most valuable?** Options: Product updates, Industry news, Case studies, Tips and strategy

5. *Free text:* **What improvements would you suggest for the newsletter?**

Reading in the CSV:

In [1]:
import pandas as pd

df = pd.read_csv("product_manager_responses.csv")

Checking the questions:

In [2]:
df[["question_type", "question_text", "question_options"]].drop_duplicates()

Unnamed: 0,question_type,question_text,question_options
0,Multiple Choice,How often do you read the newsletter?,"Daily, Weekly, Monthly, Rarely, Never"
5,Free Text,What topics would you like to see covered in f...,
10,Linear Scale,"On a scale of 1 to 10, how would you rate the ...",1-10
15,Multiple Choice,Which section of the newsletter do you find mo...,"Product Updates, Industry News, Case Studies, ..."
20,Free Text,What improvements would you suggest for the ne...,


Inspecting a sample of the responses:

In [3]:
df[["question_text", "response"]].sample(5)

Unnamed: 0,question_text,response
21,What improvements would you suggest for the ne...,The visual design of the newsletter could be i...
11,"On a scale of 1 to 10, how would you rate the ...",7
17,Which section of the newsletter do you find mo...,Case Studies
18,Which section of the newsletter do you find mo...,Tips and Strategies
15,Which section of the newsletter do you find mo...,Tips and Strategies


## Create agents for the interview subjects
Here we create an `Agent` for each `respondent_id` and give it information about its responses to the original survey. We do this by reformatting each set of responses as a dictionary of `traits` that we pass to an agent:

In [4]:
# Change the set of all responses into a dictionary
all_responses = df.to_dict(orient="records") 

# Initialize a dictionary to store responses by respondent_id
respondents_dict = {}

# Iterate over each row in the responses
for row in all_responses:
    respondent_id = row['respondent_id']
    question_id = row['question_id']
    question_text = row['question_text']
    question_options = row['question_options']
    response = row['response']
    
    # Format each response as background information for the relevant respondent
    formatted_string = (
        f"You were asked: '{question_text}' "
        f"The answer options were: '{question_options}'. "
        f"You responded: '{response}'"
    )
    
    # Add the information to the respondent's dictionary
    if respondent_id not in respondents_dict:
        respondents_dict[respondent_id] = {}
    
    respondents_dict[respondent_id][question_id] = formatted_string

# Print the new dictionary 
print(respondents_dict)

{1: {1: "You were asked: 'How often do you read the newsletter?' The answer options were: 'Daily, Weekly, Monthly, Rarely, Never'. You responded: 'Weekly'", 2: "You were asked: 'What topics would you like to see covered in future issues?' The answer options were: 'nan'. You responded: 'I would love to see more in-depth analyses on emerging trends in product management. It would be beneficial to include case studies from various industries, as it helps to see how different strategies are implemented in real-world scenarios. Additionally, insights from seasoned product managers on how they tackle common challenges would be greatly appreciated.'", 3: "You were asked: 'On a scale of 1 to 10, how would you rate the overall quality of the newsletter?' The answer options were: '1-10'. You responded: '8'", 4: "You were asked: 'Which section of the newsletter do you find most valuable?' The answer options were: 'Product Updates, Industry News, Case Studies, Tips and Strategies'. You responded: 

Next we pass the dictionaries to `Agent` objects, together with information and instructions about the follow-on interviews that we want to conduct:

In [5]:
from edsl import Agent

import textwrap
from rich import print

# Interview topic
interview_topic = "Product Management Newsletter Follow-up Interview"

# Persona for the interview subjects
interview_subject_persona = textwrap.dedent(f"""\
You are a professional with a keen interest in product management.
""")

# Instructions for the interview subject agents
interview_subject_instructions = textwrap.dedent(f"""\
You recently completed a reader survey about the newsletter of a 
product management expert who also produces a popular blog and podcast. 
Now they are asking you some follow-on questions.
""")

# Persona for the interviewer agent
interviewer_persona = textwrap.dedent(f"""\
You are an well-known expert on product management who produces 
a popular newsletter, blog and podcast on product management.  
""")

# Instructions for the interviewer agent
interviewer_instructions = textwrap.dedent(f"""\
You recently conducted a reader survey about your newsletter. 
Now you are asking respondents some follow-on questions.
""")

# Total number of questions to ask in the interview
total_questions = 5 

In [6]:
# Initialize a list to store the agents
interview_subjects = []

# Iterate over the respondents' data
for respondent_id, questions in respondents_dict.items():
    
    # Initialize the traits for each agent
    traits = {}
    
    # Iterate over the questions
    for question_id, formatted_string in questions.items():
        traits[f"question_id_{question_id}"] = formatted_string
    
    # Create the agent and add it to the agents list
    agent = Agent(name = f"Respondent {respondent_id}", 
                  traits = traits, 
                  instruction = interview_subject_instructions)
    interview_subjects.append(agent)

interview_subjects

[Agent(name = 'Respondent 1', traits = {'question_id_1': "You were asked: 'How often do you read the newsletter?' The answer options were: 'Daily, Weekly, Monthly, Rarely, Never'. You responded: 'Weekly'", 'question_id_2': "You were asked: 'What topics would you like to see covered in future issues?' The answer options were: 'nan'. You responded: 'I would love to see more in-depth analyses on emerging trends in product management. It would be beneficial to include case studies from various industries, as it helps to see how different strategies are implemented in real-world scenarios. Additionally, insights from seasoned product managers on how they tackle common challenges would be greatly appreciated.'", 'question_id_3': "You were asked: 'On a scale of 1 to 10, how would you rate the overall quality of the newsletter?' The answer options were: '1-10'. You responded: '8'", 'question_id_4': "You were asked: 'Which section of the newsletter do you find most valuable?' The answer options

## Create an interviewer agent

In [7]:
# Create an agent for the interviewer 
interviewer = Agent(traits = {"persona":interviewer_persona},
                    instruction = interviewer_instructions)

## Methods for automating the interviews

In [8]:
from edsl.questions import QuestionFreeText
from edsl import Scenario, Model, Survey

# Selecting a language model to use
model = Model('gpt-4')

def get_next_question(subject, researcher, dialog_so_far):
    scenario = Scenario({'subject': str(subject.traits), 'dialog_so_far': dialog_so_far})
    meta_q = QuestionFreeText(
        question_name="next_question",
        question_text="""
        This is the background information for the interview subject: {{ subject }}
        This is your current dialog with the interview subject: {{ dialog_so_far }}
        What question you would ask the interview subject next?
        """
    )
    question_text = meta_q.by(model).by(researcher).by(scenario).run().select("next_question").first()
    return question_text

def get_response_to_question(question_text, subject, dialog_so_far):
    q_to_subject = QuestionFreeText(
        question_name="question",
        question_text=f"""
        This is your current dialog with the interview subject: {dialog_so_far}.
        You are now being asked:""" + question_text
    )
    response = q_to_subject.by(model).by(subject).run().select("question").first()
    return response

def ask_question(subject, researcher, dialog_so_far):
    question_text = get_next_question(subject, researcher, dialog_so_far)
    response = get_response_to_question(question_text, subject, dialog_so_far)

    print(" \nQuestion: \n\n" + question_text + "\n\nResponse: \n\n" + response)
    
    return {"question": question_text, "response": response}

def dialog_to_string(d):
    return "\n".join([f"Question: {d['question']}\nResponse: {d['response']}" for d in d])

def clean_dict(d):
    """Convert dictionary to string and remove braces."""
    return str(d).replace('{', '').replace('}', '')

def summarize_interview(subject, interview_topic, dialog_so_far, researcher):
    interview_subject_name = subject["name"]
    interview_subject_traits = subject["traits"]
    summary_q = QuestionFreeText(
        question_name = "summary",
        question_text = (
        f"You have just conducted the following interview of {interview_subject_name} "
        f"who has these traits: {clean_dict(interview_subject_traits)} "
        f"The topic of the interview was {interview_topic}. "
        f"Please draft a summary of the interview: {clean_dict(dialog_so_far)}")
    )
    themes_q = QuestionFreeText(
        question_name = "themes",
        question_text = "List the major themes of the interview."
    )
    survey = Survey([summary_q, themes_q]).set_full_memory_mode()
    results = survey.by(model).by(researcher).run()
    summary = results.select("summary").first()
    themes = results.select("themes").first()
    print("\n\nSummary:\n\n" + summary + "\n\nThemes:\n\n" + themes)

def conduct_interview(subject, researcher, interview_topic):

    print("\n\nInterview subject: " + subject["name"] + "\n\nInterview topic: " + interview_topic)
    
    dialog_so_far = []  
    
    for i in range(total_questions):
        result = ask_question(subject, researcher, dialog_to_string(dialog_so_far))
        dialog_so_far.append(result)
    
    summarize_interview(subject, interview_topic, dialog_so_far, researcher)

In [9]:
for interview_subject in interview_subjects:
    conduct_interview(interview_subject, interviewer, interview_topic)