# Simulating interviews with prior survey respondents
This notebook provides sample [EDSL](https://docs.expectedparrot.com/) code for creating AI agents representing respondents of your existing survey data, and automating simulated follow-on interviews with them.

EDSL is an open-source libary for simulating surveys experiments with AI agents and language models. Please see our [documentation page](https://docs.expectedparrot.com/) for tips and tutorials on getting started.

## Importing data
We start by importing a dataset of responses from an existing survey. For purposes of demonstration, we use a set of mock responses to a survey about a home construction newsletter that included the following questions:

In [1]:
import pandas as pd

df = pd.read_csv("newsletter_survey_responses.csv")

In [2]:
df[["question_type", "question_text", "question_options"]].drop_duplicates()

Unnamed: 0,question_type,question_text,question_options
0,Multiple Choice,How often do you read the newsletter?,"Daily, Weekly, Monthly, Rarely, Never"
5,Free Text,What topics would you like to see covered in f...,
10,Linear Scale,"On a scale of 1 to 10, how would you rate the ...",1-10
15,Multiple Choice,Which section of the newsletter do you find mo...,"Construction Tips, Product Reviews, Case Studi..."
20,Free Text,What improvements would you suggest for the ne...,


A sample of the responses:

In [3]:
df[["question_text", "response"]].sample(8)

Unnamed: 0,question_text,response
12,"On a scale of 1 to 10, how would you rate the ...",9
7,What topics would you like to see covered in f...,Case studies on innovative construction techni...
20,What improvements would you suggest for the ne...,"Adding more video content, such as tutorials a..."
0,How often do you read the newsletter?,Weekly
13,"On a scale of 1 to 10, how would you rate the ...",8
11,"On a scale of 1 to 10, how would you rate the ...",7
14,"On a scale of 1 to 10, how would you rate the ...",7
18,Which section of the newsletter do you find mo...,Construction Tips


## Creating agents for interview subjects
Next we create an `Agent` for each of the respondents and give it information about its responses to the original survey. We do this by turning each respondent's responses into a ditionary of `traits` that we pass to its agent, together with some background information and instructions about the follow-on interview task.

### Designing agent traits

In [4]:
# Change the set of all responses into a dictionary
all_responses = df.to_dict(orient="records")

# Initialize a dictionary to store responses by respondent_id
respondents_dict = {}

# Iterate over each row in the responses
for row in all_responses:
    respondent_id = row["respondent_id"]
    question_id = row["question_id"]
    question_text = row["question_text"]
    question_options = row["question_options"]
    response = row["response"]

    # Format each response as background information for the relevant respondent
    formatted_string = (
        f"You were asked: '{question_text}' "
        f"The answer options were: '{question_options}'. "
        f"You responded: '{response}'"
    )

    # Add the information to the respondent's dictionary
    if respondent_id not in respondents_dict:
        respondents_dict[respondent_id] = {}

    respondents_dict[respondent_id][question_id] = formatted_string

# Print the new dictionary
# print(respondents_dict)

### Adding instructions

In [5]:
# Importing the EDSL class for constructing AI agents
from edsl import Agent

import textwrap
from rich import print

# Interview topic
interview_topic = "home construction"
interview_title = "Home Construction Newsletter Follow-up Interview"

# Persona for the interview subjects
interview_subject_persona = textwrap.dedent(
    f"""\
You are a professional with a keen interest in {interview_topic}.
"""
)

# Instructions for the interview subject agents
interview_subject_instructions = textwrap.dedent(
    f"""\
You recently completed a reader survey about the newsletter of a 
{interview_topic} expert who also produces a popular blog and podcast. 
Now they are asking you some follow-on questions.
"""
)

# Persona for the interviewer agent
interviewer_persona = textwrap.dedent(
    f"""\
You are an well-known expert on {interview_topic} who produces 
a popular newsletter, blog and podcast on {interview_topic}.  
"""
)

# Instructions for the interviewer agent
interviewer_instructions = textwrap.dedent(
    f"""\
You recently conducted a reader survey about your newsletter on
{interview_topic}. Now you are asking respondents some follow-on questions.
"""
)

# Total number of questions to ask in the interview
total_questions = 5

### Constructing the agents

In [6]:
# Initialize a list to store the agents
interview_subjects = []

# Iterate over the respondents' data
for respondent_id, questions in respondents_dict.items():

    # Initialize the traits for each agent
    traits = {}

    # Iterate over the questions
    for question_id, formatted_string in questions.items():
        traits[f"question_id_{question_id}"] = formatted_string

    # Create the agent and add it to the agents list
    agent = Agent(
        name=f"Respondent {respondent_id}",
        traits=traits,
        instruction=interview_subject_instructions,
    )
    interview_subjects.append(agent)

# Inspecting the first one
interview_subjects[0]

## Creating an interviewer agent
We also create an agent for the interviewer using the relevant information and instructions that we've specified above:

In [7]:
# Create an agent for the interviewer
interviewer = Agent(
    traits={"persona": interviewer_persona}, instruction=interviewer_instructions
)

## Conducting the interviews
Here we use EDSL to create some methods for automating an interview between a respondent agent and the interviewer agent. The interview is designed as series of questions presented to the agents with information about what has already transpired. We optionally specify that GPT 4 should be used to generate the content. (Learn more about [selecting language models](https://docs.expectedparrot.com/en/latest/language_models.html) to use with EDSL surveys.)

In [8]:
# Importing the EDSL tools that we will use to administer the questions
from edsl.questions import QuestionFreeText
from edsl import Scenario, Model, Survey

# Selecting a language model to use
# Model.available()  # To see a list of all available models
model = Model("gpt-4")


def get_next_question(subject, researcher, dialog_so_far):
    scenario = Scenario(
        {"subject": str(subject.traits), "dialog_so_far": dialog_so_far}
    )
    meta_q = QuestionFreeText(
        question_name="next_question",
        question_text="""
        This is the background information for the interview subject: {{ subject }}
        This is your current dialog with the interview subject: {{ dialog_so_far }}
        What question you would ask the interview subject next?
        """,
    )
    question_text = (
        meta_q.by(model)
        .by(researcher)
        .by(scenario)
        .run()
        .select("next_question")
        .first()
    )
    return question_text


def get_response_to_question(question_text, subject, dialog_so_far):
    q_to_subject = QuestionFreeText(
        question_name="question",
        question_text=f"""
        This is your current dialog with the interview subject: {dialog_so_far}.
        You are now being asked:"""
        + question_text,
    )
    response = q_to_subject.by(model).by(subject).run().select("question").first()
    return response


def ask_question(subject, researcher, dialog_so_far):
    question_text = get_next_question(subject, researcher, dialog_so_far)
    response = get_response_to_question(question_text, subject, dialog_so_far)

    print(" \nQuestion: \n\n" + question_text + "\n\nResponse: \n\n" + response)

    return {"question": question_text, "response": response}


def dialog_to_string(d):
    return "\n".join(
        [f"Question: {d['question']}\nResponse: {d['response']}" for d in d]
    )


def clean_dict(d):
    """Convert dictionary to string and remove braces."""
    return str(d).replace("{", "").replace("}", "")


def summarize_interview(subject, interview_topic, dialog_so_far, researcher):
    interview_subject_name = subject["name"]
    interview_subject_traits = subject["traits"]
    summary_q = QuestionFreeText(
        question_name="summary",
        question_text=(
            f"You have just conducted the following interview of {interview_subject_name} "
            f"who has these traits: {clean_dict(interview_subject_traits)} "
            f"The topic of the interview was {interview_topic}. "
            f"Please draft a summary of the interview: {clean_dict(dialog_so_far)}"
        ),
    )
    themes_q = QuestionFreeText(
        question_name="themes", question_text="List the major themes of the interview."
    )
    survey = Survey([summary_q, themes_q]).set_full_memory_mode()
    results = survey.by(model).by(researcher).run()
    summary = results.select("summary").first()
    themes = results.select("themes").first()
    print("\n\nSummary:\n\n" + summary + "\n\nThemes:\n\n" + themes)


def conduct_interview(subject, researcher, interview_topic):

    print(
        "\n\nInterview subject: "
        + subject["name"]
        + "\n\nInterview topic: "
        + interview_topic
    )

    dialog_so_far = []

    for i in range(total_questions):
        result = ask_question(subject, researcher, dialog_to_string(dialog_so_far))
        dialog_so_far.append(result)

    summarize_interview(subject, interview_topic, dialog_so_far, researcher)

In [9]:
for interview_subject in interview_subjects:
    conduct_interview(interview_subject, interviewer, interview_topic)