# Using `edsl` to scale data labeling tasks

This notebook shows how to use `edsl` tools for simulating surveys with AI to perform complex data labeling tasks. This is accomplished with the following generalized steps: <br><br>

<blockquote>
1. We identify data to be labeled. <br>
2. We construct the data labeling tasks as a question or series of questions about the data, e.g., <i>Rate the clarity of the following text on a scale from 0 to 10: {{ text }}.</i> The questions can be qualitative or quantitative, and will be typical types of survey questions (multiple choice, free text, linear scale, etc.). <br>
3. We draft personas for AI agents to reference in responding to the questions, e.g., <i>You are an expert in ...</i> <br>
4. We administer the survey to the agents with the data as inputs to the questions. <br>
</blockquote>
<br>
<img src="general_survey.png">
<br><br>

## Scaling individualized data labeling
We can add a layer of complexity to this generalized flow by administering the survey to each agent with only data that is relevant to the agent's persona, e.g., if we want an agent with a particular background to evaluate only the data that pertains to that background. This can be useful if our data is already sorted in some way that is important to our task. We can also use the tools to sort the data as needed.

We can visualize this modified flow as follows:
<img src="agent_specific_survey.png">
<br><br>

## An example case: Evaluating job posts 
Using a dataset of job categories and job posts as an example, we show how to create AI agents with relevant backgrounds and prompt them to evaluate the job posts in a variety of ways. This exercise consists of the following steps:

<blockquote>
1. We use the tools to create a mock dataset, and show how to import a real dataset to use instead. <br>
2. We construct questions we will ask about each of the job posts and combine them into a survey. <br>
3. We create an AI agent with category expertise for each of the job categories. <br>
4. We administer the survey to agent with (only) the job posts for the relevant category. <br>
5. We show how to access the results using built-in print, SQL, dataframes and visualization methods. <br>
</blockquote>

Skip to any section:
<blockquote>
<a href="#Technical-setup" style="color:#4e4089">Technical setup</a><br>
<a href="#Constructing-data-labeling-tasks-as-questions" style="color:#4e4089">Constructing data labeling tasks as questions</a><br>
<a href="#Combining-questions-into-Surveys" style="color:#4e4089">Combining questions into Surveys</a><br>
<a href="#Creating-personas-for-Agents" style="color:#4e4089">Creating personas for Agents</a><br>
<a href="#Parameterizing-questions-with-Scenarios" style="color:#4e4089">Parameterizing questions with Scenarios</a><br>
<a href="#Running-the-survey" style="color:#4e4089">Running the survey</a><br>
<a href="#Accessing-Results" style="color:#4e4089">Accessing results</a><br>
</blockquote>

Please see our Getting Started page for more details on these methods and setting up the `edsl` tools:
<a href="https://www.goemeritus.com/getting-started">https://www.goemeritus.com/getting-started</a>

## Technical setup

Here we import the `edsl` tools that we'll use and select LLMs. We will be prompted to enter an API key. Press return to skip entering a key.

In [1]:
# ! pip install edsl

In [2]:
from edsl.questions import QuestionMultipleChoice, QuestionFreeText, QuestionLinearScale, QuestionList
from edsl import Scenario, Survey, Agent, Model
from edsl.results import Results

In [3]:
Model.available()

['claude-3-haiku-20240307',
 'claude-3-opus-20240229',
 'claude-3-sonnet-20240229',
 'dbrx-instruct',
 'gemini_pro',
 'gpt-3.5-turbo',
 'gpt-4-1106-preview',
 'llama-2-13b-chat-hf',
 'llama-2-70b-chat-hf',
 'mixtral-8x7B-instruct-v0.1']

The default model is now GPT4 (previously GPT-3.5-turbo):

In [4]:
model = Model()

No model name provided, using default model: gpt-4-1106-preview


Next we import a dataset. For purposes of this demo we use `edsl` to create a mock dataset.

In [5]:
# import csv
# data = []
# with open("data.csv", "r") as f: 
#     reader = csv.reader(f)
#     header = next(reader)
#     for row in reader: 
#         data.append(row)

Here we use the tools to create a dataset consisting of a column of job categories (3 different types) and a column of job posts for those categories (3 posts for each type). We'll go into more detail on the `edsl` methods that we use to do this in later steps.

In [6]:
# Skip this step and upload your real dataset, modifying columns as needed.

import pandas as pd 

def create_job_categories(num_categories, model):
    # Create a list of job categories
    q_job_categories = QuestionList(
        question_name = "job_categories",
        question_text = f"""{ num_categories } categories of jobs commonly posted at an 
        online labor marketplace (e.g., 'Graphic Design'). Return each category as an item of the list."""
    )
    job_categories_list = q_job_categories.by(model).run().select("job_categories").to_list()[0]
    return job_categories_list

def create_job_posts(num_posts, job_category, model):
    # Create job posts for a category
    q_job_posts = QuestionList(
        question_name = "job_posts",
        question_text = f"""Draft descriptions for { num_posts } job posts in the following 
        category of an online labor marketplace: { job_category }."""
    )
    job_posts_list = q_job_posts.by(model).run().select("job_posts").to_list()[0]
    return job_posts_list

def create_data(num_categories, num_posts, model):
    jobs_data = pd.DataFrame(columns=["job_category", "job_post"])
    job_categories_list = create_job_categories(num_categories, model)    
        
    for job_category in job_categories_list:
        # Because of how job posts are typically structured, we expect this to return a list with a
        # dict for each job post. We turn each job post dict into a string to add it to our dataset.
        job_posts_list = create_job_posts(num_posts, job_category, model)

        for job_post in job_posts_list:
            row_df = pd.DataFrame([[job_category, job_post]], columns=["job_category", "job_post"])
            jobs_data = pd.concat([jobs_data, row_df], ignore_index=True)
    
    return jobs_data

In [7]:
df = create_data(num_categories=3, num_posts=3, model=Model('gpt-4-1106-preview'))
print(df)

      job_category                                           job_post
0   Graphic Design  {'job_title': 'Freelance Graphic Designer', 'd...
1   Graphic Design  {'job_title': 'Senior Graphic Designer', 'desc...
2   Graphic Design  {'job_title': 'Graphic Design Intern', 'descri...
3  Web Development  {'title': 'Front-End Developer', 'description'...
4  Web Development  {'title': 'Back-End Developer', 'description':...
5  Web Development  {'title': 'Full Stack Developer', 'description...
6  Content Writing  {'job_title': 'Freelance Lifestyle Blogger', '...
7  Content Writing  {'job_title': 'Technical Content Writer', 'job...
8  Content Writing  {'job_title': 'SEO Content Writer', 'job_descr...


## Constructing data labeling tasks as `Questions`

Next we draft our data labeling tasks in the form of questions about the job posts. We choose relevant question types—multiple choice, linear scale, free text, numerical—and construct the questions with job categories and job posts as inputs.

In [8]:
q_specific_ls = QuestionLinearScale(
    question_name = "specific_ls",
    question_text = """
        Consider the following job category at an online labor marketplace: {{ job_category }}.
        Consider the following job post: {{ job_post }}.
        On a scale from 0 to 10, rate how specific the job post is compared with other job posts in the same category
        (0 = Very generic, 10 = Very specific).""",
    question_options = [0,1,2,3,4,5,6,7,8,9,10]
)

q_generic_ls = QuestionLinearScale(
    question_name = "generic_ls",
    question_text = """
        Consider the following job category at an online labor marketplace: {{ job_category }}.
        Consider the following job post: {{ job_post }}.
        On a scale from 0 to 10, rate how generic the job post is compared to other job posts in the same category
        (0 = Very specific, 10 = Very generic).""",
    question_options = [0,1,2,3,4,5,6,7,8,9,10]
)

q_specific_mc = QuestionMultipleChoice(
    question_name = "specific_mc",
    question_text = """
        Consider the following job category at an online labor marketplace: {{ job_category }}.
        Consider the following job post: {{ job_post }}.
        How generic or specific is the job post is compared with other job posts in the same category?""",
    question_options = [
        "Highly generic", 
        "Somewhat generic", 
        "Neither generic nor specific",
        "Somewhat specific",
        "Highly specific"]
)

## Combining questions into `Surveys`

Next we combine our questions into a survey that will be administered to the AI agents.

In [9]:
jobs_survey = Survey(questions = [q_specific_ls, q_generic_ls, q_specific_mc])

## Creating personas for `Agents`

Next we create descriptions for personas that we will assign to AI agents. For each job category we will construct an AI agent that is an expert in the category. 

We can use the `.example()` method to see how an `Agent` is constructed:

In [10]:
Agent.example()

Agent(traits = {'age': 22, 'hair': 'brown', 'height': 5.5})

An agent can also take an optional name and parameterized traits. For example:

In [11]:
job_category = "Web design"
base_persona = "You are an experienced freelancer on online labor marketplaces."
expertise = f"You regularly perform jobs in the following category: { job_category }."

In [12]:
job_category = "Graphic design"
example_agent = Agent(name = "Example agent", traits = {"base_persona": base_persona, "expertise": expertise})
example_agent.print()

## Parameterizing questions with `Scenarios`

Each agent will answer the survey for the set of job posts that is relevant to the agent's expertise. We do this by creating a "scenario" for each question. We can use the `example.()` method again to see how a `Scenario` is constructed:

In [13]:
Scenario.example()

{'persona': 'A reseacher studying whether LLMs can be used to generate surveys.'}

Here we show how to create a `Scenario` for each job category/job post pair in our dataset. (Note, however, that we will do this individually for each agent when we put it all together below, as we want each agent to only evaluate job posts in their category):

In [14]:
scenarios = [Scenario({"job_category": row["job_category"], "job_post": row["job_post"]}) for _, row in df.iterrows()]
scenarios[0].print()

## Running the survey

We administer our survey by appending the components and the `.run()` method. In the simplest case where we want a single agent or list of agents to answer all questions with the same scenarios, this takes the following form to generate a single `Results` object for the survey:

`results = survey.by(scenarios).by(agents).by(models).run()`

We modify this form as needed to have individual agents answer the questions for category-specific job posts. Here we create a list of `Results` objects for each agent/survey results:

In [15]:
def data_labeling(df, survey):
    results = {}
    job_categories = df["job_category"].unique()
    for job_category in job_categories:
        # print(job_category)
        
        # We create an agent with expertise in the job category
        base_persona = "You are an experienced freelancer on online labor marketplaces."
        expertise = f"You regularly perform jobs in the following category: { job_category }."
        agent = Agent(name = job_category, traits = {"base_persona":base_persona, "expertise":expertise})
        # agent.print()
    
        # We take the job posts in the job category as scenarios for the survey
        df_category = df[df["job_category"] == job_category]
        scenarios = [Scenario({"job_category": row["job_category"], "job_post": row["job_post"]}) for _, row in df_category.iterrows()]
        # print(scenarios)
        
        # We administer the survey to the agent with our selected LLM
        job_category_results = survey.by(scenarios).by(agent).by(model).run()
        # job_category_results.print()
        
        results[job_category] = job_category_results
        
    return results

In [16]:
results = data_labeling(df, jobs_survey)

## Accessing `Results`

In the previous step we created independent `Results` objects for our individual agents' survey results and stored them as a dict with keys = job categories for easy reference. (We also could have just created them separately, or as a list or some other convenient type.) In the next steps we should how to access results with built-in print and analytical methods.

In [17]:
len(results)

3

In [18]:
results.keys()

dict_keys(['Graphic Design', 'Web Development', 'Content Writing'])

In [19]:
type(results["Graphic Design"])

edsl.results.Results.Results

Here we inspect the full results for "Graphic Design" job posts:

In [20]:
results["Graphic Design"]

Result 0
                                                      Result                                                       
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Attribute              ┃ Value                                                                                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ agent                  │                                    Agent Attributes                                    │
│                        │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│                        │ ┃ Attribute               ┃ Value                                                    ┃ │
│                        │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│                        │ │ _name                   │ 'Graphic

We can identify the column names to select the fields that we want to inspect:

In [21]:
results["Graphic Design"].to_pandas().columns

Index(['agent.agent_name', 'agent.base_persona', 'agent.expertise',
       'answer.generic_ls', 'answer.generic_ls_comment', 'answer.specific_ls',
       'answer.specific_ls_comment', 'answer.specific_mc',
       'answer.specific_mc_comment', 'iteration.iteration',
       'model.frequency_penalty', 'model.logprobs', 'model.max_tokens',
       'model.model', 'model.presence_penalty', 'model.temperature',
       'model.top_logprobs', 'model.top_p', 'prompt.generic_ls_system_prompt',
       'prompt.generic_ls_user_prompt', 'prompt.specific_ls_system_prompt',
       'prompt.specific_ls_user_prompt', 'prompt.specific_mc_system_prompt',
       'prompt.specific_mc_user_prompt',
       'raw_model_response.generic_ls_raw_model_response',
       'raw_model_response.specific_ls_raw_model_response',
       'raw_model_response.specific_mc_raw_model_response',
       'scenario.job_category', 'scenario.job_post'],
      dtype='object')

We can select individual fields in a variety of ways:

In [22]:
(results["Graphic Design"]
 .select("job_post", "specific_ls", "generic_ls", "specific_mc")
 .print()
)

We can apply some labels to our table for readability. Note that each question field also automatically includes a `<question>_comment` field for any commentary by the LLM on the question:

In [23]:
(results["Graphic Design"]
 .select("job_post", "specific_mc", "specific_mc_comment")
 .print(pretty_labels = {
     "scenario.job_post":"Job post description",
     "answer.specific_mc":"How generic or specific? (Multiple choice)",
     "answer.specific_mc_comment":"Comment"})
)

We can also access results as a SQL table (called `self`) with the `.sql()` method, choosing between a "wide" horizontal view of all fields and a "long" vertical view, and optionally removing the column name prefixes 'agent', 'model', 'prompt', etc.:

In [24]:
results["Graphic Design"].sql("select * from self", shape="long")

Unnamed: 0,id,data_type,key,value
0,0,agent,base_persona,You are an experienced freelancer on online la...
1,0,agent,expertise,You regularly perform jobs in the following ca...
2,0,agent,agent_name,Graphic Design
3,0,scenario,job_category,Graphic Design
4,0,scenario,job_post,"{'job_title': 'Freelance Graphic Designer', 'd..."
...,...,...,...,...
91,2,raw_model_response,specific_mc_raw_model_response,{'id': 'chatcmpl-9CavpG9QlCMmD8ZHVlmkr6HVqu6lZ...
92,2,iteration,iteration,0
93,2,question_text,specific_ls_question_text,\n Consider the following job category ...
94,2,question_text,generic_ls_question_text,\n Consider the following job category ...


<br>

In [25]:
results["Graphic Design"].word_cloud_plot("specific_mc_comment")

In [26]:
results["Graphic Design"].bar_chart("specific_ls")