# Data labeling with LLMs, validating with humans
This notebook provides example [EDSL](https://github.com/expectedparrot/edsl) code for conducting a data labeling task with large language models and validating responses with humans.
The example below consists of the following steps, which can be conducted entirely in EDSL code or interactively at your [Coop account](https://www.expectedparrot.com):

* Construct questions about a dataset, using a placeholder in each question for the individual piece of data to be labeled (each answer is a "label" for a piece of data)
* Combine the questions in a survey to administer them together
* *Optionally* create AI agent personas to answer the questions (e.g., if there is relevant expertise or background for the task)
* Select language models to generate the answers (for the agents, or without referencing any AI personas)
* Run the survey with the data, agents and models to generate a formatted dataset of results
* Select questions and data that you want to validate with humans to create a subset of your survey (or leave it unchanged to run the entire survey with humans)
* Send a web-based version of the survey to human respondents
* Compare LLM and human answers, and iterate on the data labeling survey as needed!

Before running the code below please see instructions on [getting started](https://www.expectedparrot.com/en/latest/getting-started) using Expected Parrot tools for AI research.

## Construct questions about a dataset
We start by creating questions about a dataset, where each answer will provide a "label" for each piece of data. 
EDSL comes with many [common question types](https://docs.expectedparrot.com/en/latest/questions.html) that we can choose from based on the form of the response that we want to get back from a model (multiple choice, linear scale, matrix, etc.).

We use a "scenario" placeholder in each question text for data that we want to add to it.
This method allows us to efficiently readminister a question for each piece of data.
[Scenarios](https://docs.expectedparrot.com/en/latest/scenarios.html) can be created from many types of data, including PNG, PDF, CSV, docs, lists, tables, videos, and other types.

We combine the questions in a [survey](https://docs.expectedparrot.com/en/latest/surveys.html) in order to administer them together, asynchronously by default, or else according to any [logic or rules](https://docs.expectedparrot.com/en/latest/surveys.html#survey-rules-logic) that we want to add (e.g., skip/stop rules).

In [5]:
from edsl import ScenarioList, QuestionList, QuestionNumerical, Survey

q1 = QuestionList(
    question_name = "characters",
    question_text = "Name all of the characters in this show: {{ scenario.show }}"
)

q2 = QuestionNumerical(
    question_name = "years",
    question_text = "Identify the year this show first aired: {{ scenario.show }}"
)

scenarios = ScenarioList.from_source("list", "show", ["The Simpsons", "South Park", "I Love Lucy"])

questions = q1.loop(scenarios) + q2.loop(scenarios)

survey = Survey(questions)

## Generate data "labels" using LLMs
EDSL allows us to [specify the models](https://docs.expectedparrot.com/en/latest/language_models.html) that we want to use to answer the questions, and optionally [design AI agent personas](https://docs.expectedparrot.com/en/latest/agents.html) for the models to reference in answering the questions.
This can be useful if you want to reference specific expertise that is relevant to the labeling task.

We administer the questions by adding the scenarios, agents and models to the survey and calling the `run()` method.
This generates a formatted dataset of `Results` that we can analyze with [built-in methods for working with results](https://docs.expectedparrot.com/en/latest/results.html).

In [6]:
survey

question_name,question_text,question_type
characters_0,Name all of the characters in this show: The Simpsons,list
characters_1,Name all of the characters in this show: South Park,list
characters_2,Name all of the characters in this show: I Love Lucy,list
years_0,Identify the year this show first aired: The Simpsons,numerical
years_1,Identify the year this show first aired: South Park,numerical
years_2,Identify the year this show first aired: I Love Lucy,numerical


In [3]:
len(survey)

6

In [4]:
from edsl import Agent, AgentList, Model, ModelList

agents = AgentList([
    Agent(traits = {"persona":"You watch a lot of TV."})
])

models = ModelList([
    Model("gemini-2.5-flash", service_name = "google"),
    Model("gpt-4o", service_name = "openai")
])

results = survey.by(scenarios).by(agents).by(models).run()

‚ùå E[ü¶É] EDSL ERROR: JobsCompatibilityError: Scenario with fields {'show'} is attached but none of these fields are used in any question. At least one scenario field must be referenced in the survey.


Exception raised when there are compatibility issues between components.

    This exception indicates that the components being used together (like
    surveys and scenarios) are not compatible with each other for the requested
    operation.

    To fix this error:
    1. Check that your survey and scenario are compatible
    2. Ensure all referenced questions exist in the survey
    3. Verify that scenario fields match expected inputs for questions


For more information, see: https://docs.expectedparrot.com/en/latest/jobs.html


Results are accessible at your Coop account (see link above ^) and at your workspace. 
We can inspect a list of all the components of the results:

In [7]:
results = survey.by(agents).by(models).run()

Service,Model,Input Tokens,Input Cost,Output Tokens,Output Cost,Total Cost,Total Credits
google,gemini-2.5-flash,683,$0.0003,892,$0.0023,$0.0026,0.26
openai,gpt-4o,704,$0.0018,494,$0.0050,$0.0068,0.68
Totals,Totals,1387,$0.0021,1386,$0.0073,$0.0094,0.94


In [8]:
results.columns

0
agent.agent_index
agent.agent_instruction
agent.agent_name
agent.persona
answer.characters_0
answer.characters_1
answer.characters_2
answer.years_0
answer.years_1
answer.years_2


Here we select components to display in a table:

In [9]:
results.select("model", "persona", "characters_0", "years_0", "characters_1", "years_1", "characters_2", "years_2")

model.model,agent.persona,answer.characters_0,answer.years_0,answer.characters_1,answer.years_1,answer.characters_2,answer.years_2
gemini-2.5-flash,You watch a lot of TV.,"['Homer Simpson', 'Marge Simpson', 'Bart Simpson', 'Lisa Simpson', 'Maggie Simpson', 'Abraham', 'Grampa', 'Simpson', ""Santa's Little Helper"", 'Snowball II', 'Ned Flanders', 'Rod Flanders', 'Todd Flanders', 'Moe Szyslak', 'Barney Gumble', 'Apu Nahasapeemapetilon', 'Manjula Nahasapeemapetilon', 'Principal Seymour Skinner', 'Edna Krabappel', 'Milhouse Van Houten', 'Kirk Van Houten', 'Luann Van Houten', 'Nelson Muntz', 'Ralph Wiggum', 'Chief Wiggum', 'Sarah Wiggum', 'Mr. Burns', 'Waylon Smithers', 'Krusty the Clown', 'Sideshow Bob', 'Sideshow Mel', 'Comic Book Guy (Jeff Albertson)', 'Groundskeeper Willie', 'Otto Mann', 'Patty Bouvier', 'Selma Bouvier', 'Lenny Leonard', 'Carl Carlson', 'Professor Frink', 'Dr. Hibbert', 'Mayor Quimby', 'Superintendent Chalmers', 'Cletus Spuckler', 'Brandine Spuckler', 'Kang', 'Kodos', 'Reverend Lovejoy', 'Helen Lovejoy', 'Fat Tony', 'Luigi Risotto', 'Bumblebee Man', 'Disco Stu', 'Squeaky-Voiced Teen', 'Agnes Skinner', 'Duffman', 'Hans Moleman', 'Jasper Beardly', 'Kearney Zzyzwicz', 'Dolph Starbeam', 'Jimbo Jones', 'Ms. Hoover', 'Wendell Borton', 'Sherri and Terri', 'Martin Prince', 'Rich Texan', 'Captain Horatio McCallister', 'Cookie Kwan', 'Akira', 'Dr. Nick Riviera', 'Lionel Hutz', 'Troy McClure', 'Bleeding Gums Murphy', 'Frank Grimes', 'Artie Ziff', 'Gil Gunderson', 'Mr. Teeny']",1989,"['Stan Marsh', 'Kyle Broflovski', 'Eric Cartman', 'Kenny McCormick', 'Butters Stotch', 'Randy Marsh', 'Sharon Marsh', 'Shelly Marsh', 'Gerald Broflovski', 'Sheila Broflovski', 'Ike Broflovski', 'Liane Cartman', 'Mr. Garrison', 'Mr. Mackey', 'Chef', 'Wendy Testaburger', 'Jimmy Valmer', 'Timmy Burch', 'Craig Tucker', 'Tweek Tweak', 'Token Black', 'Clyde Donovan', 'Bebe Stevens', 'PC Principal', 'Mr. Slave', 'Towelie', 'Jesus', 'Satan', 'ManBearPig', 'Officer Barbrady']",1997,"['Lucy Ricardo', 'Ricky Ricardo', 'Ethel Mertz', 'Fred Mertz', 'Little Ricky', 'Mrs. Trumbull']",1951
gpt-4o,You watch a lot of TV.,"['Homer Simpson', 'Marge Simpson', 'Bart Simpson', 'Lisa Simpson', 'Maggie Simpson', 'Abe Simpson', 'Ned Flanders', 'Milhouse Van Houten', 'Mr. Burns', 'Waylon Smithers', 'Krusty the Clown', 'Principal Skinner', 'Moe Szyslak', 'Barney Gumble', 'Chief Wiggum', 'Ralph Wiggum', 'Apu Nahasapeemapetilon', 'Sideshow Bob', 'Edna Krabappel', 'Patty Bouvier', 'Selma Bouvier', 'Comic Book Guy', 'Groundskeeper Willie', 'Nelson Muntz']",1989,"['Eric Cartman', 'Stan Marsh', 'Kyle Broflovski', 'Kenny McCormick', 'Butters Stotch', 'Randy Marsh', 'Mr. Garrison', 'Chef', 'Wendy Testaburger', 'Mr. Mackey', 'Towelie', 'Terrance', 'Phillip', 'Jimmy Valmer', 'Timmy Burch', 'Token Black', 'Craig Tucker', 'Clyde Donovan', 'Bebe Stevens', 'Ike Broflovski', 'Shelley Marsh', 'Gerald Broflovski', 'Sheila Broflovski', 'Sharon Marsh', 'Liane Cartman', 'Scott Malkinson', 'PC Principal', 'Mr. Hankey', 'Officer Barbrady', 'Mayor McDaniels']",1997,"['Lucy Ricardo', 'Ricky Ricardo', 'Ethel Mertz', 'Fred Mertz']",1951


## Run the survey with human respondents
We can validate some of all of the responses with human respondents by calling the `humanize()` method on the version of the survey that we want to validate with humans.
This method generates a shareable URL for a web-based version of the survey that you can distribute, together with a URL for tracking the responses at your Coop account.

Here we create a new version of the survey to add some screening/information questions of the humans that answer it:

In [10]:
from edsl import QuestionLinearScale

q3 = QuestionLinearScale(
    question_name = "tv_viewing",
    question_text = "On a scale from 1 to 5, how much tv would you say that you've watched in your life?",
    question_options = [1,2,3,4,5],
    option_labels = {
        1:"None at all",
        5:"A ton"
    }
)

q4 = QuestionNumerical(
    question_name = "age",
    question_text = "How old are you (in years)?"
)

new_questions = [q3, q4]

human_survey = Survey(questions + new_questions)

In [11]:
human_survey.humanize()

key,value
name,New survey
uuid,34b544e4-6b7f-4ce9-8497-108952e680fb
admin_url,https://www.expectedparrot.com/home/human-surveys/34b544e4-6b7f-4ce9-8497-108952e680fb
respondent_url,https://www.expectedparrot.com/respond/human-surveys/34b544e4-6b7f-4ce9-8497-108952e680fb
n_responses,0
survey_uuid,ad133a49-6325-40b5-a17e-5a4817b1fe6a
scenario_list_uuid,


Responses automatically appear at your Coop account, and you can import them into your workspace using `Coop` methods:

In [None]:
from edsl import Coop

human_results = Coop().get_project_human_responses("bbb84776-3364-4bc9-b028-0119cd84d480")
human_results

In [None]:
human_results.select("age", "tv_viewing", "characters_0", "years_0", "characters_1", "years_1", "characters_2", "years_2")