# Evaluating job posts
This notebook provides sample code for conducting a text analysis using [EDSL](https://docs.expectedparrot.com), an open-source library for simulating surveys, experiments and other research with AI agents and large language models. 

Using a dataset of job posts as an example, we demonstrate how to: 

1. Import data into EDSL 
2. Create questions about the data 
3. Design an AI agent to answer the questions
4. Select a language model to generate responses
5. Analyze results as a formatted dataset


## Technical setup
Before running the code below please ensure that you have completed setup:

* [Install EDSL](https://docs.expectedparrot.com/en/latest/installation.html).
* Create a [Coop account](https://www.expectedparrot.com/login) and activate [remote inference](https://docs.expectedparrot.com/en/latest/remote_inference.html) OR store your own [API Keys](https://docs.expectedparrot.com/en/latest/api_keys.html) for language models that you want to use.

Our [Starter Tutorial](https://docs.expectedparrot.com/en/latest/starter_tutorial.html) also provides examples of EDSL basic components. 

## Selecting data for review
First we identify some data for review. Data can be created using the EDSL tools or [imported from other sources](https://docs.expectedparrot.com/en/latest/scenarios.html). For purposes of this demo we import a set of job posts:

In [1]:
job_posts = [
    "Oversee daily operations, manage staff, and ensure customer satisfaction in a fast-paced retail environment.",
    "Craft engaging and informative blog posts on health and wellness topics to boost website traffic and engage readers.",
    "Analyze sales data using statistical tools to identify trends and provide actionable insights to the marketing team.",
    "Prepare gourmet dishes that comply with restaurant standards and delight customers with unique flavor combinations.",
    "Design creative visual content for marketing materials, including brochures, banners, and digital ads, using Adobe Creative Suite.",
    "Develop, test, and maintain robust software solutions to improve business processes using Python and Java.",
    "Craft coffee drinks and manage the coffee station while providing excellent customer service in a busy café.",
    "Manage recruitment processes, conduct interviews, and oversee employee benefit programs to ensure a motivated workforce.",
    "Assist veterinarians by preparing animals for surgery, administering injections, and providing post-operative care.",
    "Design aesthetic and practical outdoor spaces for clients, from residential gardens to public parks.",
    "Install and repair residential plumbing systems, including water heaters, pipes, and fixtures to ensure proper functionality.",
    "Develop comprehensive marketing strategies that align with company goals, including digital campaigns and branding efforts.",
    "Install, maintain, and repair electrical wiring, equipment, and fixtures to ensure safe and effective operation.",
    "Provide personalized fitness programs and conduct group fitness classes to help clients achieve their health goals.",
    "Diagnose and repair automotive issues, perform routine maintenance, and ensure vehicles meet safety standards.",
    "Lead creative campaigns, from concept through execution, coordinating with graphic designers and content creators.",
    "Educate students in mathematics using innovative teaching strategies to enhance understanding and interest in the subject.",
    "Drive sales through engaging customer interactions, understanding client needs, and providing product solutions.",
    "Fold dough into pretzel shapes ensuring each is uniformly twisted and perfectly salted before baking.",
    "Address customer inquiries and issues via phone and email, ensuring high levels of satisfaction and timely resolution.",
]

## Constructing questions about the data
Next we create some questions about the data. EDSL provides a variety of question types that we can choose from based on the form of the response that we want to get back from the model (multiple choice, free text, checkbox, linear scale, etc.). [Learn more about question types](https://docs.expectedparrot.com/en/latest/questions.html). 

Note that we use a `{{ placeholder }}` in each question text in order to parameterize the questions with the individual job posts in the next step:

In [4]:
from edsl import (
    QuestionList,
    QuestionLinearScale,
    QuestionMultipleChoice,
    QuestionYesNo,
    QuestionFreeText,
)

q1 = QuestionList(
    question_name="category_list",
    question_text="Draft a list of increasingly specific categories for the following job post: {{ job_post }}",
    max_list_items=3,  # optional
)

q2 = QuestionLinearScale(
    question_name="specific_scale",
    question_text="How specific is this job post: {{ job_post }}",
    question_options=[0, 1, 2, 3, 4, 5],
    option_labels={0: "Unclear", 1: "Not at all specific", 5: "Highly specific"},
)

q3 = QuestionMultipleChoice(
    question_name="skill_choice",
    question_text="What is the skill level required for this job: {{ job_post }}",
    question_options=["Entry level", "Intermediate", "Advanced", "Expert"],
)

q4 = QuestionYesNo(
    question_name="technical_yn",
    question_text="Is this a technical job? Job post: {{ job_post }}",
)

q5 = QuestionFreeText(
    question_name="rewrite_text",
    question_text="""Draft an improved version of the following job post: {{ job_post }}""",
)

## Building a survey
We combine the questions into a survey in order to administer them together:

In [5]:
from edsl import Survey

questions = [q1, q2, q3, q4, q5]

survey = Survey(questions)

If we want the agent/model to have information about prior questions in the survey we can add targeted or full memories ([learn more about adding survey rules/logic](https://docs.expectedparrot.com/en/latest/surveys.html)):

In [6]:
# Memory of a specific question is presented with another question:
# survey = survey.add_targeted_memory(q2, q1)

# Full memory of all prior questions is presented with each question (token-intensive):
# survey = survey.set_full_memory_mode()

## Adding data to the questions
We add the contents of each ticket into each question as an independent "scenario" for review. This allows us to create versions of the questions for each job post and deliver them to the model all at once. [EDSL provides many methods for generating scenarios](https://docs.expectedparrot.com/en/latest/scenarios.html) from different data sources (PDFs, CSVs, docs, images, tables, dicts, etc.). Here we import the list from above:

In [7]:
from edsl import ScenarioList, Scenario

scenarios = ScenarioList.from_list("job_post", job_posts)

## Designing AI agents
A key feature of EDSL is the ability to create personas for AI agents that the language models are prompted to use in generating responses to the questions. This is done by passing a dictionary of traits to Agent objects:

In [8]:
from edsl import AgentList, Agent

agent = Agent(traits={"persona":"You are a labor economist."})

## Selecting language models
EDSL allows us to select the language models to use in generating results. To see all available models:

In [9]:
from edsl import ModelList, Model

# Model.available()

Here we select GPT 4o (if no model is specified, GPT 4 preview is used by default):

In [10]:
model = Model("gpt-4o")

## Running the survey
We run the survey by adding the scenarios, agent and model with the `by()` method and then calling the `run()` method:

In [11]:
results = survey.by(scenarios).by(agent).by(model).run()

This generates a dataset of `Results` that we can analyze with built-in methods for data tables, dataframes, SQL, etc. We can see a list of all the components that can be analyzed:

In [12]:
# results.columns

For example, we can filter, sort, select, limit, shuffle, sample and print some components of results in a table:

In [13]:
(
    results
    .filter("specific_scale in [3,4,5]")
    .sort_by("skill_choice")
    .select(
        "model",
        "persona",
        "job_post",
        "category_list",
        "specific_scale",
        "skill_choice",
        "technical_yn",
    )
    .print(pretty_labels={}, format="rich", max_rows=5)
)

In [14]:
results.select("rewrite_text").print(format="rich")

## Posting content to the Coop
We can post any objects to the Coop, including this notebook. Objects can be updated or modified at your Coop account, and shared with others or stored privately (default visibility is *unlisted*):

In [15]:
survey.push(description = "Example survey: Job posts analysis", visibility = "public")

{'description': 'Example survey: Job posts analysis',
 'object_type': 'survey',
 'url': 'https://www.expectedparrot.com/content/d53dce63-0a02-4383-88f6-7ae8c5a8534b',
 'uuid': 'd53dce63-0a02-4383-88f6-7ae8c5a8534b',
 'version': '0.1.33.dev1',
 'visibility': 'public'}

In [16]:
results.push(description = "Example results: Job posts analysis", visibility = "public")

{'description': 'Example results: Job posts analysis',
 'object_type': 'results',
 'url': 'https://www.expectedparrot.com/content/4730e6db-bd98-4ccc-a663-b4cf84af0313',
 'uuid': '4730e6db-bd98-4ccc-a663-b4cf84af0313',
 'version': '0.1.33.dev1',
 'visibility': 'public'}

In [17]:
from edsl import Notebook

n = Notebook(path = "evaluating_job_posts.ipynb")

In [18]:
n.push(description = "Example text analysis: evaluating job posts", visibility = "public")

{'description': 'Example text analysis: evaluating job posts',
 'object_type': 'notebook',
 'url': 'https://www.expectedparrot.com/content/f0158719-6a94-461f-92e3-8ca045cc1a9d',
 'uuid': 'f0158719-6a94-461f-92e3-8ca045cc1a9d',
 'version': '0.1.33.dev1',
 'visibility': 'public'}

To update an object at the Coop:

In [19]:
n = Notebook(path = "evaluating_job_posts.ipynb") # resave the object

n.patch(uuid = "f0158719-6a94-461f-92e3-8ca045cc1a9d", value = n)

{'status': 'success'}