# Using EDSL to sense check data
This notebook provides example code for sense checking survey data using [EDSL](https://docs.expectedparrot.com), an open-source library for simulating surveys, experiments and market research with AI agents and large language models. 

## Contents
Using a set of responses to a survey about online marketplaces as an example, we demonstrate EDSL methods for: 

1. Evaluating survey questions (e.g., for clarity and improvements)
2. Analyzing each respondent's set of answers (e.g., to summarize or identify sentiment, themes, etc.)
3. Reviewing each answer individually (e.g., to evaluate its relevance or usefulness)

## Coop
We also show how to post EDSL questions, surveys, results and notebooks (like this one) to the [Coop: a new platform for creating and sharing LLM-based research](https://www.expectedparrot.com/explore). 

## How EDSL works
EDSL is a flexible library that can be used to perform a broad variety of research tasks. A typical workflow consists of the following steps:

* Construct questions in EDSL 
* Add data to the questions (e.g., for data labeling tasks)
* Use an AI agent to answer the questions
* Select a language model to generate the answers
* Analyze results in a formatted dataset

## Technical setup
Before running the code below please ensure that you have completed setup:

* [Install](https://docs.expectedparrot.com/en/latest/installation.html) the EDSL library.
* Create a [Coop account](https://www.expectedparrot.com/login) and activate [remote inference](https://docs.expectedparrot.com/en/latest/remote_inference.html) OR store your own [API Keys](https://docs.expectedparrot.com/en/latest/api_keys.html) for language models that you want to use.

Our [Starter Tutorial](https://docs.expectedparrot.com/en/latest/starter_tutorial.html) provides examples of EDSL basic components. 

## Example data
Our example data is a CSV consisting of several questions and a few rows of responses.
We have stored it at the Coop and can re-import it:

In [1]:
from edsl.scenarios.FileStore import CSVFileStore

In [2]:
csv_file = CSVFileStore.pull('2e3c4292-8fdb-4d17-9eea-858dadf1e42d', expected_parrot_url='https://www.expectedparrot.com')

In [3]:
# Code for uploading a CSV to the Coop:

# refresh = False
# if refresh:
#     from edsl.scenarios.FileStore import CSVFileStore
#     fs = CSVFileStore("marketplace_survey_results.csv")
#     info = fs.push()
#     print(info)

## Creating questions about the data
There are many questions we might want to ask about the data, such as:

* Does this survey question have any logical or syntactical problems? {{ *question* }}
* What is the overall sentiment of this respondent's answers? {{ *responses* }}
* Is this answer responsive to the question that was asked? {{ *question* }} {{ *answer* }}

## Question types
EDSL comes with many common question types that we can select from based on the form of the response that we want to get back from the model: multiple choice, checkbox, linear scale, free text, etc. [Learn more about EDSL question types](https://docs.expectedparrot.com/en/latest/questions.html).

Here we construct `Question` objects for the questions that we want to ask about the data, using `{{ placeholders }}` for the information that we will add to the questions in the steps that follow:

In [4]:
from edsl import QuestionFreeText, QuestionMultipleChoice, QuestionYesNo

In [5]:
q_logic = QuestionFreeText(
    question_name = "logic",
    question_text = "Describe any logical or syntactical problems in the following survey question: {{ question }}"
)

In [6]:
q_sentiment = QuestionMultipleChoice(
    question_name = "sentiment",
    question_text = "What is the overall sentiment of this respondent's survey answers? {{ responses }}",
    question_options = ["Very unsatisfied", "Somewhat unsatisfied", "Somewhat satisfied", "Very satisfied"]
)

In [7]:
q_responsive = QuestionYesNo(
    question_name = "responsive",
    question_text = "Is this answer responsive to the question that was asked? Question: {{ question }} Answer: {{ answer }}"
)

## Adding survey data to the questions
Next we'll add our data to our questions. This can be done efficiently by creating a `ScenarioList` representing the data. The individual `Scenario` objects in the list can be constructed in a variety of ways depending on the information that we want to include in a particular question.

We start by calling the `from_csv()` method to create a `ScenarioList` for the data in its original form. We can see that this generates a `Scenario` dictionary for each respondent's set of answers with key/value pairs for the individual questions and answers: 

In [8]:
from edsl import ScenarioList

In [9]:
sl = ScenarioList.from_csv(csv_file.to_tempfile()) # replace with CSV file name if importing a local file
sl

## Evaluating the questions
For our first question we want to create a `Scenario` for each survey question:

In [10]:
from edsl import QuestionFreeText, Survey

q_logic = QuestionFreeText(
    question_name = "logic",
    question_text = "Describe any logical or syntactical problems in the following survey question: {{ question }}"
)

q_improved = QuestionFreeText(
    question_name = "improved",
    question_text = "Please draft an improved version of the survey question. Return only the revised question text."
)

survey = Survey([q_logic, q_improved]).add_targeted_memory(q_improved, q_logic)

The survey questions are the `parameters` of the `ScenarioList` created above:

In [11]:
questions = list(sl.parameters)
questions

['How do you feel about the current product search and filtering options?',
 'What do you like most about using our online marketplace?',
 'Can you describe a recent experience where you were dissatisfied with our service?',
 'Is there anything else you would like to share about your experience with us?',
 'What is one feature you would like to see added to improve your shopping experience?',
 'Respondent ID']

We can pass them to the `from_list()` method to create a new `ScenarioList`, specifying that the key for each `Scenario` will be `question` in order to match the parameter of our logic question:

In [12]:
sl_questions = ScenarioList.from_list("question", questions)
sl_questions

We add the scenarios to the survey when we run it:

In [13]:
results = survey.by(sl_questions).run()

This generates a dataset of `Results` that we can access with built-in methods for analysis:

In [14]:
results.select("question", "logic", "improved").print(format="rich")

[Learn more about working with results](https://docs.expectedparrot.com/en/latest/results.html).

## Evaluating respondents' collective answers
Next we can create a `ScenarioList` for each respondent's answers to use with our question about sentiment:

In [15]:
sl_responses = ScenarioList.from_list("responses", sl['scenarios'])
sl_responses

Next we add these scenarios to our sentiment question (and any others we want to add) and run it:

In [16]:
from edsl import QuestionMultipleChoice, QuestionLinearScale, Survey

q_sentiment = QuestionMultipleChoice(
    question_name = "sentiment",
    question_text = "What is the overall sentiment of this respondent's survey answers? {{ responses }}",
    question_options = ["Very unsatisfied", "Somewhat unsatisfied", "Somewhat satisfied", "Very satisfied"]
)

q_recommend = QuestionLinearScale(
    question_name = "recommend",
    question_text = "On a scale from 1 to 5, how likely do you think this respondent is to recommend the company to a friend? {{ responses }}",
    question_options = [1, 2, 3, 4, 5],
    option_labels = {1:"Not at all likely", 5:"Very likely"}
)

survey = Survey([q_sentiment, q_recommend])

In [17]:
results = survey.by(sl_responses).run()

In [18]:
results.select("responses", "sentiment", "recommend").print(format="rich")

## Evaluating individual answers
Next we create a `ScenarioList` for each individual question and answer to use with our question about the responsiveness of each answer. We can use the `unpivot()` method to expand the scenarios by desired identifiers (e.g., respondent ID):

In [19]:
sl_qa = sl.unpivot(id_vars = ["Respondent ID"])
sl_qa

We can call the `rename()` method to rename the keys as desired to match our question parameters syntax:

In [20]:
sl_qa = sl_qa.rename({"Respondent ID": "id", "variable": "question", "value": "answer"})
sl_qa

In [21]:
from edsl import QuestionYesNo

q_responsive = QuestionYesNo(
    question_name = "responsive",
    question_text = "Is this answer responsive to the question that was asked? Question: {{ question }} Answer: {{ answer }}"
)

In [22]:
results = q_responsive.by(sl_qa).run()

In [23]:
(results
 .filter("responsive == 'No'")
 .select("id", "question", "answer")
 .print(format="rich")
)

## Uploading content to the Coop
[Coop](https://www.expectedparrot.com/explore) is a new platform for creating, storing and sharing LLM-based research. It is fully integrated with EDSL, and a convenient place to post and access surveys, agents, results and notebooks. [Learn more about using the Coop](https://docs.expectedparrot.com/en/latest/coop.html).

Here we post the contents of this notebook:

In [24]:
from edsl import Notebook

In [25]:
n = Notebook(path = "scenariolist_unpivot.ipynb")

In [26]:
n.push(description = "ScenarioList methods for sense checking survey data", visibility = "public")

{'description': 'ScenarioList methods for sense checking survey data',
 'object_type': 'notebook',
 'url': 'https://www.expectedparrot.com/content/c4a75b92-1835-4345-b171-42450b876278',
 'uuid': 'c4a75b92-1835-4345-b171-42450b876278',
 'version': '0.1.33.dev1',
 'visibility': 'public'}

To update an object at the Coop:

In [27]:
n = Notebook(path = "scenariolist_unpivot.ipynb") # resave

In [28]:
n.patch(uuid = "c4a75b92-1835-4345-b171-42450b876278", value = n)

{'status': 'success'}