# Adding metadata to survey results
This notebook provides sample [EDSL](https://docs.expectedparrot.com/) code for adding metadata to survey [results](https://docs.expectedparrot.com/en/latest/results.html). This can be useful when you are using EDSL to conduct [data labeling](https://docs.expectedparrot.com/en/latest/notebooks/data_labeling_example.html) or similar tasks and want to include information about the data or content that you are using with a survey (e.g., the data source or date), without having to perform post-survey data match up steps.

In EDSL this can be done by including fields for metadata in [scenarios](https://docs.expectedparrot.com/en/latest/scenarios.html) that you create for the data/content you are using with a survey. When the scenarios are added to the survey and it is run, columns for the metadata fields are automatically included in the results that are generated.

## Example
In the steps below we create and run a simple EDSL survey that uses scenarios to add metadata to the results. The steps consist of:

* Constructing a survey of questions about some data (mock news stories)
* Creating a scenario (dictionary) for each news story
* Adding the scenarios to the survey and running it
* Inspecting the results

## Technical setup
Before running the code below, please ensure that you have [installed the EDSL libary](https://docs.expectedparrot.com/en/latest/installation.html) and either [activated remote inference](https://docs.expectedparrot.com/en/latest/remote_inference.html) from your [Coop account](https://docs.expectedparrot.com/en/latest/coop.html) or [stored API keys](https://docs.expectedparrot.com/en/latest/api_keys.html) for the language models that you want to use with EDSL. Please also see our [documentation page](https://docs.expectedparrot.com/) for tips and tutorials on getting started using EDSL.

## Constructing questions
We start by constructing some questions with a `{{ placeholder }}` for data that we will add to the question texts. 
EDSL comes with a variety of [question types](https://docs.expectedparrot.com/en/latest/questions.html) that we can choose from based on the form of the response that we want to get back from the model:

In [1]:
from edsl import QuestionFreeText, QuestionMultipleChoice

In [2]:
q_reference = QuestionFreeText(
    question_name = "reference",
    question_text = "What is this headline referring to: {{ headline }}",
)

q_section = QuestionMultipleChoice(
    question_name = "section",
    question_text = "Which section of the paper is most likely to include this story: {{ headline }}",
    question_options = [
        "Front page",
        "Health",
        "Politics",
        "Entertainment",
        "Local",
        "Opinion",
        "Sports",
        "Culture",
        "Housing"
    ]
)

## Creating a survey
Next we pass the questions to a survey in order to administer them together:

In [3]:
from edsl import Survey

In [4]:
survey = Survey(questions = [q_reference, q_section])

## Parameterizing questions with scenarios
Next we create a `ScenarioList` with a `Scenario` consisting of a key/value for each piece of data that we want to add to the questions at the `{{ placeholder }}`, with additional key/values for metadata that we want to keep with the results that are generated when the survey is run. 
EDSL comes with a variety of [methods for generating scenarios from different data sources](https://docs.expectedparrot.com/en/latest/scenarios.html) (PDFs, CSVs, images, tables, lists, etc.); here we generate scenarios from a dictionary:

In [5]:
from edsl import ScenarioList, Scenario

In [6]:
data = {
    "headline": [
        "Armistice Signed, War Over: Celebrations Erupt Across City",
        "Spanish Flu Pandemic: Hospitals Overwhelmed as Cases Surge",
        "Women Gain Right to Vote: Historic Amendment Passed",
        "Broadway Theaters Reopen After Flu Shutdown",
        "City Welcomes Returning Soldiers with Parade",
        "Prohibition Debate Heats Up: Public Opinion Divided",
        "New York Yankees Win First Pennant in Franchise History",
        "Subway Expansion Project Approved by City Council",
        "Harlem Renaissance: New Wave of Cultural Expression",
        "Mayor Announces New Housing Initiative for Veterans",
    ],
    "date": [
        "1918-11-11",
        "1918-10-15",
        "1918-06-05",
        "1918-12-01",
        "1918-11-12",
        "1918-07-20",
        "1918-09-30",
        "1918-08-18",
        "1918-04-25",
        "1918-11-20",
    ],
    "author": [
        "John Doe",
        "Jane Smith",
        "Robert Johnson",
        "Mary Lee",
        "James Brown",
        "Patricia Green",
        "William Davis",
        "Barbara Wilson",
        "Charles Miller",
        "Elizabeth Taylor",
    ]
}

In [7]:
scenarios = ScenarioList.from_nested_dict(data)

We can inspect the scenarios that have been created:

In [8]:
scenarios

## Running a survey
To run the survey, we add the scenarios with the `by()` method and then call the `run()` method:

In [9]:
results = survey.by(scenarios).run()

This generates a dataset of `Results` that we can access with [built-in methods for analysis](https://docs.expectedparrot.com/en/latest/results.html). 
To see a list of all the components of results:

In [10]:
# results.columns

For example, we can filter, sort, select and print components of results in a table:

In [11]:
(results
 .filter("section in ['Sports', 'Health', 'Politics']")
 .sort_by("section", "date")
 .select("headline", "date", "author", "section", "reference")
 .print(format="rich")
)

## Posting to the Coop
The [Coop](https://www.expectedparrot.com/explore) is a platform for creating, storing and sharing LLM-based research.
It is fully integrated with EDSL and accessible from your workspace or Coop account page.
Learn more about [creating an account](https://www.expectedparrot.com/login) and [using the Coop](https://docs.expectedparrot.com/en/latest/coop.html).

Here we post the scenarios, survey and results from above, and this notebook:

In [12]:
scenarios.push(description = "Scenarios for example survey using metadata", visibility = "public")

{'description': 'Scenarios for example survey using metadata',
 'object_type': 'scenario_list',
 'url': 'https://www.expectedparrot.com/content/711d3d8d-3e60-4b9b-9b64-9c5c1a5f749d',
 'uuid': '711d3d8d-3e60-4b9b-9b64-9c5c1a5f749d',
 'version': '0.1.33.dev1',
 'visibility': 'public'}

In [13]:
survey.push(description = "Example survey using scenarios to add metadata to results", visibility = "public")

{'description': 'Example survey using scenarios to add metadata to results',
 'object_type': 'survey',
 'url': 'https://www.expectedparrot.com/content/333395bb-bfe1-4795-a17f-93cc67da88a9',
 'uuid': '333395bb-bfe1-4795-a17f-93cc67da88a9',
 'version': '0.1.33.dev1',
 'visibility': 'public'}

In [14]:
results.push(description = "Results for example survey using scenarios to add metadata", visibility = "public")

{'description': 'Results for example survey using scenarios to add metadata',
 'object_type': 'results',
 'url': 'https://www.expectedparrot.com/content/5cdf086d-45cb-4bc4-896a-a48f45621919',
 'uuid': '5cdf086d-45cb-4bc4-896a-a48f45621919',
 'version': '0.1.33.dev1',
 'visibility': 'public'}

In [15]:
from edsl import Notebook

In [16]:
n = Notebook(path = "adding_metadata.ipynb")

In [17]:
n.push(description = "Adding metadata to survey results", visibility = "public")

{'description': 'Adding metadata to survey results',
 'object_type': 'notebook',
 'url': 'https://www.expectedparrot.com/content/0837130c-5983-482b-ae1b-a6ba2bbef07e',
 'uuid': '0837130c-5983-482b-ae1b-a6ba2bbef07e',
 'version': '0.1.33.dev1',
 'visibility': 'public'}

To update an object at the Coop:

In [18]:
n = Notebook(path = "adding_metadata.ipynb")

In [19]:
n.patch(uuid = "0837130c-5983-482b-ae1b-a6ba2bbef07e", value = n)

{'status': 'success'}