# Using data with surveys: FileStore
This notebook provides example [EDSL](https://github.com/expectedparrot/edsl) code for methods for using data with an EDSL survey.
In the steps below we show how to use the [*FileStore*](https://docs.expectedparrot.com/en/latest/filestore.html) module to upload, share and retrieve data files at the [*Coop*](https://docs.expectedparrot.com/en/latest/coop.html), and then create [*Scenario*](https://docs.expectedparrot.com/en/latest/scenarios.html) objects for the data to use it with a survey.

[EDSL is an open-source library](https://github.com/expectedparrot/edsl) for simulating surveys, experiments and other research with AI agents and large language models. 
Before running the code below, please ensure that you have [installed the EDSL library](https://docs.expectedparrot.com/en/latest/installation.html) and either [activated remote inference](https://docs.expectedparrot.com/en/latest/remote_inference.html) from your [Coop account](https://docs.expectedparrot.com/en/latest/coop.html) or [stored API keys](https://docs.expectedparrot.com/en/latest/api_keys.html) for the language models that you want to use with EDSL. Please also see our [documentation page](https://docs.expectedparrot.com/) for tips and tutorials on getting started using EDSL.

## What is a *Scenario*?
A *Scenario* is a dictionary of one or more key/value pairs representing data or content to be added to questions; a *ScenarioList* is a list of *Scenario* objects. 
Scenario keys are used as question parameters that get replaced with the values when the scenarios are added to the questions, allowing you to create variants of questions efficiently. Learn more about creating and working with scenarios [here](https://docs.expectedparrot.com/en/latest/scenarios.html) and [here](https://docs.expectedparrot.com/en/latest/notebooks/question_loop_scenarios.html).

## What is the *Coop*?
[Coop](https://docs.expectedparrot.com/en/latest/coop.html) is a platform for creating, storing and sharing LLM-based research. 
It is fully integrated with EDSL, allowing you to post, download and update objects directly from your workspace and at the [Coop web app](https://www.expectedparrot.com/login). 
The Coop also provides access to features for working with EDSL remotely at the Expected Parrot server. 
Learn more about these features in the [remote inference](https://docs.expectedparrot.com/en/latest/remote_inference.html) and [remote caching](https://docs.expectedparrot.com/en/latest/remote_caching.html) sections of the [documentation page](https://docs.expectedparrot.com/).

## What is *FileStore*?
*FileStore* is a [module for storing and sharing data files at the Coop](https://docs.expectedparrot.com/en/latest/filestore.html) to use in EDSL projects, such as survey data, PDFs, CSVs or images. 
In particular, it is designed for storing files to be used as as scenarios, and allows you to include code for easily retrieving and processing the files in your EDSL project, as we do in the examples below!

## Example
In the example below we create scenarios for some data (a table at a Wikipedia page) and inspect them. Then we store the scenarios as a CSV and post it to the Coop using the file store. Then we retrieve the file and recreate the scenarios, and use them in a survey. We also post the survey, results and this notebook to the Coop for reference.

We start by creating importing the tools that we will use:

In [1]:
from edsl import ScenarioList, Scenario
from edsl.scenarios.FileStore import CSVFileStore

### Creating a scenario list for a Wikipedia table
EDSL comes with many methods for automatically [generating scenarios for various data sources](https://docs.expectedparrot.com/en/latest/scenarios.html), such as PDFs, CSVs, docs, images, lists, dicts, etc.
Here we use a method to automatically [create a scenario list for a Wikipedia table](https://docs.expectedparrot.com/en/latest/notebooks/scenario_list_wikipedia.html), passing the URL and the number of the table on the page:

In [2]:
s = ScenarioList.from_wikipedia("https://en.wikipedia.org/wiki/List_of_Billboard_Hot_100_number-one_singles_of_the_1980s",5)

We can inspect the scenario list that has been created:

In [3]:
s.print(format="rich")

We can rename the keys for convenience:

In [4]:
s.parameters

{'Artist(s)', 'Song', 'Weeks at number one'}

In [5]:
s = s.rename({'Artist(s)':"artists", 'Song':"song", 'Weeks at number one':"weeks"})

In [6]:
s.print(format="rich")

We can save the scenarios to a CSV:

In [7]:
s.to_csv("billboard_100_1980s.csv")

### Storing data at the Coop using the file store
Here we use the CSV file store to store the file that we just created:

In [8]:
fs = CSVFileStore("billboard_100_1980s.csv")

We can post a `FileStore` object to the Coop by calling the `push()` method on it.
We can optionally pass a `description` and a `visibility` setting - *public*, *unlisted* (by default) or *private*:

In [9]:
info = fs.push(description = "Wikipedia: List of Billboard Hot 100 number-one singles of the 1980s")

We can print the details of the posted object, including the URL and Coop uuid that we will need to retrieve it later:

In [10]:
info

{'description': 'Wikipedia: List of Billboard Hot 100 number-one singles of the 1980s',
 'object_type': 'scenario',
 'url': 'https://www.expectedparrot.com/content/add0b8ee-b127-4b5e-82ad-cd00ddaf2552',
 'uuid': 'add0b8ee-b127-4b5e-82ad-cd00ddaf2552',
 'version': '0.1.33',
 'visibility': 'unlisted'}

### Retrieving a file and recreating scenarios
Here we retrieve the file from the file store and recreate scenarios:

In [11]:
csv_file = CSVFileStore.pull("add0b8ee-b127-4b5e-82ad-cd00ddaf2552", expected_parrot_url="https://www.expectedparrot.com")

In [12]:
s = ScenarioList.from_csv(csv_file.to_tempfile())

In [13]:
s.print(format="rich")

### Using scenarios in a survey
We can use the scenarios with a survey by creating placeholders in the questions for the scenario keys, and adding the scenarios to the survey when we run it:

In [14]:
from edsl import QuestionFreeText, QuestionMultipleChoice, QuestionCheckBox, QuestionList, Survey

q1 = QuestionFreeText(
    question_name = "topic",
    question_text = "What is the topic of the song {{ song }} by {{ artists }}?"
)

q2 = QuestionMultipleChoice(
    question_name = "sentiment",
    question_text = "What is the sentiment of the song {{ song }} by {{ artists }}?",
    question_options = [
        "Happy",
        "Sad",
        "Angry",
        "Romantic",
        "Nostalgic",
        "Empowering",
        "Melancholic",
        "Hopeful"
    ]
)

q3 = QuestionCheckBox(
    question_name = "themes",
    question_text = "What themes are present in the song {{ song }} by {{ artists }}?",
    question_options = [
        "Love",
        "Loss",
        "Struggle",
        "Celebration",
        "Social issues",
        "Other"
    ]
)

q4 = QuestionList(
    question_name = "other_themes",
    question_text = "What other themes are present?"
)

survey = (
    Survey(questions = [q1, q2, q3, q4])
    .add_targeted_memory(q4, q3)
    .add_stop_rule(q3, "'Other' not in themes")
)

results = survey.by(s).run()

We can filter, sort, select and print any components of the results that are generated. 
Note that the results include columns for all scenario keys, whether used in question texts or not:

In [15]:
results.sort_by("song").select("song", "artists", "topic").print(format="rich")

In [16]:
results.sort_by("weeks", reverse=True).select("weeks", "song", "artists", "sentiment", "themes", "other_themes").print(format="rich")

### Posting a notebook to the Coop
Here we post the contents of this notebook to the Coop for anyone to access:

In [17]:
from edsl import Notebook

In [18]:
n = Notebook(path = "scenarios_filestore_example.ipynb")

In [19]:
n.push(description = "Example code for using data files for scenarios via file store and Coop", visibility = "public")

{'description': 'Example code for using data files for scenarios via file store and Coop',
 'object_type': 'notebook',
 'url': 'https://www.expectedparrot.com/content/0b0c86b4-7629-428c-8346-03d69a6a76f9',
 'uuid': '0b0c86b4-7629-428c-8346-03d69a6a76f9',
 'version': '0.1.33',
 'visibility': 'public'}

To update an object:

In [20]:
n = Notebook(path = "scenarios_filestore_example.ipynb") # resave

In [21]:
n.patch(uuid = "0b0c86b4-7629-428c-8346-03d69a6a76f9", value = n)

{'status': 'success'}