# Using PDFs in a survey
This notebook provides sample [EDSL](https://docs.expectedparrot.com/) code demonstrating a method `from_pdf()` that imports a PDF and automatically creates `Scenario` objects for the pages to use as parameters of survey questions. This can be helpful when using EDSL to extract qualitative information from a large text efficiently. 

EDSL is an open-source library for simulating surveys and experiments with AI agents and large language models. Please see our [documentation page](https://docs.expectedparrot.com/) for tips and tutorials on getting started.

## How it works
EDSL comes with a [variety of question types](https://docs.expectedparrot.com/en/latest/questions.html) that we can select from based on the desired form of the response (multiple choice, free text, etc.). We can also parameterize questions with textual content in order to ask questions about it. We do this by creating a `{{ placeholder }}` in a question text, e.g., *What are the key themes of this text: {{ text }}*, and then creating `Scenario` objects for the content to be inserted in the placeholder when we run the survey. This allows us to administer multiple versions of a question with different inputs all at once. A common use case for this is performing [data labeling tasks](https://docs.expectedparrot.com/en/latest/notebooks/data_labeling_example.html) designed as questions about one or more pieces of textual data that can be inserted into the survey question texts. [Learn more about using scenarios](https://docs.expectedparrot.com/en/latest/scenarios.html).

## Example
For purposes of demonstration we use a PDF copy of the first page of the recent paper [Automated Social Science:
Language Models as Scientist and Subjects](https://arxiv.org/pdf/2404.11794) and conduct a survey consisting of several questions about the contents of it:

<img src="automated_social_science_paper.png" width="300px">

We have stored it at the Coop and can re-import it:

In [1]:
from edsl.scenarios.FileStore import PDFFileStore

In [2]:
ass_pdf = PDFFileStore.pull('65c1ca0c-35d8-4c57-9186-787522806a1f', expected_parrot_url='https://www.expectedparrot.com')

In [3]:
# Code for posting a PDF to Coop file store:
# 
# ass_pdf = PDFFileStore("automated_social_scientist.pdf")
# info = ass_pdf.push()
# print(info)

Here we create a survey of questions that we will administer for each page of the PDF. Note that the `from_pdf()` method requires that the scenario placeholders be `{{ text }}` (for regular scenario objects, you can use any placeholder word that you like):

In [4]:
from edsl import QuestionFreeText, QuestionList, ScenarioList, Survey

In [5]:
q_summary = QuestionFreeText(
    question_name="summary",
    question_text="Briefly summarize the abstract of this paper: {{ text }}",
)

q_authors = QuestionList(
    question_name="authors",
    question_text="List the names of all the authors of the following paper: {{ text }}",
)

q_thanks = QuestionList(
    question_name="thanks",
    question_text="List the names of the people thanked in the following paper: {{ text }}",
)

survey = Survey([q_summary, q_authors, q_thanks])

Next we create a `ScenarioList` for the PDF using the `from_pdf()` method, which automatically creates a list of `Scenario` objects for the pages of the PDF which will be inserted in our questions (in our example, this is just the first page of the paper):

In [6]:
automated_social_scientist = ScenarioList.from_pdf(ass_pdf.to_tempfile())

Alternative method for importing a file locally:

In [7]:
# automated_social_scientist = ScenarioList.from_pdf("automated_social_scientist.pdf")

We can inspect the scenarios:

In [8]:
automated_social_scientist[0:2]

We can select pages to use if we do not want to use all of them -- e.g., here we filter just the first page to use with our survey:

In [9]:
automated_social_scientist = automated_social_scientist.filter("page == 1")
automated_social_scientist

Now we can add the list of scenarios to to the survey and run it:

In [10]:
results = survey.by(automated_social_scientist).run()

We can see a list of all the components of results that are directly accessible:

In [11]:
results.columns

['agent.agent_instruction',
 'agent.agent_name',
 'answer.authors',
 'answer.summary',
 'answer.thanks',
 'comment.authors_comment',
 'comment.summary_comment',
 'comment.thanks_comment',
 'generated_tokens.authors_generated_tokens',
 'generated_tokens.summary_generated_tokens',
 'generated_tokens.thanks_generated_tokens',
 'iteration.iteration',
 'model.frequency_penalty',
 'model.logprobs',
 'model.max_tokens',
 'model.model',
 'model.presence_penalty',
 'model.temperature',
 'model.top_logprobs',
 'model.top_p',
 'prompt.authors_system_prompt',
 'prompt.authors_user_prompt',
 'prompt.summary_system_prompt',
 'prompt.summary_user_prompt',
 'prompt.thanks_system_prompt',
 'prompt.thanks_user_prompt',
 'question_options.authors_question_options',
 'question_options.summary_question_options',
 'question_options.thanks_question_options',
 'question_text.authors_question_text',
 'question_text.summary_question_text',
 'question_text.thanks_question_text',
 'question_type.authors_question_

We can select components of the results to inspect and print:

In [12]:
results.select("summary", "authors", "thanks").print(format="rich")

## Posting to the Coop
The [Coop](https://www.expectedparrot.com/explore) is a platform for creating, storing and sharing LLM-based research.
It is fully integrated with EDSL and accessible from your workspace or Coop account page.
Learn more about [creating an account](https://www.expectedparrot.com/login) and [using the Coop](https://docs.expectedparrot.com/en/latest/coop.html).

Here we demonstrate how to post this notebook:

In [13]:
from edsl import Notebook

In [14]:
n = Notebook(path = "scenario_from_pdf.ipynb")

In [15]:
n.push(description = "Example code for generating scenarios from PDFs", visibility = "public")

{'description': 'Example code for generating scenarios from PDFs',
 'object_type': 'notebook',
 'url': 'https://www.expectedparrot.com/content/b9cb2a90-c3e3-4d80-8bb1-0e19b75b535d',
 'uuid': 'b9cb2a90-c3e3-4d80-8bb1-0e19b75b535d',
 'version': '0.1.33.dev1',
 'visibility': 'public'}