# Extracting text
EDSL comes with a variety of question types that can be selected based on the form of response that you want.
This notebook demonstrates how to use the `QuestionExtract` question type to return information extracted (or extrapolated) from a given text in the form of a Pythonic dictionary. The required parameters are <i>question_name</i>, <i>question_text</i> and and <i>answer_template</i>, which is a dictionary of example responses that the agent is prompted to use for reference (we will show this in the prompts).

Please see the [Questions page](https://docs.expectedparrot.com/en/latest/questions.html) of the docs for details on other question types.

## Question template
We start by importing the question type, and then use the `.example()` method to inspect the format of an example object:

In [1]:
# ! pip install edsl

In [2]:
from edsl import QuestionExtract

In [3]:
QuestionExtract.example()

Unnamed: 0,key,value
0,question_name,extract_name
1,question_text,"My name is Moby Dick. I have a PhD in astrology, but I'm actually a truck driver"
2,answer_template:name,John Doe
3,answer_template:profession,Carpenter
4,question_type,extract


We can then run the example question and check that the agent's response mirrors the <i>answer_template</i> that it was given:

In [4]:
results = QuestionExtract.example().run()
results.select("extract_name")

0,1
Job UUID,aeb247ce-fedc-479f-b686-6cc50046146e
Progress Bar URL,https://www.expectedparrot.com/home/remote-job-progress/aeb247ce-fedc-479f-b686-6cc50046146e
Exceptions Report URL,
Results UUID,29c0967b-f2b5-4c21-bd18-5887b28d5d7a
Results URL,https://www.expectedparrot.com/content/29c0967b-f2b5-4c21-bd18-5887b28d5d7a


Unnamed: 0,answer.extract_name
0,"{'name': 'Moby Dick', 'profession': 'Truck Driver'}"


## Creating a question
Here we create a new example of the question type where we prompt the agent to review a (longer) text and return information about it. Note that we use a <b>{{placeholder}}</b> in the question so that we can parameterize it with different texts. This is useful when we want to conduct a data labeling task where we want to ask the same questions about many different pieces of data at once. This is done by creating `Scenario` objects for the inputs to the questions. 

Note also that our instructions to the agent are quite short; we could substitute a more detailed question text with context about the actual task you want performed.

Learn more about using [Scenario](https://docs.expectedparrot.com/en/latest/scenarios.html) objects in the docs.

In [5]:
simpsons = """
"The Simpsons" is an iconic American animated sitcom created by Matt Groening that debuted in 1989 on the Fox network. 
The show is set in the fictional town of Springfield and centers on the Simpsons family, consisting of the bumbling but well-intentioned father Homer, the caring and patient mother Marge, and their three children: mischievous Bart, intelligent Lisa, and baby Maggie. 
Renowned for its satirical take on the typical American family and society, the series delves into themes of politics, religion, and pop culture with a distinct blend of humor and wit. 
Its longevity, marked by over thirty seasons, makes it one of the longest-running television series in history, influencing many other sitcoms and becoming deeply ingrained in popular culture.
"""

In [6]:
from edsl.questions import QuestionExtract
from edsl import Scenario

q = QuestionExtract(
    question_name="example",
    question_text="Review the following text: {{ scenario.content }}",
    answer_template={
        "main_characters_list": ["name", "name"],
        "location": "location",
        "genre": "genre",
    },
)

scenario = Scenario({"content": simpsons})
results = q.by(scenario).run()

0,1
Job UUID,6808df8c-9c28-4950-8a0d-8bfff8eae08f
Progress Bar URL,https://www.expectedparrot.com/home/remote-job-progress/6808df8c-9c28-4950-8a0d-8bfff8eae08f
Exceptions Report URL,
Results UUID,0140f7e7-6d77-4019-b92f-3424ce233d96
Results URL,https://www.expectedparrot.com/content/0140f7e7-6d77-4019-b92f-3424ce233d96


In [7]:
results.select("example")

Unnamed: 0,answer.example
0,"{'main_characters_list': ['Homer', 'Marge', 'Bart', 'Lisa', 'Maggie'], 'location': 'Springfield', 'genre': 'animated sitcom'}"


## Show prompts
We can inspect the prompts that were used to generate the response (note that there is no system prompt because we did not use an agent):

In [8]:
results.select("prompt.*")

Unnamed: 0,prompt.example_system_prompt,prompt.example_user_prompt
0,,"Review the following text: ""The Simpsons"" is an iconic American animated sitcom created by Matt Groening that debuted in 1989 on the Fox network. The show is set in the fictional town of Springfield and centers on the Simpsons family, consisting of the bumbling but well-intentioned father Homer, the caring and patient mother Marge, and their three children: mischievous Bart, intelligent Lisa, and baby Maggie. Renowned for its satirical take on the typical American family and society, the series delves into themes of politics, religion, and pop culture with a distinct blend of humor and wit. Its longevity, marked by over thirty seasons, makes it one of the longest-running television series in history, influencing many other sitcoms and becoming deeply ingrained in popular culture. An ANSWER should be formatted like this: {'main_characters_list': ['name', 'name'], 'location': 'location', 'genre': 'genre'} It should have the same keys but values extracted from the input. If the value of a key is not present in the input, fill with ""null"". Put any comments in the next line after the answer."
