# Generative Active Task Elicitation

- This notebook explores and provides a demonstration of the research paper [Eliciting Human Preferences with Language Models](https://arxiv.org/abs/2310.11589).
- Defining a desired and unambigous specification for a target task can be challenging.
- Learning frameworks may take different approaches in defining the task specification.
    - Passive: The user provides a single initial specification.
    - Interactive: The specification is updated through multiple queries that can change depending on user responses.
    - Example-based: The specification is defined by a set of examples labeled by the user.
    - Free-form: The specification is defined by natural language instructions and explanations using language models (LMs).
- Existing task elicitation frameworks suffer from passive and example-based approaches.
    - Supervised learning: passive, example-based
        - The model is provided with labeled examples and then fit and fine-tuned using standard algorithms.
        - The effectiveness of the model is highly dependent on the quality of examples and can't be adapted to user preference changes or edge-cases.
    - Pool-based active learning: interactive, example-based
        - A fixed pool of unlabeled inputs are drawn from interactively for the user to label. The model is then trained as in supervised methods.
        - The interactive process enables examples to be drawn that may resolve uncertainty or ambiguity in the task specification.
    - Prompting: passive, free-form
        - A pre-trained language model is provided with a prompt, which is a natural language description of the task.
        - The free-form approach allows for specifying tasks in more flexible ways than simply labeling examples.
- All of these existing frameworks have important drawbacks.
    - The user must ensure the prompt or example sets are truly comprehensive specifications of the task.
    - A poorly crafted prompt or set of examples could lead to task ambiguity resulting in undesired behavior.
    - Resolving task ambiguity is challenging and time-consuming due to the difficulties of precisely identifying personal preferences and anticipating edge-cases.
- The research paper introduces a learning framework they call generative active task elicitation (GATE).
- GATE attempts to solve the challenges of existing frameworks by using an interactive and free-form approach. 
    - Generative active task elicitation: interactive, free-form
        - The model discovers and defines the user's intended task through an open-ended interaction.
        - GATE may employ different types of information gathering policies depending on the kind of questions asked.
            - Generative active learning
                - The LM generates example inputs for the user to label.
                - This approach has the advantage of providing scenarios that may not have otherwise been considered.
            - Generating yes-or-no questions
                - The LM is restricted to generating binary yes-or-no questions.
                - This approach enables the model to elicit more abstract preferences while still being easy for the user to answer.
            - Generating open-ended questions
                - The LM generates arbitrary questions requiring free-form natural language responses.
                - This approach is the most flexible at the cost of being overly broad or challenging for the user to answer.


In [57]:
# This is a demonstration of how GATE works.
# For a full implementation to test GATE visit https://github.com/alextamkin/generative-elicitation

import os
import textwrap
from enum import StrEnum
from openai import OpenAI

# LM configuration
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
engine =  "gpt-3.5-turbo"

class QuestionType(StrEnum):
    YN = "yes/no question"
    OPEN = "open-ended question"

# GATE interaction_prompt template.  The result of each query is added to the interaction history.
interaction_prompt = textwrap.dedent("""\
    Your task is to {task_description}.

    Previous questions:
    {interaction_history}
    Generate the most informative {question_type} that, when answered, will reveal the most about the desired behavior beyond what has already been queried for above.
    Make sure your question addresses different aspects of the {implementation} than the questions that have already been asked.
    At the same time however, the question should be bite-sized, and not ask for too much at once.
    Generate the {question_type} and nothing else:"""
)

# Prompt template for testing GATE.
hypothesis_prompt = textwrap.dedent("""\
    {prompt1}
    {previous_examples}
    {prompt2}
    {test_case}
    """
)

# GATE updates the prompt over multiple query iterations.
def gate(num_interactions, task_description, question_type, implementation):
    kwargs = {
        "task_description": task_description,
        "interaction_history": "",
        "question_type": question_type,
        "implementation": implementation
    }
    for i in range(num_interactions):
        messages = [{"role": "user", "content": interaction_prompt.format(**kwargs)}]
        completion = client.chat.completions.create(model=engine, messages=messages)
        question = completion.choices[0].message.content
        answer = input(f"{question} ")
        kwargs["interaction_history"] += f"- {question} -> {answer}\n"
    print(f"LAST INTERACTION PROMPT GENERATED:\n\n{interaction_prompt.format(**kwargs)}")
    return kwargs["interaction_history"]

def gate_test(prompt1, previous_examples, prompt2, test_cases):
    print("\nTEST CASES:")
    kwargs = {
        "prompt1": prompt1,
        "previous_examples": previous_examples,
        "prompt2": prompt2,
    }
    for test_case, actual in test_cases:
        kwargs["test_case"] = test_case
        messages = [{"role": "user", "content": hypothesis_prompt.format(**kwargs)}]
        completion = client.chat.completions.create(model=engine, messages=messages)
        predict = completion.choices[0].message.content
        print(f"\n{test_case}\nACTUAL: {actual}, PREDICT: {predict}")

In [58]:
# This is an example. You can modify these variables to experiment with different GATE use cases.

num_interactions = 4

# Variables for the GATE interaction_prompt template.
task_description = textwrap.dedent("""\
    learn what topics a user is interested in reading online article about.
    People's interests are broad, so you should seek to understand their interests across many topics; in other words, go for breadth rather than depth.
    Do not assume a user has given a complete answer to any question, so make sure to keep probing different types of interests"""
    )
question_type = QuestionType.OPEN
implementation = "user's interests"

# Variables for the GATE hypothesis_prompt template.
prompt1 = "A user has a particular set of preferences over what articles they would like to read. Based on these preferences, the user has specified whether they are interested in reading following articles."
prompt2 = "Based on these preferences, would the user be interested in reading the following article? Only answer \"yes\" or \"no\". If uncertain, please make your best guess."
test_cases = [
    ("Website Name: New York Times\nTitle: What Does the Future Hold for AI?", "Yes"),
    ("Website Name: CBS Sports\nTitle: Top Basketball Trends to Watch.", "No"),
    ("Website Name: ESPN\nTitle: Behind Serena’s Killer Serve.", "Yes"),
]

# Run GATE
previous_examples = gate(num_interactions, task_description, question_type, implementation)

# Make a prediction based on the interaction_history
gate_test(previous_examples, prompt1, prompt2, test_cases)


LAST INTERACTION PROMPT GENERATED:

Your task is to learn what topics a user is interested in reading online article about.
People's interests are broad, so you should seek to understand their interests across many topics; in other words, go for breadth rather than depth.
Do not assume a user has given a complete answer to any question, so make sure to keep probing different types of interests.

Previous questions:
- What types of hobbies or recreational activities do you enjoy? -> I play tennis and I read a lot about it.
- What other sports or physical activities do you enjoy participating in or reading about? -> I can’t think of any.
- What are some other hobbies or interests that you enjoy exploring online? -> I like to read about science, but  not finance.
- What are some other topics or subjects that you enjoy reading about online? -> I have a Scientific American  subscription. I also read the NYT.

Generate the most informative open-ended question that, when answered, will reveal