In [8]:
import json
from together import Together
client = Together()

In [88]:
TEMPERATURE = 0.2
MODEL="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"

DEFINITIONS = """
I am going to define some concepts/procedures, and then ask you to perform some of these procedures.

CONVENTIONS:
- Trees/DAGs are oriented from the leaves toward the root.
END CONVENTIONS

DEFINITION 1 (decomposing a question into a tree of questions).
Given a question, we can decompose it into a number of sub-questions that need to be answered before answering the full question.

For example, suppose I have the question:

Q: Who is older, Michael Jordan or Kyle Richardson?

I want you break this into a tree of questions:

Q1: How old is Michael Jordan? [A1]
Q2: How old is Kyle Richardson. [A2]
Q3: Which is bigger, A1 or A2?

Note that there is an underlying DAG structure: Q1, Q2, and Q3 are the nodes, and there are arrows Q1->Q3 and Q2->Q3.
END DEFINITION 1

Given a tree of questions, we can "fuse" them into a single question. This is essentially the inverse operation of the decomposition process in definition 1.

DEFINITION 2 (fusion of a tree of questions). If T is a tree of questions, we can fuse them into a single question.

For example, say that our tree of questions is:

Q1: What tribes are considered uncontacted? [A1]
Q2: What are the populations of the tribes [A1]? [A2]
Q3: Which of the numbers [A2] is the largest?

Then the fusion would be:

Q: What is the population of the largest uncontacted tribe?

Note that there could be several possible legitimate fusions.
END DEFINITION 2

Given a tree of questions (or just ToQ), we can talk about "partial collapses" thereof.

DEFINITION 3 (partial collapses of a ToQ). Fix a ToQ and the data of a choice, for every arrow in the DAG, of whether to collapse that arrow. This divides the ToQ into a bunch of sub-ToQs, where in each sub-ToQ, all the arrows should be collapsed.

For instance, in the ToQ example above, suppose that we decide to collapse the Q1->Q3 arrow but not the Q2->Q3 arrow. Our ToQ is then divided into two sub-ToQs, where one has the Q1, Q3 nodes, and the other has the Q2 node.

Then, we form the fusion (as in definition 2) of each of these sub-ToQs. The final result -- the so-called "partial collapse" associated to the original ToQ and the choice of which arrows to collapse -- will again be a ToQ (a smaller one).

For instance, in the example from earlier in this definition, the partial collapse would be as follows:

Q1': How old is Kyle Richardson? [A1']
Q2': Which is bigger, Michael Jordan's age or [A2']?

For completeness, here is the list of all possible partial collapses, in the example above:

-(no arrows collapsed)
Q1: How old is Michael Jordan? [A1]
Q2: How old is Kyle Richardson. [A2]
Q3: Which is bigger, A1 or A2?

-(only the A1->A3 arrow collapsed}
Q1': How old is Kyle Richardson? [A1']
Q2': Which is bigger, Michael Jordan's age or [A1']?

-(only the A2->A3 arrow collapsed)
Q1': How old is Michael Jordan? [A1']
Q2': Which is bigger, [A1'] or Kyle Richardson's age?

-(both arrows collapsed)
Q1': Who is older, Michael Jordan or Kyle Richardson?
END DEFINITION 3

Given a ToQ, we can "follow the ToQ".

DEFINITION 4 (producing an answer via a ToQ). Suppose we are given a ToQ. We can answer the questions, starting at the leaves and moving toward the root. When we answer all the questions that feed into another question, we can fill in the blanks in that next question, and then answer it.

For instance, take the following ToQ:

Q1: How old is Michael Jordan? [A1]
Q2: How old is Kyle Richardson. [A2]
Q3: Which is bigger, A1 or A2?

We first answer Q1 and Q2 (the leaves). Suppose that A1 = 62 and A2 = 35. We then fill these in in Q3, which now is, "Which is bigger, 62 or 35?". The final answer is 62.
END DEFINITION 4
"""

GET_SUBQS_INSTRUCTIONS = """
INSTRUCTIONS:
- When I input a question, I want you to decompose it into a tree of questions as in Definition 1.
- There should only be a single question that whose answer is not used by any other question. This question should appear last in the list. This is the "root" question.
- Any given answer (e.g. A1) should be used by at most one other question.
- Please put your response into a JSON with a field "Questions" as below:
{
   "Questions" : ["Q1: How old is Michael Jordan? [A1]" , "Q2: How old is Kyle Richardson. [A2]", ...]
}
- There should be at most 7 total questions in the tree of questions you produce.
- Try to avoid questions that involve long lists of answers, e.g. questions like 'Given A1, A2, A3, ..., answer the question [...]'.
- For any given question, once its blanks are filled in, it should be answerable strictly on its own. No additional context/info should be required in order for it to be a fully-formed and answerable question.
"""

MAKE_COLLAPSES_INSTRUCTIONS = """
INSTRUCTIONS:
- When I input a tree of questions, I want you to all possible partial collapses, as in Definition 3.
- VERY IMPORTANT: Please put your response into a JSON with a field "Collapses". This response should be directly parsable via json.loads. If I run json.loads(...)["Collapses"] on your response, I shouldn't get an error. DO NOT have any text that precedes or follows the JSON.
- Make sure that you're producing every possible partial collapse of the given tree. If there are n questions in this tree, there should be 2^{n-1} partial collapses. (The reason is that a tree with n nodes has n-1 edges, and for every edge we have the choice of whether or not to fuse it.)
"""

ANSWER_VIA_TREE_INSTRUCTIONS = """
INSTRUCTIONS
- When I input a tree of questions, I want you to produce a single answer by the method described in Definition 4.
- ONLY produce the final answer. Do not explain your reasoning.
"""



def call_llm(system_prompt, query, num_retries=2) -> dict:

    for attempt in range(num_retries):
        try:
            response = client.chat.completions.create(
                model=MODEL,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": query},
                ],
                temperature=TEMPERATURE,
                )

            # For the Responses API
            text = response.choices[0].message.content

            json_parse = json.loads(text)
            return json_parse

        except Exception as e:
            print(f"[Attempt {attempt + 1}/{num_retries}] ERROR: {e}")

    print("All retries failed. Returning empty dict.")
    print(text)
    return {}

def get_subquestions(question: str):
    llm_output = call_llm(DEFINITIONS + GET_SUBQS_INSTRUCTIONS, question)
    return llm_output["Questions"]

def get_collapses(subquestions: list):
    ### direct version
    squestions = '\n'.join(subquestions)
    llm_output = call_llm(DEFINITIONS + MAKE_COLLAPSES_INSTRUCTIONS, squestions)
    return llm_output["Collapses"]
   
def answer_question(toq):
    response = client.chat.completions.create(
            model=MODEL,
        messages=[
            {'role': 'system', 'content': DEFINITIONS + ANSWER_VIA_TREE_INSTRUCTIONS},
            {'role': 'user', 'content': toq},
        ],
        temperature=TEMPERATURE,
        )
    return response.choices[0].message.content
   
def consistency_check(question):
    subquestions: list = get_subquestions(question)
    collapses: list = get_collapses(subquestions)
    collapsed_answers = [answer_question(str(collapse)) for collapse in collapses]

    return collapsed_answers

In [66]:
subqs = get_subquestions("What is the best breed of dog for an older adult?")
print(subqs)

['Q1: What are the characteristics of a dog breed that make it suitable for an older adult? [A1]', 'Q2: What are the most common dog breeds that have the characteristics [A1]? [A2]', 'Q3: Which of the dog breeds [A2] is generally considered the best for an older adult? [A3]']


In [77]:
collapses = get_collapses(subqs)
print(collapses)

[{'Q1': 'What are the characteristics of a dog breed that make it suitable for an older adult? [A1]', 'Q2': 'What are the most common dog breeds that have the characteristics [A1]? [A2]', 'Q3': 'Which of the dog breeds [A2] is generally considered the best for an older adult? [A3]'}, {'Q1': 'What are the characteristics of a dog breed that make it suitable for an older adult? [A1]', "Q2'": 'Which of the dog breeds that have the characteristics [A1] is generally considered the best for an older adult? [A2]'}, {"Q1'": 'What are the most common dog breeds that have the characteristics that make them suitable for an older adult? [A1]', 'Q2': 'Which of the dog breeds [A1] is generally considered the best for an older adult? [A2]'}, {"Q1'": 'Which dog breed is generally considered the best for an older adult? [A1]'}]


In [87]:
answer_question(str(collapses[0]))

'The Cavalier King Charles Spaniel'

In [91]:
consistency_check('What is the best breed of dog for an older adult?')

[Attempt 1/2] ERROR: Unterminated string starting at: line 98 column 20 (char 8331)
[Attempt 2/2] ERROR: Expecting value: line 103 column 13 (char 7228)
All retries failed. Returning empty dict.
{
  "Collapses": [
    {
      "Q1": "What are the characteristics of a dog breed that are suitable for older adults? [A1]",
      "Q2": "What are the most common dog breeds that have the characteristics [A1]? [A2]",
      "Q3": "Which of the dog breeds [A2] requires the least amount of exercise? [A3]",
      "Q4": "Which of the dog breeds [A2] is the most gentle? [A4]",
      "Q5": "Which of the dog breeds [A2] is the smallest in size? [A5]",
      "Q6": "Which of the dog breeds [A2] has the lowest grooming needs? [A6]",
      "Q7": "Which dog breed among [A3], [A4], [A5], and [A6] is the best overall fit for an older adult?"
    },
    {
      "Q1": "What are the characteristics of a dog breed that are suitable for older adults? [A1]",
      "Q2": "What are the most common dog breeds that hav

KeyError: 'Collapses'