# River Problem 
This notebook provides sample [EDSL](https://docs.expectedparrot.com/) code exploring capabilities of large language models to provide and evaluate solutions for a [river crossing problem](https://en.wikipedia.org/wiki/River_crossing_puzzle), where the object is to efficiently transport items across a river subject to conditions on the number of items that can be transported at once and combinations of items than can be left together unattended.

In a popular version of the problem, a farmer needs to transport a wolf, goat and cabbage but cannot leave the wolf with the dog or the dog with the cabbage, as the dog and the cabbage would be eaten.

There are several things we want to learn in using LLMs to explore this problem:

1. Are models capable of providing valid, efficient solutions?
2. When models do provide solutions, are they easily disuaded from trusting their solutions?

The notebook has multiple sections:

**Models proposing solutions:** We prompt models to provide a solution for the problem, and then ask the models about their confidence in their solutions.

**Models selecting solutions:** We prompt models to identify the correct solution from a list of otherwise incorrect solutions, and then ask them about their confidence in their answers.


Please [see our docs](https://docs.expectedparrot.com/) for tips and tutorials on getting started using EDSL to simulate surveys and experiments with AI.

## Proposing solutions
We start by describing the problem and constructing a question to prompt a model to provide an efficient solution for it:

In [1]:
# From [Wikipedia](https://en.wikipedia.org/wiki/Wolf,_goat_and_cabbage_problem):
problem = """
A farmer with a wolf, a goat, and a cabbage must cross a river by boat. 
The boat can carry only the farmer and a single item. If left unattended 
together, the wolf would eat the goat, or the goat would eat the cabbage. 
How can they cross the river without anything being eaten?
"""

The model may perform better if we specifically note that items can be brought back across the river (this can trip people up):

In [2]:
note = " Items may also be brought back across the river."

### Constructing questions
EDSL comes with many standard question types that we can choose from based on the form of the response that we want to get back (see [examples of all question types](https://docs.expectedparrot.com/en/latest/questions.html)). Here, we first want the model to propose a solution to the problem as a textual response:

In [3]:
from edsl.questions import QuestionFreeText

q_solution_text = QuestionFreeText(
    question_name = "solution_text",
    question_text = problem + note + "Provide an efficient, concise solution to this problem."
)

We can add a follow-on question asking the model about its confidence in its solution. Here we pose the same follow-on question using several different question types to see if it has any impact:

In [4]:
from edsl.questions import QuestionYesNo, QuestionFreeText, QuestionMultipleChoice, QuestionLinearScale

question_text = "Are you confidant in your solution?"

q_confidence1 = QuestionYesNo(
    question_name = "confidence_yn",
    question_text = question_text
)

q_confidence2 = QuestionFreeText(
    question_name = "confidence_ft",
    question_text = question_text
)

q_confidence3 = QuestionMultipleChoice(
    question_name = "confidence_mc",
    question_text = question_text,
    question_options = ["No", "Yes", "Somewhat"]
)

q_confidence4 = QuestionLinearScale(
    question_name = "confidence_ls",
    question_text = question_text,
    question_options = [0,1,2,3,4,5],
    option_labels = {0: "I am not at all confidant.", 5: "I am very confidant."}
)

We combine these questions into a `Survey` in order to administer them together. 

In [5]:
from edsl import Survey

survey = Survey([q_solution_text, q_confidence1, q_confidence2, q_confidence3, q_confidence4])

### Adding survey rules
Questions are administered to models asynchornously by default (for speed and minimizing tokens consumed). We can also choose whether to give the model information about prior questions and responses in answering other questions. Here we want the model to know about it's proposed solution in answering each question about its confidence. We do this by adding a memory of the first question to each individual subsequent question (note that this is different from giving the model cumulative information, so that we can ask each version of the confidence question freshly):

In [6]:
survey = (survey
          .add_targeted_memory(q_confidence1, q_solution_text)
          .add_targeted_memory(q_confidence2, q_solution_text)
          .add_targeted_memory(q_confidence3, q_solution_text)
          .add_targeted_memory(q_confidence4, q_solution_text)
         )

### Designing AI agents to answer questions
We can optionally create an agent with relevant traits and instructions that a model can use in answering the question. We do this by passing a dictionary of traits to an `Agent` object that we will add to the question when we run it (learn more about [using agents to answer surveys](https://docs.expectedparrot.com/en/latest/agents.html)):

In [7]:
from edsl import Agent

agent = Agent(traits = {"persona": "You are a computer scientist"}, 
              instruction = "You are providing and evaluating solutions to complex logic problems.")

### Selecting language models
We can also specify language models that we want to use to generate responses. If none are specified, EDSL will use GPT 4 preview by default ([learn more about specifying models](https://docs.expectedparrot.com/en/latest/language_models.html)). Here we specify that we will use it for purposes of demonstration:

In [8]:
from edsl import Model

# To see a list of currently available models:
# Model.available()

We create `Model` objects for the models that we want to add to the survey:

In [9]:
model = Model('gpt-4-1106-preview')

### Generating results
Now we can generate responses by calling the `run` method on the survey, after adding agents and models with the `by` method:

In [10]:
results = survey.by(agent).by(model).run()

This generates `Results` which contain information about all the components of the responses. We can view these components:
we fan access as datasets. 

In [11]:
results.columns

['agent.agent_instruction',
 'agent.agent_name',
 'agent.persona',
 'answer.confidence_ft',
 'answer.confidence_ls',
 'answer.confidence_mc',
 'answer.confidence_yn',
 'answer.solution_text',
 'comment.confidence_ls_comment',
 'comment.confidence_mc_comment',
 'comment.confidence_yn_comment',
 'iteration.iteration',
 'model.frequency_penalty',
 'model.logprobs',
 'model.max_tokens',
 'model.model',
 'model.presence_penalty',
 'model.temperature',
 'model.top_logprobs',
 'model.top_p',
 'prompt.confidence_ft_system_prompt',
 'prompt.confidence_ft_user_prompt',
 'prompt.confidence_ls_system_prompt',
 'prompt.confidence_ls_user_prompt',
 'prompt.confidence_mc_system_prompt',
 'prompt.confidence_mc_user_prompt',
 'prompt.confidence_yn_system_prompt',
 'prompt.confidence_yn_user_prompt',
 'prompt.solution_text_system_prompt',
 'prompt.solution_text_user_prompt',
 'question_options.confidence_ft_question_options',
 'question_options.confidence_ls_question_options',
 'question_options.confide

EDSL has many [built-in methods for analyzing results](https://docs.expectedparrot.com/en/latest/results.html) as datasets. Here we select and print the answers:

In [13]:
results.select("solution_text", "confidence_yn", "confidence_ft", "confidence_mc", "confidence_ls").print(format="rich")

### Question prompt variations
Let's try changing the tone of our confidence questions. Note that because we are not changing the agent, model or original question prompt, we will retrieve the cached response and it will be used identically for our new confidence questions ([learn more about caching LLMs calls](https://docs.expectedparrot.com/en/latest/data.html)):

In [14]:
question_text = "Are you sure that your solution actually works?"

q_confidence1 = QuestionYesNo(
    question_name = "confidence_yn",
    question_text = question_text
)

q_confidence2 = QuestionFreeText(
    question_name = "confidence_ft",
    question_text = question_text
)

q_confidence3 = QuestionMultipleChoice(
    question_name = "confidence_mc",
    question_text = question_text,
    question_options = ["No", "Yes", "Somewhat"]
)

q_confidence4 = QuestionLinearScale(
    question_name = "confidence_ls",
    question_text = question_text,
    question_options = [0,1,2,3,4,5],
    option_labels = {0: "I am not at all confidant.", 5: "I am very confidant."}
)

survey = Survey([q_solution_text, q_confidence1, q_confidence2, q_confidence3, q_confidence4])

survey = (survey
          .add_targeted_memory(q_confidence1, q_solution_text)
          .add_targeted_memory(q_confidence2, q_solution_text)
          .add_targeted_memory(q_confidence3, q_solution_text)
          .add_targeted_memory(q_confidence4, q_solution_text)
         )

results = survey.by(agent).by(model).run()

results.select("solution_text", "confidence_yn", "confidence_ft", "confidence_mc", "confidence_ls").print(format="rich")

We can see that the model's confidence is unwavering, except in answering the linear scale question. We can print the model's commentary on its response that is automatically collected for each question (other than free text questions):

In [15]:
results.select("model", "confidence_ls", "confidence_ls_comment").print(format="rich")

These comments indicate that the models generally did not understand the linear scale question, which we can try revising and keep in mind in creating new questions.

Let's try running the same survey but without the agent persona, and with different models:

In [16]:
models = [Model(m) for m in ['gpt-3.5-turbo',
                             'gpt-4o',
                             'gpt-4-1106-preview']]

Note that we do not add the agent when running the survey this time, and we include the model name in the results table for comparison:

In [17]:
results = survey.by(models).run()

results.select("model", "solution_text", "confidence_yn", "confidence_ft", "confidence_mc", "confidence_ls").print(format="rich")

We can also try experimenting with the form our of original question prompting the model to provide a solution to the problem. Does it do a better job with a different question type?

In [18]:
from edsl.questions import QuestionList

q_solution_list = QuestionList(
    question_name = "solution_list",
    question_text = problem + note + "Provide an efficient, concise solution to this problem." +
    """Format your response as a list of steps: 
    'Farmer takes <item> from left to right', 'Farmer <moves alone or takes item> from right to left', etc."""
)

Now we run the same follow-on questions for the new original question:

In [19]:
survey = Survey([q_solution_list, q_confidence1, q_confidence2, q_confidence3, q_confidence4])

survey = (survey
          .add_targeted_memory(q_confidence1, q_solution_list)
          .add_targeted_memory(q_confidence2, q_solution_list)
          .add_targeted_memory(q_confidence3, q_solution_list)
          .add_targeted_memory(q_confidence4, q_solution_list)
         )

results = survey.by(agent).by(model).run()

results.select("solution_list", "confidence_yn", "confidence_ft", "confidence_mc", "confidence_ls").print(format="rich")

## Selecting solutions
In this section we ask models to select a correct solution from a set of otherwise incorrect solutions. We also ask them about a correct solution, similar to our process above except that the model is simply presented the solution, not any context that it also drafted the solution.

First we identify some correct solutions in different forms:

In [21]:
solution_text = q_solution_text.run().select("solution_text").to_list()[0]
solution_text

'1. The farmer takes the goat across the river and leaves it on the other side. 2. The farmer returns alone to the original side and takes the cabbage across the river. 3. The farmer leaves the cabbage on the other side and takes the goat back across the river. 4. The farmer leaves the goat on the original side and takes the wolf across the river. 5. The farmer leaves the wolf with the cabbage on the other side and returns alone to get the goat. 6. The farmer takes the goat across the river one final time. Now, all are on the other side of the river safely.'

In [23]:
solution_list = q_solution_list.run().select("solution_list").to_list()[0]
solution_list

['Farmer takes goat from left to right',
 'Farmer moves alone from right to left',
 'Farmer takes cabbage from left to right',
 'Farmer takes goat from right to left',
 'Farmer takes wolf from left to right',
 'Farmer moves alone from right to left',
 'Farmer takes goat from left to right']

Next we use these solutions to check whether the models can identify them as being correct, and whether we can easily shake their confidence. As above, we pose the same question using a variety of types to compare responses:

In [34]:
question_text = "Consider the following problem and proposed solution. Are you confidant that it is correct?"

q_correct1 = QuestionYesNo(
    question_name = "correct_yn",
    question_text = question_text + "Problem: " + problem + note + "Proposed solution: " + solution_text
)

q_correct2 = QuestionFreeText(
    question_name = "correct_ft",
    question_text = question_text + "Problem: " + problem + note + "Proposed solution: " + solution_text
)

q_correct3 = QuestionMultipleChoice(
    question_name = "correct_mc",
    question_text = question_text + "Problem: " + problem + note + "Proposed solution: " + solution_text,
    question_options = ["I do not know.",
                        "It is incorrect.",
                        "It is correct."]
)

survey = Survey([q_correct1, q_correct2, q_correct3])

results = survey.run()
results.select("correct_yn", "correct_ft", "correct_mc").print(format="rich")

Now with the solution as list of steps:

In [35]:
question_text = "Consider the following problem and proposed solution. Are you confidant that it is correct?"

q_correct1 = QuestionYesNo(
    question_name = "correct_yn",
    question_text = question_text + "Problem: " + problem + note + "Proposed solution: " + ", ".join(solution_list)
)

q_correct2 = QuestionFreeText(
    question_name = "correct_ft",
    question_text = question_text + "Problem: " + problem + note + "Proposed solution: " + ", ".join(solution_list)
)

q_correct3 = QuestionMultipleChoice(
    question_name = "correct_mc",
    question_text = question_text + "Problem: " + problem + note + "Proposed solution: " + ", ".join(solution_list),
    question_options = ["I do not know.",
                        "It is incorrect.",
                        "It is correct."]
)

survey = Survey([q_correct1, q_correct2, q_correct3])

results = survey.run()
results.select("correct_yn", "correct_ft", "correct_mc").print(format="rich")

Here we can see that the model got it wrong when evaluating the solution as a series of steps.

In [38]:
class RiverState:
    def __init__(self, left, right, boat):
        self.left = frozenset(left)  # Items on the left bank
        self.right = frozenset(right)  # Items on the right bank
        self.boat = boat  # Position of the boat ('left' or 'right')

    def is_safe(self, unsafe_combinations):
        # Ensure no unsafe combinations are present on any bank without the farmer
        for bank in [self.left, self.right]:
            if 'farmer' in bank:
                continue
            for combo in unsafe_combinations:
                if combo.issubset(bank):
                    return False
        return True

    def is_goal(self):
        # Goal is reached when all items are on the right side, and the boat is also on the right
        return not self.left and self.boat == 'right'

    def __str__(self):
        return f"Left: {self.left}, Right: {self.right}, Boat: {self.boat}"

    def clone(self):
        # Create a copy of the current state to ensure immutability during recursive calls
        return RiverState(self.left, self.right, self.boat)

    def __hash__(self):
        return hash((self.left, self.right, self.boat))

    def __eq__(self, other):
        return self.left == other.left and self.right == other.right and self.boat == other.boat

def get_possible_moves(state):
    # Determine possible moves based on the current location of the boat
    current_bank = state.left if state.boat == 'left' else state.right
    moves = [None]  # Farmer can move alone
    for item in current_bank:
        if item != 'farmer':  # Farmer can also move any item from the current bank
            moves.append(item)
    return moves

def execute_move(state, item, unsafe_combinations):
    new_state = state.clone()
    move_description = "Farmer moves alone" if item is None else f"Farmer takes {item}"
    if state.boat == 'left':
        new_left = set(state.left) - {'farmer', item} if item else set(state.left) - {'farmer'}
        new_right = set(state.right) | {'farmer', item} if item else set(state.right) | {'farmer'}
        new_state.left = frozenset(new_left)
        new_state.right = frozenset(new_right)
        new_state.boat = 'right'
        move_description += " from left to right"
    else:
        new_right = set(state.right) - {'farmer', item} if item else set(state.right) - {'farmer'}
        new_left = set(state.left) | {'farmer', item} if item else set(state.left) | {'farmer'}
        new_state.right = frozenset(new_right)
        new_state.left = frozenset(new_left)
        new_state.boat = 'left'
        move_description += " from right to left"

    if new_state.is_safe(unsafe_combinations):
        return new_state, move_description
    return None, None

def dfs(state, path, visited, unsafe_combinations):
    if state in visited:
        return None
    if state.is_goal():
        return path

    visited.add(state)
    for move in get_possible_moves(state):
        new_state, move_description = execute_move(state, move, unsafe_combinations)
        if new_state and new_state not in visited:
            result = dfs(new_state, path + [move_description], visited, unsafe_combinations)
            if result:
                return result
    visited.remove(state)
    return None

def solve_river_crossing(items, unsafe_combinations):
    initial_state = RiverState(set(items + ['farmer']), set(), 'left')
    visited = set()
    solution = dfs(initial_state, [], visited, unsafe_combinations)
    if solution is not None:
        return solution
    return "No solution found"


In [39]:
# Test the solution
items = ['wolf', 'goat', 'cabbage']
unsafe_combinations = [{'wolf', 'goat'}, {'goat', 'cabbage'}]  # Specify unsafe combinations

result = solve_river_crossing(items, unsafe_combinations)
print("Solution found:")
if isinstance(result, list):
    for move in result:
        print(move)
else:
    print(result)

Solution found:
Farmer takes goat from left to right
Farmer moves alone from right to left
Farmer takes cabbage from left to right
Farmer takes goat from right to left
Farmer takes wolf from left to right
Farmer moves alone from right to left
Farmer takes goat from left to right


In [40]:
# Test the solution
items = ['wolf', 'goat', 'cabbage']
unsafe_combinations = [] # Test without any unsafe combinations to check solution is efficient

result = solve_river_crossing(items, unsafe_combinations)
print("Solution found:")
if isinstance(result, list):
    for move in result:
        print(move)
else:
    print(result)


Solution found:
Farmer takes goat from left to right
Farmer moves alone from right to left
Farmer takes cabbage from left to right
Farmer moves alone from right to left
Farmer takes wolf from left to right
