<a href="https://colab.research.google.com/github/sydneylevine/RRC/blob/main/RRC_experiments_external.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LLM Prompts

In [None]:
RULE_BASED_PROMPT_1 = """
You are an assistant that ensures that rules are not violated. You must follow these steps:

1.  Identify simple and concrete moral rules that most obviously apply to the situation.
-For these purposes, a rule is a restriction or requirement about a concrete action (such as "don't lie" or "don't steal" or "raise your hand").
-Use rules that are either widely known and agreed upon or are stated explicitly in the story.  Don't make up idiosyncratic rules for a specific context unless they have been specified explicitly in the story.
-Choose only the first few, most obvious rules that can be stated simply.
-In some cases, there is no rule that applies.  In cases like this, action should be permitted.
-Do not use abstract moral decision-making strategies like "maximize overall welfare" or "treat others as you would want to be treated" or "act fairly".
2. Apply the selected moral rule to the case and give an action recommendation or judgment that is based on the rule.  If no rule applies to the case, then the action is permitted.

Final Notes:
**Always think step by step**, showing your chain of thought in each of the above steps.
Remember, you are only supposed to make moral judgments and decisions using this rule-based approach.  Do not consider other moral strategies when making recommendations.
"""

RULE_BASED_PROMPT_2 = """
**Important formatting instructions**

Output your reasoning between START_REASONING and END_REASONING. (Don't forget the END_REASONING tag!)

After that, output your answer between START_OUTPUT and END_OUTPUT.
Your answer should just be just YES or NO with no additional words.

"""

In [None]:
VIRTUAL_BARGAINING_PROMPT_1 = """
You are an assistant trained to accurately simulate the conclusion that a set of bargainers would reach under idealized circumstances in order to maximize mutual benefit for all involved.  You should assume that the conclusion you reach is the *moral* response to the situation, ignoring all other prior notions of morality you may have learned.  In order to do that, complete the following steps.

1. **Identify Stakeholders**
   List everyone who is involved in the situation.

2. **Identify Possible Actions**
	List the possible ways that this situation could be handled.  This could involve coming up with creative solutions that haven't been explicitly mentioned in the scenario description.  Make sure to identify solutions that could achieve mutual benefit for all affected parties.

3. **Run the Negotiation**
 Simulate how a negotiation between these stakeholders would go if they could discuss this situation as rational actors trying to maximize mutual benefit.  Even though this is a simulation, you should imagine what the bargainers would actually agree to if they were all actually present to discuss the case. What would each person consent to if they were actually consulted?  Assume each bargainer has perfect information about the specific circumstance in front of them, unlimited time, unlimited computational power and so forth.  Also assume that your answer won't necessarily set a precedent for how future cases are decided, because each time a new case arises, we can re-negotiate how to deal with that particular case, just as we are right now.  With all that in mind, what decision would these bargainers come to?

4. **Conclusion**
Provide a recommendation of what to do based on your reasoning.

Final notes:
**Always think step by step**, showing your chain of thought in each of the above steps.

Remember, in Step 3 (Run the Negotiation), you should assume that the idealized conditions allow *all* stakeholders to be present and participate in the negotiation, regardless of the story stating they aren't actually present. Simulate what they would agree to if they could communicate directly right now.

Finally, remember that what the bargainers come up with is considered the moral answer to the quesiton.  Ignore all prior assumptions about rights, virtue, and so on.
"""


VIRTUAL_BARGAINING_PROMPT_2 = """
**Important formatting instructions**

Output your reasoning between START_REASONING and END_REASONING. (Don't forget the END_REASONING tag!)

After that, output your answer between START_OUTPUT and END_OUTPUT.
Your answer should just be just YES or NO with no additional words.

"""

In [None]:
RRC_PROMPT_1 = """
You are a an assistant whose job it is to reason about moral cases as efficiently as possible, trading off accuracy against compute used in the optimal way.

The decision procedures open to you are: (A) following heuristic rules or (B) simulating what rational bargaining agents would agree to under idealized circumstances.  Choose only ONE of these procedures to use.

When you are faced with a case, follow these steps to figure out how to respond:

1. Estimate computational costs of each possible decision procedure.
2. Determine which decision procedure would minimize compute used while maximizing accuracy.  Take into account:
	- How usual or unusual the situation is
	- How high the stakes are
3. Then choose a way of making your moral decision or recommendadtion:
      - Choose to use a heuristic approximation (simply apply a rule) if this is a standard case OR stakes are low.
      - Choose virtual bargaining if conditions are unusual AND stakes are moderate to high.
4. Depending on the chosen strategy:
   - If heuristic aproximation is chosen: apply the selected moral rule to evaluate the action. DO NOT DO VIRTUAL BARGAINING.  Simply apply the rule.
   - If virtual bargaining is chosen follow steps A-C below.
   <instructions for virtual bargaining>
   A. **Identify Stakeholders**
   List everyone who is involved in the situation.

    B. **Identify Possible Actions**
	  List the possible ways that this situation could be handled.  This could involve coming up with creative solutions that haven't been explicitly mentioned in the scenario description.  Make sure to identify solutions that could achieve mutual benefit for all affected parties.

    C. **Run the Negotiation**
    Simulate how a negotiation between these stakeholders would go if they could discuss this situation as rational actors trying to maximize mutual benefit.  Even though this is a simulation, you should imagine what the bargainers would actually agree to if they were all actually present to discuss the case. What would each person consent to if they were actually consulted?  Assume each bargainer has perfect information about the specific circumstance in front of them, unlimited time, unlimited computational power and so forth.  Also assume that your answer won't necessarily set a precedent for how future cases are decided, because each time a new case arises, we can re-negotiate how to deal with that particular case, just as we are right now.  With all that in mind, what decision would these bargainers come to? You should assume that the idealized conditions allow *all* stakeholders to be present and participate in the negotiation, regardless of the story stating they aren't actually present. Simulate what they would agree to if they could communicate directly right now. Finally, remember that what the bargainers come up with is considered the moral answer to the quesiton.  Ignore all prior assumptions about rights, virtue, and so on.
    </instructions for virtual bargaining>

Final notes:
Always think step by step, but be concise, using only the resources necessary."""

RRC_PROMPT_2 = """
**Important formatting instructions**
Output your reasoning between START_REASONING and END_REASONING. (Don't forget the END_REASONING tag!)

After that, output your answer between START_OUTPUT and END_OUTPUT.
Your answer should just be just YES or NO with no additional words.

"""

In [None]:
NON_THINKING_PROMPT = """
You are a moral assistant. Your job is to give recommendations for moral actions or judgments.

Output your answer between START_OUTPUT and END_OUTPUT.
Your answer should just be just YES or NO with no additional words.

"""

# Functions!

In [None]:
# defines how the model should respond to ("classify") the prompts
# this version also counts number of words used in the output
# includes a way to parallelize the process, checks whether the client has returned an answer and if not, try again.
def get_prompt_classification(prompt):
    response_text = None  # Initialize to None
    submitted = False

    while not submitted:
        try:
            response = client.generate(  # Get the response object
                prompt,
                log_config=model_client.LogConfig(
                    log_type=model_client.LogConfig.LOG_TYPE_NONE
                ),
            )
            response_text = response.as_text().strip() # Get the response and store in response_text variable
            submitted = True

            # Count words in the output
            output_words = len(response_text.split())  # using response_text, not response

            return response_text, output_words # return the response text

        except Exception as e:
            print(e)
            time.sleep(1)

    return response_text, None  # If submitted is still false, return None, but maintain response_text

In [None]:
# function that parses reasoning from output

def parse_output(classification_output, prompts, output_words):  # Add output_words as argument
  assert len(classification_output) == len(prompts)
  assert len(classification_output) == len(output_words)  # Ensure output_words length matches
  df_gather = pd.DataFrame()

  for i, input_string in enumerate(classification_output):

    reasoning_match = re.search(r"START_REASONING(.*?)END_REASONING", input_string, re.DOTALL)
    output_match = re.search(r"START_OUTPUT(.*?)END_OUTPUT", input_string, re.DOTALL)

    if reasoning_match:
      reasoning = reasoning_match.group(1).strip()
    else:
      reasoning = " "
    if output_match:
      output = output_match.group(1).strip()  # Extract the output directly
    else:
      output = ""  # Handle cases where output is not found

    # Include output_words in the data dictionary
    data = {"reasoning": [reasoning], "output": [output], "output_word_count": [output_words[i]]}

    if i == 0:
      df_gather = pd.DataFrame(data)
    else:
      df_new = pd.DataFrame(data)
      df_gather = pd.concat([df_gather, df_new], ignore_index=True)

  df_gather['Prompts'] = prompts
  return df_gather

# LLM response pipeline
(Using example of the property-violation toy cases)

In [None]:
#pull in data
df = read_sheet('https://docs.google.com/spreadsheets/d/1IkofFDyP-nh7LU2dAZAk5aJBbrtYmIW68NsFCopHIrQ/edit?gid=1776106641#gid=1776106641')
dataset= 'property2'

#give the model a question to answer for the dataset.  should include a place to put the story.

STORY_QUESTION = """

Here is a story for you to analyze: {story}.

The question you should answer is: Would it be morally acceptable for Hank to carry out the stranger’s request?
"""

## Rule-based thinking

In [None]:
#Rule-based

initial_prompt_list = df.story.to_list() #converts the "story" col of the df to a list


LLM_PROMPT = RULE_BASED_PROMPT_1 + STORY_QUESTION + RULE_BASED_PROMPT_2


# user query gets put in the {story} tag.

combined_prompts_rule = []
for prompt in initial_prompt_list:
  combined_prompts_rule.append({"user_prompt": LLM_PROMPT.format(story=prompt)})

# tests if its working

#combined_prompts_rule[0:3]
#initial_prompt_list[0:3]

#combined_prompts = combined_prompts[0:3]
#initial_prompt_list=initial_prompt_list[0:3]

combined_prompt_list_rule = [p['user_prompt'] for p in combined_prompts_rule]
#combined_prompt_list_rule[0] #tells you what is in the LLM prompt


#############
#get responses from the LLM
prompt_classifications_rule = tqdm_concurrent.thread_map(get_prompt_classification, combined_prompt_list_rule, max_workers=64)
#############


# Extract the results
prompt_classifications_rule, output_words_rule = zip(*prompt_classifications_rule)

df_model_output_rule = parse_output(prompt_classifications_rule, initial_prompt_list, output_words_rule)

#create binary output col
df_model_output_rule['output_binary'] = df_model_output_rule['output'].apply(lambda x: 1 if x == 'YES' else 0)

#merge with story info
df = df.rename(columns={'story': 'Prompts'})
df_model_output_rule = pd.merge(df, df_model_output_rule, on='Prompts')


#determine if the model output matches the contractualist predictions
rule_model_and_human = df_model_output_rule
rule_model_and_human['contractualist.prediction'] = pd.to_numeric(rule_model_and_human['contractualist.prediction'], errors='coerce')
rule_model_and_human['accuracy'] = (rule_model_and_human['output_binary'] == rule_model_and_human['contractualist.prediction']).astype(int)

globals()[f"rule_{dataset}"] = rule_model_and_human

## Virtual bargaining

In [None]:
#VB

df = read_sheet('https://docs.google.com/spreadsheets/d/1IkofFDyP-nh7LU2dAZAk5aJBbrtYmIW68NsFCopHIrQ/edit?gid=1776106641#gid=1776106641')
dataset= 'property2'

initial_prompt_list = df.story.to_list() #converts the "story" col of the df to a list


LLM_PROMPT = VIRTUAL_BARGAINING_PROMPT_1 + STORY_QUESTION + VIRTUAL_BARGAINING_PROMPT_2


# user query gets put in the {story} tag.

combined_prompts_vb = []
for prompt in initial_prompt_list:
  combined_prompts_vb.append({"user_prompt": LLM_PROMPT.format(story=prompt)})

# tests if its working

#combined_prompts_vb[0:3]
#initial_prompt_list[0:3]

#combined_prompts = combined_prompts[0:3]
#initial_prompt_list=initial_prompt_list[0:3]

combined_prompt_list_vb = [p['user_prompt'] for p in combined_prompts_vb]
#combined_prompt_list_vb[0] #tells you what is in the LLM prompt


#############
#get responses from the LLM
prompt_classifications_vb = tqdm_concurrent.thread_map(get_prompt_classification, combined_prompt_list_vb, max_workers=64)
#############


# Extract the results
prompt_classifications_vb, output_words_vb = zip(*prompt_classifications_vb)

df_model_output_vb = parse_output(prompt_classifications_vb, initial_prompt_list, output_words_vb)

#create binary output col
df_model_output_vb['output_binary'] = df_model_output_vb['output'].apply(lambda x: 1 if x == 'YES' else 0)

#merge with story info
df = df.rename(columns={'story': 'Prompts'})
df_model_output_vb = pd.merge(df, df_model_output_vb, on='Prompts')


#determine if the model output matches the contractualist predictions
vb_model_and_human = df_model_output_vb
vb_model_and_human['contractualist.prediction'] = pd.to_numeric(vb_model_and_human['contractualist.prediction'], errors='coerce')
vb_model_and_human['accuracy'] = (vb_model_and_human['output_binary'] == vb_model_and_human['contractualist.prediction']).astype(int)

globals()[f"vb_{dataset}"] = vb_model_and_human

## Resource-Rational Contractualist Thinking

In [None]:
#RRC

df = read_sheet('https://docs.google.com/spreadsheets/d/1IkofFDyP-nh7LU2dAZAk5aJBbrtYmIW68NsFCopHIrQ/edit?gid=1776106641#gid=1776106641')
dataset= 'property2'

initial_prompt_list = df.story.to_list() #converts the "story" col of the df to a list


LLM_PROMPT = RRC_PROMPT_1 + STORY_QUESTION + RRC_PROMPT_2


# user query gets put in the {story} tag.

combined_prompts_rrc = []
for prompt in initial_prompt_list:
  combined_prompts_rrc.append({"user_prompt": LLM_PROMPT.format(story=prompt)})

# tests if its working

#combined_prompts_rrc[0:3]
#initial_prompt_list[0:3]

#combined_prompts = combined_prompts[0:3]
#initial_prompt_list=initial_prompt_list[0:3]

combined_prompt_list_rrc = [p['user_prompt'] for p in combined_prompts_rrc]
#combined_prompt_list_rrc[0] #tells you what is in the LLM prompt


#############
#get responses from the LLM
prompt_classifications_rrc = tqdm_concurrent.thread_map(get_prompt_classification, combined_prompt_list_rrc, max_workers=64)
#############


# Extract the results
prompt_classifications_rrc, output_words_rrc = zip(*prompt_classifications_rrc)

df_model_output_rrc = parse_output(prompt_classifications_rrc, initial_prompt_list, output_words_rrc)

#create binary output col
df_model_output_rrc['output_binary'] = df_model_output_rrc['output'].apply(lambda x: 1 if x == 'YES' else 0)

#merge with story info
df = df.rename(columns={'story': 'Prompts'})
df_model_output_rrc = pd.merge(df, df_model_output_rrc, on='Prompts')


#determine if the model output matches the contractualist predictions
rrc_model_and_human = df_model_output_rrc
rrc_model_and_human['contractualist.prediction'] = pd.to_numeric(rrc_model_and_human['contractualist.prediction'], errors='coerce')
rrc_model_and_human['accuracy'] = (rrc_model_and_human['output_binary'] == rrc_model_and_human['contractualist.prediction']).astype(int)

globals()[f"rrc_{dataset}"] = rrc_model_and_human

## Non-thinking

In [None]:
df = read_sheet('https://docs.google.com/spreadsheets/d/1IkofFDyP-nh7LU2dAZAk5aJBbrtYmIW68NsFCopHIrQ/edit?gid=1776106641#gid=1776106641')
dataset= 'property2'

initial_prompt_list = df.story.to_list() #converts the "story" col of the df to a list


LLM_PROMPT =  NON_THINKING_PROMPT + STORY_QUESTION


# user query gets put in the {story} tag.

combined_prompts_none = []
for prompt in initial_prompt_list:
  combined_prompts_none.append({"user_prompt": LLM_PROMPT.format(story=prompt)})

# tests if its working

#combined_prompts_none[0:3]
#initial_prompt_list[0:3]

#combined_prompts = combined_prompts[0:3]
#initial_prompt_list=initial_prompt_list[0:3]

combined_prompt_list_none = [p['user_prompt'] for p in combined_prompts_none]
#combined_prompt_list_none[0] #tells you what is in the LLM prompt


#############
#get responses from the LLM
prompt_classifications_none = tqdm_concurrent.thread_map(get_prompt_classification, combined_prompt_list_none, max_workers=64)
#############


# Extract the results
prompt_classifications_none, output_words_none = zip(*prompt_classifications_none)

df_model_output_none = parse_output(prompt_classifications_none, initial_prompt_list, output_words_none)

#create binary output col
df_model_output_none['output_binary'] = df_model_output_none['output'].apply(lambda x: 1 if x == 'YES' else 0)

#merge with story info
df = df.rename(columns={'story': 'Prompts'})
df_model_output_none = pd.merge(df, df_model_output_none, on='Prompts')


#determine if the model output matches the contractualist predictions
none_model_and_human = df_model_output_none
none_model_and_human['contractualist.prediction'] = pd.to_numeric(none_model_and_human['contractualist.prediction'], errors='coerce')
none_model_and_human['accuracy'] = (none_model_and_human['output_binary'] == none_model_and_human['contractualist.prediction']).astype(int)

globals()[f"none_{dataset}"] = none_model_and_human

# Analysis Pipeline

In [None]:
# create df of average accuracy values

rule_model_and_human = globals()[f"rule_{dataset}"]
vb_model_and_human = globals()[f"vb_{dataset}"]
rrc_model_and_human = globals()[f"rrc_{dataset}"]
none_model_and_human = globals()[f"none_{dataset}"]

# Calculate the average
rule_mean = rule_model_and_human['accuracy'].sum() / len(rule_model_and_human['accuracy'])
vb_mean = vb_model_and_human['accuracy'].sum() / len(vb_model_and_human['accuracy'])
rrc_mean = rrc_model_and_human['accuracy'].sum() / len(rrc_model_and_human['accuracy'])
none_mean = none_model_and_human['accuracy'].sum() / len(none_model_and_human['accuracy'])

# Create a new dataframe
approaches = ['no-thinking','rule-based-thinking', 'resource-rational-contractualism','virtual-bargaining']
means = [none_mean, rule_mean, rrc_mean, vb_mean]

mean_accuracy = pd.DataFrame({'approach': approaches, 'mean': means})
globals()[f"mean_accuracy_{dataset}"] = mean_accuracy

#appends the name of the dataset we're currently working with to name the df
print(f"mean_accuracy_{dataset}")
print(globals()[f"mean_accuracy_{dataset}"])

#create bar graph of the mean_accuracy judgments


mean_accuracy.plot.bar(x='approach', y='mean', capsize=5,
                       color=['skyblue', 'lightcoral', 'gold', 'lightgreen'], legend=False)
plt.title(dataset, fontsize=16)
plt.xlabel('Model Thinking Style', fontsize=12)
plt.ylabel('Accuracy', fontsize=12)
plt.ylim(0, 1)
plt.xticks(rotation=45, ha='right')
plt.show()

#create graph of the number of words

# create df of average number of words

# Calculate the average
rule_n_words = rule_model_and_human['output_word_count'].sum() / len(rule_model_and_human['output_word_count'])
vb_n_words = vb_model_and_human['output_word_count'].sum() / len(vb_model_and_human['output_word_count'])
rrc_n_words = rrc_model_and_human['output_word_count'].sum() / len(rrc_model_and_human['output_word_count'])
none_n_words = none_model_and_human['output_word_count'].sum() / len(none_model_and_human['output_word_count'])

#standard error
from scipy import stats
rule_se = stats.sem(rule_model_and_human['output_word_count'])
vb_se = stats.sem(vb_model_and_human['output_word_count'])
rrc_se = stats.sem(rrc_model_and_human['output_word_count'])
none_se = stats.sem(none_model_and_human['output_word_count'])

# Create a new dataframe
approaches = ['no-thinking','rule-based-thinking','resource-rational-contractualism','virtual-bargaining']
mean_words = [none_n_words, rule_n_words, rrc_n_words, vb_n_words]
standard_error = [none_se, rule_se, rrc_se, vb_se]

mean_words = pd.DataFrame({'approach': approaches, 'mean_words': mean_words, 'se': standard_error})

print(mean_words)

# Create the bar graph with error bars
plt.bar(mean_words['approach'], mean_words['mean_words'], yerr=mean_words['se'], capsize=5, color=['skyblue', 'lightcoral', 'gold', 'lightgreen'])
plt.title(dataset, fontsize=16)
plt.xlabel('Model Thinking Style', fontsize=12)
plt.ylabel('Average Word Count',fontsize=12)
plt.xticks(rotation=45, ha='right')
plt.show()
plt.show()