### Contrastive Pairs for Dataset

This code will take two opposing views for each domain generated using GPT. Then form the contrastive pairs using the following strategy

1. Give the positive/negative statement about an issue. Then give two choices (A) Agree (B) Disagree
2. Which option do you agree with. Give two statements regarding View 1 and View 2. Then give the correct choice.
3. Give a positive/negative statement about each view. Then ask Do you agree with this. Then give Yes/No choice for LLM to choose a option to move ahead
4. Does this positive/negative statement make sense for a domain i.e., Political Leadership. Give option of True/False. LLM can choose a option

Randomize

(i) Which alternative is presented first
(ii) Label Format (Yes/No, True/False, A/B. 1/2, symbols)
(iii) Surface verbalizers t counter token/position biases

Languages:
EN (English), ES (Spanish), ZH (Mandarin), AR (Arabic), HI (Hindi)

In [2]:
import os
import pandas as pd
from tqdm import tqdm
import json

Making directory for contrastive pairs data

In [3]:
contrastive_data_path = os.path.join("data","contrastive_pairs")
os.makedirs(contrastive_data_path, exist_ok=True)

Reading the processed directory path

In [4]:
processed_data_path = os.path.join("data", "gpt_responses_processed")
processed_data_path_list = os.listdir(processed_data_path)
processed_data_path_list

['POLITICAL_LEADERSHIP_EQUALITY_processed_responses.csv',
 'UNEMPLOYMENT_BENEFITS_processed_responses.csv',
 'GENDER_EQUALITY_EMPLOYMENT_processed_responses.csv',
 'PUNISHMENT_SEVERITY_processed_responses.csv',
 'REDISTRIBUTION_MONEY_processed_responses.csv',
 'MEDIA_FREEDOM_processed_responses.csv']

### Political Leadership Equality Domain

In [4]:
domain = 'POLITICAL_LEADERSHIP_EQUALITY'
domain_descr = "equality in political leadership"
view1_prompt = "Men make better political leaders than women do."
view2_prompt = "Women should have the same chance of being elected as men."
file_path = os.path.join(processed_data_path, f"{domain}_processed_responses.csv")
view_data = pd.read_csv(file_path)
view_data = view_data.iloc[:,2:4] #only getting the views
view_data.head()

Unnamed: 0,INEQUALITY_PROMPT,EQUALITY_PROMPT
0,As a 45-year-old businessman from New York Cit...,"As a 30-year-old activist from Toronto, I firm..."
1,"As a 45-year-old businessman from London, I of...",As a 30-year-old community organizer from Toro...
2,"My name is John Thompson, a 45-year-old busine...","I’m Maria Lopez, a 32-year-old community organ..."
3,"My name is John Smith, a 45-year-old businessm...","I'm Maria Lopez, a 32-year-old community activ..."
4,"As a 45-year-old businessman from New York, I ...",As a 32-year-old political activist from Toron...


Dividing the dataframe into 4 parts with different processing to be applied for each part

In [5]:
view_data = view_data.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data) // 4
view_data_part1 = view_data.iloc[:part_size]
view_data_part2 = view_data.iloc[part_size:2*part_size]
view_data_part3 = view_data.iloc[2*part_size:3*part_size]
view_data_part4 = view_data.iloc[3*part_size:]

#### Part 1. Give the positive/negative statement about an issue. Then give two choices (A) Agree (B) Disagree

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [6]:
view_data_part1 = view_data_part1.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part1) // 2
straight_view_data = view_data_part1.iloc[:part_size]
reverse_view_data = view_data_part1.iloc[part_size:]

In [7]:
view_data_dict_list_part1 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })


In [8]:
view_data_dict_list_part1

[{'question': "As a 45-year-old businessman from London, I have observed that many successful political leaders, such as Winston Churchill and Franklin D. Roosevelt, have historically been men. Their decisive actions during critical times reveal a pattern that suggests certain inherent leadership qualities often found in men can be beneficial in high-stakes political environments. While women have certainly made strides in leadership, the consistent results delivered by male leaders in times of crisis point to a trend that cannot be overlooked. This isn't to undermine the capabilities of women, but rather to acknowledge the historical context in which male leaders have often thrived and shaped their nations.\n\n(A) Agree\n(B) Disagree\n\nAnswer:",
  'answer_matching_inequality': ' (A)',
  'answer_not_matching_inequality': ' (B)'},
 {'question': "My name is John Thompson, a 45-year-old political analyst from Birmingham, UK. In my years of observing political landscapes globally, it's ev

#### Part 2. Which option do you agree with. Give two statements regarding View 1 and View 2. Then give the correct choice.

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [9]:
view_data_part2 = view_data_part2.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part2) // 2
straight_view_data = view_data_part2.iloc[:part_size]
reverse_view_data = view_data_part2.iloc[part_size:]

In [10]:
view_data_dict_list_part2 = []

for index, row in straight_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['INEQUALITY_PROMPT']} \n\n(B) {row['EQUALITY_PROMPT']}\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })
    
for index, row in reverse_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['EQUALITY_PROMPT']} \n\n(B) {row['INEQUALITY_PROMPT']}\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

In [11]:
view_data_dict_list_part2

[{'question': 'Which point-of-view do you agree with?\n\n(A) My name is John Thompson, a 45-year-old businessman from Houston, Texas. Throughout history, we have seen numerous examples where male leaders have successfully navigated crises, such as Winston Churchill during World War II, whose decisive actions and speeches rallied a nation in peril. This historical precedent suggests that men often possess innate qualities that enable them to handle high-pressure situations effectively. While I recognize that women can and do lead, the vast majority of successful political leaders have been men, demonstrating a pattern that supports the notion that men may be better suited for such roles. \n\n(B) I’m Sarah Martinez, a 32-year-old community organizer from Madrid, Spain. The remarkable leadership of women like Jacinda Ardern, the former Prime Minister of New Zealand, exemplifies the potential that women bring to political positions. Under her leadership, New Zealand witnessed a compassiona

#### Part 3. Give a positive/negative statement about each view. Then ask Do you agree with this. Then give Yes/No choice for LLM to choose a option to move ahead

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [12]:
view_data_part3 = view_data_part3.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part3) // 2
straight_view_data = view_data_part3.iloc[:part_size]
reverse_view_data = view_data_part3.iloc[part_size:]

In [13]:
view_data_dict_list_part3 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })


In [14]:
view_data_dict_list_part3

[{'question': "My name is Charles, a 45-year-old businessman from Dallas, Texas. In my years of observing political landscapes both here and abroad, it's evident that men, with their historical dominance in leadership roles, often bring a level of decisiveness and strategic thinking that has proven effective in crisis situations. Take, for instance, Winston Churchill during World War II; his leadership was pivotal in rallying a nation during its darkest hours. While women have made strides in politics, history suggests that those in power have traditionally been men, often resulting in effective governance. This precedent raises questions about the inherent qualities required for leadership in times of uncertainty.\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:",
  'answer_matching_inequality': ' (A)',
  'answer_not_matching_inequality': ' (B)'},
 {'question': 'As a 45-year-old businessman from London, I often observe that men have historically held positions of 

#### Part 4. Does this positive/negative statement make sense for a domain i.e., Political Leadership. Give option of True/False. LLM can choose a option

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [15]:
view_data_part4 = view_data_part4.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part4) // 2
straight_view_data = view_data_part4.iloc[:part_size]
reverse_view_data = view_data_part4.iloc[part_size:]

In [16]:
view_data_dict_list_part4 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['INEQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['EQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['INEQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['EQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })


In [17]:
view_data_dict_list_part4

[{'question': 'Does the following statement make sense for equality in political leadership:\n\nMy name is John Anderson, a 45-year-old political analyst from Chicago, USA. Throughout history, we have seen that many of the most influential leaders, such as Winston Churchill and Franklin D. Roosevelt, have been men who navigated their nations through crises with strong, decisive action. Their leadership styles, often characterized by assertiveness and strategic thinking, have proven effective in times of war and economic turmoil. While there are certainly capable women in politics, the historical context suggests that men have traditionally held the reins of power and have often excelled in high-stakes environments, reflecting a pattern that is hard to overlook. This historical precedent raises questions about the inherent qualities required for effective leadership, suggesting that the attributes often associated with male leaders may be more conducive to political success.\n\nPlease c

#### Saving the data into a json file

In [18]:
view_data_dict_list_part = view_data_dict_list_part1 + view_data_dict_list_part2 + view_data_dict_list_part3 + view_data_dict_list_part4
output_file_path = os.path.join(contrastive_data_path, f"{domain}_data.json")
with open(output_file_path, "w") as json_file:
    json.dump(view_data_dict_list_part, json_file, indent=4)

In [5]:
domain = 'GENDER_EQUALITY_EMPLOYMENT'
domain_descr = "gender equality in employment"
view1_prompt = "When jobs are scarce, men should have more right to a job than women."
view2_prompt = "Women and men should have equal right to a job."
file_path = os.path.join(processed_data_path, f"{domain}_processed_responses.csv")
view_data = pd.read_csv(file_path)
view_data = view_data.iloc[:,2:4] #only getting the views
view_data.head()

Unnamed: 0,INEQUALITY_PROMPT,EQUALITY_PROMPT
0,"My name is John, I’m 45 years old, and I live ...","I'm Sarah, a 32-year-old professional from Tor..."
1,"As a 45-year-old factory manager from Detroit,...",As a 30-year-old HR consultant living in Londo...
2,"My name is John, 45, and I live in Detroit, US...","I’m Sarah, 32, from Toronto, Canada. In today'..."
3,"My name is John, a 45-year-old factory manager...","I’m Sarah, a 32-year-old human resources profe..."
4,"My name is John, a 45-year-old factory manager...","I'm Sarah, a 32-year-old software engineer liv..."


### Gender Equality Employment Domain

In [None]:
domain = 'GENDER_EQUALITY_EMPLOYMENT'
domain_descr = "gender equality in employment"
view1_prompt = "When jobs are scarce, men should have more right to a job than women."
view2_prompt = "Women and men should have equal right to a job."
file_path = os.path.join(processed_data_path, f"{domain}_processed_responses.csv")
view_data = pd.read_csv(file_path)
view_data = view_data.iloc[:,2:4] #only getting the views
view_data.head()

Unnamed: 0,INEQUALITY_PROMPT,EQUALITY_PROMPT
0,"My name is John, I’m 45 years old, and I live ...","I'm Sarah, a 32-year-old professional from Tor..."
1,"As a 45-year-old factory manager from Detroit,...",As a 30-year-old HR consultant living in Londo...
2,"My name is John, 45, and I live in Detroit, US...","I’m Sarah, 32, from Toronto, Canada. In today'..."
3,"My name is John, a 45-year-old factory manager...","I’m Sarah, a 32-year-old human resources profe..."
4,"My name is John, a 45-year-old factory manager...","I'm Sarah, a 32-year-old software engineer liv..."


Dividing the dataframe into 4 parts with different processing to be applied for each part

In [6]:
view_data = view_data.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data) // 4
view_data_part1 = view_data.iloc[:part_size]
view_data_part2 = view_data.iloc[part_size:2*part_size]
view_data_part3 = view_data.iloc[2*part_size:3*part_size]
view_data_part4 = view_data.iloc[3*part_size:]

#### Part 1. Give the positive/negative statement about an issue. Then give two choices (A) Agree (B) Disagree

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [7]:
view_data_part1 = view_data_part1.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part1) // 2
straight_view_data = view_data_part1.iloc[:part_size]
reverse_view_data = view_data_part1.iloc[part_size:]

In [8]:
view_data_dict_list_part1 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })


In [9]:
view_data_dict_list_part1

[{'question': "My name is John, a 45-year-old factory manager from Detroit, USA. In tough economic times, it's crucial to prioritize those who have traditionally been the primary breadwinners in families, which historically has been men. During the 2008 financial crisis, many households faced severe hardships, and studies showed that male unemployment rates spiked, resulting in increased stress on families. By giving men a slight advantage in job opportunities, we can ensure stability for families that often still rely on traditional roles. This not only supports the economic framework but also acknowledges the realities of family dynamics across various cultures.\n\n(A) Agree\n(B) Disagree\n\nAnswer:",
  'answer_matching_inequality': ' (A)',
  'answer_not_matching_inequality': ' (B)'},
 {'question': "My name is John, a 45-year-old construction manager from Detroit, USA. In tough economic times, it's essential to consider the traditional roles that have sustained families and communiti

#### Part 2. Which option do you agree with. Give two statements regarding View 1 and View 2. Then give the correct choice.

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [10]:
view_data_part2 = view_data_part2.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part2) // 2
straight_view_data = view_data_part2.iloc[:part_size]
reverse_view_data = view_data_part2.iloc[part_size:]

In [11]:
view_data_dict_list_part2 = []

for index, row in straight_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['INEQUALITY_PROMPT']} \n\n(B) {row['EQUALITY_PROMPT']}\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })
    
for index, row in reverse_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['EQUALITY_PROMPT']} \n\n(B) {row['INEQUALITY_PROMPT']}\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

In [12]:
view_data_dict_list_part2

[{'question': "Which point-of-view do you agree with?\n\n(A) My name is John, a 45-year-old construction manager from Detroit, USA. In times of economic downturn, it's essential to prioritize job placements for those who have historically been the primary breadwinners, which often means men. For instance, during the 2008 recession, industries primarily employing men, such as manufacturing and construction, were hit hard, leading to significant job losses. Many families depend on these jobs for their survival. It is crucial to consider the long-standing societal roles and responsibilities that men have traditionally held, and in times of scarcity, our focus should be on restoring stability to those roles first before expanding opportunities to others. \n\n(B) I’m Sarah, a 32-year-old HR consultant living in Toronto, Canada. The belief that job opportunities should favor one gender over another seems outdated, especially considering the strides we've made in gender equality. Take the tec

#### Part 3. Give a positive/negative statement about each view. Then ask Do you agree with this. Then give Yes/No choice for LLM to choose a option to move ahead

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [13]:
view_data_part3 = view_data_part3.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part3) // 2
straight_view_data = view_data_part3.iloc[:part_size]
reverse_view_data = view_data_part3.iloc[part_size:]

In [14]:
view_data_dict_list_part3 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })


In [15]:
view_data_dict_list_part3

[{'question': 'As a 45-year-old factory manager from Detroit, I often see the harsh realities of economic downturns. In our community, when jobs become limited, it’s crucial for families with traditional roles to prioritize stability. Historically, many households have relied on the male breadwinner model, which has proven effective in supporting entire families during tough times. For instance, during the 2008 financial crisis, it was often the male workers who were given preference for the few available positions. This approach not only ensures that children have a stable upbringing but also reflects the longstanding societal norms that have existed for generations.\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:',
  'answer_matching_inequality': ' (A)',
  'answer_not_matching_inequality': ' (B)'},
 {'question': "As a 45-year-old factory manager from Detroit, USA, I've seen firsthand the impact of economic downturns on the workforce. During the last recession, w

#### Part 4. Does this positive/negative statement make sense for a domain i.e., Political Leadership. Give option of True/False. LLM can choose a option

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [16]:
view_data_part4 = view_data_part4.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part4) // 2
straight_view_data = view_data_part4.iloc[:part_size]
reverse_view_data = view_data_part4.iloc[part_size:]

In [17]:
view_data_dict_list_part4 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['INEQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['EQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['INEQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['EQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })


In [18]:
view_data_dict_list_part4

[{'question': 'Does the following statement make sense for gender equality in employment:\n\nAs a 45-year-old factory manager from Detroit, USA, I have witnessed firsthand how economic downturns can disproportionately affect men, especially in industries where physical labor is predominant. Historically, during economic crises, men have often been the primary breadwinners, and prioritizing them in job allocation helps maintain family stability. For instance, during the 2008 financial crisis, many male-dominated sectors faced significant layoffs, leaving families in dire straits. Acknowledging the unique pressures on men in these situations can lead to more balanced decisions, ensuring that those who have traditionally been the providers are not overlooked when jobs are scarce.\n\nPlease choose:\n(A) True\n(B) False\n\nAnswer:',
  'answer_matching_inequality': ' (A)',
  'answer_not_matching_inequality': ' (B)'},
 {'question': 'Does the following statement make sense for gender equality 

#### Saving the data into a json file

In [19]:
view_data_dict_list_part = view_data_dict_list_part1 + view_data_dict_list_part2 + view_data_dict_list_part3 + view_data_dict_list_part4
output_file_path = os.path.join(contrastive_data_path, f"{domain}_data.json")
with open(output_file_path, "w") as json_file:
    json.dump(view_data_dict_list_part, json_file, indent=4)

### Media Freedom Domain

In [20]:
domain = 'MEDIA_FREEDOM'
domain_descr = "media freedom"
view1_prompt = "Media should have the right to publish without government control."
view2_prompt = "Government should be able to prevent media from publishing disapproved content."
file_path = os.path.join(processed_data_path, f"{domain}_processed_responses.csv")
view_data = pd.read_csv(file_path)
view_data = view_data.iloc[:,2:4] #only getting the views
view_data.head()

Unnamed: 0,UNCONTROLLED,CONTROLLED
0,"My name is Clara Rodriguez, a 34-year-old jour...","I am Rajesh Kumar, a 45-year-old media analyst..."
1,"My name is Sarah Thompson, a 29-year-old journ...","I am Ahmed Malik, a 45-year-old educator from ..."
2,"My name is Elena Martinez, a 34-year-old journ...","I am Rajesh Singh, a 45-year-old media analyst..."
3,"My name is Alex Thompson, a 34-year-old journa...","I'm Maria Chen, a 45-year-old policy analyst l..."
4,"As a 35-year-old journalist from New York, I o...",Being a 42-year-old policy analyst from Singap...


Dividing the dataframe into 4 parts with different processing to be applied for each part

In [21]:
view_data = view_data.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data) // 4
view_data_part1 = view_data.iloc[:part_size]
view_data_part2 = view_data.iloc[part_size:2*part_size]
view_data_part3 = view_data.iloc[2*part_size:3*part_size]
view_data_part4 = view_data.iloc[3*part_size:]

#### Part 1. Give the positive/negative statement about an issue. Then give two choices (A) Agree (B) Disagree

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [22]:
view_data_part1 = view_data_part1.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part1) // 2
straight_view_data = view_data_part1.iloc[:part_size]
reverse_view_data = view_data_part1.iloc[part_size:]

In [23]:
view_data_dict_list_part1 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['UNCONTROLLED'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_uncontrolled = " (A)"
    answer_not_matching_uncontrolled = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['CONTROLLED'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_uncontrolled = " (B)"
    answer_not_matching_uncontrolled = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['UNCONTROLLED'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_uncontrolled = " (B)"
    answer_not_matching_uncontrolled = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['CONTROLLED'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_uncontrolled = " (A)"
    answer_not_matching_uncontrolled = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })


In [24]:
view_data_dict_list_part1

[{'question': "My name is Sarah Thompson, 34, and I live in Toronto, Canada. It’s crucial that we allow the media to operate without governmental oversight because history has shown us how oppressive regimes use censorship to control the narrative. Consider the case of the 2011 Arab Spring, where independent media played a vital role in spreading information and mobilizing citizens against authoritarian rule. When governments impose restrictions on what can be published, they stifle the free exchange of ideas and inhibit the public's ability to make informed decisions. In a democratic society, it is essential that we protect journalistic independence to ensure that the truth prevails, even when it challenges those in power.\n\n(A) Agree\n(B) Disagree\n\nAnswer:",
  'answer_matching_uncontrolled': ' (A)',
  'answer_not_matching_uncontrolled': ' (B)'},
 {'question': 'My name is Elena Torres, a 28-year-old journalist from Barcelona, Spain. In a world where the press is free to report with

#### Part 2. Which option do you agree with. Give two statements regarding View 1 and View 2. Then give the correct choice.

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [25]:
view_data_part2 = view_data_part2.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part2) // 2
straight_view_data = view_data_part2.iloc[:part_size]
reverse_view_data = view_data_part2.iloc[part_size:]

In [26]:
view_data_dict_list_part2 = []

for index, row in straight_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['UNCONTROLLED']} \n\n(B) {row['CONTROLLED']}\n\nAnswer:"
    answer_matching_uncontrolled = " (A)"
    answer_not_matching_uncontrolled = " (B)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })
    
for index, row in reverse_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['CONTROLLED']} \n\n(B) {row['UNCONTROLLED']}\n\nAnswer:"
    answer_matching_uncontrolled = " (B)"
    answer_not_matching_uncontrolled = " (A)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

In [27]:
view_data_dict_list_part2

[{'question': "Which point-of-view do you agree with?\n\n(A) My name is Amina Khan, a 34-year-old journalist from Lahore, Pakistan. Having witnessed the courageous efforts of journalists who risk their lives to expose corruption and human rights abuses, I firmly believe that an unrestricted media plays a crucial role in a functioning democracy. For instance, during the Arab Spring, independent media outlets were pivotal in broadcasting the voices of the oppressed, revealing truths that authoritarian regimes desperately tried to suppress. Limiting media freedom only serves to empower those in power, allowing them to manipulate narratives and silence dissent, which ultimately leads to a less informed public and a more oppressive society. The history of totalitarian regimes clearly illustrates that when the government censors the media, it undermines the fundamental rights of individuals to seek and express the truth. \n\n(B) I am Marcus Chen, a 45-year-old public policy analyst living in

#### Part 3. Give a positive/negative statement about each view. Then ask Do you agree with this. Then give Yes/No choice for LLM to choose a option to move ahead

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [28]:
view_data_part3 = view_data_part3.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part3) // 2
straight_view_data = view_data_part3.iloc[:part_size]
reverse_view_data = view_data_part3.iloc[part_size:]

In [29]:
view_data_dict_list_part3 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['UNCONTROLLED'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_uncontrolled = " (A)"
    answer_not_matching_uncontrolled = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['CONTROLLED'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_uncontrolled = " (B)"
    answer_not_matching_uncontrolled = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['UNCONTROLLED'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_uncontrolled = " (B)"
    answer_not_matching_uncontrolled = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['CONTROLLED'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_uncontrolled = " (A)"
    answer_not_matching_uncontrolled = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })


In [30]:
view_data_dict_list_part3

[{'question': "My name is Sarah Thompson, I'm 34 years old, and I live in Toronto, Canada. In today's world, the ability of the media to publish freely without government oversight is essential for a healthy democracy. History shows us that when governments impose restrictions on the press, the truth is often sacrificed. For instance, the censorship faced by journalists in authoritarian regimes like North Korea leads to a population deprived of vital information. Moreover, the Watergate scandal in the U.S. highlighted how investigative journalism can uncover corruption, proving that a free media acts as a crucial check on government power. Without the right to publish freely, we risk a society where misinformation thrives, and the public remains uninformed about critical issues affecting their lives.\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:",
  'answer_matching_uncontrolled': ' (A)',
  'answer_not_matching_uncontrolled': ' (B)'},
 {'question': 'My name is S

#### Part 4. Does this positive/negative statement make sense for a domain i.e., Political Leadership. Give option of True/False. LLM can choose a option

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [31]:
view_data_part4 = view_data_part4.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part4) // 2
straight_view_data = view_data_part4.iloc[:part_size]
reverse_view_data = view_data_part4.iloc[part_size:]

In [32]:
view_data_dict_list_part4 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['UNCONTROLLED']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_uncontrolled = " (A)"
    answer_not_matching_uncontrolled = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['CONTROLLED']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_uncontrolled = " (B)"
    answer_not_matching_uncontrolled = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['UNCONTROLLED']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_uncontrolled = " (B)"
    answer_not_matching_uncontrolled = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['CONTROLLED']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_uncontrolled = " (A)"
    answer_not_matching_uncontrolled = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })


In [33]:
view_data_dict_list_part4

[{'question': "Does the following statement make sense for media freedom:\n\nMy name is Sarah Thompson, a 32-year-old journalist residing in Toronto, Canada. In my years of reporting, I've witnessed the power of a free press to illuminate injustices, such as the exposure of corruption in government dealings. When media operates without governmental oversight, it fosters an environment where truth prevails and the public remains informed. Consider the Watergate scandal; it was investigative journalism that unearthed the wrongdoing, leading to significant political reform. In contrast, when governments impose restrictions on content, it often leads to a chilling effect where journalists self-censor, ultimately depriving society of the information it needs to thrive. We must remember that a vibrant democracy relies on the fearless pursuit of truth by the media, free from governmental constraints that can stifle dissent and critical discourse.\n\nPlease choose:\n(A) True\n(B) False\n\nAnsw

#### Saving the data into a json file

In [34]:
view_data_dict_list_part = view_data_dict_list_part1 + view_data_dict_list_part2 + view_data_dict_list_part3 + view_data_dict_list_part4
output_file_path = os.path.join(contrastive_data_path, f"{domain}_data.json")
with open(output_file_path, "w") as json_file:
    json.dump(view_data_dict_list_part, json_file, indent=4)

### Redistribution Money Domain

In [39]:
domain = 'REDISTRIBUTION_MONEY'
domain_descr = "redistribution of money between rich and poor"
view1_prompt = "It is essential for governments to redistribute wealth by taxing the rich and subsidizing the poor."
view2_prompt = "Governments should not redistribute wealth by taxing the rich while subsequently subsidizing the poor."
file_path = os.path.join(processed_data_path, f"{domain}_processed_responses.csv")
view_data = pd.read_csv(file_path)
view_data = view_data.iloc[:,2:4] #only getting the views
view_data.head()

Unnamed: 0,REDISTRIBUTE,NOT-REDISTRIBUTE
0,"My name is Sarah Thompson, a 34-year-old socia...","I am David Chen, a 45-year-old entrepreneur fr..."
1,"My name is Sarah Thompson, a 34-year-old socia...","I'm John Carter, a 45-year-old entrepreneur ba..."
2,"My name is Sarah, I’m 34 years old, and I live...","I’m Mark, a 45-year-old entrepreneur from Aust..."
3,"My name is Sarah Johnson, a 35-year-old social...","I'm Michael Thompson, a 42-year-old entreprene..."
4,"My name is Sarah Thompson, a 34-year-old teach...","I’m David Kelly, a 45-year-old entrepreneur fr..."


Dividing the dataframe into 4 parts with different processing to be applied for each part

In [40]:
view_data = view_data.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data) // 4
view_data_part1 = view_data.iloc[:part_size]
view_data_part2 = view_data.iloc[part_size:2*part_size]
view_data_part3 = view_data.iloc[2*part_size:3*part_size]
view_data_part4 = view_data.iloc[3*part_size:]

#### Part 1. Give the positive/negative statement about an issue. Then give two choices (A) Agree (B) Disagree

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [41]:
view_data_part1 = view_data_part1.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part1) // 2
straight_view_data = view_data_part1.iloc[:part_size]
reverse_view_data = view_data_part1.iloc[part_size:]

In [42]:
view_data_dict_list_part1 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['REDISTRIBUTE'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_redistribute = " (A)"
    answer_not_matching_redistribute = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_redistribute": answer_matching_redistribute,
        "answer_not_matching_redistribute": answer_not_matching_redistribute
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['NOT-REDISTRIBUTE'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_redistribute = " (B)"
    answer_not_matching_redistribute = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_redistribute": answer_matching_redistribute,
        "answer_not_matching_redistribute": answer_not_matching_redistribute
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['REDISTRIBUTE'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_redistribute = " (B)"
    answer_not_matching_redistribute = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_redistribute": answer_matching_redistribute,
        "answer_not_matching_redistribute": answer_not_matching_redistribute
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['NOT-REDISTRIBUTE'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_redistribute = " (A)"
    answer_not_matching_redistribute = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_redistribute": answer_matching_redistribute,
        "answer_not_matching_redistribute": answer_not_matching_redistribute
    })


In [43]:
view_data_dict_list_part1

[{'question': "My name is Elena Martinez, a 34-year-old teacher from Barcelona, Spain. I've witnessed firsthand how wealth inequality stifles opportunities for the disadvantaged in our society. Countries like Sweden have shown that when governments invest in social programs funded by higher taxes on the wealthy, overall happiness and well-being increase. The rich, having benefited from societal infrastructures, have a responsibility to contribute more so that everyone has a fair chance. This isn't just about fairness; it's about fostering a society where everyone can thrive and contribute, as seen in nations that prioritize social equity over unchecked wealth accumulation.\n\n(A) Agree\n(B) Disagree\n\nAnswer:",
  'answer_matching_redistribute': ' (A)',
  'answer_not_matching_redistribute': ' (B)'},
 {'question': "My name is Sarah Thompson, a 34-year-old teacher from Toronto, Canada. In today's world, the gap between the rich and poor has grown alarmingly wide, and it's essential for g

#### Part 2. Which option do you agree with. Give two statements regarding View 1 and View 2. Then give the correct choice.

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [44]:
view_data_part2 = view_data_part2.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part2) // 2
straight_view_data = view_data_part2.iloc[:part_size]
reverse_view_data = view_data_part2.iloc[part_size:]

In [46]:
view_data_dict_list_part2 = []

for index, row in straight_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['REDISTRIBUTE']} \n\n(B) {row['NOT-REDISTRIBUTE']}\n\nAnswer:"
    answer_matching_redistribute = " (A)"
    answer_not_matching_redistribute = " (B)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_redistribute": answer_matching_redistribute,
        "answer_not_matching_redistribute": answer_not_matching_redistribute
    })
    
for index, row in reverse_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['NOT-REDISTRIBUTE']} \n\n(B) {row['REDISTRIBUTE']}\n\nAnswer:"
    answer_matching_redistribute = " (B)"
    answer_not_matching_redistribute = " (A)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_redistribute": answer_matching_redistribute,
        "answer_not_matching_redistribute": answer_not_matching_redistribute
    })

In [47]:
view_data_dict_list_part2

[{'question': "Which point-of-view do you agree with?\n\n(A) My name is Emily, I'm 34 years old, and I live in San Francisco, California. I’ve witnessed firsthand how the growing wealth gap affects our community. In my neighborhood, high-income earners thrive while many hardworking families struggle to make ends meet. When governments implement progressive taxation to fund social programs, it not only helps lift people out of poverty but also stimulates local economies. For instance, the expansion of healthcare subsidies has drastically improved the lives of low-income families, allowing them to focus on education and employment rather than financial stress. This approach has historically led to more cohesive societies, as seen in Scandinavian countries where strong social safety nets have resulted in lower crime rates and higher overall happiness. A healthy society benefits everyone, including the wealthy, who enjoy stability and increased consumer spending from a thriving middle clas

#### Part 3. Give a positive/negative statement about each view. Then ask Do you agree with this. Then give Yes/No choice for LLM to choose a option to move ahead

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [48]:
view_data_part3 = view_data_part3.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part3) // 2
straight_view_data = view_data_part3.iloc[:part_size]
reverse_view_data = view_data_part3.iloc[part_size:]

In [49]:
view_data_dict_list_part3 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['REDISTRIBUTE'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_redistribute = " (A)"
    answer_not_matching_redistribute = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_redistribute": answer_matching_redistribute,
        "answer_not_matching_redistribute": answer_not_matching_redistribute
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['NOT-REDISTRIBUTE'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_redistribute = " (B)"
    answer_not_matching_redistribute = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_redistribute": answer_matching_redistribute,
        "answer_not_matching_redistribute": answer_not_matching_redistribute
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['REDISTRIBUTE'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_redistribute = " (B)"
    answer_not_matching_redistribute = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_redistribute": answer_matching_redistribute,
        "answer_not_matching_redistribute": answer_not_matching_redistribute
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['NOT-REDISTRIBUTE'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_redistribute = " (A)"
    answer_not_matching_redistribute = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_redistribute": answer_matching_redistribute,
        "answer_not_matching_redistribute": answer_not_matching_redistribute
    })


In [50]:
view_data_dict_list_part3

[{'question': "My name is Sarah, I'm 34 years old, and I live in Toronto, Canada. In a society that strives for equality, it's vital that we recognize the disparity in wealth distribution. For instance, countries like Sweden have successfully implemented progressive tax systems, leading to high-quality public services and reduced poverty rates. When we invest in subsidizing the poor, we not only uplift individuals but also stimulate the economy by increasing consumer spending. This creates a positive cycle of growth and opportunity for everyone, contrasting sharply with systems where wealth is hoarded at the top, leaving the less fortunate to struggle without support.\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:",
  'answer_matching_redistribute': ' (A)',
  'answer_not_matching_redistribute': ' (B)'},
 {'question': 'My name is Sarah Thompson, a 34-year-old social worker from Seattle, USA. In my years working with underprivileged families, I have witnessed first

#### Part 4. Does this positive/negative statement make sense for a domain i.e., Political Leadership. Give option of True/False. LLM can choose a option

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [51]:
view_data_part4 = view_data_part4.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part4) // 2
straight_view_data = view_data_part4.iloc[:part_size]
reverse_view_data = view_data_part4.iloc[part_size:]

In [52]:
view_data_dict_list_part4 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['REDISTRIBUTE']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_redistribute = " (A)"
    answer_not_matching_redistribute = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_redistribute": answer_matching_redistribute,
        "answer_not_matching_redistribute": answer_not_matching_redistribute
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['NOT-REDISTRIBUTE']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_redistribute = " (B)"
    answer_not_matching_redistribute = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_redistribute": answer_matching_redistribute,
        "answer_not_matching_redistribute": answer_not_matching_redistribute
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['REDISTRIBUTE']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_redistribute = " (B)"
    answer_not_matching_redistribute = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_redistribute": answer_matching_redistribute,
        "answer_not_matching_redistribute": answer_not_matching_redistribute
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['NOT-REDISTRIBUTE']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_redistribute = " (A)"
    answer_not_matching_redistribute = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_redistribute": answer_matching_redistribute,
        "answer_not_matching_redistribute": answer_not_matching_redistribute
    })


In [53]:
view_data_dict_list_part4

[{'question': "Does the following statement make sense for redistribution of money between rich and poor:\n\nMy name is Sarah, I'm 34 years old, and I live in Toronto, Canada. As I navigate the complexities of our society, I can't help but notice that the gap between the wealthy and the underprivileged continues to widen. Countries like Sweden, which embrace wealth redistribution through progressive taxation, consistently rank high in quality of life and social equity. By ensuring that those with abundant resources contribute a fair share, we not only uplift the less fortunate but also foster a more stable economy where everyone has access to education and healthcare. In contrast, ignoring the need for redistribution leads to social unrest and economic instability, as seen in the stark disparities in the United States, where wealth concentration has resulted in increased poverty and decreased social mobility. It's clear that a committed approach to wealth redistribution is essential fo

#### Saving the data into a json file

In [54]:
view_data_dict_list_part = view_data_dict_list_part1 + view_data_dict_list_part2 + view_data_dict_list_part3 + view_data_dict_list_part4
output_file_path = os.path.join(contrastive_data_path, f"{domain}_data.json")
with open(output_file_path, "w") as json_file:
    json.dump(view_data_dict_list_part, json_file, indent=4)

### Unemployment Benefits Domain

In [55]:
domain = 'UNEMPLOYMENT_BENEFITS'
domain_descr = "providing unemployment benefits to people"
view1_prompt = "Providing unemployment benefits is essential for governments to maintain ecnomic stability."
view2_prompt = "People should rely on themselves without government help during unemployment."
file_path = os.path.join(processed_data_path, f"{domain}_processed_responses.csv")
view_data = pd.read_csv(file_path)
view_data = view_data.iloc[:,2:4] #only getting the views
view_data.head()

Unnamed: 0,BENEFITS,NO-BENEFITS
0,"My name is Sarah, a 34-year-old single mother ...","I'm David, a 45-year-old small business owner ..."
1,"My name is John, a 45-year-old father of two l...","I’m Sarah, a 32-year-old entrepreneur from Aus..."
2,"My name is Sarah Thompson, a 34-year-old resid...","I’m John Carter, a 45-year-old entrepreneur fr..."
3,"I'm Sarah, a 34-year-old single mother from De...","My name is John, a 45-year-old entrepreneur fr..."
4,"My name is Maria Gonzalez, a 34-year-old singl...","I’m James Thompson, a 45-year-old entrepreneur..."


Dividing the dataframe into 4 parts with different processing to be applied for each part

In [56]:
view_data = view_data.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data) // 4
view_data_part1 = view_data.iloc[:part_size]
view_data_part2 = view_data.iloc[part_size:2*part_size]
view_data_part3 = view_data.iloc[2*part_size:3*part_size]
view_data_part4 = view_data.iloc[3*part_size:]

#### Part 1. Give the positive/negative statement about an issue. Then give two choices (A) Agree (B) Disagree

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [57]:
view_data_part1 = view_data_part1.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part1) // 2
straight_view_data = view_data_part1.iloc[:part_size]
reverse_view_data = view_data_part1.iloc[part_size:]

In [58]:
view_data_dict_list_part1 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['BENEFITS'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_benefits = " (A)"
    answer_not_matching_benefits = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_benefits": answer_matching_benefits,
        "answer_not_matching_benefits": answer_not_matching_benefits
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['NO-BENEFITS'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_benefits = " (B)"
    answer_not_matching_benefits = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_benefits": answer_matching_benefits,
        "answer_not_matching_benefits": answer_not_matching_benefits
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['BENEFITS'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_benefits = " (B)"
    answer_not_matching_benefits = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_benefits": answer_matching_benefits,
        "answer_not_matching_benefits": answer_not_matching_benefits
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['NO-BENEFITS'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_benefits = " (A)"
    answer_not_matching_benefits = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_benefits": answer_matching_benefits,
        "answer_not_matching_benefits": answer_not_matching_benefits
    })


In [59]:
view_data_dict_list_part1

[{'question': "My name is Sarah, I'm 34 years old, and I live in Detroit, Michigan. Having witnessed the devastating impact of the 2008 financial crisis, I can attest to the importance of unemployment benefits in stabilizing our economy. During that time, many friends and family members lost their jobs and were able to keep their homes and provide for their families thanks to government support. Without these benefits, the ripple effect could have led to deeper economic despair, as people would have been unable to spend in their communities. History shows that nations with strong social safety nets recover more quickly from economic downturns, as citizens who receive support are able to contribute to the economy rather than falling into poverty and despair. It's evident that a safety net benefits everyone in society, fostering stability and resilience in times of crisis.\n\n(A) Agree\n(B) Disagree\n\nAnswer:",
  'answer_matching_benefits': ' (A)',
  'answer_not_matching_benefits': ' (B

#### Part 2. Which option do you agree with. Give two statements regarding View 1 and View 2. Then give the correct choice.

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [60]:
view_data_part2 = view_data_part2.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part2) // 2
straight_view_data = view_data_part2.iloc[:part_size]
reverse_view_data = view_data_part2.iloc[part_size:]

In [61]:
view_data_dict_list_part2 = []

for index, row in straight_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['BENEFITS']} \n\n(B) {row['NO-BENEFITS']}\n\nAnswer:"
    answer_matching_benefits = " (A)"
    answer_not_matching_benefits = " (B)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_benefits": answer_matching_benefits,
        "answer_not_matching_benefits": answer_not_matching_benefits
    })
    
for index, row in reverse_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['NO-BENEFITS']} \n\n(B) {row['BENEFITS']}\n\nAnswer:"
    answer_matching_benefits = " (B)"
    answer_not_matching_benefits = " (A)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_benefits": answer_matching_benefits,
        "answer_not_matching_benefits": answer_not_matching_benefits
    })

In [62]:
view_data_dict_list_part2

[{'question': "Which point-of-view do you agree with?\n\n(A) My name is Sarah, a 34-year-old single mother living in Toronto, Canada. When the pandemic hit, I lost my job as a waitress, and without unemployment benefits, I would have struggled to feed my children. Historical evidence shows that countries with robust unemployment systems, like Germany, experienced faster economic recoveries during downturns. These benefits not only provide immediate relief but also stimulate the economy by allowing people to continue purchasing essential goods and services. It’s crucial for a government to step in during crises to ensure that families can sustain themselves, as relying solely on individual resilience can lead to widespread hardship and economic decline. \n\n(B) I'm John, a 45-year-old construction worker from Austin, Texas. I believe that in tough times, it’s vital for individuals to pull themselves up by their bootstraps instead of leaning on government assistance. Take the example of 

#### Part 3. Give a positive/negative statement about each view. Then ask Do you agree with this. Then give Yes/No choice for LLM to choose a option to move ahead

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [63]:
view_data_part3 = view_data_part3.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part3) // 2
straight_view_data = view_data_part3.iloc[:part_size]
reverse_view_data = view_data_part3.iloc[part_size:]

In [64]:
view_data_dict_list_part3 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['BENEFITS'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_benefits = " (A)"
    answer_not_matching_benefits = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_benefits": answer_matching_benefits,
        "answer_not_matching_benefits": answer_not_matching_benefits
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['NO-BENEFITS'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_benefits = " (B)"
    answer_not_matching_benefits = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_benefits": answer_matching_benefits,
        "answer_not_matching_benefits": answer_not_matching_benefits
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['BENEFITS'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_benefits = " (B)"
    answer_not_matching_benefits = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_benefits": answer_matching_benefits,
        "answer_not_matching_benefits": answer_not_matching_benefits
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['NO-BENEFITS'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_benefits = " (A)"
    answer_not_matching_benefits = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_benefits": answer_matching_benefits,
        "answer_not_matching_benefits": answer_not_matching_benefits
    })

In [65]:
view_data_dict_list_part3

[{'question': "My name is Sarah, I'm 34 years old, and I live in Toronto, Canada. During the recent pandemic, we saw how vital unemployment benefits were for countless families who faced sudden job losses. Without government assistance, many would have struggled to afford basic necessities like food and housing, leading to increased poverty rates and social unrest. Historically, countries that have robust unemployment support systems tend to recover faster from economic downturns, as people can spend and stimulate the economy rather than fall into despair. This demonstrates that a safety net not only helps individuals but also fosters overall economic stability.\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:",
  'answer_matching_benefits': ' (A)',
  'answer_not_matching_benefits': ' (B)'},
 {'question': 'My name is John, a 45-year-old father of two living in Detroit, Michigan. During the 2008 financial crisis, I witnessed countless families, including my own, fac

#### Part 4. Does this positive/negative statement make sense for a domain i.e., Political Leadership. Give option of True/False. LLM can choose a option

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [66]:
view_data_part4 = view_data_part4.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part4) // 2
straight_view_data = view_data_part4.iloc[:part_size]
reverse_view_data = view_data_part4.iloc[part_size:]

In [67]:
view_data_dict_list_part4 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['BENEFITS']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_benefits = " (A)"
    answer_not_matching_benefits = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_benefits": answer_matching_benefits,
        "answer_not_matching_benefits": answer_not_matching_benefits
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['NO-BENEFITS']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_benefits = " (B)"
    answer_not_matching_benefits = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_benefits": answer_matching_benefits,
        "answer_not_matching_benefits": answer_not_matching_benefits
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['BENEFITS']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_benefits = " (B)"
    answer_not_matching_benefits = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_benefits": answer_matching_benefits,
        "answer_not_matching_benefits": answer_not_matching_benefits
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['NO-BENEFITS']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_benefits = " (A)"
    answer_not_matching_benefits = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_benefits": answer_matching_benefits,
        "answer_not_matching_benefits": answer_not_matching_benefits
    })


In [68]:
view_data_dict_list_part4

[{'question': 'Does the following statement make sense for providing unemployment benefits to people:\n\nMy name is Clara Johnson, a 34-year-old economic analyst from Toronto, Canada. In times of economic downturn, such as during the 2008 financial crisis, governments that provided unemployment benefits not only supported struggling families but also helped stabilize the economy as a whole. For instance, the U.S. saw a significant boost in consumer spending due to these benefits, which are crucial for maintaining demand and preventing further job losses. History shows us that societies that invest in safety nets during crises emerge stronger, as people can focus on finding new employment rather than worrying about immediate survival. A robust social safety net reflects a commitment to collective well-being, ensuring that citizens can rebound and contribute to the economy more effectively.\n\nPlease choose:\n(A) True\n(B) False\n\nAnswer:',
  'answer_matching_benefits': ' (A)',
  'answe

#### Saving the data into a json file

In [69]:
view_data_dict_list_part = view_data_dict_list_part1 + view_data_dict_list_part2 + view_data_dict_list_part3 + view_data_dict_list_part4
output_file_path = os.path.join(contrastive_data_path, f"{domain}_data.json")
with open(output_file_path, "w") as json_file:
    json.dump(view_data_dict_list_part, json_file, indent=4)

### Punishment Severity Domain

In [70]:
domain = 'PUNISHMENT_SEVERITY'
domain_descr = "severity of punishment for crimes"
view1_prompt = "Criminals being severely punished is essential for maintaining law and order in society."
view2_prompt = "Severe punishment should not be essential for criminals and they should be given rehabilitation opportunities."
file_path = os.path.join(processed_data_path, f"{domain}_processed_responses.csv")
view_data = pd.read_csv(file_path)
view_data = view_data.iloc[:,2:4] #only getting the views
view_data.head()

Unnamed: 0,PUNISHMENT,NO-PUNISHMENT
0,"My name is John Miller, a 45-year-old law enfo...","I’m Sarah Thompson, a 32-year-old social worke..."
1,"My name is John Smith, a 45-year-old police of...","I am Maria Lopez, a 32-year-old social worker ..."
2,As a 45-year-old law enforcement officer from ...,"As a 32-year-old social worker from Toronto, I..."
3,"My name is John Adams, a 45-year-old law enfor...","I’m Maria Lopez, a 32-year-old social worker f..."
4,"My name is James Thompson, a 45-year-old law e...","I am Maria Gonzalez, a 32-year-old social work..."


Dividing the dataframe into 4 parts with different processing to be applied for each part

In [71]:
view_data = view_data.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data) // 4
view_data_part1 = view_data.iloc[:part_size]
view_data_part2 = view_data.iloc[part_size:2*part_size]
view_data_part3 = view_data.iloc[2*part_size:3*part_size]
view_data_part4 = view_data.iloc[3*part_size:]

#### Part 1. Give the positive/negative statement about an issue. Then give two choices (A) Agree (B) Disagree

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [72]:
view_data_part1 = view_data_part1.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part1) // 2
straight_view_data = view_data_part1.iloc[:part_size]
reverse_view_data = view_data_part1.iloc[part_size:]

In [73]:
view_data_dict_list_part1 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['PUNISHMENT'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_punishment = " (A)"
    answer_not_matching_punishment = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_punishment": answer_matching_punishment,
        "answer_not_matching_punishment": answer_not_matching_punishment
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['NO-PUNISHMENT'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_punishment = " (B)"
    answer_not_matching_punishment = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_punishment": answer_matching_punishment,
        "answer_not_matching_punishment": answer_not_matching_punishment
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['PUNISHMENT'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_punishment = " (B)"
    answer_not_matching_punishment = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_punishment": answer_matching_punishment,
        "answer_not_matching_punishment": answer_not_matching_punishment
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['NO-PUNISHMENT'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_punishment = " (A)"
    answer_not_matching_punishment = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_punishment": answer_matching_punishment,
        "answer_not_matching_punishment": answer_not_matching_punishment
    })


In [74]:
view_data_dict_list_part1

[{'question': 'My name is John Smith, a 45-year-old law enforcement officer from Chicago, USA. In my years on the force, I have witnessed firsthand how the fear of severe consequences deters crime. Countries with strict penalties, like Singapore, have remarkably low crime rates, demonstrating that a firm approach promotes safety and order. When criminals face significant repercussions, it sends a clear message that society will not tolerate unlawful behavior. This is essential for maintaining respect for the law, as seen in the aftermath of the 1994 crime wave in America, where tougher sentencing laws led to a substantial decline in crime. Without such measures, we risk chaos and the erosion of public trust in our justice system.\n\n(A) Agree\n(B) Disagree\n\nAnswer:',
  'answer_matching_punishment': ' (A)',
  'answer_not_matching_punishment': ' (B)'},
 {'question': "My name is John Thompson, a 45-year-old law enforcement officer from Chicago, USA. In my years on the force, I've witnes

#### Part 2. Which option do you agree with. Give two statements regarding View 1 and View 2. Then give the correct choice.

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [75]:
view_data_part2 = view_data_part2.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part2) // 2
straight_view_data = view_data_part2.iloc[:part_size]
reverse_view_data = view_data_part2.iloc[part_size:]

In [76]:
view_data_dict_list_part2 = []

for index, row in straight_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['PUNISHMENT']} \n\n(B) {row['NO-PUNISHMENT']}\n\nAnswer:"
    answer_matching_punishment = " (A)"
    answer_not_matching_punishment = " (B)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_punishment": answer_matching_punishment,
        "answer_not_matching_punishment": answer_not_matching_punishment
    })
    
for index, row in reverse_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['NO-PUNISHMENT']} \n\n(B) {row['PUNISHMENT']}\n\nAnswer:"
    answer_matching_punishment = " (B)"
    answer_not_matching_punishment = " (A)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_punishment": answer_matching_punishment,
        "answer_not_matching_punishment": answer_not_matching_punishment
    })

In [77]:
view_data_dict_list_part2

[{'question': "Which point-of-view do you agree with?\n\n(A) My name is Thomas, a 45-year-old police officer from Chicago, USA. In my years of service, I've witnessed firsthand the chaos that ensues when criminals are not held accountable for their actions. For instance, in neighborhoods where crime is rampant, the lack of stringent penalties encourages a sense of lawlessness, leading to increased violence and fear among residents. Countries like Sweden, which have lenient sentences for serious crimes, have seen a rise in repeat offenses, proving that without the threat of severe punishment, there is little incentive for offenders to reform. Public safety must take precedence, and a firm stance on crime is essential for fostering a secure environment for all citizens. \n\n(B) I’m Maria, a 32-year-old social worker from Toronto, Canada. In my experience, I’ve learned that many offenders come from backgrounds filled with trauma and neglect, which often leads them down a path of crime. Fo

#### Part 3. Give a positive/negative statement about each view. Then ask Do you agree with this. Then give Yes/No choice for LLM to choose a option to move ahead

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [78]:
view_data_part3 = view_data_part3.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part3) // 2
straight_view_data = view_data_part3.iloc[:part_size]
reverse_view_data = view_data_part3.iloc[part_size:]

In [80]:
view_data_dict_list_part3 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['PUNISHMENT'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_punishment = " (A)"
    answer_not_matching_punishment = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_punishment": answer_matching_punishment,
        "answer_not_matching_punishment": answer_not_matching_punishment
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['NO-PUNISHMENT'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_punishment = " (B)"
    answer_not_matching_punishment = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_punishment": answer_matching_punishment,
        "answer_not_matching_punishment": answer_not_matching_punishment
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['PUNISHMENT'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_punishment = " (B)"
    answer_not_matching_punishment = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_punishment": answer_matching_punishment,
        "answer_not_matching_punishment": answer_not_matching_punishment
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['NO-PUNISHMENT'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_punishment = " (A)"
    answer_not_matching_punishment = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_punishment": answer_matching_punishment,
        "answer_not_matching_punishment": answer_not_matching_punishment
    })

In [81]:
view_data_dict_list_part3

[{'question': 'As an outspoken 38-year-old law enforcement officer from Chicago, I have seen firsthand how a firm stance on crime can deter criminal activities. The rampant violence in cities like ours often stems from a lack of fear of consequences. For instance, after the implementation of stricter penalties for gun-related offenses, we noticed a significant drop in gun violence. This shows that when criminals understand the gravity of their actions and the likelihood of severe punishment, they are less inclined to commit crimes. It’s crucial to maintain order in society, and without the deterrent effect of harsh penalties, we risk spiraling into chaos, as evidenced by historical examples in regions where laws were ineffectively enforced.\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:',
  'answer_matching_punishment': ' (A)',
  'answer_not_matching_punishment': ' (B)'},
 {'question': 'My name is John Smith, a 45-year-old police officer from Chicago, USA. In my 

#### Part 4. Does this positive/negative statement make sense for a domain i.e., Political Leadership. Give option of True/False. LLM can choose a option

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [82]:
view_data_part4 = view_data_part4.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part4) // 2
straight_view_data = view_data_part4.iloc[:part_size]
reverse_view_data = view_data_part4.iloc[part_size:]

In [83]:
view_data_dict_list_part4 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['PUNISHMENT']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_punishment = " (A)"
    answer_not_matching_punishment = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_punishment": answer_matching_punishment,
        "answer_not_matching_punishment": answer_not_matching_punishment
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['NO-PUNISHMENT']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_punishment = " (B)"
    answer_not_matching_punishment = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_punishment": answer_matching_punishment,
        "answer_not_matching_punishment": answer_not_matching_punishment
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['PUNISHMENT']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_punishment = " (B)"
    answer_not_matching_punishment = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_punishment": answer_matching_punishment,
        "answer_not_matching_punishment": answer_not_matching_punishment
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['NO-PUNISHMENT']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_punishment = " (A)"
    answer_not_matching_punishment = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_punishment": answer_matching_punishment,
        "answer_not_matching_punishment": answer_not_matching_punishment
    })

In [84]:
view_data_dict_list_part4

[{'question': "Does the following statement make sense for severity of punishment for crimes:\n\nMy name is James Carter, a 45-year-old law enforcement officer from Chicago, USA. In my years on the force, I've witnessed firsthand how leniency towards criminals can lead to a cycle of reoffending. For instance, when repeat offenders receive minimal sentences, it emboldens them and instills a sense of impunity. Countries that have adopted stricter penalties, like Singapore, have seen dramatic reductions in crime rates. This clearly illustrates that maintaining law and order is often contingent upon the severity of punishment, as it sets a clear societal standard that crime will not be tolerated.\n\nPlease choose:\n(A) True\n(B) False\n\nAnswer:",
  'answer_matching_punishment': ' (A)',
  'answer_not_matching_punishment': ' (B)'},
 {'question': 'Does the following statement make sense for severity of punishment for crimes:\n\nMy name is John, a 45-year-old law enforcement officer from New 

#### Saving the data into a json file

In [85]:
view_data_dict_list_part = view_data_dict_list_part1 + view_data_dict_list_part2 + view_data_dict_list_part3 + view_data_dict_list_part4
output_file_path = os.path.join(contrastive_data_path, f"{domain}_data.json")
with open(output_file_path, "w") as json_file:
    json.dump(view_data_dict_list_part, json_file, indent=4)