### Contrastive Pairs for Dataset

This code will take two opposing views for each domain generated using GPT. Then form the contrastive pairs using the following strategy

1. Give the positive/negative statement about an issue. Then give two choices (A) Agree (B) Disagree
2. Which option do you agree with. Give two statements regarding View 1 and View 2. Then give the correct choice.
3. Give a positive/negative statement about each view. Then ask Do you agree with this. Then give Yes/No choice for LLM to choose a option to move ahead
4. Does this positive/negative statement make sense for a domain i.e., Political Leadership. Give option of True/False. LLM can choose a option

Randomize

(i) Which alternative is presented first
(ii) Label Format (Yes/No, True/False, A/B. 1/2, symbols)
(iii) Surface verbalizers t counter token/position biases

Languages:
EN (English), ES (Spanish), ZH (Mandarin), AR (Arabic), HI (Hindi)

In [2]:
import os
import pandas as pd
from tqdm import tqdm
import json

Making directory for contrastive pairs data

In [3]:
contrastive_data_path = os.path.join("data","contrastive_pairs")
os.makedirs(contrastive_data_path, exist_ok=True)

Reading the processed directory path

In [4]:
processed_data_path = os.path.join("data", "gpt_responses_processed")
processed_data_path_list = os.listdir(processed_data_path)
processed_data_path_list

['POLITICAL_LEADERSHIP_EQUALITY_processed_responses.csv',
 'UNEMPLOYMENT_BENEFITS_processed_responses.csv',
 'GENDER_EQUALITY_EMPLOYMENT_processed_responses.csv',
 'PUNISHMENT_SEVERITY_processed_responses.csv',
 'REDISTRIBUTION_MONEY_processed_responses.csv',
 'MEDIA_FREEDOM_processed_responses.csv']

### Political Leadership Equality Domain

In [4]:
domain = 'POLITICAL_LEADERSHIP_EQUALITY'
domain_descr = "equality in political leadership"
view1_prompt = "Men make better political leaders than women do."
view2_prompt = "Women should have the same chance of being elected as men."
file_path = os.path.join(processed_data_path, f"{domain}_processed_responses.csv")
view_data = pd.read_csv(file_path)
view_data = view_data.iloc[:,2:4] #only getting the views
view_data.head()

Unnamed: 0,INEQUALITY_PROMPT,EQUALITY_PROMPT
0,As a 45-year-old businessman from New York Cit...,"As a 30-year-old activist from Toronto, I firm..."
1,"As a 45-year-old businessman from London, I of...",As a 30-year-old community organizer from Toro...
2,"My name is John Thompson, a 45-year-old busine...","I’m Maria Lopez, a 32-year-old community organ..."
3,"My name is John Smith, a 45-year-old businessm...","I'm Maria Lopez, a 32-year-old community activ..."
4,"As a 45-year-old businessman from New York, I ...",As a 32-year-old political activist from Toron...


Dividing the dataframe into 4 parts with different processing to be applied for each part

In [5]:
view_data = view_data.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data) // 4
view_data_part1 = view_data.iloc[:part_size]
view_data_part2 = view_data.iloc[part_size:2*part_size]
view_data_part3 = view_data.iloc[2*part_size:3*part_size]
view_data_part4 = view_data.iloc[3*part_size:]

#### Part 1. Give the positive/negative statement about an issue. Then give two choices (A) Agree (B) Disagree

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [6]:
view_data_part1 = view_data_part1.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part1) // 2
straight_view_data = view_data_part1.iloc[:part_size]
reverse_view_data = view_data_part1.iloc[part_size:]

In [7]:
view_data_dict_list_part1 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })


In [8]:
view_data_dict_list_part1

[{'question': "As a 45-year-old businessman from London, I have observed that many successful political leaders, such as Winston Churchill and Franklin D. Roosevelt, have historically been men. Their decisive actions during critical times reveal a pattern that suggests certain inherent leadership qualities often found in men can be beneficial in high-stakes political environments. While women have certainly made strides in leadership, the consistent results delivered by male leaders in times of crisis point to a trend that cannot be overlooked. This isn't to undermine the capabilities of women, but rather to acknowledge the historical context in which male leaders have often thrived and shaped their nations.\n\n(A) Agree\n(B) Disagree\n\nAnswer:",
  'answer_matching_inequality': ' (A)',
  'answer_not_matching_inequality': ' (B)'},
 {'question': "My name is John Thompson, a 45-year-old political analyst from Birmingham, UK. In my years of observing political landscapes globally, it's ev

#### Part 2. Which option do you agree with. Give two statements regarding View 1 and View 2. Then give the correct choice.

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [9]:
view_data_part2 = view_data_part2.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part2) // 2
straight_view_data = view_data_part2.iloc[:part_size]
reverse_view_data = view_data_part2.iloc[part_size:]

In [10]:
view_data_dict_list_part2 = []

for index, row in straight_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['INEQUALITY_PROMPT']} \n\n(B) {row['EQUALITY_PROMPT']}\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })
    
for index, row in reverse_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['EQUALITY_PROMPT']} \n\n(B) {row['INEQUALITY_PROMPT']}\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

In [11]:
view_data_dict_list_part2

[{'question': 'Which point-of-view do you agree with?\n\n(A) My name is John Thompson, a 45-year-old businessman from Houston, Texas. Throughout history, we have seen numerous examples where male leaders have successfully navigated crises, such as Winston Churchill during World War II, whose decisive actions and speeches rallied a nation in peril. This historical precedent suggests that men often possess innate qualities that enable them to handle high-pressure situations effectively. While I recognize that women can and do lead, the vast majority of successful political leaders have been men, demonstrating a pattern that supports the notion that men may be better suited for such roles. \n\n(B) I’m Sarah Martinez, a 32-year-old community organizer from Madrid, Spain. The remarkable leadership of women like Jacinda Ardern, the former Prime Minister of New Zealand, exemplifies the potential that women bring to political positions. Under her leadership, New Zealand witnessed a compassiona

#### Part 3. Give a positive/negative statement about each view. Then ask Do you agree with this. Then give Yes/No choice for LLM to choose a option to move ahead

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [12]:
view_data_part3 = view_data_part3.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part3) // 2
straight_view_data = view_data_part3.iloc[:part_size]
reverse_view_data = view_data_part3.iloc[part_size:]

In [13]:
view_data_dict_list_part3 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })


In [14]:
view_data_dict_list_part3

[{'question': "My name is Charles, a 45-year-old businessman from Dallas, Texas. In my years of observing political landscapes both here and abroad, it's evident that men, with their historical dominance in leadership roles, often bring a level of decisiveness and strategic thinking that has proven effective in crisis situations. Take, for instance, Winston Churchill during World War II; his leadership was pivotal in rallying a nation during its darkest hours. While women have made strides in politics, history suggests that those in power have traditionally been men, often resulting in effective governance. This precedent raises questions about the inherent qualities required for leadership in times of uncertainty.\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:",
  'answer_matching_inequality': ' (A)',
  'answer_not_matching_inequality': ' (B)'},
 {'question': 'As a 45-year-old businessman from London, I often observe that men have historically held positions of 

#### Part 4. Does this positive/negative statement make sense for a domain i.e., Political Leadership. Give option of True/False. LLM can choose a option

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [15]:
view_data_part4 = view_data_part4.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part4) // 2
straight_view_data = view_data_part4.iloc[:part_size]
reverse_view_data = view_data_part4.iloc[part_size:]

In [16]:
view_data_dict_list_part4 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['INEQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['EQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['INEQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['EQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })


In [17]:
view_data_dict_list_part4

[{'question': 'Does the following statement make sense for equality in political leadership:\n\nMy name is John Anderson, a 45-year-old political analyst from Chicago, USA. Throughout history, we have seen that many of the most influential leaders, such as Winston Churchill and Franklin D. Roosevelt, have been men who navigated their nations through crises with strong, decisive action. Their leadership styles, often characterized by assertiveness and strategic thinking, have proven effective in times of war and economic turmoil. While there are certainly capable women in politics, the historical context suggests that men have traditionally held the reins of power and have often excelled in high-stakes environments, reflecting a pattern that is hard to overlook. This historical precedent raises questions about the inherent qualities required for effective leadership, suggesting that the attributes often associated with male leaders may be more conducive to political success.\n\nPlease c

#### Saving the data into a json file

In [18]:
view_data_dict_list_part = view_data_dict_list_part1 + view_data_dict_list_part2 + view_data_dict_list_part3 + view_data_dict_list_part4
output_file_path = os.path.join(contrastive_data_path, f"{domain}_data.json")
with open(output_file_path, "w") as json_file:
    json.dump(view_data_dict_list_part, json_file, indent=4)

In [5]:
domain = 'GENDER_EQUALITY_EMPLOYMENT'
domain_descr = "gender equality in employment"
view1_prompt = "When jobs are scarce, men should have more right to a job than women."
view2_prompt = "Women and men should have equal right to a job."
file_path = os.path.join(processed_data_path, f"{domain}_processed_responses.csv")
view_data = pd.read_csv(file_path)
view_data = view_data.iloc[:,2:4] #only getting the views
view_data.head()

Unnamed: 0,INEQUALITY_PROMPT,EQUALITY_PROMPT
0,"My name is John, I’m 45 years old, and I live ...","I'm Sarah, a 32-year-old professional from Tor..."
1,"As a 45-year-old factory manager from Detroit,...",As a 30-year-old HR consultant living in Londo...
2,"My name is John, 45, and I live in Detroit, US...","I’m Sarah, 32, from Toronto, Canada. In today'..."
3,"My name is John, a 45-year-old factory manager...","I’m Sarah, a 32-year-old human resources profe..."
4,"My name is John, a 45-year-old factory manager...","I'm Sarah, a 32-year-old software engineer liv..."


### Gender Equality Employment Domain

In [None]:
domain = 'GENDER_EQUALITY_EMPLOYMENT'
domain_descr = "gender equality in employment"
view1_prompt = "When jobs are scarce, men should have more right to a job than women."
view2_prompt = "Women and men should have equal right to a job."
file_path = os.path.join(processed_data_path, f"{domain}_processed_responses.csv")
view_data = pd.read_csv(file_path)
view_data = view_data.iloc[:,2:4] #only getting the views
view_data.head()

Unnamed: 0,INEQUALITY_PROMPT,EQUALITY_PROMPT
0,"My name is John, I’m 45 years old, and I live ...","I'm Sarah, a 32-year-old professional from Tor..."
1,"As a 45-year-old factory manager from Detroit,...",As a 30-year-old HR consultant living in Londo...
2,"My name is John, 45, and I live in Detroit, US...","I’m Sarah, 32, from Toronto, Canada. In today'..."
3,"My name is John, a 45-year-old factory manager...","I’m Sarah, a 32-year-old human resources profe..."
4,"My name is John, a 45-year-old factory manager...","I'm Sarah, a 32-year-old software engineer liv..."


Dividing the dataframe into 4 parts with different processing to be applied for each part

In [6]:
view_data = view_data.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data) // 4
view_data_part1 = view_data.iloc[:part_size]
view_data_part2 = view_data.iloc[part_size:2*part_size]
view_data_part3 = view_data.iloc[2*part_size:3*part_size]
view_data_part4 = view_data.iloc[3*part_size:]

#### Part 1. Give the positive/negative statement about an issue. Then give two choices (A) Agree (B) Disagree

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [7]:
view_data_part1 = view_data_part1.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part1) // 2
straight_view_data = view_data_part1.iloc[:part_size]
reverse_view_data = view_data_part1.iloc[part_size:]

In [8]:
view_data_dict_list_part1 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })


In [9]:
view_data_dict_list_part1

[{'question': "My name is John, a 45-year-old factory manager from Detroit, USA. In tough economic times, it's crucial to prioritize those who have traditionally been the primary breadwinners in families, which historically has been men. During the 2008 financial crisis, many households faced severe hardships, and studies showed that male unemployment rates spiked, resulting in increased stress on families. By giving men a slight advantage in job opportunities, we can ensure stability for families that often still rely on traditional roles. This not only supports the economic framework but also acknowledges the realities of family dynamics across various cultures.\n\n(A) Agree\n(B) Disagree\n\nAnswer:",
  'answer_matching_inequality': ' (A)',
  'answer_not_matching_inequality': ' (B)'},
 {'question': "My name is John, a 45-year-old construction manager from Detroit, USA. In tough economic times, it's essential to consider the traditional roles that have sustained families and communiti

#### Part 2. Which option do you agree with. Give two statements regarding View 1 and View 2. Then give the correct choice.

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [10]:
view_data_part2 = view_data_part2.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part2) // 2
straight_view_data = view_data_part2.iloc[:part_size]
reverse_view_data = view_data_part2.iloc[part_size:]

In [11]:
view_data_dict_list_part2 = []

for index, row in straight_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['INEQUALITY_PROMPT']} \n\n(B) {row['EQUALITY_PROMPT']}\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })
    
for index, row in reverse_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['EQUALITY_PROMPT']} \n\n(B) {row['INEQUALITY_PROMPT']}\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

In [12]:
view_data_dict_list_part2

[{'question': "Which point-of-view do you agree with?\n\n(A) My name is John, a 45-year-old construction manager from Detroit, USA. In times of economic downturn, it's essential to prioritize job placements for those who have historically been the primary breadwinners, which often means men. For instance, during the 2008 recession, industries primarily employing men, such as manufacturing and construction, were hit hard, leading to significant job losses. Many families depend on these jobs for their survival. It is crucial to consider the long-standing societal roles and responsibilities that men have traditionally held, and in times of scarcity, our focus should be on restoring stability to those roles first before expanding opportunities to others. \n\n(B) I’m Sarah, a 32-year-old HR consultant living in Toronto, Canada. The belief that job opportunities should favor one gender over another seems outdated, especially considering the strides we've made in gender equality. Take the tec

#### Part 3. Give a positive/negative statement about each view. Then ask Do you agree with this. Then give Yes/No choice for LLM to choose a option to move ahead

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [13]:
view_data_part3 = view_data_part3.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part3) // 2
straight_view_data = view_data_part3.iloc[:part_size]
reverse_view_data = view_data_part3.iloc[part_size:]

In [14]:
view_data_dict_list_part3 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['INEQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['EQUALITY_PROMPT'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })


In [15]:
view_data_dict_list_part3

[{'question': 'As a 45-year-old factory manager from Detroit, I often see the harsh realities of economic downturns. In our community, when jobs become limited, it’s crucial for families with traditional roles to prioritize stability. Historically, many households have relied on the male breadwinner model, which has proven effective in supporting entire families during tough times. For instance, during the 2008 financial crisis, it was often the male workers who were given preference for the few available positions. This approach not only ensures that children have a stable upbringing but also reflects the longstanding societal norms that have existed for generations.\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:',
  'answer_matching_inequality': ' (A)',
  'answer_not_matching_inequality': ' (B)'},
 {'question': "As a 45-year-old factory manager from Detroit, USA, I've seen firsthand the impact of economic downturns on the workforce. During the last recession, w

#### Part 4. Does this positive/negative statement make sense for a domain i.e., Political Leadership. Give option of True/False. LLM can choose a option

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [16]:
view_data_part4 = view_data_part4.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part4) // 2
straight_view_data = view_data_part4.iloc[:part_size]
reverse_view_data = view_data_part4.iloc[part_size:]

In [17]:
view_data_dict_list_part4 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['INEQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['EQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['INEQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_inequality = " (B)"
    answer_not_matching_inequality = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['EQUALITY_PROMPT']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_inequality = " (A)"
    answer_not_matching_inequality = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_inequality": answer_matching_inequality,
        "answer_not_matching_inequality": answer_not_matching_inequality
    })


In [18]:
view_data_dict_list_part4

[{'question': 'Does the following statement make sense for gender equality in employment:\n\nAs a 45-year-old factory manager from Detroit, USA, I have witnessed firsthand how economic downturns can disproportionately affect men, especially in industries where physical labor is predominant. Historically, during economic crises, men have often been the primary breadwinners, and prioritizing them in job allocation helps maintain family stability. For instance, during the 2008 financial crisis, many male-dominated sectors faced significant layoffs, leaving families in dire straits. Acknowledging the unique pressures on men in these situations can lead to more balanced decisions, ensuring that those who have traditionally been the providers are not overlooked when jobs are scarce.\n\nPlease choose:\n(A) True\n(B) False\n\nAnswer:',
  'answer_matching_inequality': ' (A)',
  'answer_not_matching_inequality': ' (B)'},
 {'question': 'Does the following statement make sense for gender equality 

#### Saving the data into a json file

In [19]:
view_data_dict_list_part = view_data_dict_list_part1 + view_data_dict_list_part2 + view_data_dict_list_part3 + view_data_dict_list_part4
output_file_path = os.path.join(contrastive_data_path, f"{domain}_data.json")
with open(output_file_path, "w") as json_file:
    json.dump(view_data_dict_list_part, json_file, indent=4)

### Media Freedom Domain

In [20]:
domain = 'MEDIA_FREEDOM'
domain_descr = "media freedom"
view1_prompt = "Media should have the right to publish without government control."
view2_prompt = "Government should be able to prevent media from publishing disapproved content."
file_path = os.path.join(processed_data_path, f"{domain}_processed_responses.csv")
view_data = pd.read_csv(file_path)
view_data = view_data.iloc[:,2:4] #only getting the views
view_data.head()

Unnamed: 0,UNCONTROLLED,CONTROLLED
0,"My name is Clara Rodriguez, a 34-year-old jour...","I am Rajesh Kumar, a 45-year-old media analyst..."
1,"My name is Sarah Thompson, a 29-year-old journ...","I am Ahmed Malik, a 45-year-old educator from ..."
2,"My name is Elena Martinez, a 34-year-old journ...","I am Rajesh Singh, a 45-year-old media analyst..."
3,"My name is Alex Thompson, a 34-year-old journa...","I'm Maria Chen, a 45-year-old policy analyst l..."
4,"As a 35-year-old journalist from New York, I o...",Being a 42-year-old policy analyst from Singap...


Dividing the dataframe into 4 parts with different processing to be applied for each part

In [21]:
view_data = view_data.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data) // 4
view_data_part1 = view_data.iloc[:part_size]
view_data_part2 = view_data.iloc[part_size:2*part_size]
view_data_part3 = view_data.iloc[2*part_size:3*part_size]
view_data_part4 = view_data.iloc[3*part_size:]

#### Part 1. Give the positive/negative statement about an issue. Then give two choices (A) Agree (B) Disagree

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [22]:
view_data_part1 = view_data_part1.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part1) // 2
straight_view_data = view_data_part1.iloc[:part_size]
reverse_view_data = view_data_part1.iloc[part_size:]

In [23]:
view_data_dict_list_part1 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['UNCONTROLLED'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_uncontrolled = " (A)"
    answer_not_matching_uncontrolled = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['CONTROLLED'] + "\n\n(A) Agree\n(B) Disagree\n\nAnswer:"
    answer_matching_uncontrolled = " (B)"
    answer_not_matching_uncontrolled = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['UNCONTROLLED'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_uncontrolled = " (B)"
    answer_not_matching_uncontrolled = " (A)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['CONTROLLED'] + "\n\n(A) Disagree\n(B) Agree\n\nAnswer:"
    answer_matching_uncontrolled = " (A)"
    answer_not_matching_uncontrolled = " (B)"
    view_data_dict_list_part1.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })


In [24]:
view_data_dict_list_part1

[{'question': "My name is Sarah Thompson, 34, and I live in Toronto, Canada. It’s crucial that we allow the media to operate without governmental oversight because history has shown us how oppressive regimes use censorship to control the narrative. Consider the case of the 2011 Arab Spring, where independent media played a vital role in spreading information and mobilizing citizens against authoritarian rule. When governments impose restrictions on what can be published, they stifle the free exchange of ideas and inhibit the public's ability to make informed decisions. In a democratic society, it is essential that we protect journalistic independence to ensure that the truth prevails, even when it challenges those in power.\n\n(A) Agree\n(B) Disagree\n\nAnswer:",
  'answer_matching_uncontrolled': ' (A)',
  'answer_not_matching_uncontrolled': ' (B)'},
 {'question': 'My name is Elena Torres, a 28-year-old journalist from Barcelona, Spain. In a world where the press is free to report with

#### Part 2. Which option do you agree with. Give two statements regarding View 1 and View 2. Then give the correct choice.

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [25]:
view_data_part2 = view_data_part2.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part2) // 2
straight_view_data = view_data_part2.iloc[:part_size]
reverse_view_data = view_data_part2.iloc[part_size:]

In [26]:
view_data_dict_list_part2 = []

for index, row in straight_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['UNCONTROLLED']} \n\n(B) {row['CONTROLLED']}\n\nAnswer:"
    answer_matching_uncontrolled = " (A)"
    answer_not_matching_uncontrolled = " (B)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })
    
for index, row in reverse_view_data.iterrows():
    question = f"Which point-of-view do you agree with?\n\n(A) {row['CONTROLLED']} \n\n(B) {row['UNCONTROLLED']}\n\nAnswer:"
    answer_matching_uncontrolled = " (B)"
    answer_not_matching_uncontrolled = " (A)"
    view_data_dict_list_part2.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

In [27]:
view_data_dict_list_part2

[{'question': "Which point-of-view do you agree with?\n\n(A) My name is Amina Khan, a 34-year-old journalist from Lahore, Pakistan. Having witnessed the courageous efforts of journalists who risk their lives to expose corruption and human rights abuses, I firmly believe that an unrestricted media plays a crucial role in a functioning democracy. For instance, during the Arab Spring, independent media outlets were pivotal in broadcasting the voices of the oppressed, revealing truths that authoritarian regimes desperately tried to suppress. Limiting media freedom only serves to empower those in power, allowing them to manipulate narratives and silence dissent, which ultimately leads to a less informed public and a more oppressive society. The history of totalitarian regimes clearly illustrates that when the government censors the media, it undermines the fundamental rights of individuals to seek and express the truth. \n\n(B) I am Marcus Chen, a 45-year-old public policy analyst living in

#### Part 3. Give a positive/negative statement about each view. Then ask Do you agree with this. Then give Yes/No choice for LLM to choose a option to move ahead

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [28]:
view_data_part3 = view_data_part3.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part3) // 2
straight_view_data = view_data_part3.iloc[:part_size]
reverse_view_data = view_data_part3.iloc[part_size:]

In [29]:
view_data_dict_list_part3 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = row['UNCONTROLLED'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_uncontrolled = " (A)"
    answer_not_matching_uncontrolled = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = row['CONTROLLED'] + "\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:"
    answer_matching_uncontrolled = " (B)"
    answer_not_matching_uncontrolled = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = row['UNCONTROLLED'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_uncontrolled = " (B)"
    answer_not_matching_uncontrolled = " (A)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = row['CONTROLLED'] + "\n\nDo you agree with the above statement:\n(A) No\n(B) Yes\n\nAnswer:"
    answer_matching_uncontrolled = " (A)"
    answer_not_matching_uncontrolled = " (B)"
    view_data_dict_list_part3.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })


In [30]:
view_data_dict_list_part3

[{'question': "My name is Sarah Thompson, I'm 34 years old, and I live in Toronto, Canada. In today's world, the ability of the media to publish freely without government oversight is essential for a healthy democracy. History shows us that when governments impose restrictions on the press, the truth is often sacrificed. For instance, the censorship faced by journalists in authoritarian regimes like North Korea leads to a population deprived of vital information. Moreover, the Watergate scandal in the U.S. highlighted how investigative journalism can uncover corruption, proving that a free media acts as a crucial check on government power. Without the right to publish freely, we risk a society where misinformation thrives, and the public remains uninformed about critical issues affecting their lives.\n\nDo you agree with the above statement:\n(A) Yes\n(B) No\n\nAnswer:",
  'answer_matching_uncontrolled': ' (A)',
  'answer_not_matching_uncontrolled': ' (B)'},
 {'question': 'My name is S

#### Part 4. Does this positive/negative statement make sense for a domain i.e., Political Leadership. Give option of True/False. LLM can choose a option

- Divide the dataset into two parts to shuffle the labels between the two views
- Make the dataset for each positive/negative statement

In [31]:
view_data_part4 = view_data_part4.sample(frac=1, random_state=42).reset_index(drop=True) #shuffle the data to remove any bias
part_size = len(view_data_part4) // 2
straight_view_data = view_data_part4.iloc[:part_size]
reverse_view_data = view_data_part4.iloc[part_size:]

In [32]:
view_data_dict_list_part4 = []

# Create data where the answer agrees the inequality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['UNCONTROLLED']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_uncontrolled = " (A)"
    answer_not_matching_uncontrolled = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer disagrees the equality view
for index, row in straight_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['CONTROLLED']}\n\n"+ "Please choose:\n(A) True\n(B) False\n\nAnswer:"
    answer_matching_uncontrolled = " (B)"
    answer_not_matching_uncontrolled = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer agrees the inequality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['UNCONTROLLED']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_uncontrolled = " (B)"
    answer_not_matching_uncontrolled = " (A)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })

# Create data where the answer disagrees the equality view
for index, row in reverse_view_data.iterrows():
    question = f"Does the following statement make sense for {domain_descr}:" + f"\n\n{row['CONTROLLED']}\n\n"+ "Please choose:\n(A) False\n(B) True\n\nAnswer:"
    answer_matching_uncontrolled = " (A)"
    answer_not_matching_uncontrolled = " (B)"
    view_data_dict_list_part4.append({
        "question": question,
        "answer_matching_uncontrolled": answer_matching_uncontrolled,
        "answer_not_matching_uncontrolled": answer_not_matching_uncontrolled
    })


In [33]:
view_data_dict_list_part4

[{'question': "Does the following statement make sense for media freedom:\n\nMy name is Sarah Thompson, a 32-year-old journalist residing in Toronto, Canada. In my years of reporting, I've witnessed the power of a free press to illuminate injustices, such as the exposure of corruption in government dealings. When media operates without governmental oversight, it fosters an environment where truth prevails and the public remains informed. Consider the Watergate scandal; it was investigative journalism that unearthed the wrongdoing, leading to significant political reform. In contrast, when governments impose restrictions on content, it often leads to a chilling effect where journalists self-censor, ultimately depriving society of the information it needs to thrive. We must remember that a vibrant democracy relies on the fearless pursuit of truth by the media, free from governmental constraints that can stifle dissent and critical discourse.\n\nPlease choose:\n(A) True\n(B) False\n\nAnsw

#### Saving the data into a json file

In [34]:
view_data_dict_list_part = view_data_dict_list_part1 + view_data_dict_list_part2 + view_data_dict_list_part3 + view_data_dict_list_part4
output_file_path = os.path.join(contrastive_data_path, f"{domain}_data.json")
with open(output_file_path, "w") as json_file:
    json.dump(view_data_dict_list_part, json_file, indent=4)