# Recreating the RW17 Dataset from Scratch

This notebook first explains the datastructure used in (Rehder and Waldmann, 2017)
and then demonstrates how to **construct the RW17 domain components** using helper functions 
from `dataset_creation`, and **convert them into a structured DataFrame**.

We will:
1. **Create a domain dictionary** using `create_domain_dict`
2. **Expand it into a DataFrame** using `expand_domain_to_dataframe`
3. **Add inference tasks** to extend the dataset
4. **Generate verbalized prompts** for human evaluation


This step will:

Load and examine rw_17_domain_components (the predefined dataset components).
Break down its structure to understand:
Domain dictionary (specification of causal variables).
Graph structure (causal relationships).
Inference tasks (reasoning scenarios).
Explain how these elements combine to generate prompts for LLMs.


In [1]:
import os
import sys
import pprint
import pandas as pd

# Ensure Python finds the `src` directory
sys.path.append(os.path.abspath("../../src"))

# Import everything defined in `__all__`
from causalalign.dataset_creation import (
    rw_17_domain_components,
    graph_structures,
    inference_tasks_rw17,
    generate_prompt_dataframe,
    expand_domain_to_dataframe,
    expand_df_by_task_queries,
    create_domain_dict,
    verbalize_domain_intro,
    verbalize_causal_mechanism,
    verbalize_inference_task,
    append_dfs,
)

print("Dataset creation module imported successfully!")


Dataset creation module imported successfully!


# 1. Understanding the RW17 Dataset Structure

Before generating new datasets, we need to understand the structure of **RW17 domain components**.

## 🔹 What Is RW17?
RW17 presented humans with causal inference tasks and asked for their likelihood judgements. Each causal inference task was presented on four subsequent screens.
In the following, I will describe how I translated the experimental materials used by RW17 into nested dictionaries, which serve as the backbone to algorithmically  generate  the materials in RW17 in *textual form* such that we can prompt and compare LLMs' causal judgements. The following notebook explains how to algoritmically re-create the the textual form used by RW17 and also, how to easily create new prompts.

RW17 used 3 different knowledge domains, in particular economy, sociology, and weather, in which the inference tasks were thematically embedded.
Each domain specifies:
- **Variables (`C1`, `C2`, `E`)**: The causal variables / graph nodes, e.g., C1: interest rates
- **Variable Sense depending on binary values 0 or 1 for (`C1`, `C2`, `E`)**: e.g. for C1=1: *high* interest rates
- ** Counterbalance-dependent Sense Assignments (`p/m`)**: How we represent conditions in counterbalanced ways (optional, but used in RW17). Essentially, this flips the senses of what it means for the variable to be on (1) or off (0).

The verbalization of the prompt depends on the domain and:
- **Causal Mechanisms**: How the variables influence each other (e.g., specified by collider, graph or chain graph)
- **Inference Tasks**: The reasoning problems we ask an LLM to solve specified by 

By combining these components, we **generate structured natural language prompts** that can be used for causal reasoning tasks in an LLMs.


##  Understanding the RW17 Domain Dictionary

Each domain in `rw_17_domain_components` contains:
1. **Domain Name & Introduction**:  
   - Explains the overall knowledge structure / cover story the inference task is embedded in.
   
2. **Causal Variables (`C1`, `C2`, `E`)**:  
   - `C1` and `C2` are **causes**, and `E` is the **effect**.
   - Each variable has:
     - A **name** and **detailed description**.
     - `p_value` and `m_value`: Counterbalanced values.
     - **Explanation mappings**: How specific conditions lead to outcomes. (optional, plus, they are graph dependent!)

3. **Example: Economy Domain**
   - `C1`: **Interest Rates** (low vs. high)
   - `C2`: **Trade Deficit** (small vs. large)
   - `E`: **Retirement Savings** (high vs. low)


#### Make sure you understand the building blocks of the dictionary. 

in ``src/causalalign/dataset_creation/constants.py``, there are dictionaries that define the domain building blocks, causal mechanism for 3 different graph topologies (collider, fork, and chain), and inference tasks. Before re-creating the prompts used in RW17, let's first load them and get a feeling for the prompt structure:




In [2]:
# load domain components
rw_17 = rw_17_domain_components


# list the domains in the dataset
print("Domains in the dataset:")
print(rw_17.keys())


# Pretty-print RW17 dataset structure

pprint.pprint(rw_17)

Domains in the dataset:
dict_keys(['economy', 'sociology', 'weather'])
{'economy': {'domain_name': 'economy',
             'introduction': 'Economists seek to describe and predict the '
                             'regular patterns of economic fluctuation. To do '
                             'this, they study some important variables or '
                             'attributes of economies. They also study how '
                             'these attributes are responsible for producing '
                             'or causing one another.',
             'variables': {'C1': {'C1_detailed': 'Interest rates are the rates '
                                                 'banks charge to loan money.',
                                  'C1_name': 'interest rates',
                                  'explanations': {'m_m': 'A lot of people are '
                                                          'making large '
                                                          'monthly

##  How Graph Structures Define Causal Mechanisms

A **graph structure** specifies how causal variables (`C1`, `C2`, `E`) relate to each other.

### Example Graph Structures:
1. **Collider** (`C1 → E ← C2`)
   - `C1` and `C2` both cause `E`.

2. **Fork** (`C1 ← E → C2`)
   - `E` causes both `C1` and `C2`.

3. **Chain** (`C1 → C2 → E`)
   - `C1` causes `C2`, which then affects `E`.

Let's look at the graph structures dictionary that is already pre-defined in ``src/causalalign/dataset_creation/constants.py``


In [3]:
# Pretty-print available graph structures
pprint.pprint(graph_structures)


{'chain': {'causal_template': '{c1_sense} {c1_name} causes {c2_sense} '
                              '{c2_name}. And {c2_sense} {c2_name} causes '
                              '{e_sense} {e_name}.',
           'description': 'C1→C2→E'},
 'collider': {'causal_template': '{c1_sense} {c1_name} causes {e_sense} '
                                 '{e_name}. Also, {c2_sense} {c2_name} causes '
                                 '{e_sense} {e_name}.',
              'description': 'C1→E←C2'},
 'fork': {'causal_template': '{c1_sense} {c1_name} causes {c2_sense} '
                             '{c2_name}. Also, {c1_sense} {c1_name} causes '
                             '{e_sense} {e_name}.',
          'description': 'C2←C1→E'}}


##  How Inference Tasks Define the Final Prompt

Inference tasks specify **what the LLM needs to predict** given certain observations.

For example:
- `"a": {"query_node": "Ci", "observation": "Cj=1", "query": "Ci=?"}`
  - **Ask:** Given that `Cj=1`, what is the likely value of `Ci`?
  - This corresponds to **a causal reasoning question**.

Inference tasks work together with **domain dictionaries** and **graph structures** to create **verbalized prompts**.

### 🔗 How It All Connects:
1. **Domain Dictionary** → Specifies **variables and values**.
2. **Graph Structure** → Defines **causal relationships**.
3. **Inference Tasks** → Frame **the reasoning problem**.
4. **Prompt Verbalization** → Converts this into **natural language for LLMs**.



In [4]:
pprint.pprint(inference_tasks_rw17)

{'a': {'observation': 'E=1, Cj=1',
       'query': 'p(Ci=1|E=1, Cj=1)',
       'query_node': 'Ci=1'},
 'b': {'observation': 'E=1', 'query': 'p(Ci=1|E=1)', 'query_node': 'Ci=1'},
 'c': {'observation': 'E=1, Cj=0',
       'query': 'p(Ci=1|E=1, Cj=0)',
       'query_node': 'Ci=1'},
 'd': {'observation': 'Cj=1', 'query': 'p(Ci=1|Cj=1)', 'query_node': 'Ci=1'},
 'e': {'observation': 'Cj=0', 'query': 'p(Ci=1|Cj=0)', 'query_node': 'Ci=1'},
 'f': {'observation': 'E=0, Cj=1',
       'query': 'p(Ci=1|E=0, Cj=1)',
       'query_node': 'Ci=1'},
 'g': {'observation': 'E=0', 'query': 'p(Ci=1|E=0)', 'query_node': 'Ci=1'},
 'h': {'observation': 'E=0, Cj=0',
       'query': 'p(Ci=1|E=0, Cj=0)',
       'query_node': 'Ci=1'},
 'i': {'observation': 'Ci=0, Cj=0',
       'query': 'p(E=1|Ci=0, Cj=0)',
       'query_node': 'E=1'},
 'j': {'observation': 'Ci=0, Cj=1',
       'query': 'p(E=1|Ci=0, Cj=1)',
       'query_node': 'E=1'},
 'k': {'observation': 'Ci=1, Cj=1',
       'query': 'p(E=1|Ci=1, Cj=1)',
       

Now, let's explore how these components are combined by re-creating the prompts used in RW17 starting with re-creating the dictionaries that are stored in ``constants.py``!


# Step 1: Create Domain Dictionray:


In [5]:
# Example usage with enforcement of "normal" for zero values
try:
    economy_test_dict = create_domain_dict(
        domain="economy",
        introduction="Economists seek to describe and predict economic fluctuations...",
        C1_name="interest rates",
        C1_detailed="Interest rates are the rates banks charge to loan money.",
        C1_values={"1": "low", "0": "high"},
        C2_name="trade deficits",
        C2_detailed="A country's trade deficit...",
        C2_values={"1": "small", "0": "large"},
        E_name="retirement savings",
        E_detailed="Retirement savings is the money people save for their retirement.",
        E_values={"1": "high", "0": "low"},
        counterbalance_enabled=True,
        enforce_zero_label=True,  # Ensures '0' is verbalized as 'normal'
        zero_label="normal",
    )

    # Convert dictionary to a readable JSON format for display
    import json

    formatted_output = json.dumps(economy_test_dict, indent=4)
    print(formatted_output)

except Exception as e:
    print(f"Error: {e}")

{
    "domain_name": "economy",
    "introduction": "Economists seek to describe and predict economic fluctuations...",
    "variables": {
        "C1": {
            "C1_name": "interest rates",
            "C1_detailed": "Interest rates are the rates banks charge to loan money.",
            "p_value": {
                "1": "low",
                "0": "normal"
            },
            "m_value": {
                "1": "high",
                "0": "normal"
            }
        },
        "C2": {
            "C2_name": "trade deficits",
            "C2_detailed": "A country's trade deficit...",
            "p_value": {
                "1": "small",
                "0": "normal"
            },
            "m_value": {
                "1": "large",
                "0": "normal"
            }
        },
        "E": {
            "E_name": "retirement savings",
            "E_detailed": "Retirement savings is the money people save for their retirement.",
            "p_value": {
     

In [6]:
# Create individual domains
economy_domain_dict = create_domain_dict(
    domain="economy",
    introduction="Economists seek to describe and predict the regular patterns of economic fluctuation. To do this, they study some important variables or attributes of economies. They also study how these attributes are responsible for producing or causing one another.",
    C1_name="interest rates",
    C1_detailed="Interest rates are the rates banks charge to loan money.",
    C1_values={"1": "low", "0": "high"},
    C2_name="trade deficits",
    C2_detailed="A country's trade deficit is the difference between the value of the goods that a country imports and the value of the goods that a country exports.",
    C2_values={"1": "small", "0": "large"},
    E_name="retirement savings",
    E_detailed="Retirement savings is the money people save for their retirement.",
    E_values={"1": "high", "0": "low"},
    counterbalance_enabled=True,
    enforce_zero_label=True,  # Ensures '0' is verbalized as 'normal'
    zero_label="normal",
)

sociology_domain_dict = create_domain_dict(
    domain="sociology",
    introduction="Sociologists seek to describe and predict the regular patterns of societal interactions. To do this, they study some important variables or attributes of societies. They also study how these attributes are responsible for producing or causing one another.",
    C1_name="urbanization",
    C1_detailed="Urbanization is the degree to which the members of a society live in urban environments (i.e., cities) versus rural environments.",
    C1_values={"1": "high", "0": "low"},
    C2_name="interest in religion",
    C2_detailed="Interest in religion is the degree to which the members of a society show a curiosity in religion issues or participate in organized religions.",
    C2_values={"1": "low", "0": "high"},
    E_name="socio-economic mobility",
    E_detailed="Socioeconomic mobility is the degree to which the members of a society are able to improve their social and economic status.",
    E_values={"1": "high", "0": "low"},
    counterbalance_enabled=True,
    enforce_zero_label=True,  # Ensures '0' is verbalized as 'normal'
    zero_label="normal",
)

weather_domain_dict = create_domain_dict(
    domain="weather",
    introduction="Meteorologists seek to describe and predict the regular patterns that govern weather systems. To do this, they study some important variables or attributes of weather systems. They also study how these attributes are responsible for producing or causing one another.",
    C1_name="ozone levels",
    C1_detailed="Ozone is a gaseous allotrope of oxygen (O3) and is formed by exposure to UV radiation.",
    C1_values={"1": "high", "0": "low"},
    C2_name="air pressure",
    C2_detailed="Air pressure is force exerted due to concentrations of air molecules.",
    C2_values={"1": "low", "0": "high"},
    E_name="humidity",
    E_detailed="Humidity is the degree to which the atmosphere contains water molecules.",
    E_values={"1": "high", "0": "low"},
    counterbalance_enabled=True,
    enforce_zero_label=True,  # Ensures '0' is verbalized as 'normal'
    zero_label="normal",
)


This should re-create our rw17 dictionary. Let's verify this.

In [7]:
pprint.pprint(economy_domain_dict)

{'domain_name': 'economy',
 'introduction': 'Economists seek to describe and predict the regular patterns '
                 'of economic fluctuation. To do this, they study some '
                 'important variables or attributes of economies. They also '
                 'study how these attributes are responsible for producing or '
                 'causing one another.',
 'variables': {'C1': {'C1_detailed': 'Interest rates are the rates banks '
                                     'charge to loan money.',
                      'C1_name': 'interest rates',
                      'm_value': {'0': 'normal', '1': 'high'},
                      'p_value': {'0': 'normal', '1': 'low'}},
               'C2': {'C2_detailed': "A country's trade deficit is the "
                                     'difference between the value of the '
                                     'goods that a country imports and the '
                                     'value of the goods that a country '
      

## Step 2: Expand the Dictionaries into Prompt Components in Dataframe

In [8]:
# create a dataframe for each domain and then append them together
economy_df = expand_domain_to_dataframe(
    economy_domain_dict,
)
sociology_df = expand_domain_to_dataframe(
    sociology_domain_dict,
)
weather_df = expand_domain_to_dataframe(
    weather_domain_dict,
)
sociology_df

Unnamed: 0,domain,C1,C1_values,C1_cntbl,C1_sense,C1_detailed,C2,C2_values,C2_cntbl,C2_sense,C2_detailed,E,E_values,E_cntbl,E_sense,E_detailed,cntbl_cond
0,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,p,low,Interest in religion is the degree to which th...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,ppp
1,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,p,low,Interest in religion is the degree to which th...,socio-economic mobility,1,m,low,Socioeconomic mobility is the degree to which ...,ppm
2,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,m,high,Interest in religion is the degree to which th...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,pmp
3,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,m,high,Interest in religion is the degree to which th...,socio-economic mobility,1,m,low,Socioeconomic mobility is the degree to which ...,pmm
4,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,p,low,Interest in religion is the degree to which th...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,mpp
5,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,p,low,Interest in religion is the degree to which th...,socio-economic mobility,1,m,low,Socioeconomic mobility is the degree to which ...,mpm
6,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,m,high,Interest in religion is the degree to which th...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,mmp
7,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,m,high,Interest in religion is the degree to which th...,socio-economic mobility,1,m,low,Socioeconomic mobility is the degree to which ...,mmm


### Let's look at the dataframe:

In [9]:
print(
    f"Each domain dataframe now has {len(weather_df)} rows  \n and the following columns: \n {weather_df.columns}."
)
print(f"Unique counterbalance conditions: {weather_df['cntbl_cond'].unique()}")
weather_df

Each domain dataframe now has 8 rows  
 and the following columns: 
 Index(['domain', 'C1', 'C1_values', 'C1_cntbl', 'C1_sense', 'C1_detailed',
       'C2', 'C2_values', 'C2_cntbl', 'C2_sense', 'C2_detailed', 'E',
       'E_values', 'E_cntbl', 'E_sense', 'E_detailed', 'cntbl_cond'],
      dtype='object').
Unique counterbalance conditions: ['ppp' 'ppm' 'pmp' 'pmm' 'mpp' 'mpm' 'mmp' 'mmm']


Unnamed: 0,domain,C1,C1_values,C1_cntbl,C1_sense,C1_detailed,C2,C2_values,C2_cntbl,C2_sense,C2_detailed,E,E_values,E_cntbl,E_sense,E_detailed,cntbl_cond
0,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,Air pressure is force exerted due to concentra...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,ppp
1,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,Air pressure is force exerted due to concentra...,humidity,1,m,low,Humidity is the degree to which the atmosphere...,ppm
2,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,Air pressure is force exerted due to concentra...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,pmp
3,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,Air pressure is force exerted due to concentra...,humidity,1,m,low,Humidity is the degree to which the atmosphere...,pmm
4,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,Air pressure is force exerted due to concentra...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,mpp
5,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,Air pressure is force exerted due to concentra...,humidity,1,m,low,Humidity is the degree to which the atmosphere...,mpm
6,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,Air pressure is force exerted due to concentra...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,mmp
7,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,Air pressure is force exerted due to concentra...,humidity,1,m,low,Humidity is the degree to which the atmosphere...,mmm


In [10]:
economy_df

Unnamed: 0,domain,C1,C1_values,C1_cntbl,C1_sense,C1_detailed,C2,C2_values,C2_cntbl,C2_sense,C2_detailed,E,E_values,E_cntbl,E_sense,E_detailed,cntbl_cond
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,A country's trade deficit is the difference be...,retirement savings,1,p,high,Retirement savings is the money people save fo...,ppp
1,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,A country's trade deficit is the difference be...,retirement savings,1,m,low,Retirement savings is the money people save fo...,ppm
2,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,A country's trade deficit is the difference be...,retirement savings,1,p,high,Retirement savings is the money people save fo...,pmp
3,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,A country's trade deficit is the difference be...,retirement savings,1,m,low,Retirement savings is the money people save fo...,pmm
4,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,A country's trade deficit is the difference be...,retirement savings,1,p,high,Retirement savings is the money people save fo...,mpp
5,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,A country's trade deficit is the difference be...,retirement savings,1,m,low,Retirement savings is the money people save fo...,mpm
6,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,A country's trade deficit is the difference be...,retirement savings,1,p,high,Retirement savings is the money people save fo...,mmp
7,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,A country's trade deficit is the difference be...,retirement savings,1,m,low,Retirement savings is the money people save fo...,mmm



### How to derive the the off-sense of a variable:

Note that the function `expand_df_by_task_queries()` hands us back the necessary building blocks for verbalizing the entire prompt for every single of the 8 possible counterbalance conditions and every one of the 20 inference tasks. (Hence, the function `expand_df_by_task_queries()` will return a  dataframe with num_cntbl_conds * num_tasks, i.e. here: $8\cdot 20 = 160$ rows. Importanlty, the values for each variable node C1, C2, and E are only *1*, the variable-senses (high, low, small, etc.) for *0* are *not* contained in the dataframe. We have to look up the off-values (0) in the nested dictionary created by `create_domain_dict()`. The function `verbalize_inference_task()` does take care of this, when we want to verbalize an inference task that contains an off-value.

Below, you can verify for yourself that the function `verbalize_inference_task` correclty looks up off-variables  in the nested dictionary returnd  by `create_domain_dict()`. We first create the "building blocks"-df for the inference task verablization with `expand_df_by_task_queries`.

In [11]:
# create building blocks for inference task verbalization

econ_df_tasks_ex = expand_df_by_task_queries(economy_df, inference_tasks_rw17)
print(f"unique counterbalance values: {econ_df_tasks_ex['cntbl_cond'].unique()}")
econ_df_tasks_ex

unique counterbalance values: ['ppp' 'ppm' 'pmp' 'pmm' 'mpp' 'mpm' 'mmp' 'mmm']


Unnamed: 0,domain,C1,C1_values,C1_cntbl,C1_sense,C1_detailed,C2,C2_values,C2_cntbl,C2_sense,...,E,E_values,E_cntbl,E_sense,E_detailed,cntbl_cond,task,query_node,observation,query
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,retirement savings,1,p,high,Retirement savings is the money people save fo...,ppp,a,C1=1,"E=1, C2=1","p(C1=1|E=1, C2=1)"
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,retirement savings,1,p,high,Retirement savings is the money people save fo...,ppp,a,C2=1,"E=1, C1=1","p(C2=1|E=1, C1=1)"
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,retirement savings,1,p,high,Retirement savings is the money people save fo...,ppp,b,C1=1,E=1,p(C1=1|E=1)
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,retirement savings,1,p,high,Retirement savings is the money people save fo...,ppp,b,C2=1,E=1,p(C2=1|E=1)
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,retirement savings,1,p,high,Retirement savings is the money people save fo...,ppp,c,C1=1,"E=1, C2=0","p(C1=1|E=1, C2=0)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,...,retirement savings,1,m,low,Retirement savings is the money people save fo...,mmm,h,C2=1,"E=0, C1=0","p(C2=1|E=0, C1=0)"
7,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,...,retirement savings,1,m,low,Retirement savings is the money people save fo...,mmm,i,E=1,"C1=0, C2=0","p(E=1|C1=0, C2=0)"
7,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,...,retirement savings,1,m,low,Retirement savings is the money people save fo...,mmm,j,E=1,"C1=0, C2=1","p(E=1|C1=0, C2=1)"
7,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,...,retirement savings,1,m,low,Retirement savings is the money people save fo...,mmm,j,E=1,"C2=0, C1=1","p(E=1|C2=0, C1=1)"


In [12]:
weather_df_tasks_ex = expand_df_by_task_queries(weather_df, inference_tasks_rw17)
print(f"unique counterbalance values: {weather_df_tasks_ex['cntbl_cond'].unique()}")
weather_df_tasks_ex

unique counterbalance values: ['ppp' 'ppm' 'pmp' 'pmm' 'mpp' 'mpm' 'mmp' 'mmm']


Unnamed: 0,domain,C1,C1_values,C1_cntbl,C1_sense,C1_detailed,C2,C2_values,C2_cntbl,C2_sense,...,E,E_values,E_cntbl,E_sense,E_detailed,cntbl_cond,task,query_node,observation,query
0,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,ppp,a,C1=1,"E=1, C2=1","p(C1=1|E=1, C2=1)"
0,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,ppp,a,C2=1,"E=1, C1=1","p(C2=1|E=1, C1=1)"
0,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,ppp,b,C1=1,E=1,p(C1=1|E=1)
0,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,ppp,b,C2=1,E=1,p(C2=1|E=1)
0,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,ppp,c,C1=1,"E=1, C2=0","p(C1=1|E=1, C2=0)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,humidity,1,m,low,Humidity is the degree to which the atmosphere...,mmm,h,C2=1,"E=0, C1=0","p(C2=1|E=0, C1=0)"
7,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,humidity,1,m,low,Humidity is the degree to which the atmosphere...,mmm,i,E=1,"C1=0, C2=0","p(E=1|C1=0, C2=0)"
7,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,humidity,1,m,low,Humidity is the degree to which the atmosphere...,mmm,j,E=1,"C1=0, C2=1","p(E=1|C1=0, C2=1)"
7,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,humidity,1,m,low,Humidity is the degree to which the atmosphere...,mmm,j,E=1,"C2=0, C1=1","p(E=1|C2=0, C1=1)"


In [13]:
sociology_df_tasks_ex = expand_df_by_task_queries(sociology_df, inference_tasks_rw17)
print(f"unique counterbalance values: {sociology_df_tasks_ex['cntbl_cond'].unique()}")
sociology_df_tasks_ex

unique counterbalance values: ['ppp' 'ppm' 'pmp' 'pmm' 'mpp' 'mpm' 'mmp' 'mmm']


Unnamed: 0,domain,C1,C1_values,C1_cntbl,C1_sense,C1_detailed,C2,C2_values,C2_cntbl,C2_sense,...,E,E_values,E_cntbl,E_sense,E_detailed,cntbl_cond,task,query_node,observation,query
0,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,p,low,...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,ppp,a,C1=1,"E=1, C2=1","p(C1=1|E=1, C2=1)"
0,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,p,low,...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,ppp,a,C2=1,"E=1, C1=1","p(C2=1|E=1, C1=1)"
0,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,p,low,...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,ppp,b,C1=1,E=1,p(C1=1|E=1)
0,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,p,low,...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,ppp,b,C2=1,E=1,p(C2=1|E=1)
0,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,p,low,...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,ppp,c,C1=1,"E=1, C2=0","p(C1=1|E=1, C2=0)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,m,high,...,socio-economic mobility,1,m,low,Socioeconomic mobility is the degree to which ...,mmm,h,C2=1,"E=0, C1=0","p(C2=1|E=0, C1=0)"
7,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,m,high,...,socio-economic mobility,1,m,low,Socioeconomic mobility is the degree to which ...,mmm,i,E=1,"C1=0, C2=0","p(E=1|C1=0, C2=0)"
7,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,m,high,...,socio-economic mobility,1,m,low,Socioeconomic mobility is the degree to which ...,mmm,j,E=1,"C1=0, C2=1","p(E=1|C1=0, C2=1)"
7,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,m,high,...,socio-economic mobility,1,m,low,Socioeconomic mobility is the degree to which ...,mmm,j,E=1,"C2=0, C1=1","p(E=1|C2=0, C1=1)"


Next, we verbalize a row from the above dataframe `econ_df_tasks_ex` that contains in the observation column "E=0, C1=0". You can verify for yourself that the function  `verbalize_inference_task` correctly looks up off-values in the dictionary `economy_domain_dict`.



In [14]:
# verbalize inference task for a single row
print(
    verbalize_inference_task(
        econ_df_tasks_ex.iloc[0], economy_domain_dict, prompt_type="prompt"
    )
)

# for each row in the dataframe, verbalize the inference task
econ_df_tasks_ex["prompt"] = econ_df_tasks_ex.apply(
    lambda row: verbalize_inference_task(
        row, economy_domain_dict, prompt_type="prompt"
    ),
    axis=1,
)

# check the unique counterbalance conditions: We'd expect 8 unique conditions at this point
print(econ_df_tasks_ex["cntbl_cond"].unique())
econ_df_tasks_ex

 You are currently observing: high retirement savings, small trade deficits. Your task is to estimate how likely it is that low interest rates are present on a scale from 0 to 100, given the observations and causal relationships described. 0 means completely unlikely and 100 means completely certain. prompt
['ppp' 'ppm' 'pmp' 'pmm' 'mpp' 'mpm' 'mmp' 'mmm']


Unnamed: 0,domain,C1,C1_values,C1_cntbl,C1_sense,C1_detailed,C2,C2_values,C2_cntbl,C2_sense,...,E_values,E_cntbl,E_sense,E_detailed,cntbl_cond,task,query_node,observation,query,prompt
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,1,p,high,Retirement savings is the money people save fo...,ppp,a,C1=1,"E=1, C2=1","p(C1=1|E=1, C2=1)",You are currently observing: high retirement ...
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,1,p,high,Retirement savings is the money people save fo...,ppp,a,C2=1,"E=1, C1=1","p(C2=1|E=1, C1=1)",You are currently observing: high retirement ...
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,1,p,high,Retirement savings is the money people save fo...,ppp,b,C1=1,E=1,p(C1=1|E=1),You are currently observing: high retirement ...
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,1,p,high,Retirement savings is the money people save fo...,ppp,b,C2=1,E=1,p(C2=1|E=1),You are currently observing: high retirement ...
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,1,p,high,Retirement savings is the money people save fo...,ppp,c,C1=1,"E=1, C2=0","p(C1=1|E=1, C2=0)",You are currently observing: high retirement ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,...,1,m,low,Retirement savings is the money people save fo...,mmm,h,C2=1,"E=0, C1=0","p(C2=1|E=0, C1=0)",You are currently observing: normal retiremen...
7,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,...,1,m,low,Retirement savings is the money people save fo...,mmm,i,E=1,"C1=0, C2=0","p(E=1|C1=0, C2=0)",You are currently observing: normal interest ...
7,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,...,1,m,low,Retirement savings is the money people save fo...,mmm,j,E=1,"C1=0, C2=1","p(E=1|C1=0, C2=1)",You are currently observing: normal interest ...
7,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,...,1,m,low,Retirement savings is the money people save fo...,mmm,j,E=1,"C2=0, C1=1","p(E=1|C2=0, C1=1)",You are currently observing: normal trade def...


***Customizing prompting Strategy:**
`verbalize_inference_task` also allows you to pass an argument `prompt_type` that allows you to customize the prompting strategy. The default is set to `prompt_type`=  `Please provide only a numeric response and no additional information.`. But you may want to try Chain-of-Thought (CoT) or tell the LLM to place its responses within certain characters like `< >`.


In [15]:
# Example of custom verbalization
prompt = "THIS IS A CUSTOM PROMPT"
verbalize_inference_task(
    econ_df_tasks_ex.iloc[0], economy_domain_dict, prompt_type=prompt
)

' You are currently observing: high retirement savings, small trade deficits. Your task is to estimate how likely it is that low interest rates are present on a scale from 0 to 100, given the observations and causal relationships described. 0 means completely unlikely and 100 means completely certain. THIS IS A CUSTOM PROMPT'


### A Note on the Counterbalance Conditions p and m
Those can be admittedly a bit confusing and you may think you don't need them. They are created automatically for you, so $2^3$ versions of combining p-m-p values. Simply subset for the ones you want to use. RW17 uses only 4 of the 8 possible combinations. 



In [16]:
# Let's look at an individual counterbalance conditions:
print(economy_df["cntbl_cond"].unique())
print(economy_df["cntbl_cond"])

print(econ_df_tasks_ex["cntbl_cond"].unique())

['ppp' 'ppm' 'pmp' 'pmm' 'mpp' 'mpm' 'mmp' 'mmm']
0    ppp
1    ppm
2    pmp
3    pmm
4    mpp
5    mpm
6    mmp
7    mmm
Name: cntbl_cond, dtype: object
['ppp' 'ppm' 'pmp' 'pmm' 'mpp' 'mpm' 'mmp' 'mmm']


## Creating the entire prompt for each condition and domain:

- Step 1:  for each domain dictionary, create the domain dataframe and then the verbalizations
- Step 2: append all complete dataframes
- Step 3: _optionally_ subset for counterbalance conditions of interest
- Step 4: _optionally_ merge with human data



## Custom Response prompt instructions:


Please provide your response in the following strict XML format without any additional text or explanation:
```
<response>
    <likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood>
    <confidence>YOUR_CONFIDENCE_SCORE_HERE</confidence>
</response>

```

Only replace YOUR_NUMERIC_RESPONSE_HERE with a number between 0 (unlikely) and 100 (very likely), and YOUR_CONFIDENCE_SCORE_HERE with a number between 0 (very uncertain) and 100 (very certain). Do not include any explanations or text outside the XML format.


In [17]:
xml_format_numeric_certainty = "<response><likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood><confidence>YOUR_CONFIDENCE_SCORE_HERE</confidence></response>"
xml_explanation_numeric_certainty = (
    "Replace YOUR_NUMERIC_RESPONSE_HERE with your likelihood estimate between 0 (very unlikely) and 100 (very likely). "
    "Replace YOUR_CONFIDENCE_SCORE_HERE with a number between 0 (very uncertain) and 100 (very certain), indicating how confident you are in your likelihood estimate. "
    "DO NOT include any other information, explanation, or formatting in your response. "
    "DO NOT use Markdown, code blocks, quotation marks, or special characters."
    # "Only return the XML as raw text in a single line."
)

prompt_type_xml_numeric_certainty = (
    "Return your response as raw text in one single line using this exact XML format: "
    + xml_format_numeric_certainty
    + " "
    # + "DO NOT use Markdown, code blocks, or any additional formatting."
    # "Return your response in a single line, without line breaks within the following XML format. Do NOT use Markdown, code blocks, or any additional formatting. DO NOT add quotation marks around the response. DO NOT provide any additional information. Output must be in the exact XML format below, inline "
    # "Please provide your response in the following strict XML format. Provide your response in a single line. DO NOT provide any additional information outside the  formatting like line breaks and ensure the XML is returned as raw text without any code markers e.g., ```xml or quotation marks: "
    # + xml_format_numeric_certainty
    + " "
    + xml_explanation_numeric_certainty
)


##### Next, we'll call `generate_prompt_dataframe()` for each domain dictionary. 
We'll start with one domain dictionary and after that, we'll loop over the remaining domain dictionaries
and append the resulting dataframes.

In [18]:
# create an empty dataframe to append the domain dataframes to
econ_complete_df = generate_prompt_dataframe(
    domain_dict=economy_domain_dict,
    inference_tasks=inference_tasks_rw17,
    graph_type="collider",
    graph_structures=graph_structures,
    counterbalance_enabled=True,
    prompt_category="numeric-certainty",
    prompt_type=prompt_type_xml_numeric_certainty,
)

# list of remaining 2 omain dictionaries
domain_dicts = [sociology_domain_dict, weather_domain_dict]


# we'll start with the completed economy dataframe and append the other domain dataframes to it
rw_17_over_complete_df = (
    econ_complete_df.copy()  # over complete means, it has all 8 possible counterbalance conditions.
    # we'll subset later for the 4 counterbalance conditions that were used
)


##### now loop over remaining domain dicts and append them to the one created above

In [19]:
for dict in domain_dicts:
    # for dict in rw_17_domain_components.values():
    df = generate_prompt_dataframe(
        domain_dict=dict,
        inference_tasks=inference_tasks_rw17,
        graph_type="collider",
        graph_structures=graph_structures,
        counterbalance_enabled=True,
        prompt_category="numeric-certainty",
        prompt_type=prompt_type_xml_numeric_certainty,
    )
    # append the new dataframe to the complete dataframe
    rw_17_over_complete_df = append_dfs(rw_17_over_complete_df, df)

### Subsetting for the Counterbalance conditions used in RW17
- and adding unique id.

In [20]:
# add a unique id to each row
rw_17_over_complete_df["id"] = range(1, len(rw_17_over_complete_df) + 1)

# subset for the counterbalance conditions used in the RW17 study
select_contbl_cond_xs = ["ppp", "pmm", "mmp", "mpm"]
print(
    f"unique counterbalance conditions used: {rw_17_over_complete_df['cntbl_cond'].unique()}"
)

unique counterbalance conditions used: ['ppp' 'ppm' 'pmp' 'pmm' 'mpp' 'mpm' 'mmp' 'mmm']


In [21]:
rw_17_complete_df = rw_17_over_complete_df[
    rw_17_over_complete_df["cntbl_cond"].isin(select_contbl_cond_xs)
]

In [22]:
rw_17_over_complete_df[rw_17_over_complete_df["id"] == 275]

Unnamed: 0,domain,C1,C1_values,C1_cntbl,C1_sense,C1_detailed,C2,C2_values,C2_cntbl,C2_sense,...,E_detailed,cntbl_cond,task,query_node,observation,query,graph,prompt,prompt_category,id
274,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,p,low,...,Socioeconomic mobility is the degree to which ...,mpm,h,C1=1,"E=0, C2=0","p(C1=1|E=0, C2=0)",collider,Sociologists seek to describe and predict the ...,numeric-certainty,275


In [23]:
print(len(rw_17_complete_df))
print(rw_17_complete_df.columns)
print(rw_17_complete_df["task"].unique())
print(rw_17_complete_df["cntbl_cond"].unique())
print(rw_17_complete_df["domain"].unique())


240
Index(['domain', 'C1', 'C1_values', 'C1_cntbl', 'C1_sense', 'C1_detailed',
       'C2', 'C2_values', 'C2_cntbl', 'C2_sense', 'C2_detailed', 'E',
       'E_values', 'E_cntbl', 'E_sense', 'E_detailed', 'cntbl_cond', 'task',
       'query_node', 'observation', 'query', 'graph', 'prompt',
       'prompt_category', 'id'],
      dtype='object')
['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k']
['ppp' 'pmm' 'mpm' 'mmp']
['economy' 'sociology' 'weather']


## Double-checking with Bob's sampled prompts:

In [24]:
# # filter for the following


# # Example DataFrame (replace this with your actual dataset)
# data = {
#     "domain": ["economy", "sociology", "weather", "economy", "society"],
#     "graph": ["collider", "non-collider", "collider", "non-collider", "collider"],
#     "cotbl_cond": ["mmp", "pmm", "ppp", "ppp", "mmp"],
# }
# df = pd.DataFrame(data)

# List of file names
file_names = [
    "economy.CC.mmp.txt",
    "economy.CC.mpm.txt",
    "economy.CC.ppp.txt",
    "economy.CE.ppp.txt",
    "sociology.CC.mmp.txt",
    "sociology.CC.pmm.txt",
    "sociology.CE.mmp.txt",
    "sociology.CE.mpm.txt",
    "sociology.CE.pmm.txt",
    "weather.CC.ppp.txt",
    "weather.CE.mmp.txt",
    "weather.CE.ppp.txt",
]

# Extracting relevant parts
subset_conditions = []
for file in file_names:
    parts = file.split(".")
    domain = parts[0]
    graph = "collider" if parts[1] == "CE" else "fork"
    cotbl_cond = parts[2]

    subset_conditions.append((domain, graph, cotbl_cond))

# Convert conditions into a DataFrame for merging
bob_test_conditions_df = pd.DataFrame(
    subset_conditions, columns=["domain", "graph", "cntbl_cond"]
)

# # Subset the original DataFrame
# subset_df = rw_17_complete_df.merge(
#     bob_test_conditions_df, on=["domain", "graph", "cntbl_cond"], how="inner"
# )
bob_test_conditions_df

Unnamed: 0,domain,graph,cntbl_cond
0,economy,fork,mmp
1,economy,fork,mpm
2,economy,fork,ppp
3,economy,collider,ppp
4,sociology,fork,mmp
5,sociology,fork,pmm
6,sociology,collider,mmp
7,sociology,collider,mpm
8,sociology,collider,pmm
9,weather,fork,ppp


In [25]:
# Filter the rw_17_over_complete_df for rows that match those in bob_test_conditions_df
filtered_df = rw_17_over_complete_df.merge(
    bob_test_conditions_df, on=["domain", "graph", "cntbl_cond"], how="inner"
)[["domain", "graph", "cntbl_cond", "prompt", "query"]]

# Display the filtered dataframe

filtered_df["cntbl_cond"].unique()
print(rw_17_over_complete_df["cntbl_cond"].unique())

['ppp' 'ppm' 'pmp' 'pmm' 'mpp' 'mpm' 'mmp' 'mmm']


In [26]:
# uncomment below for full display of the dataframe to verify condition-dependent prompt verbalizations.

# # display all rows and columns
# pd.set_option("display.max_rows", None)
# pd.set_option("display.max_columns",None)
# # display the entire prompt column
# pd.set_option("display.max_colwidth", None)

# filtered_df

#### End Double-checking with Bob's data
Result: 
- the causal mechanism for both Collider and Fork is identical. 
- this code doesn't provide the explanation followed by each edge.

In [27]:
# # display the entire dataframe cell content
# pd.set_option("display.max_colwidth", None)
# pd.set_option("display.max_rows", None)

# subset_df[["domain", "graph", "cntbl_cond", "prompt"]]

## Merging with Human Data (from RW17, Collider graph, Experiment 1)

### Step 1: Loading Human Data and Making Sense of it

There are a number of columns that we don't need or will re-name in the following. 
- Note  the mapping of the column label (a-k) to the conditional probailities that define the inference tasks defined in the dictionary `inference_tasks_rw17`
- we will rename this to task (after having dropped the other task column below)
- the column ppp refers to counterbalance condition, name cntbl_cond in our dataframe

### Step 2: Merging Human Data with Prompts


#### The most important thing is ...
- knowing the *unique identifier that lets us map a human data row to a prompt*.
- the combination of domain, counterbalance condition (originally ppp) task (a-k, renamned by from label to task), and graph
    - note that graph is not necessary here. But just in case we should handle human data that contains recordings from more than one graph topology, it doesn't hurt to include it

In [28]:
# load human data
# import human data
human_data = pd.read_csv("../data/17_rw/human_data/rw16_ce.csv", sep=";")
# print the columns
print(human_data.columns)
print(human_data["label"].unique())

Index(['s', 'domain', 'model', 'diag', 'ppp', 'task', 'type', 'wt', 'y',
       'y_length', 'label', 'y.hat', 'y.hat.pre', 'diff', 'y.hat.prescaled'],
      dtype='object')
['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k']


In [29]:
## rename the columns
human_data["domain"].unique()
human_data["domain"] = human_data["domain"].replace({"society": "sociology"})
# rename ppp column with cntbl_cond
human_data = human_data.rename(columns={"ppp": "cntbl_cond"})
# rename s with subj_id
human_data = human_data.rename(columns={"s": "human_subj_id"})  # in rw17
## add column subject that contains humans everywhere
human_data["subject"] = "human"
# add a column graph that contains collider everywhere
human_data["graph"] = "collider"
# drop task column
human_data = human_data.drop(columns=["task"])
# rename label with task
human_data = human_data.rename(columns={"label": "task"})
# rename column y with response
human_data = human_data.rename(columns={"y": "response"})
# drop the following columns: y.hat', 'y.hat.pre', 'diff', 'y.hat.prescaled' , 'wt', label, type, diag, model
human_data = human_data.drop(
    columns=[
        "y.hat",
        "y.hat.pre",
        "diff",
        "y.hat.prescaled",
        "wt",
        "type",
        "diag",
        "model",
    ]
)
# print the columns
# rename y_length with num_responses_agg
human_data = human_data.rename(columns={"y_length": "num_responses_agg"})


In [30]:
# save humna data
human_data.to_csv("../datasets/17_rw/human_data/human_cleaned_coll.csv", index=False)
print(human_data.columns)

Index(['human_subj_id', 'domain', 'cntbl_cond', 'response',
       'num_responses_agg', 'task', 'subject', 'graph'],
      dtype='object')


In [31]:
# now merge the all_domains_df with  human data on the columns: domain, task, cntbl_cond
# merge the dataframes
merged_df = pd.merge(
    rw_17_complete_df, human_data, on=["domain", "task", "cntbl_cond", "graph"]
)
merged_df["response"] = merged_df["response"].str.replace(",", ".").astype(float)
# print the columns
print(merged_df.columns)

Index(['domain', 'C1', 'C1_values', 'C1_cntbl', 'C1_sense', 'C1_detailed',
       'C2', 'C2_values', 'C2_cntbl', 'C2_sense', 'C2_detailed', 'E',
       'E_values', 'E_cntbl', 'E_sense', 'E_detailed', 'cntbl_cond', 'task',
       'query_node', 'observation', 'query', 'graph', 'prompt',
       'prompt_category', 'id', 'human_subj_id', 'response',
       'num_responses_agg', 'subject'],
      dtype='object')


### Let's have a look at the merged dataframe that now reveals which prompt corresponds to each human response

- Since we have 4 subjects for each of the 240 unique prompts emerging from the 3 domains, 4 counterbalance conditions and 20 tasks, we should expect the merged data frame to have $240\cdot 4 = 960$ rows. 
- Let's double check that too.

In [32]:
print(
    f"the oringinal rw17_complete_df has {len(rw_17_complete_df)} rows and the merged_df has {len(merged_df)} rows."
)

merged_df

the oringinal rw17_complete_df has 240 rows and the merged_df has 960 rows.


Unnamed: 0,domain,C1,C1_values,C1_cntbl,C1_sense,C1_detailed,C2,C2_values,C2_cntbl,C2_sense,...,observation,query,graph,prompt,prompt_category,id,human_subj_id,response,num_responses_agg,subject
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,"E=1, C2=1","p(C1=1|E=1, C2=1)",collider,Economists seek to describe and predict the re...,numeric-certainty,1,0,82.5,2,human
1,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,"E=1, C2=1","p(C1=1|E=1, C2=1)",collider,Economists seek to describe and predict the re...,numeric-certainty,1,25,72.5,2,human
2,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,"E=1, C2=1","p(C1=1|E=1, C2=1)",collider,Economists seek to describe and predict the re...,numeric-certainty,1,58,65.0,2,human
3,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,"E=1, C2=1","p(C1=1|E=1, C2=1)",collider,Economists seek to describe and predict the re...,numeric-certainty,1,69,50.0,2,human
4,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,"E=1, C1=1","p(C2=1|E=1, C1=1)",collider,Economists seek to describe and predict the re...,numeric-certainty,2,0,82.5,2,human
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
955,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,"C1=1, C2=1","p(E=1|C1=1, C2=1)",collider,Meteorologists seek to describe and predict th...,numeric-certainty,460,33,100.0,1,human
956,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,"C1=1, C2=1","p(E=1|C1=1, C2=1)",collider,Meteorologists seek to describe and predict th...,numeric-certainty,460,57,100.0,1,human
957,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,"C1=1, C2=1","p(E=1|C1=1, C2=1)",collider,Meteorologists seek to describe and predict th...,numeric-certainty,460,72,90.0,1,human
958,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,"C1=1, C2=1","p(E=1|C1=1, C2=1)",collider,Meteorologists seek to describe and predict th...,numeric-certainty,460,77,75.0,1,human


In [33]:
# display maximal column width in dataframe
# # display all rows and columns
# pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
# # display the entire prompt column
pd.set_option("display.max_colwidth", None)
# # return the entire row for id = 275
merged_df[merged_df["id"] == 275]


Unnamed: 0,domain,C1,C1_values,C1_cntbl,C1_sense,C1_detailed,C2,C2_values,C2_cntbl,C2_sense,C2_detailed,E,E_values,E_cntbl,E_sense,E_detailed,cntbl_cond,task,query_node,observation,query,graph,prompt,prompt_category,id,human_subj_id,response,num_responses_agg,subject
548,sociology,urbanization,1,m,low,"Urbanization is the degree to which the members of a society live in urban environments (i.e., cities) versus rural environments.",interest in religion,1,p,low,Interest in religion is the degree to which the members of a society show a curiosity in religion issues or participate in organized religions.,socio-economic mobility,1,m,low,Socioeconomic mobility is the degree to which the members of a society are able to improve their social and economic status.,mpm,h,C1=1,"E=0, C2=0","p(C1=1|E=0, C2=0)",collider,"Sociologists seek to describe and predict the regular patterns of societal interactions. To do this, they study some important variables or attributes of societies. They also study how these attributes are responsible for producing or causing one another.low urbanization causes low socio-economic mobility. Also, low interest in religion causes low socio-economic mobility. You are currently observing: normal socio-economic mobility, normal interest in religion. Your task is to estimate how likely it is that low urbanization are present on a scale from 0 to 100, given the observations and causal relationships described. 0 means completely unlikely and 100 means completely certain. Return your response as raw text in one single line using this exact XML format: <response><likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood><confidence>YOUR_CONFIDENCE_SCORE_HERE</confidence></response> Replace YOUR_NUMERIC_RESPONSE_HERE with your likelihood estimate between 0 (very unlikely) and 100 (very likely). Replace YOUR_CONFIDENCE_SCORE_HERE with a number between 0 (very uncertain) and 100 (very certain), indicating how confident you are in your likelihood estimate. DO NOT include any other information, explanation, or formatting in your response. DO NOT use Markdown, code blocks, quotation marks, or special characters.",numeric-certainty,275,71,25.0,2,human
549,sociology,urbanization,1,m,low,"Urbanization is the degree to which the members of a society live in urban environments (i.e., cities) versus rural environments.",interest in religion,1,p,low,Interest in religion is the degree to which the members of a society show a curiosity in religion issues or participate in organized religions.,socio-economic mobility,1,m,low,Socioeconomic mobility is the degree to which the members of a society are able to improve their social and economic status.,mpm,h,C1=1,"E=0, C2=0","p(C1=1|E=0, C2=0)",collider,"Sociologists seek to describe and predict the regular patterns of societal interactions. To do this, they study some important variables or attributes of societies. They also study how these attributes are responsible for producing or causing one another.low urbanization causes low socio-economic mobility. Also, low interest in religion causes low socio-economic mobility. You are currently observing: normal socio-economic mobility, normal interest in religion. Your task is to estimate how likely it is that low urbanization are present on a scale from 0 to 100, given the observations and causal relationships described. 0 means completely unlikely and 100 means completely certain. Return your response as raw text in one single line using this exact XML format: <response><likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood><confidence>YOUR_CONFIDENCE_SCORE_HERE</confidence></response> Replace YOUR_NUMERIC_RESPONSE_HERE with your likelihood estimate between 0 (very unlikely) and 100 (very likely). Replace YOUR_CONFIDENCE_SCORE_HERE with a number between 0 (very uncertain) and 100 (very certain), indicating how confident you are in your likelihood estimate. DO NOT include any other information, explanation, or formatting in your response. DO NOT use Markdown, code blocks, quotation marks, or special characters.",numeric-certainty,275,99,25.0,2,human


## Saving Dataframes to csv

### To prompt the the LLMs, we only need the id and prompt.
- to not confuse the LLMs, we should group them by counterbalance condition when interacting with them through the website.
- however, the default when interacting with them throught their respective APIs is that they're stateless, meaning they don't retain anything from the previous context.
    - this is why we don't have to worry about having the different counterbalance conditions contradict each other and hence confuse the LLMs


In [34]:
# save merged data
# merged_df.to_csv("../datasets/17_rw/merged_data/humans_prompts_coll.csv", index=False)
# print(merged_df.columns)

# drop text verbalization columns from merged df
merged_df = merged_df.drop(
    columns=["C1_detailed", "C2_detailed", "E_detailed", "prompt"]
)
# save merged data
merged_df.to_csv(
    "../../../results/17_rw/humans/humans_responses_w_prompt_id_coll.csv",
    index=False,
    sep=";",
)
print(merged_df.columns)


Index(['domain', 'C1', 'C1_values', 'C1_cntbl', 'C1_sense', 'C2', 'C2_values',
       'C2_cntbl', 'C2_sense', 'E', 'E_values', 'E_cntbl', 'E_sense',
       'cntbl_cond', 'task', 'query_node', 'observation', 'query', 'graph',
       'prompt_category', 'id', 'human_subj_id', 'response',
       'num_responses_agg', 'subject'],
      dtype='object')


In [35]:
# # save for prompting LLMs
# LLM_prompting_df = rw_17_complete_df[
#     ["id", "prompt", "prompt_category", "graph", "domain", "cntbl_cond", "task"]
# ]
# prompt_category = LLM_prompting_df["prompt_category"].unique()[0]
# graph_type = LLM_prompting_df["graph"].unique()[0]
# print(prompt_category, graph_type)
# LLM_prompting_df.to_csv(
#     f"../datasets/17_rw/prompts_for_LLM_api/{prompt_category}_LLM_prompting_{graph_type}.csv",
#     index=False,
# )

# print("Prompts for LLMs saved successfully!")
# print("prompt data file has the following columns:")
# print(LLM_prompting_df.columns)


version = "4"
# save for prompting LLMs
LLM_prompting_df = rw_17_complete_df[
    ["id", "prompt", "prompt_category", "graph", "domain", "cntbl_cond", "task"]
]
prompt_category = LLM_prompting_df["prompt_category"].unique()[0]
graph_type = LLM_prompting_df["graph"].unique()[0]
print(prompt_category, graph_type)
LLM_prompting_df.to_csv(
    f"../datasets/17_rw/prompts_for_LLM_api/{version}_v_{prompt_category}_LLM_prompting_{graph_type}.csv",
    index=False,
)

print(
    f"Prompts for LLMs for version {version}, graph = {graph_type}, and prompt type = {prompt_category} saved successfully!"
)
print("prompt data file has the following columns:")
print(LLM_prompting_df.columns)

numeric-certainty collider
Prompts for LLMs for version 4, graph = collider, and prompt type = numeric-certainty saved successfully!
prompt data file has the following columns:
Index(['id', 'prompt', 'prompt_category', 'graph', 'domain', 'cntbl_cond',
       'task'],
      dtype='object')
