# Recreating the RW17 Dataset from Scratch

This notebook first explains the datastructure used in (Rehder and Waldmann, 2017)
and then demonstrates how to **construct the RW17 domain components** using helper functions 
from `dataset_creation`, and **convert them into a structured DataFrame**.

We will:
1. **Create a domain dictionary** using `create_domain_dict`
2. **Expand it into a DataFrame** using `expand_domain_to_dataframe`
3. **Add inference tasks** to extend the dataset
4. **Generate verbalized prompts** for human evaluation


This step will:

Load and examine rw_17_domain_components (the predefined dataset components).
Break down its structure to understand:
Domain dictionary (specification of causal variables).
Graph structure (causal relationships).
Inference tasks (reasoning scenarios).
Explain how these elements combine to generate prompts for LLMs.


In [1]:
import os
import sys
import pprint
import pandas as pd

# Ensure Python finds the `src` directory
sys.path.append(os.path.abspath("../../src"))

# Import everything defined in `__all__`
from causalalign.dataset_creation import (
    rw_17_domain_components,
    graph_structures,
    inference_tasks_rw17,
    generate_prompt_dataframe,
    expand_domain_to_dataframe,
    expand_df_by_task_queries,
    create_domain_dict,
    verbalize_domain_intro,
    verbalize_causal_mechanism,
    verbalize_inference_task,
    append_dfs,
)

print("Dataset creation module imported successfully!")


Dataset creation module imported successfully!


# 1. Understanding the RW17 Dataset Structure

Before generating new datasets, we need to understand the structure of **RW17 domain components**.

## 🔹 What Is RW17?
RW17 presented humans with causal inference tasks and asked for their likelihood judgements. Each causal inference task was presented on four subsequent screens.
In the following, I will describe how I translated the experimental materials used by RW17 into nested dictionaries, which serve as the backbone to algorithmically  generate  the materials in RW17 in *textual form* such that we can prompt and compare LLMs' causal judgements. The following notebook explains how to algoritmically re-create the the textual form used by RW17 and also, how to easily create new prompts.

RW17 used 3 different knowledge domains, in particular economy, sociology, and weather, in which the inference tasks were thematically embedded.
Each domain specifies:
- **Variables (`C1`, `C2`, `E`)**: The causal variables / graph nodes, e.g., C1: interest rates
- **Variable Sense depending on binary values 0 or 1 for (`C1`, `C2`, `E`)**: e.g. for C1=1: *high* interest rates
- ** Counterbalance-dependent Sense Assignments (`p/m`)**: How we represent conditions in counterbalanced ways (optional, but used in RW17). Essentially, this flips the senses of what it means for the variable to be on (1) or off (0).

The verbalization of the prompt depends on the domain and:
- **Causal Mechanisms**: How the variables influence each other (e.g., specified by collider, graph or chain graph)
- **Inference Tasks**: The reasoning problems we ask an LLM to solve specified by 

By combining these components, we **generate structured natural language prompts** that can be used for causal reasoning tasks in an LLMs.


##  Understanding the RW17 Domain Dictionary

Each domain in `rw_17_domain_components` contains:
1. **Domain Name & Introduction**:  
   - Explains the overall knowledge structure / cover story the inference task is embedded in.
   
2. **Causal Variables (`C1`, `C2`, `E`)**:  
   - `C1` and `C2` are **causes**, and `E` is the **effect**.
   - Each variable has:
     - A **name** and **detailed description**.
     - `p_value` and `m_value`: Counterbalanced values.
     - **Explanation mappings**: How specific conditions lead to outcomes. (optional, plus, they are graph dependent!)

3. **Example: Economy Domain**
   - `C1`: **Interest Rates** (low vs. high)
   - `C2`: **Trade Deficit** (small vs. large)
   - `E`: **Retirement Savings** (high vs. low)


#### Make sure you understand the building blocks of the dictionary. 

in ``src/causalalign/dataset_creation/constants.py``, there are dictionaries that define the domain building blocks, causal mechanism for 3 different graph topologies (collider, fork, and chain), and inference tasks. Before re-creating the prompts used in RW17, let's first load them and get a feeling for the prompt structure:




In [2]:
# load domain components
rw_17 = rw_17_domain_components


# list the domains in the dataset
print("Domains in the dataset:")
print(rw_17.keys())


# Pretty-print RW17 dataset structure

pprint.pprint(rw_17)

Domains in the dataset:
dict_keys(['economy', 'sociology', 'weather'])
{'economy': {'domain_name': 'economy',
             'introduction': 'Economists seek to describe and predict the '
                             'regular patterns of economic fluctuation. To do '
                             'this, they study some important variables or '
                             'attributes of economies. They also study how '
                             'these attributes are responsible for producing '
                             'or causing one another.',
             'variables': {'C': {'C_detailed': 'Retirement savings is the '
                                               'money people save for their '
                                               'retirement.',
                                 'C_name': 'retirement savings',
                                 'm_value': {'0': 'normal', '1': 'low'},
                                 'p_value': {'0': 'normal', '1': 'high'}},
                

##  How Graph Structures Define Causal Mechanisms

A **graph structure** specifies how causal variables (`C1`, `C2`, `E`) relate to each other.

### Example Graph Structures:
1. **Collider** (`C1 → E ← C2`)
   - `C1` and `C2` both cause `E`.

2. **Fork** (`C1 ← E → C2`)
   - `E` causes both `C1` and `C2`.

3. **Chain** (`C1 → C2 → E`)
   - `C1` causes `C2`, which then affects `E`.

Let's look at the graph structures dictionary that is already pre-defined in ``src/causalalign/dataset_creation/constants.py``


In [3]:
# Pretty-print available graph structures
pprint.pprint(graph_structures)


{'chain': {'causal_template': '{x_sense} {x_name} causes {y_sense} {y_name}. '
                              'And {y_sense} {y_name} causes {z_sense} '
                              '{z_name}.',
           'description': 'A→B→C'},
 'collider': {'causal_template': '{x_sense} {x_name} causes {z_sense} '
                                 '{z_name}. Also, {y_sense} {y_name} causes '
                                 '{z_sense} {z_name}.',
              'description': 'X→Z←Y'},
 'fork': {'causal_template': '{x_sense} {x_name} causes {y_sense} {y_name}. '
                             'Also, {x_sense} {x_name} causes {z_sense} '
                             '{z_name}.',
          'description': 'B←A→C'}}


##  How Inference Tasks Define the Final Prompt

Inference tasks specify **what the LLM needs to predict** given certain observations.

For example:
- `"a": {"query_node": "Ci", "observation": "Cj=1", "query": "Ci=?"}`
  - **Ask:** Given that `Cj=1`, what is the likely value of `Ci`?
  - This corresponds to **a causal reasoning question**.

Inference tasks work together with **domain dictionaries** and **graph structures** to create **verbalized prompts**.

### 🔗 How It All Connects:
1. **Domain Dictionary** → Specifies **variables and values**.
2. **Graph Structure** → Defines **causal relationships**.
3. **Inference Tasks** → Frame **the reasoning problem**.
4. **Prompt Verbalization** → Converts this into **natural language for LLMs**.



In [4]:
pprint.pprint(inference_tasks_rw17)

{'a': {'observation': 'Z=1, Yj=1',
       'query': 'p(Xi=1|Z=1, Yj=1)',
       'query_node': 'Xi=1'},
 'b': {'observation': 'Z=1', 'query': 'p(Xi=1|Z=1)', 'query_node': 'Xi=1'},
 'c': {'observation': 'Z=1, Yj=0',
       'query': 'p(Xi=1|Z=1, Yj=0)',
       'query_node': 'Xi=1'},
 'd': {'observation': 'Yj=1', 'query': 'p(Xi=1|Yj=1)', 'query_node': 'Xi=1'},
 'e': {'observation': 'Yj=0', 'query': 'p(Xi=1|Yj=0)', 'query_node': 'Xi=1'},
 'f': {'observation': 'Z=0, Yj=1',
       'query': 'p(Xi=1|Z=0, Yj=1)',
       'query_node': 'Xi=1'},
 'g': {'observation': 'Z=0', 'query': 'p(Xi=1|Z=0)', 'query_node': 'Xi=1'},
 'h': {'observation': 'Z=0, Yj=0',
       'query': 'p(Xi=1|Z=0, Yj=0)',
       'query_node': 'Xi=1'},
 'i': {'observation': 'Xi=0, Yj=0',
       'query': 'p(Z=1|Xi=0, Yj=0)',
       'query_node': 'Z=1'},
 'j': {'observation': 'Xi=0, Yj=1',
       'query': 'p(Z=1|Xi=0, Yj=1)',
       'query_node': 'Z=1'},
 'k': {'observation': 'Xi=1, Yj=1',
       'query': 'p(Z=1|Xi=1, Yj=1)',
       

Now, let's explore how these components are combined by re-creating the prompts used in RW17 starting with re-creating the dictionaries that are stored in ``constants.py``!


# Step 1: Create Domain Dictionray:


In [None]:
# Example usage with enforcement of "normal" for zero values
try:
    economy_test_dict = create_domain_dict(
        domain="economy",
        introduction="Economists seek to describe and predict the regular patterns of economic fluctuation. To do this, they study some important variables or attributes of economies. They also study how these variables are responsible for producing or causing one another.",
        X_name="interest rates",
        X_detailed="Interest rates are the rates banks charge to loan money.",
        X_range_intro =f"Some economies have <tag> interest rates. Others have <tag> interest rates."
        X_values={"1": "low", "0": "high"},
        Y_name="trade deficits",
        Y_detailed="A country's trade deficit...",
        Y_values={"1": "small", "0": "large"},
        Z_name="retirement savings",
        Z_detailed="Retirement savings is the money people save for their retirement.",
        Z_values={"1": "high", "0": "low"},
        counterbalance_enabled=True,
        enforce_zero_label=True,  # Ensures '0' is verbalized as 'normal'
        zero_label="normal",
    )

    # Convert dictionary to a readable JSON format for display
    import json

    formatted_output = json.dumps(economy_test_dict, indent=4)
    print(formatted_output)

except Exception as e:
    print(f"Error: {e}")

{
    "domain_name": "economy",
    "introduction": "Economists seek to describe and predict economic fluctuations...",
    "variables": {
        "X": {
            "X_name": "interest rates",
            "X_detailed": "Interest rates are the rates banks charge to loan money.",
            "p_value": {
                "1": "low",
                "0": "normal"
            },
            "m_value": {
                "1": "high",
                "0": "normal"
            }
        },
        "Y": {
            "Y_name": "trade deficits",
            "Y_detailed": "A country's trade deficit...",
            "p_value": {
                "1": "small",
                "0": "normal"
            },
            "m_value": {
                "1": "large",
                "0": "normal"
            }
        },
        "Z": {
            "Z_name": "retirement savings",
            "Z_detailed": "Retirement savings is the money people save for their retirement.",
            "p_value": {
           

In [6]:
# Create individual domains
economy_domain_dict = create_domain_dict(
    domain="economy",
    introduction="Economists seek to describe and predict the regular patterns of economic fluctuation. To do this, they study some important variables or attributes of economies. They also study how these attributes are responsible for producing or causing one another.",
    X_name="interest rates",
    X_detailed="Interest rates are the rates banks charge to loan money.",
    X_values={"1": "low", "0": "high"},
    Y_name="trade deficits",
    Y_detailed="A country's trade deficit is the difference between the value of the goods that a country imports and the value of the goods that a country exports.",
    Y_values={"1": "small", "0": "large"},
    Z_name="retirement savings",
    Z_detailed="Retirement savings is the money people save for their retirement.",
    Z_values={"1": "high", "0": "low"},
    counterbalance_enabled=True,
    enforce_zero_label=True,  # Ensures '0' is verbalized as 'normal'
    zero_label="normal",
)

sociology_domain_dict = create_domain_dict(
    domain="sociology",
    introduction="Sociologists seek to describe and predict the regular patterns of societal interactions. To do this, they study some important variables or attributes of societies. They also study how these attributes are responsible for producing or causing one another.",
    X_name="urbanization",
    X_detailed="Urbanization is the degree to which the members of a society live in urban environments (i.e., cities) versus rural environments.",
    X_values={"1": "high", "0": "low"},
    Y_name="interest in religion",
    Y_detailed="Interest in religion is the degree to which the members of a society show a curiosity in religion issues or participate in organized religions.",
    Y_values={"1": "low", "0": "high"},
    Z_name="socio-economic mobility",
    Z_detailed="Socioeconomic mobility is the degree to which the members of a society are able to improve their social and economic status.",
    Z_values={"1": "high", "0": "low"},
    counterbalance_enabled=True,
    enforce_zero_label=True,  # Ensures '0' is verbalized as 'normal'
    zero_label="normal",
)

weather_domain_dict = create_domain_dict(
    domain="weather",
    introduction="Meteorologists seek to describe and predict the regular patterns that govern weather systems. To do this, they study some important variables or attributes of weather systems. They also study how these attributes are responsible for producing or causing one another.",
    X_name="ozone levels",
    X_detailed="Ozone is a gaseous allotrope of oxygen (O3) and is formed by exposure to UV radiation.",
    X_values={"1": "high", "0": "low"},
    Y_name="air pressure",
    Y_detailed="Air pressure is force exerted due to concentrations of air molecules.",
    Y_values={"1": "low", "0": "high"},
    Z_name="humidity",
    Z_detailed="Humidity is the degree to which the atmosphere contains water molecules.",
    Z_values={"1": "high", "0": "low"},
    counterbalance_enabled=True,
    enforce_zero_label=True,  # Ensures '0' is verbalized as 'normal'
    zero_label="normal",
)


This should re-create our rw17 dictionary. Let's verify this.

In [7]:
pprint.pprint(economy_domain_dict)

{'domain_name': 'economy',
 'introduction': 'Economists seek to describe and predict the regular patterns '
                 'of economic fluctuation. To do this, they study some '
                 'important variables or attributes of economies. They also '
                 'study how these attributes are responsible for producing or '
                 'causing one another.',
 'variables': {'X': {'X_detailed': 'Interest rates are the rates banks charge '
                                   'to loan money.',
                     'X_name': 'interest rates',
                     'm_value': {'0': 'normal', '1': 'high'},
                     'p_value': {'0': 'normal', '1': 'low'}},
               'Y': {'Y_detailed': "A country's trade deficit is the "
                                   'difference between the value of the goods '
                                   'that a country imports and the value of '
                                   'the goods that a country exports.',
             

## Step 2: Expand the Dictionaries into Prompt Components in Dataframe

In [8]:
# create a dataframe for each domain and then append them together
economy_df = expand_domain_to_dataframe(
    economy_domain_dict,
)
sociology_df = expand_domain_to_dataframe(
    sociology_domain_dict,
)
weather_df = expand_domain_to_dataframe(
    weather_domain_dict,
)
sociology_df

Unnamed: 0,domain,X,X_values,X_cntbl,X_sense,X_detailed,Y,Y_values,Y_cntbl,Y_sense,Y_detailed,Z,Z_values,Z_cntbl,Z_sense,Z_detailed,cntbl_cond
0,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,p,low,Interest in religion is the degree to which th...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,ppp
1,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,p,low,Interest in religion is the degree to which th...,socio-economic mobility,1,m,high,Socioeconomic mobility is the degree to which ...,ppm
2,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,m,high,Interest in religion is the degree to which th...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,pmp
3,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,m,high,Interest in religion is the degree to which th...,socio-economic mobility,1,m,high,Socioeconomic mobility is the degree to which ...,pmm
4,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,p,low,Interest in religion is the degree to which th...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,mpp
5,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,p,low,Interest in religion is the degree to which th...,socio-economic mobility,1,m,high,Socioeconomic mobility is the degree to which ...,mpm
6,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,m,high,Interest in religion is the degree to which th...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,mmp
7,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,m,high,Interest in religion is the degree to which th...,socio-economic mobility,1,m,high,Socioeconomic mobility is the degree to which ...,mmm


### Let's look at the dataframe:

In [9]:
print(
    f"Each domain dataframe now has {len(weather_df)} rows  \n and the following columns: \n {weather_df.columns}."
)
print(f"Unique counterbalance conditions: {weather_df['cntbl_cond'].unique()}")
weather_df

Each domain dataframe now has 8 rows  
 and the following columns: 
 Index(['domain', 'X', 'X_values', 'X_cntbl', 'X_sense', 'X_detailed', 'Y',
       'Y_values', 'Y_cntbl', 'Y_sense', 'Y_detailed', 'Z', 'Z_values',
       'Z_cntbl', 'Z_sense', 'Z_detailed', 'cntbl_cond'],
      dtype='object').
Unique counterbalance conditions: ['ppp' 'ppm' 'pmp' 'pmm' 'mpp' 'mpm' 'mmp' 'mmm']


Unnamed: 0,domain,X,X_values,X_cntbl,X_sense,X_detailed,Y,Y_values,Y_cntbl,Y_sense,Y_detailed,Z,Z_values,Z_cntbl,Z_sense,Z_detailed,cntbl_cond
0,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,Air pressure is force exerted due to concentra...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,ppp
1,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,Air pressure is force exerted due to concentra...,humidity,1,m,high,Humidity is the degree to which the atmosphere...,ppm
2,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,Air pressure is force exerted due to concentra...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,pmp
3,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,Air pressure is force exerted due to concentra...,humidity,1,m,high,Humidity is the degree to which the atmosphere...,pmm
4,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,Air pressure is force exerted due to concentra...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,mpp
5,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,Air pressure is force exerted due to concentra...,humidity,1,m,high,Humidity is the degree to which the atmosphere...,mpm
6,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,Air pressure is force exerted due to concentra...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,mmp
7,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,Air pressure is force exerted due to concentra...,humidity,1,m,high,Humidity is the degree to which the atmosphere...,mmm


In [10]:
economy_df

Unnamed: 0,domain,X,X_values,X_cntbl,X_sense,X_detailed,Y,Y_values,Y_cntbl,Y_sense,Y_detailed,Z,Z_values,Z_cntbl,Z_sense,Z_detailed,cntbl_cond
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,A country's trade deficit is the difference be...,retirement savings,1,p,high,Retirement savings is the money people save fo...,ppp
1,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,A country's trade deficit is the difference be...,retirement savings,1,m,large,Retirement savings is the money people save fo...,ppm
2,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,A country's trade deficit is the difference be...,retirement savings,1,p,high,Retirement savings is the money people save fo...,pmp
3,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,A country's trade deficit is the difference be...,retirement savings,1,m,large,Retirement savings is the money people save fo...,pmm
4,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,A country's trade deficit is the difference be...,retirement savings,1,p,high,Retirement savings is the money people save fo...,mpp
5,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,A country's trade deficit is the difference be...,retirement savings,1,m,large,Retirement savings is the money people save fo...,mpm
6,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,A country's trade deficit is the difference be...,retirement savings,1,p,high,Retirement savings is the money people save fo...,mmp
7,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,A country's trade deficit is the difference be...,retirement savings,1,m,large,Retirement savings is the money people save fo...,mmm



### How to derive the the off-sense of a variable:

Note that the function `expand_df_by_task_queries()` hands us back the necessary building blocks for verbalizing the entire prompt for every single of the 8 possible counterbalance conditions and every one of the 20 inference tasks. (Hence, the function `expand_df_by_task_queries()` will return a  dataframe with num_cntbl_conds * num_tasks, i.e. here: $8\cdot 20 = 160$ rows. Importanlty, the values for each variable node C1, C2, and E are only *1*, the variable-senses (high, low, small, etc.) for *0* are *not* contained in the dataframe. We have to look up the off-values (0) in the nested dictionary created by `create_domain_dict()`. The function `verbalize_inference_task()` does take care of this, when we want to verbalize an inference task that contains an off-value.

Below, you can verify for yourself that the function `verbalize_inference_task` correclty looks up off-variables  in the nested dictionary returnd  by `create_domain_dict()`. We first create the "building blocks"-df for the inference task verablization with `expand_df_by_task_queries`.

In [11]:
# create building blocks for inference task verbalization

econ_df_tasks_ex = expand_df_by_task_queries(economy_df, inference_tasks_rw17)
print(f"unique counterbalance values: {econ_df_tasks_ex['cntbl_cond'].unique()}")
econ_df_tasks_ex

unique counterbalance values: ['ppp' 'ppm' 'pmp' 'pmm' 'mpp' 'mpm' 'mmp' 'mmm']


Unnamed: 0,domain,X,X_values,X_cntbl,X_sense,X_detailed,Y,Y_values,Y_cntbl,Y_sense,...,Z,Z_values,Z_cntbl,Z_sense,Z_detailed,cntbl_cond,task,query_node,observation,query
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,retirement savings,1,p,high,Retirement savings is the money people save fo...,ppp,a,X=1,"Z=1, Y=1","p(X=1|Z=1, Y=1)"
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,retirement savings,1,p,high,Retirement savings is the money people save fo...,ppp,a,Y=1,"Z=1, X=1","p(Y=1|Z=1, X=1)"
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,retirement savings,1,p,high,Retirement savings is the money people save fo...,ppp,b,X=1,Z=1,p(X=1|Z=1)
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,retirement savings,1,p,high,Retirement savings is the money people save fo...,ppp,b,Y=1,Z=1,p(Y=1|Z=1)
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,retirement savings,1,p,high,Retirement savings is the money people save fo...,ppp,c,X=1,"Z=1, Y=0","p(X=1|Z=1, Y=0)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,...,retirement savings,1,m,large,Retirement savings is the money people save fo...,mmm,h,Y=1,"Z=0, X=0","p(Y=1|Z=0, X=0)"
7,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,...,retirement savings,1,m,large,Retirement savings is the money people save fo...,mmm,i,Z=1,"X=0, Y=0","p(Z=1|X=0, Y=0)"
7,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,...,retirement savings,1,m,large,Retirement savings is the money people save fo...,mmm,j,Z=1,"X=0, Y=1","p(Z=1|X=0, Y=1)"
7,economy,interest rates,1,m,high,Interest rates are the rates banks charge to l...,trade deficits,1,m,large,...,retirement savings,1,m,large,Retirement savings is the money people save fo...,mmm,j,Z=1,"Y=0, X=1","p(Z=1|Y=0, X=1)"


In [12]:
weather_df_tasks_ex = expand_df_by_task_queries(weather_df, inference_tasks_rw17)
print(f"unique counterbalance values: {weather_df_tasks_ex['cntbl_cond'].unique()}")
weather_df_tasks_ex

unique counterbalance values: ['ppp' 'ppm' 'pmp' 'pmm' 'mpp' 'mpm' 'mmp' 'mmm']


Unnamed: 0,domain,X,X_values,X_cntbl,X_sense,X_detailed,Y,Y_values,Y_cntbl,Y_sense,...,Z,Z_values,Z_cntbl,Z_sense,Z_detailed,cntbl_cond,task,query_node,observation,query
0,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,ppp,a,X=1,"Z=1, Y=1","p(X=1|Z=1, Y=1)"
0,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,ppp,a,Y=1,"Z=1, X=1","p(Y=1|Z=1, X=1)"
0,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,ppp,b,X=1,Z=1,p(X=1|Z=1)
0,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,ppp,b,Y=1,Z=1,p(Y=1|Z=1)
0,weather,ozone levels,1,p,high,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,p,low,...,humidity,1,p,high,Humidity is the degree to which the atmosphere...,ppp,c,X=1,"Z=1, Y=0","p(X=1|Z=1, Y=0)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,humidity,1,m,high,Humidity is the degree to which the atmosphere...,mmm,h,Y=1,"Z=0, X=0","p(Y=1|Z=0, X=0)"
7,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,humidity,1,m,high,Humidity is the degree to which the atmosphere...,mmm,i,Z=1,"X=0, Y=0","p(Z=1|X=0, Y=0)"
7,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,humidity,1,m,high,Humidity is the degree to which the atmosphere...,mmm,j,Z=1,"X=0, Y=1","p(Z=1|X=0, Y=1)"
7,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,humidity,1,m,high,Humidity is the degree to which the atmosphere...,mmm,j,Z=1,"Y=0, X=1","p(Z=1|Y=0, X=1)"


In [13]:
sociology_df_tasks_ex = expand_df_by_task_queries(sociology_df, inference_tasks_rw17)
print(f"unique counterbalance values: {sociology_df_tasks_ex['cntbl_cond'].unique()}")
sociology_df_tasks_ex

unique counterbalance values: ['ppp' 'ppm' 'pmp' 'pmm' 'mpp' 'mpm' 'mmp' 'mmm']


Unnamed: 0,domain,X,X_values,X_cntbl,X_sense,X_detailed,Y,Y_values,Y_cntbl,Y_sense,...,Z,Z_values,Z_cntbl,Z_sense,Z_detailed,cntbl_cond,task,query_node,observation,query
0,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,p,low,...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,ppp,a,X=1,"Z=1, Y=1","p(X=1|Z=1, Y=1)"
0,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,p,low,...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,ppp,a,Y=1,"Z=1, X=1","p(Y=1|Z=1, X=1)"
0,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,p,low,...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,ppp,b,X=1,Z=1,p(X=1|Z=1)
0,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,p,low,...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,ppp,b,Y=1,Z=1,p(Y=1|Z=1)
0,sociology,urbanization,1,p,high,Urbanization is the degree to which the member...,interest in religion,1,p,low,...,socio-economic mobility,1,p,high,Socioeconomic mobility is the degree to which ...,ppp,c,X=1,"Z=1, Y=0","p(X=1|Z=1, Y=0)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,m,high,...,socio-economic mobility,1,m,high,Socioeconomic mobility is the degree to which ...,mmm,h,Y=1,"Z=0, X=0","p(Y=1|Z=0, X=0)"
7,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,m,high,...,socio-economic mobility,1,m,high,Socioeconomic mobility is the degree to which ...,mmm,i,Z=1,"X=0, Y=0","p(Z=1|X=0, Y=0)"
7,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,m,high,...,socio-economic mobility,1,m,high,Socioeconomic mobility is the degree to which ...,mmm,j,Z=1,"X=0, Y=1","p(Z=1|X=0, Y=1)"
7,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,m,high,...,socio-economic mobility,1,m,high,Socioeconomic mobility is the degree to which ...,mmm,j,Z=1,"Y=0, X=1","p(Z=1|Y=0, X=1)"


Next, we verbalize a row from the above dataframe `econ_df_tasks_ex` that contains in the observation column "E=0, C1=0". You can verify for yourself that the function  `verbalize_inference_task` correctly looks up off-values in the dictionary `economy_domain_dict`.



## Custom Response prompt instructions:


Please provide your response in the following strict XML format without any additional text or explanation:
```
<response>
    <likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood>
    <confidence>YOUR_CONFIDENCE_SCORE_HERE</confidence>
</response>

```

Only replace YOUR_NUMERIC_RESPONSE_HERE with a number between 0 (unlikely) and 100 (very likely), and YOUR_CONFIDENCE_SCORE_HERE with a number between 0 (very uncertain) and 100 (very certain). Do not include any explanations or text outside the XML format.


In [14]:
xml_format_numeric_only = (
    "<response><likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood></response>"
)
xml_explanation_numeric_only = (
    "Replace YOUR_NUMERIC_RESPONSE_HERE with your likelihood estimate between 0 (very unlikely) and 100 (very likely). "
    # "Replace YOUR_CONFIDENCE_SCORE_HERE with a number between 0 (very uncertain) and 100 (very certain), indicating how confident you are in your likelihood estimate. "
    "DO NOT include any other information, explanation, or formatting in your response. "
    "DO NOT use Markdown, code blocks, quotation marks, or special characters."
    # "Only return the XML as raw text in a single line."
)
prompt_type_xml_numeric_only = (
    "Return your response as raw text in one single line using this exact XML format: "
    + xml_format_numeric_only
    + " "
    # + "DO NOT use Markdown, code blocks, or any additional formatting."
    # "Return your response in a single line, without line breaks within the following XML format. Do NOT use Markdown, code blocks, or any additional formatting. DO NOT add quotation marks around the response. DO NOT provide any additional information. Output must be in the exact XML format below, inline "
    # "Please provide your response in the following strict XML format. Provide your response in a single line. DO NOT provide any additional information outside the  formatting like line breaks and ensure the XML is returned as raw text without any code markers e.g., ```xml or quotation marks: "
    # + xml_format_numeric_only
    + " "
    + xml_explanation_numeric_only
)


##### Next, we'll call `generate_prompt_dataframe()` for each domain dictionary. 
We'll start with one domain dictionary and after that, we'll loop over the remaining domain dictionaries
and append the resulting dataframes.

In [15]:
# create an empty dataframe to append the domain dataframes to
econ_complete_df = generate_prompt_dataframe(
    domain_dict=economy_domain_dict,
    inference_tasks=inference_tasks_rw17,
    graph_type="collider",
    graph_structures=graph_structures,
    counterbalance_enabled=True,
    prompt_category="numeric-only",
    prompt_type=prompt_type_xml_numeric_only,
)

# list of remaining 2 omain dictionaries
domain_dicts = [sociology_domain_dict, weather_domain_dict]


# we'll start with the completed economy dataframe and append the other domain dataframes to it
rw_17_over_complete_df = (
    econ_complete_df.copy()  # over complete means, it has all 8 possible counterbalance conditions.
    # we'll subset later for the 4 counterbalance conditions that were used
)


##### now loop over remaining domain dicts and append them to the one created above

In [16]:
for dict in domain_dicts:
    # for dict in rw_17_domain_components.values():
    df = generate_prompt_dataframe(
        domain_dict=dict,
        inference_tasks=inference_tasks_rw17,
        graph_type="collider",
        graph_structures=graph_structures,
        counterbalance_enabled=True,
        prompt_category="numeric-certainty",
        prompt_type=prompt_type_xml_numeric_only,
    )
    # append the new dataframe to the complete dataframe
    rw_17_over_complete_df = append_dfs(rw_17_over_complete_df, df)

### Subsetting for the Counterbalance conditions used in RW17
- and adding unique id.

In [17]:
# add a unique id to each row
rw_17_over_complete_df["id"] = range(1, len(rw_17_over_complete_df) + 1)

# subset for the counterbalance conditions used in the RW17 study
select_contbl_cond_xs = ["ppp", "pmm", "mmp", "mpm"]
print(
    f"unique counterbalance conditions used: {rw_17_over_complete_df['cntbl_cond'].unique()}"
)

unique counterbalance conditions used: ['ppp' 'ppm' 'pmp' 'pmm' 'mpp' 'mpm' 'mmp' 'mmm']


In [18]:
rw_17_complete_df = rw_17_over_complete_df[
    rw_17_over_complete_df["cntbl_cond"].isin(select_contbl_cond_xs)
]

In [19]:
rw_17_over_complete_df[rw_17_over_complete_df["id"] == 275]

Unnamed: 0,domain,X,X_values,X_cntbl,X_sense,X_detailed,Y,Y_values,Y_cntbl,Y_sense,...,Z_detailed,cntbl_cond,task,query_node,observation,query,graph,prompt,prompt_category,id
274,sociology,urbanization,1,m,low,Urbanization is the degree to which the member...,interest in religion,1,p,low,...,Socioeconomic mobility is the degree to which ...,mpm,h,X=1,"Z=0, Y=0","p(X=1|Z=0, Y=0)",collider,Sociologists seek to describe and predict the ...,numeric-certainty,275


In [20]:
print(len(rw_17_complete_df))
print(rw_17_complete_df.columns)
print(rw_17_complete_df["task"].unique())
print(rw_17_complete_df["cntbl_cond"].unique())
print(rw_17_complete_df["domain"].unique())


240
Index(['domain', 'X', 'X_values', 'X_cntbl', 'X_sense', 'X_detailed', 'Y',
       'Y_values', 'Y_cntbl', 'Y_sense', 'Y_detailed', 'Z', 'Z_values',
       'Z_cntbl', 'Z_sense', 'Z_detailed', 'cntbl_cond', 'task', 'query_node',
       'observation', 'query', 'graph', 'prompt', 'prompt_category', 'id'],
      dtype='object')
['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k']
['ppp' 'pmm' 'mpm' 'mmp']
['economy' 'sociology' 'weather']


In [21]:
# # display the entire dataframe cell content
# pd.set_option("display.max_colwidth", None)
# pd.set_option("display.max_rows", None)

# subset_df[["domain", "graph", "cntbl_cond", "prompt"]]

In [22]:
# load human data
# import human data
human_data = pd.read_csv("../data/17_rw/human_data/rw16_ce.csv", sep=";")
# print the columns
print(human_data.columns)
print(human_data["label"].unique())

Index(['s', 'domain', 'model', 'diag', 'ppp', 'task', 'type', 'wt', 'y',
       'y_length', 'label', 'y.hat', 'y.hat.pre', 'diff', 'y.hat.prescaled'],
      dtype='object')
['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k']


In [23]:
## rename the columns
human_data["domain"].unique()
human_data["domain"] = human_data["domain"].replace({"society": "sociology"})
# rename ppp column with cntbl_cond
human_data = human_data.rename(columns={"ppp": "cntbl_cond"})
# rename s with subj_id
human_data = human_data.rename(columns={"s": "human_subj_id"})  # in rw17
## add column subject that contains humans everywhere
human_data["subject"] = "human"
# add a column graph that contains collider everywhere
human_data["graph"] = "collider"
# drop task column
human_data = human_data.drop(columns=["task"])
# rename label with task
human_data = human_data.rename(columns={"label": "task"})
# rename column y with response
human_data = human_data.rename(columns={"y": "response"})
# drop the following columns: y.hat', 'y.hat.pre', 'diff', 'y.hat.prescaled' , 'wt', label, type, diag, model
human_data = human_data.drop(
    columns=[
        "y.hat",
        "y.hat.pre",
        "diff",
        "y.hat.prescaled",
        "wt",
        "type",
        "diag",
        "model",
    ]
)
# print the columns
# rename y_length with num_responses_agg
human_data = human_data.rename(columns={"y_length": "num_responses_agg"})


### Save cleaned, raw  human Data 

In [24]:
# save humna data
human_data.to_csv("../datasets/17_rw/human_data/human_cleaned_coll.csv", index=False)
print(human_data.columns)

Index(['human_subj_id', 'domain', 'cntbl_cond', 'response',
       'num_responses_agg', 'task', 'subject', 'graph'],
      dtype='object')


In [25]:
# now merge the all_domains_df with  human data on the columns: domain, task, cntbl_cond
# merge the dataframes
merged_df = pd.merge(
    rw_17_complete_df, human_data, on=["domain", "task", "cntbl_cond", "graph"]
)
merged_df["response"] = merged_df["response"].str.replace(",", ".").astype(float)
# print the columns
print(merged_df.columns)

Index(['domain', 'X', 'X_values', 'X_cntbl', 'X_sense', 'X_detailed', 'Y',
       'Y_values', 'Y_cntbl', 'Y_sense', 'Y_detailed', 'Z', 'Z_values',
       'Z_cntbl', 'Z_sense', 'Z_detailed', 'cntbl_cond', 'task', 'query_node',
       'observation', 'query', 'graph', 'prompt', 'prompt_category', 'id',
       'human_subj_id', 'response', 'num_responses_agg', 'subject'],
      dtype='object')


### Let's have a look at the merged dataframe that now reveals which prompt corresponds to each human response

- Since we have 4 subjects for each of the 240 unique prompts emerging from the 3 domains, 4 counterbalance conditions and 20 tasks, we should expect the merged data frame to have $240\cdot 4 = 960$ rows. 
- Let's double check that too.

In [26]:
print(
    f"the oringinal rw17_complete_df has {len(rw_17_complete_df)} rows and the merged_df has {len(merged_df)} rows."
)

merged_df

the oringinal rw17_complete_df has 240 rows and the merged_df has 960 rows.


Unnamed: 0,domain,X,X_values,X_cntbl,X_sense,X_detailed,Y,Y_values,Y_cntbl,Y_sense,...,observation,query,graph,prompt,prompt_category,id,human_subj_id,response,num_responses_agg,subject
0,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,"Z=1, Y=1","p(X=1|Z=1, Y=1)",collider,Economists seek to describe and predict the re...,numeric-only,1,0,82.5,2,human
1,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,"Z=1, Y=1","p(X=1|Z=1, Y=1)",collider,Economists seek to describe and predict the re...,numeric-only,1,25,72.5,2,human
2,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,"Z=1, Y=1","p(X=1|Z=1, Y=1)",collider,Economists seek to describe and predict the re...,numeric-only,1,58,65.0,2,human
3,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,"Z=1, Y=1","p(X=1|Z=1, Y=1)",collider,Economists seek to describe and predict the re...,numeric-only,1,69,50.0,2,human
4,economy,interest rates,1,p,low,Interest rates are the rates banks charge to l...,trade deficits,1,p,small,...,"Z=1, X=1","p(Y=1|Z=1, X=1)",collider,Economists seek to describe and predict the re...,numeric-only,2,0,82.5,2,human
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
955,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,"X=1, Y=1","p(Z=1|X=1, Y=1)",collider,Meteorologists seek to describe and predict th...,numeric-certainty,460,33,100.0,1,human
956,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,"X=1, Y=1","p(Z=1|X=1, Y=1)",collider,Meteorologists seek to describe and predict th...,numeric-certainty,460,57,100.0,1,human
957,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,"X=1, Y=1","p(Z=1|X=1, Y=1)",collider,Meteorologists seek to describe and predict th...,numeric-certainty,460,72,90.0,1,human
958,weather,ozone levels,1,m,low,Ozone is a gaseous allotrope of oxygen (O3) an...,air pressure,1,m,high,...,"X=1, Y=1","p(Z=1|X=1, Y=1)",collider,Meteorologists seek to describe and predict th...,numeric-certainty,460,77,75.0,1,human


In [27]:
# display maximal column width in dataframe
# # display all rows and columns
# pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
# # display the entire prompt column
pd.set_option("display.max_colwidth", None)

prompt_category = merged_df["prompt_category"].unique()[0]
print(f"prompt category: {prompt_category}")

graph_type = merged_df["graph"].unique()[0]
print(f"graph type: {graph_type}")
# # return the entire row for id = 275
merged_df[merged_df["id"] == 275]

prompt category: numeric-only
graph type: collider


Unnamed: 0,domain,X,X_values,X_cntbl,X_sense,X_detailed,Y,Y_values,Y_cntbl,Y_sense,Y_detailed,Z,Z_values,Z_cntbl,Z_sense,Z_detailed,cntbl_cond,task,query_node,observation,query,graph,prompt,prompt_category,id,human_subj_id,response,num_responses_agg,subject
548,sociology,urbanization,1,m,low,"Urbanization is the degree to which the members of a society live in urban environments (i.e., cities) versus rural environments.",interest in religion,1,p,low,Interest in religion is the degree to which the members of a society show a curiosity in religion issues or participate in organized religions.,socio-economic mobility,1,m,high,Socioeconomic mobility is the degree to which the members of a society are able to improve their social and economic status.,mpm,h,X=1,"Z=0, Y=0","p(X=1|Z=0, Y=0)",collider,"Sociologists seek to describe and predict the regular patterns of societal interactions. To do this, they study some important variables or attributes of societies. They also study how these attributes are responsible for producing or causing one another.low urbanization causes high socio-economic mobility. Also, low interest in religion causes high socio-economic mobility. You are currently observing: normal socio-economic mobility, normal interest in religion. Your task is to estimate how likely it is that low urbanization are present on a scale from 0 to 100, given the observations and causal relationships described. 0 means completely unlikely and 100 means completely certain. Return your response as raw text in one single line using this exact XML format: <response><likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood></response> Replace YOUR_NUMERIC_RESPONSE_HERE with your likelihood estimate between 0 (very unlikely) and 100 (very likely). DO NOT include any other information, explanation, or formatting in your response. DO NOT use Markdown, code blocks, quotation marks, or special characters.",numeric-certainty,275,71,25.0,2,human
549,sociology,urbanization,1,m,low,"Urbanization is the degree to which the members of a society live in urban environments (i.e., cities) versus rural environments.",interest in religion,1,p,low,Interest in religion is the degree to which the members of a society show a curiosity in religion issues or participate in organized religions.,socio-economic mobility,1,m,high,Socioeconomic mobility is the degree to which the members of a society are able to improve their social and economic status.,mpm,h,X=1,"Z=0, Y=0","p(X=1|Z=0, Y=0)",collider,"Sociologists seek to describe and predict the regular patterns of societal interactions. To do this, they study some important variables or attributes of societies. They also study how these attributes are responsible for producing or causing one another.low urbanization causes high socio-economic mobility. Also, low interest in religion causes high socio-economic mobility. You are currently observing: normal socio-economic mobility, normal interest in religion. Your task is to estimate how likely it is that low urbanization are present on a scale from 0 to 100, given the observations and causal relationships described. 0 means completely unlikely and 100 means completely certain. Return your response as raw text in one single line using this exact XML format: <response><likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood></response> Replace YOUR_NUMERIC_RESPONSE_HERE with your likelihood estimate between 0 (very unlikely) and 100 (very likely). DO NOT include any other information, explanation, or formatting in your response. DO NOT use Markdown, code blocks, quotation marks, or special characters.",numeric-certainty,275,99,25.0,2,human


## Saving Dataframes to csv

### To prompt the the LLMs, we only need the id and prompt.
- to not confuse the LLMs, we should group them by counterbalance condition when interacting with them through the website.
- however, the default when interacting with them throught their respective APIs is that they're stateless, meaning they don't retain anything from the previous context.
    - this is why we don't have to worry about having the different counterbalance conditions contradict each other and hence confuse the LLMs


In [28]:
# this has already been saved in the collider numeric-certainty notebook
# drop text verbalization columns from merged df
# merged_df = merged_df.drop(
#     columns=["C1_detailed", "C2_detailed", "E_detailed", "prompt"]
# )
# # save merged data
# merged_df.to_csv(
#     f"../../../results/17_rw/humans/humans_responses_w_prompt_id_{graph_type}_{prompt_type}.csv",
#     index=False,
#     sep=";",
# )
# print(merged_df.columns)


### Saving the Prompts for LLMs

Numeric-only version

In [29]:
version = "4"
# save for prompting LLMs
LLM_prompting_df = rw_17_complete_df[
    ["id", "prompt", "prompt_category", "graph", "domain", "cntbl_cond", "task"]
]
prompt_category = LLM_prompting_df["prompt_category"].unique()[0]
graph_type = LLM_prompting_df["graph"].unique()[0]
print(prompt_category, graph_type)
# LLM_prompting_df.to_csv(
#     f"../datasets/17_rw/prompts_for_LLM_api/{version}_v_{prompt_category}_LLM_prompting_{graph_type}.csv",
#     index=False,
# )

print(
    f"Prompts for LLMs for version {version}, graph = {graph_type}, and prompt type = {prompt_category} saved successfully!"
)
print("prompt data file has the following columns:")
print(LLM_prompting_df.columns)

numeric-only collider
Prompts for LLMs for version 4, graph = collider, and prompt type = numeric-only saved successfully!
prompt data file has the following columns:
Index(['id', 'prompt', 'prompt_category', 'graph', 'domain', 'cntbl_cond',
       'task'],
      dtype='object')


### New Human Collider Data (non-aggregated)

In [30]:
human_data_new_ce = pd.read_csv(
    "../data/17_rw/human_data/rw17_collider_ce.csv", sep=";"
)
# rename colum task as label
human_data_new_ce.rename(columns={"letter.type": "label"}, inplace=True)
# print the columns
# turn label into lower case
human_data_new_ce["label"] = human_data_new_ce["label"].str.lower()
# rename attr.polarity to ppp
human_data_new_ce.rename(columns={"attr.polarity": "ppp"}, inplace=True)
# drop task
human_data_new_ce.drop(columns=["task"], inplace=True)
print(human_data_new_ce.columns)

print(human_data_new_ce["label"].unique())
human_data_new_ce.head()


Index(['s', 'domain', 'diagram', 'ppp', 'label', 'study.type', 'trial', 'rt',
       'y', 'type', 'aggr.type', 'betw.factors'],
      dtype='object')
['d' 'j' 'b' 'f' 'c' 'k' 'h' 'e' 'a' 'g' 'i']


Unnamed: 0,s,domain,diagram,ppp,label,study.type,trial,rt,y,type,aggr.type,betw.factors
0,0,economy,a,ppp,d,Y|X=1,0,38788,65,CB|CA=1,Ci|Cj=1,"domain, attr.polarity, diagram, letter.type"
1,0,economy,a,ppp,j,"Z|X=0,Y=1",1,16532,80,"E|CA=0,CB=1","E|Ci=0,Cj=1","domain, attr.polarity, diagram, letter.type"
2,0,economy,a,ppp,b,X|Z=1,2,10979,90,CA|E=1,Ci|E=1,"domain, attr.polarity, diagram, letter.type"
3,0,economy,a,ppp,f,"X|Y=1,Z=0",3,17956,55,"CA|E=0,CB=1","Ci|E=0,Cj=1","domain, attr.polarity, diagram, letter.type"
4,0,economy,a,ppp,c,"Y|X=0,Z=1",4,954,60,"CB|E=1,CA=0","Ci|E=1,Cj=0","domain, attr.polarity, diagram, letter.type"


### we need to get the human data into the followin format (mostly column renaming):

(['human_subj_id', 'domain', 'cntbl_cond', 'response',
       'num_responses_agg', 'task', 'subject', 'graph'],
      dtype='object')

In [None]:
# rename the columns
human_data_new_ce.rename(
    columns={
        "y": "response",
        "label": "task",
        "s": "human_subj_id",
        "ppp": "cntbl_cond",
    },
    inplace=True,
)

# and drop all other columns

human_data_new_ce = human_data_new_ce[
    [
        "response",
        "task",
        "human_subj_id",
        "cntbl_cond",
        # "type",
        "domain",
    ]
]


# renaming and new columns
human_data_new_ce["domain"] = human_data_new_ce["domain"].replace(
    {"society": "sociology"}
)

human_data_new_ce["subject"] = "human"
# add a column graph that contains collider everywhere
human_data_new_ce["graph"] = "collider"


# turn response into floats

human_data_new_ce["response"] = (
    human_data_new_ce["response"].replace(",", ".", regex=True).astype(float)
)

In [None]:
print(f"uniue domain values: {human_data_new_ce['domain'].unique()}")
print(f"uniue cntbl_cond values: {human_data_new_ce['cntbl_cond'].unique()}")
print(f"uniue task values: {human_data_new_ce['task'].unique()}")
# print(f"uniue type values: {human_data_new_ce['type'].unique()}")
print(f" unique response values: {human_data_new_ce['response'].unique()}")
# save human data
human_data_new_ce.head()

uniue domain values: ['economy' 'sociology' 'weather']
uniue cntbl_cond values: ['ppp' 'pmm' 'mmp' 'mpm']
uniue task values: ['d' 'j' 'b' 'f' 'c' 'k' 'h' 'e' 'a' 'g' 'i']
 unique response values: [ 65.  80.  90.  55.  60.  75.  85.  45.  50.  95.  40.  30.  20. 100.
  70.  15.   0.  10.  35.  25.   5.]


Unnamed: 0,response,task,human_subj_id,cntbl_cond,domain,subject,graph
0,65.0,d,0,ppp,economy,human,collider
1,80.0,j,0,ppp,economy,human,collider
2,90.0,b,0,ppp,economy,human,collider
3,55.0,f,0,ppp,economy,human,collider
4,60.0,c,0,ppp,economy,human,collider


In [33]:
print(f"NEW: human data shape: {human_data_new_ce.shape}")
print(f"Old human data shape: {human_data.shape}")

NEW: human data shape: (960, 7)
Old human data shape: (528, 8)


### Merge new human data (non-aggregated) with the prompts

In [None]:
# now merge the all_domains_df with  human data on the columns: domain, task, cntbl_cond
# merge the dataframes
merged_df_new = pd.merge(
    rw_17_complete_df, human_data_new_ce, on=["domain", "task", "cntbl_cond", "graph"]
)
# merged_df_new["response"] = merged_df_new["response"].str.replace(",", ".").astype(float)
# print the columns
print(merged_df_new.columns)

Index(['domain', 'X', 'X_values', 'X_cntbl', 'X_sense', 'X_detailed', 'Y',
       'Y_values', 'Y_cntbl', 'Y_sense', 'Y_detailed', 'Z', 'Z_values',
       'Z_cntbl', 'Z_sense', 'Z_detailed', 'cntbl_cond', 'task', 'query_node',
       'observation', 'query', 'graph', 'prompt', 'prompt_category', 'id',
       'response', 'human_subj_id', 'subject'],
      dtype='object')


In [45]:
merged_df_new
# drop duplicate rows
merged_df_new = merged_df_new.drop_duplicates()

In [46]:
merged_df_new[merged_df_new["id"] == 66]

Unnamed: 0,domain,X,X_values,X_cntbl,X_sense,X_detailed,Y,Y_values,Y_cntbl,Y_sense,Y_detailed,Z,Z_values,Z_cntbl,Z_sense,Z_detailed,cntbl_cond,task,query_node,observation,query,graph,prompt,prompt_category,id,response,human_subj_id,subject
202,economy,interest rates,1,p,low,Interest rates are the rates banks charge to loan money.,trade deficits,1,m,large,A country's trade deficit is the difference between the value of the goods that a country imports and the value of the goods that a country exports.,retirement savings,1,m,large,Retirement savings is the money people save for their retirement.,pmm,c,Y=1,"Z=1, X=0","p(Y=1|Z=1, X=0)",collider,"Economists seek to describe and predict the regular patterns of economic fluctuation. To do this, they study some important variables or attributes of economies. They also study how these attributes are responsible for producing or causing one another.low interest rates causes large retirement savings. Also, large trade deficits causes large retirement savings. You are currently observing: large retirement savings, normal interest rates. Your task is to estimate how likely it is that large trade deficits are present on a scale from 0 to 100, given the observations and causal relationships described. 0 means completely unlikely and 100 means completely certain. Return your response as raw text in one single line using this exact XML format: <response><likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood></response> Replace YOUR_NUMERIC_RESPONSE_HERE with your likelihood estimate between 0 (very unlikely) and 100 (very likely). DO NOT include any other information, explanation, or formatting in your response. DO NOT use Markdown, code blocks, quotation marks, or special characters.",numeric-only,66,100.0,13,human
204,economy,interest rates,1,p,low,Interest rates are the rates banks charge to loan money.,trade deficits,1,m,large,A country's trade deficit is the difference between the value of the goods that a country imports and the value of the goods that a country exports.,retirement savings,1,m,large,Retirement savings is the money people save for their retirement.,pmm,c,Y=1,"Z=1, X=0","p(Y=1|Z=1, X=0)",collider,"Economists seek to describe and predict the regular patterns of economic fluctuation. To do this, they study some important variables or attributes of economies. They also study how these attributes are responsible for producing or causing one another.low interest rates causes large retirement savings. Also, large trade deficits causes large retirement savings. You are currently observing: large retirement savings, normal interest rates. Your task is to estimate how likely it is that large trade deficits are present on a scale from 0 to 100, given the observations and causal relationships described. 0 means completely unlikely and 100 means completely certain. Return your response as raw text in one single line using this exact XML format: <response><likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood></response> Replace YOUR_NUMERIC_RESPONSE_HERE with your likelihood estimate between 0 (very unlikely) and 100 (very likely). DO NOT include any other information, explanation, or formatting in your response. DO NOT use Markdown, code blocks, quotation marks, or special characters.",numeric-only,66,90.0,40,human
205,economy,interest rates,1,p,low,Interest rates are the rates banks charge to loan money.,trade deficits,1,m,large,A country's trade deficit is the difference between the value of the goods that a country imports and the value of the goods that a country exports.,retirement savings,1,m,large,Retirement savings is the money people save for their retirement.,pmm,c,Y=1,"Z=1, X=0","p(Y=1|Z=1, X=0)",collider,"Economists seek to describe and predict the regular patterns of economic fluctuation. To do this, they study some important variables or attributes of economies. They also study how these attributes are responsible for producing or causing one another.low interest rates causes large retirement savings. Also, large trade deficits causes large retirement savings. You are currently observing: large retirement savings, normal interest rates. Your task is to estimate how likely it is that large trade deficits are present on a scale from 0 to 100, given the observations and causal relationships described. 0 means completely unlikely and 100 means completely certain. Return your response as raw text in one single line using this exact XML format: <response><likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood></response> Replace YOUR_NUMERIC_RESPONSE_HERE with your likelihood estimate between 0 (very unlikely) and 100 (very likely). DO NOT include any other information, explanation, or formatting in your response. DO NOT use Markdown, code blocks, quotation marks, or special characters.",numeric-only,66,60.0,40,human
206,economy,interest rates,1,p,low,Interest rates are the rates banks charge to loan money.,trade deficits,1,m,large,A country's trade deficit is the difference between the value of the goods that a country imports and the value of the goods that a country exports.,retirement savings,1,m,large,Retirement savings is the money people save for their retirement.,pmm,c,Y=1,"Z=1, X=0","p(Y=1|Z=1, X=0)",collider,"Economists seek to describe and predict the regular patterns of economic fluctuation. To do this, they study some important variables or attributes of economies. They also study how these attributes are responsible for producing or causing one another.low interest rates causes large retirement savings. Also, large trade deficits causes large retirement savings. You are currently observing: large retirement savings, normal interest rates. Your task is to estimate how likely it is that large trade deficits are present on a scale from 0 to 100, given the observations and causal relationships described. 0 means completely unlikely and 100 means completely certain. Return your response as raw text in one single line using this exact XML format: <response><likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood></response> Replace YOUR_NUMERIC_RESPONSE_HERE with your likelihood estimate between 0 (very unlikely) and 100 (very likely). DO NOT include any other information, explanation, or formatting in your response. DO NOT use Markdown, code blocks, quotation marks, or special characters.",numeric-only,66,20.0,112,human
207,economy,interest rates,1,p,low,Interest rates are the rates banks charge to loan money.,trade deficits,1,m,large,A country's trade deficit is the difference between the value of the goods that a country imports and the value of the goods that a country exports.,retirement savings,1,m,large,Retirement savings is the money people save for their retirement.,pmm,c,Y=1,"Z=1, X=0","p(Y=1|Z=1, X=0)",collider,"Economists seek to describe and predict the regular patterns of economic fluctuation. To do this, they study some important variables or attributes of economies. They also study how these attributes are responsible for producing or causing one another.low interest rates causes large retirement savings. Also, large trade deficits causes large retirement savings. You are currently observing: large retirement savings, normal interest rates. Your task is to estimate how likely it is that large trade deficits are present on a scale from 0 to 100, given the observations and causal relationships described. 0 means completely unlikely and 100 means completely certain. Return your response as raw text in one single line using this exact XML format: <response><likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood></response> Replace YOUR_NUMERIC_RESPONSE_HERE with your likelihood estimate between 0 (very unlikely) and 100 (very likely). DO NOT include any other information, explanation, or formatting in your response. DO NOT use Markdown, code blocks, quotation marks, or special characters.",numeric-only,66,70.0,112,human
208,economy,interest rates,1,p,low,Interest rates are the rates banks charge to loan money.,trade deficits,1,m,large,A country's trade deficit is the difference between the value of the goods that a country imports and the value of the goods that a country exports.,retirement savings,1,m,large,Retirement savings is the money people save for their retirement.,pmm,c,Y=1,"Z=1, X=0","p(Y=1|Z=1, X=0)",collider,"Economists seek to describe and predict the regular patterns of economic fluctuation. To do this, they study some important variables or attributes of economies. They also study how these attributes are responsible for producing or causing one another.low interest rates causes large retirement savings. Also, large trade deficits causes large retirement savings. You are currently observing: large retirement savings, normal interest rates. Your task is to estimate how likely it is that large trade deficits are present on a scale from 0 to 100, given the observations and causal relationships described. 0 means completely unlikely and 100 means completely certain. Return your response as raw text in one single line using this exact XML format: <response><likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood></response> Replace YOUR_NUMERIC_RESPONSE_HERE with your likelihood estimate between 0 (very unlikely) and 100 (very likely). DO NOT include any other information, explanation, or formatting in your response. DO NOT use Markdown, code blocks, quotation marks, or special characters.",numeric-only,66,40.0,119,human
209,economy,interest rates,1,p,low,Interest rates are the rates banks charge to loan money.,trade deficits,1,m,large,A country's trade deficit is the difference between the value of the goods that a country imports and the value of the goods that a country exports.,retirement savings,1,m,large,Retirement savings is the money people save for their retirement.,pmm,c,Y=1,"Z=1, X=0","p(Y=1|Z=1, X=0)",collider,"Economists seek to describe and predict the regular patterns of economic fluctuation. To do this, they study some important variables or attributes of economies. They also study how these attributes are responsible for producing or causing one another.low interest rates causes large retirement savings. Also, large trade deficits causes large retirement savings. You are currently observing: large retirement savings, normal interest rates. Your task is to estimate how likely it is that large trade deficits are present on a scale from 0 to 100, given the observations and causal relationships described. 0 means completely unlikely and 100 means completely certain. Return your response as raw text in one single line using this exact XML format: <response><likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood></response> Replace YOUR_NUMERIC_RESPONSE_HERE with your likelihood estimate between 0 (very unlikely) and 100 (very likely). DO NOT include any other information, explanation, or formatting in your response. DO NOT use Markdown, code blocks, quotation marks, or special characters.",numeric-only,66,45.0,119,human
210,economy,interest rates,1,p,low,Interest rates are the rates banks charge to loan money.,trade deficits,1,m,large,A country's trade deficit is the difference between the value of the goods that a country imports and the value of the goods that a country exports.,retirement savings,1,m,large,Retirement savings is the money people save for their retirement.,pmm,c,Y=1,"Z=1, X=0","p(Y=1|Z=1, X=0)",collider,"Economists seek to describe and predict the regular patterns of economic fluctuation. To do this, they study some important variables or attributes of economies. They also study how these attributes are responsible for producing or causing one another.low interest rates causes large retirement savings. Also, large trade deficits causes large retirement savings. You are currently observing: large retirement savings, normal interest rates. Your task is to estimate how likely it is that large trade deficits are present on a scale from 0 to 100, given the observations and causal relationships described. 0 means completely unlikely and 100 means completely certain. Return your response as raw text in one single line using this exact XML format: <response><likelihood>YOUR_NUMERIC_RESPONSE_HERE</likelihood></response> Replace YOUR_NUMERIC_RESPONSE_HERE with your likelihood estimate between 0 (very unlikely) and 100 (very likely). DO NOT include any other information, explanation, or formatting in your response. DO NOT use Markdown, code blocks, quotation marks, or special characters.",numeric-only,66,85.0,133,human


In [None]:
# print are there any duplicates in the merged_df_new
duplicates = merged_df_new[merged_df_new.duplicated()]
# print(f"duplicates: {duplicates}")
# print the number of duplicates
print(f"number of duplicates: {len(duplicates)}")
# print the number of rows in the merged_df_new
print(f"number of rows in the merged_df: {len(merged_df)}")
# print the number of rows in the merged_df
print(f"number of rows in the merged_df_new: {len(merged_df_new)}")


number of duplicates: 0
number of rows in the merged_df: 960
number of rows in the merged_df_new: 1540


In [51]:
duplicates

Unnamed: 0,domain,X,X_values,X_cntbl,X_sense,X_detailed,Y,Y_values,Y_cntbl,Y_sense,Y_detailed,Z,Z_values,Z_cntbl,Z_sense,Z_detailed,cntbl_cond,task,query_node,observation,query,graph,prompt,prompt_category,id,response,human_subj_id,subject
