#**Introdution** <br>

(worked example for data of the year 2023)

#World Happiness Report 2023

![picture](https://drive.google.com/uc?export=view&id=1ah4A0NvLCpJ1bjYwvJwk2mVKTEwAvSLj)

In [2]:
from functools import partial
def flatten(xss):
    return [x for xs in xss for x in xs]
import pandas as pd
from tabulate import tabulate

In [3]:
from google.colab import files
uploaded = files.upload()

Saving WHR2023.csv to WHR2023.csv


In [5]:
import io
df = pd.read_csv(io.BytesIO(uploaded['WHR2023.csv']))

In [6]:
df.drop(list(df.filter(regex='Explained')), axis=1, inplace=True)
df.head()

Unnamed: 0,Country name,Ladder score,Standard error of ladder score,upperwhisker,lowerwhisker,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Ladder score in Dystopia,Dystopia + residual
0,Finland,7.804,0.036,7.875,7.733,10.792,0.969,71.15,0.961,-0.019,0.182,1.778,2.363
1,Denmark,7.586,0.041,7.667,7.506,10.962,0.954,71.25,0.934,0.134,0.196,1.778,2.084
2,Iceland,7.53,0.049,7.625,7.434,10.896,0.983,72.05,0.936,0.211,0.668,1.778,2.25
3,Israel,7.473,0.032,7.535,7.411,10.639,0.943,72.697,0.809,-0.023,0.708,1.778,2.691
4,Netherlands,7.403,0.029,7.46,7.346,10.942,0.93,71.55,0.887,0.213,0.379,1.778,2.11


Let's rename the columns so to match our graphs:

Ladder score: Y
Logged (natural) GDP per capita: S
Social support: J
Healthy life expectancy: X
Freedom to make life choices: W

In [7]:
df = df[["Ladder score",
         "Logged GDP per capita",
         "Social support",
         "Healthy life expectancy",
         "Freedom to make life choices"]
].copy()
df.rename(columns={
    "Ladder score": "Y",
    "Logged GDP per capita": "S",
    "Social support": "J",
    "Healthy life expectancy": "X",
    "Freedom to make life choices": "W"
}, inplace=True)
df.head(5)

Unnamed: 0,Y,S,J,X,W
0,7.804,10.792,0.969,71.15,0.961
1,7.586,10.962,0.954,71.25,0.934
2,7.53,10.896,0.983,72.05,0.936
3,7.473,10.639,0.943,72.697,0.809
4,7.403,10.942,0.93,71.55,0.887


In [8]:
df.describe()

Unnamed: 0,Y,S,J,X,W
count,137.0,137.0,137.0,136.0,137.0
mean,5.539796,9.449796,0.799073,64.967632,0.787394
std,1.139929,1.207302,0.129222,5.75039,0.112371
min,1.859,5.527,0.341,51.53,0.382
25%,4.724,8.591,0.722,60.6485,0.724
50%,5.684,9.567,0.827,65.8375,0.801
75%,6.334,10.54,0.896,69.4125,0.874
max,7.804,11.66,0.983,77.28,0.961


In [9]:
!pip install causal-learn git+https://github.com/py-why/dowhy.git

Collecting git+https://github.com/py-why/dowhy.git
  Cloning https://github.com/py-why/dowhy.git to /tmp/pip-req-build-50v21m0g
  Running command git clone --filter=blob:none --quiet https://github.com/py-why/dowhy.git /tmp/pip-req-build-50v21m0g
  Resolved https://github.com/py-why/dowhy.git to commit 36fd02c57a152175b66ee09f14d5119602c0e309
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting causal-learn
  Downloading causal_learn-0.1.3.7-py3-none-any.whl (174 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m174.4/174.4 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: dowhy
  Building wheel for dowhy (pyproject.toml) ... [?25l[?25hdone
  Created wheel for dowhy: filename=dowhy-0.0.0-py3-none-any.whl size=380924 sha256=833c3b173bc0583f29e3907d3f15fe977993180dc1153d9d87367a29532cd9ba
  Stored in direc

In [10]:
from dowhy import gcm
import networkx as nx

scm0 = gcm.StructuralCausalModel(
    nx.DiGraph([('S', 'Y')])
)
# we draw the mechanism for the root node S by using "a model that uniformly samples from data samples"
scm0.set_causal_mechanism(
    'S', gcm.EmpiricalDistribution())  ## alternative BayesianGaussianMixtureDistribution
scm0.set_causal_mechanism(
    'Y', gcm.AdditiveNoiseModel(gcm.ml.create_linear_regressor()))

In [11]:
# we defined a statistical mechanism for each node according to a probabilistic distribution
scm0.causal_mechanism("Y")

<dowhy.gcm.causal_mechanisms.AdditiveNoiseModel at 0x7ad72705eb60>

In [20]:
gcm.fit(scm0, df[["S", "Y"]])

Fitting causal mechanism of node Y: 100%|██████████| 2/2 [00:00<00:00, 73.92it/s]


Now we are ready for some exploratory activities One of the most important tool to climb the Ladder of Causation is intervention. In this case of one covariate S that defines an effect Y via linear regression what we can expect is to spot more or less association between the two. To step up our understanding we need to do some intervention (in terms of causal analysis do(S)) on the covariate and see what happens. With this basic one covariate effect what we get is a non-causal relation as there is no conditionality involved, S just "transmits" its value to Y according to a mechanism defined statistically on a probability distribution.

Let's try to see what happens with applying an intervention:

In [21]:
# scm0 = gcm.CausalModel(df, "S", "Y")
# scm0.fit()
def do_intervention_atomic(model, covariate, value):
    "Make an intervention by setting a covariate to a given VALUE"
    return gcm.interventional_samples(
        model,
        {covariate: lambda x: value},
        num_samples_to_draw=149
    )



print("Atomic Intervention: Set value S=n")
# take a random sample of 3 countries
sample_df01 = df[["S", "Y"]].sample(
    n=3, random_state=10101, ignore_index=False).copy()
sample_indices01 = sample_df01.index.copy()

table01 = []
for i in sample_indices01:
    row = [[i]]
    for v in (-1, 0, 5, 9.06, 12):  ## some fixed values to compare
        if v != -1: row.append (
            do_intervention_atomic(
                scm0,
                "S",
                v).iloc[i].to_list()  # <--
        )
        else: row.append(df[["S", "Y"]].iloc[i].to_list())
    table01.append(flatten(row))

print(tabulate(
    table01,
    headers=["i", "S_orig","Y_orig","S=0", "Y_0", "S=5",
             "Y_5", "S=9.6", "Y_9.6", "S=12", "Y_12"]))

Atomic Intervention: Set value S=n
  i    S_orig    Y_orig    S=0        Y_0    S=5      Y_5    S=9.6    Y_9.6    S=12     Y_12
---  --------  --------  -----  ---------  -----  -------  -------  -------  ------  -------
 75     8.979     5.523      0  -0.123453      5  1.69859     9.06  4.97663      12  7.21188
 87     5.527     5.211      0  -1.46787       5  2.15751     9.06  4.1228       12  7.57605
 85     8.095     5.267      0  -0.700597      5  2.10749     9.06  4.1228       12  7.76034


This is an "atomic intervention", the covariate is set to a fixed value.

🔭 it seems that country 75 is much more sensitive to have its income reduce to 1$ than 35 and 71, even they start from a similar original income. This is just a first impression though according to a causal graph that we designed to be limited and flawed according to common sense.

We can also perform "shift interventions" where we apply a function instead of a fixed value.

In [22]:
def do_intervention_shift(model, covariate, func):
    "Make an intervention by setting a covariate by a given FUNCTION"
    return gcm.interventional_samples(
        model,
        {covariate: lambda x: func(log_value=x)},
        num_samples_to_draw=1000
    )

def multiply_by(log_value, multiplier):
    from math import log, e
    return log((e ** log_value) * multiplier)

print("Shifting Intervention [POSSIBLY WRONG]: increase by a percentage")
table02 = []
for i in sample_indices01:
    row = [[i]]
    for v in (-1, 1.1, 1.2, 1.3):
        if v != -1: row.append(
            do_intervention_shift(
                scm0,
                "S",
                partial(multiply_by, multiplier=v)).iloc[i].to_list() ## <---
        )
        else: row.append(df[["S", "Y"]].iloc[i].to_list())
    table02.append(flatten(row))

print(tabulate(
    table02,
    headers=["i", "S_orig","Y_orig", "S_10%", "Y_10%",
             "S_20%", "Y_20%", "S_30%", "Y_30%"]))

Shifting Intervention [POSSIBLY WRONG]: increase by a percentage
  i    S_orig    Y_orig    S_10%    Y_10%     S_20%    Y_20%     S_30%    Y_30%
---  --------  --------  -------  -------  --------  -------  --------  -------
 75     8.979     5.523  10.9783  6.36933   9.71932  5.82821  10.6814   6.36511
 87     5.527     5.211  11.2593  4.77347  11.3273   7.22402  10.5674   5.85795
 85     8.095     5.267  10.6833  6.36642   9.93332  6.60601   9.39236  5.576


We tried to increase by 10% steps the value for GDP for the three random samples.

🛑 These look like inconsistant outcomes. Maybe another clue that the "pauperistic" casual graph (Hypothesis 0, only S considered as a cause) cannot help in understanding why the effect looks like it looks? Or the model cannot just work with this one covariate setup? As per any other statistical process, it takes many clues to come to a useful result. Maybe we can disproof this hypothesis by showing that the others are better at explaining the effect (Y) that we observe.

In [23]:
!pip install causalinference

Collecting causalinference
  Downloading CausalInference-0.1.3-py3-none-any.whl (51 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.1/51.1 kB[0m [31m834.7 kB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: causalinference
Successfully installed causalinference-0.1.3


In [26]:
from causalinference import CausalModel

In [28]:
df = pd.read_csv(io.BytesIO(uploaded['WHR2023.csv']))
df.drop(list(df.filter(regex='Explained')), axis=1, inplace=True)
df.head()

Unnamed: 0,Country name,Ladder score,Standard error of ladder score,upperwhisker,lowerwhisker,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Ladder score in Dystopia,Dystopia + residual
0,Finland,7.804,0.036,7.875,7.733,10.792,0.969,71.15,0.961,-0.019,0.182,1.778,2.363
1,Denmark,7.586,0.041,7.667,7.506,10.962,0.954,71.25,0.934,0.134,0.196,1.778,2.084
2,Iceland,7.53,0.049,7.625,7.434,10.896,0.983,72.05,0.936,0.211,0.668,1.778,2.25
3,Israel,7.473,0.032,7.535,7.411,10.639,0.943,72.697,0.809,-0.023,0.708,1.778,2.691
4,Netherlands,7.403,0.029,7.46,7.346,10.942,0.93,71.55,0.887,0.213,0.379,1.778,2.11


In [29]:
df = df[["Ladder score",
         "Logged GDP per capita",
         "Social support",
         "Healthy life expectancy",
         "Freedom to make life choices"]
].copy()
df.rename(columns={
    "Ladder score": "Y",
    "Logged GDP per capita": "S",
    "Social support": "J",
    "Healthy life expectancy": "X",
    "Freedom to make life choices": "W"
}, inplace=True)
df.head(5)

Unnamed: 0,Y,S,J,X,W
0,7.804,10.792,0.969,71.15,0.961
1,7.586,10.962,0.954,71.25,0.934
2,7.53,10.896,0.983,72.05,0.936
3,7.473,10.639,0.943,72.697,0.809
4,7.403,10.942,0.93,71.55,0.887


In [30]:
df.describe()

Unnamed: 0,Y,S,J,X,W
count,137.0,137.0,137.0,136.0,137.0
mean,5.539796,9.449796,0.799073,64.967632,0.787394
std,1.139929,1.207302,0.129222,5.75039,0.112371
min,1.859,5.527,0.341,51.53,0.382
25%,4.724,8.591,0.722,60.6485,0.724
50%,5.684,9.567,0.827,65.8375,0.801
75%,6.334,10.54,0.896,69.4125,0.874
max,7.804,11.66,0.983,77.28,0.961


#assumptions
These prerequisites must hold:

randomized experiment ("strong" prerequisite)
assignment of treatment must be random:

(Y(0), Y(1)) ⟂ D
unconfounded assumption ("weak" prerequisite)
exclude confounding among covariates (X), there is no unobserved confounder:

(Y(0), Y(1)) ⟂ D|X
Effects of treatment are orthogonal to treatment conditional covariates.

Spotting confounders is not the subject of this post, see Causal Discovery and Causal Graphs about how to avoid confounding.

#initialisation
Let's assign each record to a "treated group" (Y(1)) or to a "control group" (Y(0)):

In [31]:
# randomise treatment in the dataset
import numpy as np
df["D"] = np.random.choice(a=[0,1], size=df["Y"].count(), p=[0.4, 0.6])
print(df["D"].to_numpy())

[0 1 0 0 1 0 1 1 1 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 1 0 1 0 1 1 1 0 1 0 0 1 1
 0 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 0 0 1 1 1 0 0 1 1 0 0 1
 0 1 0 1 0 0 0 1 1 0 0 1 0 1 1 1 0 1 0 1 1 1 1 0 0 1 0 0 1 0 1 1 1 0 1 0 1
 1 1 1 1 0 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 1 0 0]


In [32]:
## causalinference uses the convention of calling covariates as `Xn` so we rename for convenience
df.rename(columns={
    "S": "X0",
    "J": "X1",
    "X": "X2",
    "W": "X3"
}, inplace=True)

#intervention 1
Now we simulate the treatment, we increase the "freedom of choice" index (W or X3 depending on which convention we are using) of a given amount only for the treated samples.

First we need to find a way to do that with incurring in errors, let's see how W looks like:

In [33]:
df["X3"].describe()

count    137.000000
mean       0.787394
std        0.112371
min        0.382000
25%        0.724000
50%        0.801000
75%        0.874000
max        0.961000
Name: X3, dtype: float64

There is a standard deviation of 0.113332 so we will go for 1/10 of that just to be sure we don't stir the water too much in the beginning. So our treatment looks like:

Y(1) => X3 = X3 + (std(X3) / 10)
this is the starting scenario:

In [34]:
# let's set aside the initial data and its causal model for comparison
df_start = df.copy()
df_start.head(5)

Unnamed: 0,Y,X0,X1,X2,X3,D
0,7.804,10.792,0.969,71.15,0.961,0
1,7.586,10.962,0.954,71.25,0.934,1
2,7.53,10.896,0.983,72.05,0.936,0
3,7.473,10.639,0.943,72.697,0.809,0
4,7.403,10.942,0.93,71.55,0.887,1


This is after we apply the treatment ("intervention 1": slight increase of freedom of choice):

In [35]:
std_dev_X3 = df_start["X3"].std()
print(std_dev_X3 / 10)

mask = df_start["D"] == 1

df_intervention1 = df_start.copy()
# apply intervention
df_intervention1.loc[mask, 'X3'] = df_intervention1.loc[mask, "X3"].apply(lambda x: x + (std_dev_X3 / 10))
df_intervention1.head()

0.011237112603178136


Unnamed: 0,Y,X0,X1,X2,X3,D
0,7.804,10.792,0.969,71.15,0.961,0
1,7.586,10.962,0.954,71.25,0.945237,1
2,7.53,10.896,0.983,72.05,0.936,0
3,7.473,10.639,0.943,72.697,0.809,0
4,7.403,10.942,0.93,71.55,0.898237,1


We can see that the samples with intervention (D=1) have now the X3 value increased by a minimal ~0.01.

Let's generate the model for intervention 1:

In [36]:
# we simplify the model considering only some covariates
causal_interv1 = CausalModel(
    Y=df_intervention1["Y"].to_numpy(),
    D=df_intervention1["D"].to_numpy(),
    X=df_intervention1[["X0", "X1", "X2", "X3"]].to_numpy()
)



Let's see some statistics from the observations.

In [37]:
print(causal_interv1.summary_stats)


Summary Statistics

                        Controls (N_c=62)          Treated (N_t=75)             
       Variable         Mean         S.d.         Mean         S.d.     Raw-diff
--------------------------------------------------------------------------------
              Y        5.597        1.217        5.492        1.078       -0.105

                        Controls (N_c=62)          Treated (N_t=75)             
       Variable         Mean         S.d.         Mean         S.d.     Nor-diff
--------------------------------------------------------------------------------
             X0        9.518        1.152        9.394        1.256       -0.103
             X1        0.800        0.140        0.798        0.120       -0.014
             X2          nan          nan       64.540        5.835          nan
             X3        0.778        0.116        0.806        0.109        0.248



#intervention 2
We now apply a treatment that is 1/3 of the freedom of choice index:

In [38]:
std_dev_X3 = df_start["X3"].std()
print(std_dev_X3 / 3)

mask = df_start["D"] == 1

df_intervention2 = df_start.copy()
df_intervention2.loc[mask, 'X3'] = df_intervention2.loc[mask, "X3"].apply(lambda x: x + (std_dev_X3 / 3))
df_intervention2.head()

0.03745704201059379


Unnamed: 0,Y,X0,X1,X2,X3,D
0,7.804,10.792,0.969,71.15,0.961,0
1,7.586,10.962,0.954,71.25,0.971457,1
2,7.53,10.896,0.983,72.05,0.936,0
3,7.473,10.639,0.943,72.697,0.809,0
4,7.403,10.942,0.93,71.55,0.924457,1


In [39]:
# we simplify the model considering only some covariates
causal_interv2 = CausalModel(
    Y=df_intervention2["Y"].to_numpy(),
    D=df_intervention2["D"].to_numpy(),
    X=df_intervention2[["X0", "X1", "X2", "X3"]].to_numpy()
)

print(causal_interv2.summary_stats)


Summary Statistics

                        Controls (N_c=62)          Treated (N_t=75)             
       Variable         Mean         S.d.         Mean         S.d.     Raw-diff
--------------------------------------------------------------------------------
              Y        5.597        1.217        5.492        1.078       -0.105

                        Controls (N_c=62)          Treated (N_t=75)             
       Variable         Mean         S.d.         Mean         S.d.     Nor-diff
--------------------------------------------------------------------------------
             X0        9.518        1.152        9.394        1.256       -0.103
             X1        0.800        0.140        0.798        0.120       -0.014
             X2          nan          nan       64.540        5.835          nan
             X3        0.778        0.116        0.832        0.109        0.480



LICENSE¶<br>
MIT License
Copyright (c) 3022 Farheen Zubair
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE

#Reference
[1] Kuss, Oliver. “The z-difference can be used to measure covariate balance in matched propensity score analyses.” Journal of clinical epidemiology vol. 66,11 (2013): 1302-7. doi:10.1016/j.jclinepi.2013.06.001

[2] Austin PC. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivariate Behav Res. 2011;46(3):399-424. doi:10.1080/00273171.2011.568786