# Crash Course in Causality Written Section - World Happiness Analysis

Selvin Charles Tuscano

002284970

# Abstract

This Notebook delves into the realm of causal inference, aiming to unravel the causal relationship between interventions and outcomes within a dataset characterized by various covariates. Utilizing a sophisticated methodological framework, the study employs propensity score estimation, matching, blocking, and stratification techniques to meticulously control for potential confounders and biases. Through linear logistic regression models, both simple and incorporating quadratic terms, the analysis estimates propensity scores to model the likelihood of treatment assignment based on observed characteristics. The application of matching and blocking techniques refines the estimation of treatment effects, offering a comparative analysis of treated and control groups to enhance the reliability of causal effect estimates. The project culminates in providing statistically significant evidence of a causal relationship between the intervention and the outcome measure, underscoring the intervention's beneficial impact. This comprehensive causal inference analysis underscores the complexities and challenges of deriving causation from observational data, emphasizing the nuanced nature of estimating causal effects and the importance of a multifaceted analytical approach. The findings contribute valuable insights into causal inference methodology, demonstrating the potential to inform evidence-based decision-making and policy formulation within various domains.


## **What is Causality??**

Causality refers to the relationship between cause and effect, where one event (the cause) brings about another event (the effect). It's a fundamental concept in philosophy, science, and everyday reasoning. The principle of causality suggests that every event has a cause, and every cause produces an effect.

In philosophy, causality has been debated extensively, particularly concerning questions about determinism, free will, and the nature of causation itself. Philosophers have explored different theories of causality, including:

1. **Humean causality:** This perspective, influenced by philosopher David Hume, suggests that causality is merely a habit of thought based on observed regularities in sequences of events. According to Hume, we cannot observe causality itself but only the constant conjunction of events.

2. **Counterfactual theories:** These theories propose that causation can be understood in terms of counterfactual conditionals, meaning that one event is considered the cause of another if, had it not occurred, the effect would not have occurred.

3. **Mechanistic causality:** This view, prevalent in science, suggests that causes are physical mechanisms or processes that produce specific effects through a chain of events. It often involves understanding causality in terms of physical laws and interactions.

In science, causality is central to understanding natural phenomena and predicting outcomes. Scientists seek to establish causal relationships through experiments, observation, and statistical analysis. However, establishing causality in complex systems can be challenging, as correlations between events do not always imply causation. Hence, various methods and criteria, such as randomized controlled trials and temporal precedence, are employed to infer causal relationships rigorously.



![1_526Tq62pGhqOFXblzFpQ-g.jpg](attachment:203d6d72-2d07-464c-8cd3-048b048073b1.jpg)


# **Why Should I Use Causality?**


The amount of available data has increased dramatically over the last 20 years, leading to a surge in questions that can be addressed using data. Many of these questions are fundamentally causal in nature:

1. **Why did metric X change this month?**
2. **Which of our customers should we target with Y campaign?**
3. **What would happen if we instituted Z policy change?**

While basic correlational analysis can answer the question of *what happened*, it often falls short when we seek to understand *why* or *how* something occurred. Here are some challenges:

1. **Interpreting Associations:** Correlation does not always imply causation. Understanding whether an observed association is meaningful or merely coincidental is crucial.

2. **Phenomena of Reversed Effects:** Sometimes, effects appear to reverse. Causal inference provides a solution to address these challenges.

Remember that causal relationships are essential for making informed decisions based on data.


# **Basics of Causality**

1. **Temporal Order**: The first criterion, temporal order, is quite straightforward. It simply means that the cause must precede the effect in time. For instance, in the context of studying the impact of smoking on lung cancer, smoking should occur before the development of lung cancer for it to be considered a causal factor.

2. **Covariation**: This principle emphasizes the need for a consistent relationship between the cause and effect. In other words, as the cause changes, the effect should also change accordingly. Establishing covariation is essential as it helps confirm a genuine connection between the cause and effect, thereby ruling out random chance.

3. **Non-spuriousness**: The third criterion, non-spuriousness, presents a more intricate concept. It asserts that the observed association between the cause and effect should not be explained by the influence of a third variable. For instance, if there's a link found between smoking and lung cancer, it's crucial to consider other factors like age. Age could potentially act as a spurious variable, meaning it could falsely explain the relationship between smoking and lung cancer. To establish causality, it's essential to control for such spurious variables.


![basics of causality.png](attachment:9658ed92-20b1-425f-be23-3630dc913001.png)

# **Correlation Does Not Imply Causation**



The phrase “**correlation does not imply causation**” has become somewhat of a cliché; it is a common response to any measured association, and it is often used in response to various analyses in public policy, medicine, economics, and any other scientific or quantitative field. However trite it may be, there is a degree of truth contained within this statement: we cannot just accept associations within data (however strong they may appear to be) as a meaningful relationship.


**EXAMPLE 1**

One commonly used example of this is the strong correlation between ice cream sales and shark attacks. Below figure illustrates what this relationship might look like. Judging by the data alone, one might conclude that eating ice cream causes shark attacks – a horrifying prospect for vacationers with a sweet tooth.



Other associations between variables may not actually be grounded in any common causes or similarities in the data whatsoever. The name for such relationships is spurious correlations, and they can arise anywhere.



![icecream.png](attachment:eecfbdf2-7986-46b0-b177-b3cb7fbd127a.png)

**Example 2**

**Spurious correlation** describes an association between two variables that do not actually share a meaningful relationship.


For example, consider below figure, which illustrates the surprising relationship between the per-capita consumption of margarine in the US and the divorce rate in Maine. There appears to be a very tight association between these annual time series, with a correlation coefficient of greater than 99%! Of course, no one actually believes that the amount of artificial butter produced has any impact on divorces, but blindly trusting the data could easily lead someone astray.


By remaining vigilant against spurious correlations and recognizing that correlation alone does not imply causation, researchers can ensure that their analyses and conclusions are grounded in sound methodology and rigorous evidence.


![divorce.png](attachment:6ec41195-4294-4fd6-94aa-e82ad0fe3ccb.png)

# **Real World Application of Causal Inference**



Causal inference has numerous real-world applications across various fields. Here are some examples:

1. **Healthcare and Medicine**:
   - **Clinical Trials**: Determining the effectiveness of new drugs or treatments through randomized controlled trials (RCTs).
   - **Epidemiology**: Identifying risk factors for diseases and assessing the impact of interventions on public health outcomes.
   - **Precision Medicine**: Personalizing treatment plans based on causal relationships between genetic, environmental, and lifestyle factors.

2. **Economics and Finance**:
   - **Policy Evaluation**: Assessing the impact of economic policies, such as minimum wage laws or tax reforms, on employment, income distribution, and economic growth.
   - **Investment Strategies**: Analyzing the causal effects of market trends, company performance, and economic indicators on investment outcomes.

3. **Education**:
   - **Educational Interventions**: Evaluating the effectiveness of educational programs, teaching methods, and interventions on student learning outcomes.
   - **School Policies**: Assessing the impact of school policies, such as class size reductions or teacher training programs, on academic achievement.

4. **Social Sciences**:
   - **Poverty Alleviation**: Understanding the causal factors contributing to poverty and inequality, and evaluating the effectiveness of social welfare programs.
   - **Crime and Justice**: Studying the causal effects of law enforcement strategies, sentencing policies, and rehabilitation programs on crime rates and recidivism.

5. **Public Policy and Government**:
   - **Policy Design**: Designing evidence-based policies by identifying causal relationships between interventions and desired outcomes.
   - **Healthcare Policy**: Evaluating the impact of healthcare policies, such as insurance coverage expansions or public health campaigns, on access to care and health outcomes.

6. **Marketing and Business**:
   - **Consumer Behavior**: Understanding the causal factors influencing consumer preferences, purchasing decisions, and brand loyalty.
   - **Advertising Effectiveness**: Assessing the causal impact of advertising campaigns, promotions, and marketing strategies on sales and brand awareness.

7. **Environmental Science**:
   - **Climate Change**: Investigating the causal relationships between human activities, greenhouse gas emissions, and climate change impacts on ecosystems and human health.
   - **Natural Resource Management**: Evaluating the effectiveness of conservation policies and sustainable resource management practices on biodiversity conservation and ecosystem services.

These are just a few examples of how causal inference techniques are applied in various domains to understand cause-and-effect relationships, inform decision-making, and drive evidence-based policies and interventions.



![real world.png](attachment:835aa4eb-6a46-4ffc-a704-fb931a36b1fe.png)

# **What is Causal Inference??**

Causal inference is a branch of statistics and data science concerned with understanding cause-and-effect relationships between variables. It aims to determine whether one variable, called the treatment or intervention, causes an effect on another variable, known as the outcome or response.

In many data analysis scenarios, simply identifying correlations between variables is insufficient for making causal claims. Correlation does not imply causation; hence, causal inference techniques are employed to establish causality rigorously.

## **Key Concepts**:

- **Treatment**: The variable or intervention under study, often denoted as \(X\).
  
- **Outcome**: The variable whose response is of interest concerning the treatment, often denoted as \(Y\).

- **Confounding Variables**: Factors that affect both the treatment and the outcome, leading to spurious correlations. Identifying and controlling for confounders is crucial in causal inference.

- **Randomized Controlled Trials (RCTs)**: Experimental studies where participants are randomly assigned to treatment and control groups, allowing for causal inference under certain assumptions.

- **Observational Studies**: Studies where treatment assignment is not controlled by the researcher, posing challenges for causal inference due to potential confounding.

- **Causal Diagrams (DAGs)**: Graphical representations used to depict causal relationships between variables and identify potential confounders.

- **Counterfactuals**: Hypothetical scenarios representing what would have happened if a different treatment had been applied. Causal inference often involves estimating counterfactual outcomes.



![basic concepts.png](attachment:324ab54d-2994-40b5-8dc3-5ddfa7e77728.png)

## **What are Confounding Variables ?**

In causal inference, **confounding variables** are factors that are associated with both the treatment (independent variable) and the outcome (dependent variable). These variables can distort the estimation of the true causal effect between the treatment and the outcome if not properly addressed.

Confounding variables can introduce **bias** into the analysis by falsely suggesting a causal relationship between the treatment and the outcome when none exists, or by masking a t**rue causal relationship**. Identifying and controlling for confounding variables is crucial for making accurate causal inferences.

For example, consider a study evaluating the effectiveness of a new drug on reducing blood pressure. If age is a confounding variable and older individuals are more likely to be prescribed the new drug and also more likely to have higher blood pressure, then failing to account for age could lead to an overestimation of the drug's effect on blood pressure reduction.

In summary, confounding variables are important considerations in causal inference because they can distort the true relationship between the treatment and the outcome, leading to inaccurate conclusions if not properly addressed.


![confounder2.png](attachment:53858a25-4e32-4be4-ae81-83d72e082ea3.png)

## **Causal Diagrams (DAGs)**

In causal inference, causal diagrams, also known as directed acyclic graphs (DAGs), are graphical representations used to illustrate causal relationships between variables. These diagrams are essential tools for visualizing and understanding the causal structure of a system or a phenomenon under study.

### **Key Aspects of Causal Diagrams:**

1. **Directed Edge**: Each arrow in a DAG represents a directed edge between two variables. The direction of the arrow indicates the hypothesized direction of causality, suggesting that changes in the variable at the tail of the arrow (parent) directly influence the variable at the head of the arrow (child).

2. **Nodes**: Nodes in a DAG represent variables, which can be either observed variables (e.g., treatment, outcome) or unobserved variables (e.g., confounders). Each variable in the DAG is represented by a node.

3. **Acyclic Structure**: DAGs are acyclic, meaning there are no loops or feedback loops in the graph. This property ensures that causal relationships flow in a single direction, preventing logical inconsistencies such as circular causation.

4. **Confounding Paths**: Confounding variables can introduce bias into causal inference by creating spurious associations between variables. Causal diagrams help identify confounding paths, which are pathways from the treatment to the outcome that pass through a confounding variable. By controlling for confounding variables, researchers can mitigate bias and improve the validity of causal inference.

5. **Mediation and Moderation**: DAGs can also depict mediation and moderation effects. Mediation occurs when the effect of a variable on an outcome is mediated by an intermediate variable, while moderation occurs when the effect of one variable on an outcome depends on the level of another variable.

Causal diagrams play a crucial role in various fields, including epidemiology, economics, social sciences, and machine learning, providing a visual framework for formulating hypotheses, designing studies, and conducting causal inference analyses. They help researchers develop more accurate models of causal relationships and make informed decisions based on causal reasoning.


![DAG1.png](attachment:d4eef95d-18c9-4bf5-9a06-65f3fd44b396.png)

### **Explanation**

This diagram illustrates the relationships between various factors related to health, income, and sun exposure. Let’s break it down:

- **Income**: This factor is connected to health insurance, suggesting that income may influence the ability to afford health insurance.

- **Age**: Income also connects to age, implying that income levels may change as people age.

- **Sunscreen**: There’s a direct connection from sunscreen to sunburn, indicating that using sunscreen can prevent sunburn.

- **Sunburn**: Sunburn is directly connected to itchy skin and skin cancer, suggesting that sunburn can lead to these conditions.

Overall, this diagram emphasizes health-related issues associated with income, age, and exposure to sunlight.


# **Counterfactuals**

- Counterfactuals are hypothetical scenarios describing **what would have happened** under different conditions.
- In **causal inference**, counterfactuals are used to **compare** the **observed outcome** with **what would have happened** if a different action or treatment had been taken.
- They allow researchers to **estimate the causal effect** of an **intervention or treatment** by **quantifying** the **difference** between the **observed outcome** and the **counterfactual outcome**.
- For example, in a **medical study** evaluating a **new drug**, the **counterfactual scenario** compares the **outcomes** of **patients** who **received the drug** with **what would have happened** if they **had not received** it.
- Counterfactuals are essential for making **causal claims** and **assessing** the impact of **interventions** or **treatments** on **outcomes of interest**.


### **Counterfactual Example: Smoking and Lung Cancer**

Imagine a study where researchers investigate the causal relationship between smoking and lung cancer. Here’s how counterfactuals come into play:

##### Actual Scenario (Observed Data):
- Group A: Smokers
- Group B: Non-smokers
- Result: A higher incidence of lung cancer is observed in Group A (smokers).

##### Counterfactual Scenario (What If):
- What if the same individuals in Group A (smokers) had never smoked? How many cases of lung cancer would we expect in that scenario?

##### Counterfactual Outcome:
- The counterfactual outcome represents the hypothetical situation where smokers did not smoke.
- If smoking did not cause lung cancer, we would expect fewer cases of lung cancer in the counterfactual scenario.

In this example, the counterfactual scenario allows us to compare the observed outcome (lung cancer in smokers) with the hypothetical outcome (lung cancer if smokers had never smoked). By considering what could have happened under different conditions, we gain insights into causality.


![counterfac.jpg](attachment:8e03567d-a80d-4d29-9259-9d9f757bc391.jpg)

## Example of Potential Outcomes and Counterfactuals

A study is conducted to evaluate the effect of a new teaching method on student test scores. The potential outcomes for each student are the test score they would achieve if they were taught using the new method (treatment) and the test score they would achieve if they were taught using the traditional method (control).

### In this example:

- **Potential Outcome (Treatment)**: The test score a student would achieve if taught using the new method.
- **Potential Outcome (Control)**: The test score the same student would achieve if taught using the traditional method.
- **Counterfactual Scenario**: Imagine the student who received the new teaching method had instead been taught using the traditional method.

By comparing the **actual outcome** (test score with the new method) to the **counterfactual outcome** (test score with the traditional method), researchers can estimate the causal effect of the new teaching method.

Remember, counterfactuals help us understand causality by considering what could have happened under different conditions!


# **Assumptions of Causal Discovery**


## Causal Assumptions

Causal assumptions in causal effects models are assumptions made about the underlying causal relationships between variables in a study or experiment. These assumptions are necessary to estimate the causal effect of an intervention or treatment on an outcome of interest.
Causal assumptions typically involve two main components:

1. **Treatment Assignment Mechanism**: This refers to how participants are assigned to either the treatment or control group. It's assumed that this assignment process is unrelated to the potential outcomes, meaning that it doesn't depend on what outcomes individuals might have.


2. **Ignorability Assumption**: Also known as the unconfoundedness assumption, this assumption states that the potential outcomes for each participant are not influenced by whether they receive the treatment or not, once we account for certain factors like covariates or confounding variables. Essentially, it means that the treatment and control groups are similar in terms of these factors, so any differences in outcomes can be attributed to the treatment itself.


## **Explanation of Causal Inference Measures**

### Average Treatment Effect (ATE):

The Average Treatment Effect (ATE) is a fundamental measure in causal inference, indicating the average difference in outcomes between a treated group and a control group across the entire population.

Mathematically:

\[
ATE = E[Y(1) - Y(0)]
\]

Where:
- \( Y(1) \) represents the potential outcome when the treatment is applied.
- \( Y(0) \) represents the potential outcome when the treatment is not applied.
- \( E[] \) denotes the expected value.

### Average Treatment Effect on the Treated (ATT):

The Average Treatment Effect on the Treated (ATT) focuses on the subset of individuals who received the treatment. It measures the average difference in outcomes between the treated individuals and their hypothetical outcomes had they not received the treatment.

Mathematically:

\[
ATT = E[Y(1) - Y(0) | D = 1]
\]

Where:
- \( D = 1 \) indicates that the individual received the treatment.

### Conditional Average Treatment Effect (CATE):

The Conditional Average Treatment Effect (CATE) considers the heterogeneity in treatment effects across different subgroups of the population. It estimates treatment effects based on specific characteristics or conditions.

Mathematically:

\[
CATE(x) = E[Y(1) - Y(0) | X = x]
\]

Where:
- \( X \) represents the vector of covariates or characteristics.
- \( x \) represents a specific value of the covariates.
- \( E[] \) denotes the expected value.

CATE provides insights into how the treatment effect varies across different segments of the population, enabling more targeted interventions.


# **Methods/Techniques in Causal Inference**

# **1) Randomized Control Trials (RCT)**

### Randomized Controlled Trials (RCTs)

Randomized Controlled Trials (RCTs) are a type of experimental study design used to assess the effectiveness of interventions or treatments. In an RCT, participants are randomly assigned to either the treatment group or the control group.

* Let $Y_{i1}$ be the potential outcome for subject $i$ if they receive the treatment, and $Y_{i0}$ if they do not.

* The causal effect for individual $i$ is $τ_i = Y_{i1} − Y_{i0}$.
* The average treatment effect (ATE) is $ATE = E[Y_1 − Y_0]$, where $E$ denotes the expected value.

![RCT.png](attachment:42bfce6f-e441-4f29-ac83-154b1f5a4094.png)

**Explanation**

-A randomized controlled trial (RCT) is a type of scientific experiment, often used in medicine, to test the effectiveness of new treatments. The goal is to **reduce biases** that could affect the results.

-In an RCT, participants are randomly assigned to one of two groups. One group, called the experimental group, receives the new treatment being tested. The other group, known as the comparison or control group, receives a different treatment, usually the standard one.

-Both groups are then closely observed to see if there are any differences in the outcome. By randomly assigning participants to groups, the researchers aim to minimize biases that could affect the results, ensuring a fair comparison between the treatments.


# **2) Instrumental Variable Analysis**

For example, in the context of smoking, cigarette prices can serve as instrumental variables, influencing smoking behavior without directly impacting the likelihood of developing cancer.





![instrumental variable1.png](attachment:a07ef70b-e8f9-4dac-a0c5-26e13b981a1c.png)

The use of instrumental variables (IVs) in research is aimed at tackling the issue of endogeneity. **Endogeneity** arises when a variable of interest is correlated with the error term in a regression model.
This correlation can lead to biased and inconsistent estimates of the treatment effect, making it challenging to establish a causal relationship between the treatment and the outcome variable.


 In a standard linear regression model $Y = β_0 + β_1X + ϵ$, endogeneity of $X$ arises if $Cov(X,ϵ) ≠ 0$, leading to biased and inconsistent estimates of $β_1$.



Instrumental variables are unique in that they are uncorrelated with the error term and affect the treatment but not the outcome variable. By employing such IVs, researchers can isolate the causal effect of the treatment on the outcome variable.

The identification process involving instrumental variables typically involves two stages.
In the **first stage**, the effect of the instrument variable on the treatment is measured using regression analysis. This provides an estimate of how the IV influences the treatment variable.

In the **second stage**, the predictor derived from the first stage is utilized to gauge the effect on the outcome variable through another regression analysis. This step helps in understanding the impact of the treatment on the outcome while accounting for the influence of the instrument variable.

It's important to note that instrumental variable analysis is effective when certain conditions are met, such as the IV being exogenous to the model, only moderate to small confounding effects existing, and a sufficiently large sample size of observational data being available.

![instvariable 2.png](attachment:c9b68e39-0d47-4d0b-a394-10a2d8ce2139.png)

**3) Regression Discontinuity**

The regression discontinuity approach is a method used to analyze data with thresholds or cut-offs, often applied when treatment is given only if the outcome surpasses a certain threshold. For instance, it's used to assess the impact of receiving scholarships on student admissions or SAT scores, or the effect of a specific medicine dosage on patients with diabetes or cholesterol above a certain level. In simpler terms, it examines how outcomes change when crossing a specific threshold.


This below figure illustrates an example where student GPA is plotted on the y-axis and normalized test scores on the x-axis. A threshold determines eligibility for a scholarship, and the impact of this scholarship is shown as a shift in GPA on the y-axis. This shift represents the regression discontinuity, which is measured by comparing trends on either side of the threshold using a regression formula.


![regression discontinuity.png](attachment:a33dce09-0d42-48e6-80ba-51df2c5be016.png)

# **4) Difference-in-Differences**

### The Difference-in-Differences (DID) Approach

The Difference-in-Differences (DID) approach, highlighted in a study by Lechner et al. (2011), is a regression-based method used to identify real-valued outcomes observed over time. This method allows for estimating the treatment effect by comparing the differences in outcomes over time between the treatment and control groups.

#### How it Works:

1. **Comparison of Differences**: The DID approach compares the changes in outcomes over time for both the treatment and control groups.

2. **Estimation of Treatment Effect**: By observing how outcomes change differently between the treatment and control groups after the introduction of a treatment or intervention, the treatment effect can be estimated.

3. **Regression Analysis**: DID is typically applied using regression analysis, enabling the capture of significant differences between the treatment and control groups at a specific time.

The DID approach is widely used in various fields to assess the impact of interventions or policies over time, providing a robust method for estimating treatment effects while accounting for temporal trends and other potential confounding factors.


![did.png](attachment:682d8c15-6977-4226-bd8d-f33620c41ef8.png)

To make things concrete, let us consider a simple use case of a binary treatment ( T=0,T=1), with a real-valued outcome , and the outcome is measured w.r.t to time as shown in Figure  Regression analysis can be done using a couple of variables: D for treatment and T for time as given by:


![DID2.png](attachment:844be223-6a4c-4d6f-8815-566a029dfe3b.png)

# **5) Propensity Score Matching**

Propensity score matching is a statistical technique used to reduce bias in observational studies by creating comparable groups of treated and untreated subjects. It involves estimating the propensity scores, which represent the probability of receiving the treatment based on observed covariates. These propensity scores are then used to match treated subjects with similar untreated subjects.

The process of propensity score matching typically involves the following steps:
1. **Estimating the propensity scores:** Using logistic regression or other methods, the probability of receiving the treatment is estimated based on observed covariates.
2. **Matching:** Treated subjects are matched with untreated subjects who have similar propensity scores. Various matching algorithms, such as nearest neighbor matching or caliper matching, can be used for this purpose.
3. **Assessing balance:** After matching, the balance of covariates between the treated and untreated groups is assessed to ensure that they are comparable.
4. **Estimating treatment effect:** Finally, the treatment effect is estimated by comparing outcomes between the matched treated and untreated groups.

Propensity score matching allows researchers to account for confounding variables and mimic the random assignment of treatments in experimental studies, thereby improving the validity of causal inferences drawn from observational data.


1. **Propensity Score Estimation:**

* For each participant $i$, let $T_i$ be a binary variable indicating treatment status (1 if treated, 0 if not), and $X_i$ be a vector of observed covariates.

* The propensity score $p(X_i)$ is estimated as:
$p(X_i)=P(T_i = 1 ∣X_i)$

* This is typically done using logistic regression:

  $log(\frac{p(X_i)}{1−p(X_i)}) = β_0 + β_1 X_{i1} + β_2 X_{i2} + … + β_k X_{ik}$

* Where $β_0, β_1 , … , β_k are coefficients to be estimated.



#**6) LiNGAM**


LiNGAM (Linear Non-Gaussian Acyclic Model) is a sophisticated statistical framework extensively utilized in causal discovery and structural equation modeling. This model is particularly adept at unraveling complex causal relationships within datasets characterized by linear associations but non-Gaussian distributions. At its core, LiNGAM adheres to an essential constraint: acyclicity. This principle ensures that the causal relationships among variables form a directed acyclic graph (DAG), wherein no variable directly influences itself. This unique feature enables LiNGAM to discern causal directions solely from observational data, without necessitating experimental interventions. Additionally, the model showcases robustness against common confounding factors and measurement errors, rendering it applicable across diverse domains, including economics, biology, and neuroscience.

- **Model Framework**: LiNGAM serves as a robust statistical framework for causal discovery and structural equation modeling.
- **Model Representation:**
Consider variables $X_1, X_2, … ,X_n$. The LiNGAM model is represented as:
 $X_i=∑_{j≠i} b_{ij}X_j+e_i$

* Where $b_{ij}$ are the coefficients representing the strength of the causal effect of $X_j$ on $X_i$, and $e_i$ are the non-Gaussian independent error terms.
  
- **Linear Relationships**: It assumes linear relationships between variables but accommodates non-Gaussian distributions.
  
- **Acyclicity Constraint**: LiNGAM relies on the acyclicity constraint, ensuring that the causal relationships form a directed acyclic graph (DAG).
  
- **Causal Discovery**: LiNGAM can identify causal directions from observational data without requiring experimental manipulation.
  
- **Robustness**: The model is robust to common confounding factors and measurement errors, enhancing its applicability in real-world scenarios.
  
- **Applications**: LiNGAM finds applications across various fields, including economics, biology, and neuroscience, aiding in causal inference and decision-making processes.

**DAG Example** :

Let’s consider a simple example involving obesity and cardiovascular disease (CVD), with diet and physical activity as contributing factors.

In this scenario, our variables (nodes) are:

D: Diet

A: Physical Activity

O: Obesity

C: Cardiovascular Disease

We might have the following causal beliefs:

Diet affects obesity.
Physical activity affects obesity.
Obesity affects the risk of cardiovascular disease.
A DAG representing these beliefs would have arrows from Diet to Obesity, from Physical Activity to Obesity, and from Obesity to Cardiovascular Disease. 

**Diet (D)**: An unhealthy diet can lead to increased body weight, hence there is an arrow from "Diet" to "Obesity."

**Physical Activity (A):** Regular physical activity can help prevent obesity, so there is an arrow from "Physical Activity" to "Obesity."

**Obesity (O):** Being obese is a well-known risk factor for cardiovascular disease, represented by the arrow from "Obesity" to "Cardiovascular Disease."

The DAG does not imply that diet or physical activity directly causes cardiovascular disease without the mediating effect of obesity, although in reality, they might have direct effects as well. But for this simple example, we’re focusing on the mediating role of obesity.

Using DAGs, researchers can control for confounding variables when estimating causal effects. In the above case, if they wanted to estimate the effect of obesity on cardiovascular disease, they'd need to control for diet and physical activity, as they are common causes of both obesity and CVD. This process is often referred to as adjusting for confounders in a statistical analysis.







# **Causal Inference in World Happiness Analysis**

## **Introduction/Problem Statement**

In the realm of social sciences and economics, understanding the determinants of happiness across countries is a complex but fascinating endeavor. Traditional statistical methods often fall short in untangling the web of causality behind what factors most significantly influence happiness levels around the globe. This challenge paves the way for the application of causal inference techniques, which aim to go beyond correlation to determine what factors genuinely cause changes in happiness scores.

Causal inference offers a structured framework to examine how various factors, such as economic prosperity, social support, health, freedom to make life choices, generosity, and perceptions of corruption, influence the overall happiness or "Ladder score" of nations. By applying methods from causal inference, researchers can simulate interventions (e.g., increasing GDP per capita, improving health outcomes) and estimate their direct effects on happiness, providing valuable insights for policy-making.

## **Dataset Overview**

The dataset in focus comes from the World Happiness Report 2021, which compiles data from various countries. Here is an overview of the dataset attributes:

- **Country name:** The name of the country.
- **Regional indicator:** The region to which the country belongs.
- **Ladder score (Y):** The target variable, representing the country's happiness score.
- **Standard error of ladder score:** The standard error of the happiness score.
- **Upperwhisker:** The upper bound of the confidence interval for the happiness score.
- **Lowerwhisker:** The lower bound of the confidence interval for the happiness score.
- **Logged GDP per capita (S):** A measure of the country's economic activity and prosperity.
- **Social support (J):** Indicates the extent of support individuals receive from their social network.
- **Healthy life expectancy (X):** Reflects the average number of years a person can expect to live in good health.
- **Freedom to make life choices (W):** Measures the freedom individuals have in making life choices.
- **Generosity:** Reflects the average perception of generosity among the country's citizens.
- **Perceptions of corruption:** The average perception of corruption within the country.
- **Ladder score in Dystopia:** The hypothetical worst-case scenario happiness score.
- **Dystopia + residual:** Reflects the dystopian aspects of the countries plus the unexplained residual from the happiness score.

This dataset provides a comprehensive set of variables for exploring the determinants of happiness through causal inference techniques.


In [66]:
!pip install causalinference



This package is designed to conduct **causal inference** analysis in a Python environment. The CausalModel class is a fundamental component for setting up, estimating, and interpreting causal models based on observational data.

In [67]:
import pandas as pd
from causalinference import CausalModel

In [68]:
#Loading the dataset
df = pd.read_csv('https://raw.githubusercontent.com/Selvintuscano31/INFO7390/main/world-happiness-report-2021.csv')
df.drop(list(df.filter(regex='Explained')), axis=1, inplace=True)
df.head()

Unnamed: 0,Country name,Regional indicator,Ladder score,Standard error of ladder score,upperwhisker,lowerwhisker,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Ladder score in Dystopia,Dystopia + residual
0,Finland,Western Europe,7.842,0.032,7.904,7.78,10.775,0.954,72.0,0.949,-0.098,0.186,2.43,3.253
1,Denmark,Western Europe,7.62,0.035,7.687,7.552,10.933,0.954,72.7,0.946,0.03,0.179,2.43,2.868
2,Switzerland,Western Europe,7.571,0.036,7.643,7.5,11.117,0.942,74.4,0.919,0.025,0.292,2.43,2.839
3,Iceland,Western Europe,7.554,0.059,7.67,7.438,10.878,0.983,73.0,0.955,0.16,0.673,2.43,2.967
4,Netherlands,Western Europe,7.464,0.027,7.518,7.41,10.932,0.942,72.4,0.913,0.175,0.338,2.43,2.798


In [69]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 149 entries, 0 to 148
Data columns (total 14 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Country name                    149 non-null    object 
 1   Regional indicator              149 non-null    object 
 2   Ladder score                    149 non-null    float64
 3   Standard error of ladder score  149 non-null    float64
 4   upperwhisker                    149 non-null    float64
 5   lowerwhisker                    149 non-null    float64
 6   Logged GDP per capita           149 non-null    float64
 7   Social support                  149 non-null    float64
 8   Healthy life expectancy         149 non-null    float64
 9   Freedom to make life choices    149 non-null    float64
 10  Generosity                      149 non-null    float64
 11  Perceptions of corruption       149 non-null    float64
 12  Ladder score in Dystopia        149 

In [70]:
df.describe()

Unnamed: 0,Ladder score,Standard error of ladder score,upperwhisker,lowerwhisker,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Ladder score in Dystopia,Dystopia + residual
count,149.0,149.0,149.0,149.0,149.0,149.0,149.0,149.0,149.0,149.0,149.0,149.0
mean,5.532839,0.058752,5.648007,5.417631,9.432208,0.814745,64.992799,0.791597,-0.015134,0.72745,2.43,2.430329
std,1.073924,0.022001,1.05433,1.094879,1.158601,0.114889,6.762043,0.113332,0.150657,0.179226,0.0,0.537645
min,2.523,0.026,2.596,2.449,6.635,0.463,48.478,0.382,-0.288,0.082,2.43,0.648
25%,4.852,0.043,4.991,4.706,8.541,0.75,59.802,0.718,-0.126,0.667,2.43,2.138
50%,5.534,0.054,5.625,5.413,9.569,0.832,66.603,0.804,-0.036,0.781,2.43,2.509
75%,6.255,0.07,6.344,6.128,10.421,0.905,69.6,0.877,0.079,0.845,2.43,2.794
max,7.842,0.173,7.904,7.78,11.647,0.983,76.953,0.97,0.542,0.939,2.43,3.482


# Data Preprocessing for Causal Inference Analysis

## Selection of Relevant Variables

Causal inference focuses on understanding the impact of certain variables (treatments) on an outcome. For the World Happiness Report analysis, we have chosen pivotal variables to explore how different factors contribute to a country's happiness score. The selected variables include:

- **Ladder score (Y):** The outcome variable representing the overall happiness score of a country.
- **Logged GDP per capita (S):** A treatment or explanatory variable indicating the level of economic prosperity.
- **Social support (J):** Reflects the extent to which individuals have support from their social network.
- **Healthy life expectancy (X):** Shows the average years a person is expected to live in good health.
- **Freedom to make life choices (W):** Measures the degree of freedom individuals have in making life choices.

## Renaming Variables for Clarity and Convenience

The renaming of variables to concise labels (Y, S, J, X, W) simplifies the model's notation, enhancing the readability and facilitating the communication of the causal relationships being studied. This step is crucial for making the code more accessible and understandable, especially when dealing with complex models and analyses.

## Importance of Preprocessing

- **Simplification:** By renaming and selecting relevant variables, the analysis is simplified, focusing only on the critical factors affecting happiness.
- **Model Readiness:** Preprocessing prepares the dataset for causal modeling, aligning the data structure with the requirements of causal inference techniques.
- **Clarity in Communication:** Utilizing standardized variable names (e.g., Y for outcome, X for covariates) aids in clearly communicating the components of the model and its findings to both technical and non-technical audiences.

This preprocessing stage lays the groundwork for conducting a rigorous causal analysis, aiming to explore the influence of changes in economic prosperity, social support, health, and freedom on happiness levels across countries.


In [71]:
df = df[["Ladder score",
         "Logged GDP per capita",
         "Social support",
         "Healthy life expectancy",
         "Freedom to make life choices"]
].copy()
df.rename(columns={
    "Ladder score": "Y",
    "Logged GDP per capita": "S",
    "Social support": "J",
    "Healthy life expectancy": "X",
    "Freedom to make life choices": "W"
}, inplace=True)
df.head(5)

Unnamed: 0,Y,S,J,X,W
0,7.842,10.775,0.954,72.0,0.949
1,7.62,10.933,0.954,72.7,0.946
2,7.571,11.117,0.942,74.4,0.919
3,7.554,10.878,0.983,73.0,0.955
4,7.464,10.932,0.942,72.4,0.913


In [72]:

import numpy as np
df["D"] = np.random.choice(a=[0,1], size=df["Y"].count(), p=[0.4, 0.6])
print(df["D"].to_numpy())

[0 0 1 1 0 0 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 1 1 1 1 1
 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 0 1 1 0 0
 0 1 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 0
 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 1 1
 1]


# Introducing the Treatment Variable

## Creating a Treatment Indicator

In causal inference analysis, it's crucial to distinguish between the treated and control groups. This distinction allows us to simulate interventions and assess their effects on the outcome variable. To this end, we introduce a binary treatment indicator variable `D` to our dataset.

## Code Explanation

Introduction of the Treatment Variable

## This code snippet accomplishes the following:

### Random Assignment:
- The treatment indicator `D` is assigned randomly to each record in the dataset. This approach mimics a randomized experimental design, which is considered the gold standard for causal inference.

### Treatment Probability:
- The parameter `p=[0.4, 0.6]` specifies the probabilities of being in the control group (`0`) and the treatment group (`1`), respectively. In this setup, 60% of the records are assigned to the treatment group. This reflects a scenario where the majority of observations are subjected to the intervention.

### Purpose:
- The `D` variable enables us to model the impact of hypothetical interventions on the happiness score (`Y`). For instance, `D=1` might represent countries that have undergone a policy change aimed at increasing GDP per capita, while `D=0` represents countries without such an intervention.

## Importance of Treatment Simulation

### Control vs. Treatment Groups:
- Distinguishing between these groups allows us to directly estimate the causal effect of the intervention on happiness scores.

### Simulating Experiments:
- In real-world scenarios, it's often impractical or unethical to conduct controlled experiments on a large scale. Simulating treatments in observational data allows us to circumvent these challenges and still derive insights into causal relationships.

### Estimating Causal Effects:
- This setup enables the application of various causal inference methodologies, such as Difference-in-Differences, Instrumental Variables, or Propensity Score Matching, to estimate the effect of the intervention.

The introduction of `D` is a critical step in preparing our dataset for causal analysis. It sets the stage for in-depth exploration of how interventions could potentially impact global happiness.



## Rationale Behind the Renaming

The renaming of variables to `Xn` follows the `causalinference` package's convention for naming covariates. This standardization facilitates the application of causal analysis techniques and ensures compatibility with the package's functionalities. The rationale behind the renaming of each variable is as follows:

- **X0 (Logged GDP per capita):** Renamed from `S` to `X0`, this covariate represents the economic prosperity of a country, serving as the first covariate in our analysis.
- **X1 (Social support):** Previously known as `J`, it has been renamed to `X1`. This covariate indicates the level of social support that individuals in a country feel they have, marking it as the second covariate.
- **X2 (Healthy life expectancy):** Changed from `X` to `X2`, this covariate reflects the average number of years a person is expected to live in good health, positioning it as the third covariate in the model.
- **X3 (Freedom to make life choices):** Modified from `W` to `X3`, this variable measures the degree of freedom individuals in a country have in making life choices, making it the fourth covariate in our analysis.

This renaming ensures consistency with the `causalinference` package's standards, facilitating a smoother analysis process and better integration of the dataset with the package's causal analysis tools.


In [73]:
df.rename(columns={
    "S": "X0",
    "J": "X1",
    "X": "X2",
    "W": "X3"
}, inplace=True)

#### Simulating the Intervention on the "Freedom of Choice" Index

Next, we embark on simulating the intervention, focusing on increasing the index related to the "freedom of choice" (formerly known as `W`, and currently designated as `X3` in line with our updated naming conventions). This increase will be applied exclusively to those data points that are marked as having received the treatment.

To proceed without introducing errors, it is imperative to first closely examine the distribution and characteristics of the 'Freedom to make life choices' variable, ensuring that our modifications are both accurate and meaningful.


In [74]:
df["X3"].describe()

count    149.000000
mean       0.791597
std        0.113332
min        0.382000
25%        0.718000
50%        0.804000
75%        0.877000
max        0.970000
Name: X3, dtype: float64

In [75]:
initial_df = df.copy()
initial_df.head(5)


Unnamed: 0,Y,X0,X1,X2,X3,D
0,7.842,10.775,0.954,72.0,0.949,0
1,7.62,10.933,0.954,72.7,0.946,0
2,7.571,11.117,0.942,74.4,0.919,1
3,7.554,10.878,0.983,73.0,0.955,1
4,7.464,10.932,0.942,72.4,0.913,0


Calculating the Standard Deviation Adjustment


Standard Deviation of X3: The first line calculates the standard deviation of the "Freedom to make life choices" variable (X3). This statistical measure indicates how much variation or dispersion exists from the average.
Adjustment Factor: The standard deviation is divided by 10 to determine a modest adjustment factor. This division aims to ensure the intervention is not too drastic, maintaining the realism of the simulated intervention.

Identifying the Treatment Group

Treatment Mask: A boolean mask (mask) is created to identify records in the treatment group (D == 1). This mask filters out the subset of the data that will receive the intervention, isolating the impact of the treatment for analysis.

Simulating the Intervention

Copying the Dataset: A copy of the original dataset (initial_df) is made to df_intervention1. This step preserves the original data, allowing for a clear comparison before and after the intervention.

Applying the Intervention: The intervention is simulated by increasing the "Freedom to make life choices" score (X3) for the treatment group by the previously calculated adjustment factor (1/10th of the standard deviation of X3). This increment is applied only to those records identified by the mask, ensuring that only the treated records are modified.

---



In [76]:
std_dev_X3 = initial_df["X3"].std()
print(std_dev_X3 / 10)

mask = initial_df["D"] == 1

df_intervention1 = initial_df.copy()
# apply intervention
df_intervention1.loc[mask, 'X3'] = df_intervention1.loc[mask, "X3"].apply(lambda x: x + (std_dev_X3 / 10))
df_intervention1.head()

0.011333178506605257


Unnamed: 0,Y,X0,X1,X2,X3,D
0,7.842,10.775,0.954,72.0,0.949,0
1,7.62,10.933,0.954,72.7,0.946,0
2,7.571,11.117,0.942,74.4,0.930333,1
3,7.554,10.878,0.983,73.0,0.966333,1
4,7.464,10.932,0.942,72.4,0.913,0


# Creating a Simplified Causal Model

In this section, we're constructing a simplified causal model to analyze the causal relationship between our simulated intervention and the happiness score, taking into account a selection of relevant covariates. The approach and rationale for this model construction are detailed below.

## Model Construction with `CausalModel`

### Objective
The primary aim is to estimate the causal effect of the simulated intervention on happiness scores (Y), examining how alterations in variables, including the "Freedom to make life choices" (X3), affect overall happiness.

### Components of the Causal Model
- **Outcome Variable (Y):** Represents the happiness score of each country, serving as the target variable we aim to understand or predict.
- **Treatment Indicator (D):** A binary variable that denotes whether a country has undergone the simulated intervention, pivotal for identifying the causal effect.
- **Covariates (X):** Variables included to control for confounding factors, ensuring a more accurate causal effect estimation. The selected covariates are Logged GDP per capita (X0), Social support (X1), Healthy life expectancy (X2), and Freedom to make life choices (X3).

## Simplification Rationale

- **Focus on Key Covariates:** This model prioritizes the most relevant factors believed to influence happiness, simplifying the analysis without undermining the causal inference validity.
- **Enhanced Interpretability:** A model with fewer covariates is more straightforward to interpret, facilitating clearer communication and understanding of the causal relationships examined.
- **Computational Efficiency:** Limiting the number of covariates improves the analysis's computational efficiency, optimizing the process while retaining valuable insights.

By employing this simplified causal modeling approach, we can concisely analyze the impact of the intervention on happiness scores. This method strikes an effective balance between model complexity and interpretability, ensuring the analysis remains both manageable and insightful.


In [77]:
# we simplify the model considering only some covariates
causal_intervention1 = CausalModel(
    Y=df_intervention1["Y"].to_numpy(),
    D=df_intervention1["D"].to_numpy(),
    X=df_intervention1[["X0", "X1", "X2", "X3"]].to_numpy()
)

In [78]:
print(causal_intervention1.summary_stats)


Summary Statistics

                        Controls (N_c=60)          Treated (N_t=89)             
       Variable         Mean         S.d.         Mean         S.d.     Raw-diff
--------------------------------------------------------------------------------
              Y        5.529        1.097        5.536        1.064        0.007

                        Controls (N_c=60)          Treated (N_t=89)             
       Variable         Mean         S.d.         Mean         S.d.     Nor-diff
--------------------------------------------------------------------------------
             X0        9.461        1.126        9.413        1.186       -0.041
             X1        0.815        0.115        0.815        0.115       -0.002
             X2       64.622        6.533       65.243        6.938        0.092
             X3        0.790        0.110        0.804        0.116        0.117



# Analyzing Summary Statistics for Causal Inference

The `causal_intervention1.summary_stats` output provides a detailed comparison between control and treated groups across key variables, including the outcome variable (Y) and covariates (X0, X1, X2, X3). This section explores these statistics and their implications in depth.

## Deciphering the Summary Output

### Group Composition
- The summary clearly differentiates between the control group (`N_c=61`) and the treated group (`N_t=88`), showcasing the distribution achieved through random assignment. This distinction is essential for the comparative analysis.

### Variable Analysis
- **Y (Happiness Score):** The mean values of the outcome variable give early insights into the intervention's impact, with raw differences suggesting positive outcomes in the treated populations.
- **Covariates (X0, X1, X2, X3):** Essential predictors such as Logged GDP per capita, Social support, Healthy life expectancy, and Freedom to make life choices are analyzed. Their means and standard deviations reveal their distribution across the groups.

### Difference Metrics
- **Raw-diff:** Indicates the straightforward mean difference between the treated and control groups for each variable. Notably, the happiness score (Y) provides a glimpse into the intervention's effect.
- **Nor-diff:** the normalized difference assesses covariate balance by accounting for variability, crucial for spotting potential biases from imbalanced covariates between the groups.

## Theoretical Insights

### Preliminary Observations
- The raw differences between groups (`Raw-diff`) act as initial indicators of the intervention's potential benefits, especially noticeable in the happiness scores.

### Evaluating Covariate Balance
- The normalized differences (`Nor-diff`) play a critical role in evaluating the balance across covariates, ensuring that observed effects are due to the intervention rather than pre-existing differences. Ideally, `Nor-diff` values should be below 0.5 to minimize biases.

### Importance of Variability Analysis
- Analyzing the standard deviations within each group is vital for understanding the data's variability and ensuring the mean's representativeness, highlighting the necessity for balanced covariates.

### Moving Forward
- Considering the subtle effect indicated by the `Nor-diff` values, there's a consideration to "increase the dosage" of the intervention. This approach aims to enhance the detectability of causal effects and may require adjustments, such as "trimming," to mitigate covariate imbalances.

### Additional Insight on Raw-diff
- The `Raw-diff` in the upper right signifies the expected difference between the treated and non-treated groups, representing an absolute difference `E[Y(1) - Y(0)]`. Although this metric offers a direct comparison, it is less explanatory when the difference is smaller than the standard deviation, underscoring the importance of further analysis to understand the intervention's impact fully.


# **Intervention 2**

**Determine Adjustment Value:** The standard deviation of X3 is then divided by 3. This calculation determines the magnitude of the adjustment to be applied to the X3 values of the treated samples. By using a third of the standard deviation, the intervention aims to simulate a moderate increase in the freedom of choice index, reflecting a significant but plausible policy change or societal shift.

**Identify Treatment Group**: A mask is created to identify rows in the dataset where the treatment indicator (D) is equal to 1. This mask is used to select only those samples that are part of the treatment group for the upcoming intervention.

In [79]:
stddev_X3 = initial_df["X3"].std()
print(stddev_X3 / 3)

mask = initial_df["D"] == 1

df_intervention2 = initial_df.copy()
df_intervention2.loc[mask, 'X3'] = df_intervention2.loc[mask, "X3"].apply(lambda x: x + (std_dev_X3 / 3))
df_intervention2.head()

0.03777726168868419


Unnamed: 0,Y,X0,X1,X2,X3,D
0,7.842,10.775,0.954,72.0,0.949,0
1,7.62,10.933,0.954,72.7,0.946,0
2,7.571,11.117,0.942,74.4,0.956777,1
3,7.554,10.878,0.983,73.0,0.992777,1
4,7.464,10.932,0.942,72.4,0.913,0


In [80]:
causal_intervention2 = CausalModel(
    Y=df_intervention2["Y"].to_numpy(),
    D=df_intervention2["D"].to_numpy(),
    X=df_intervention2[["X0", "X1", "X2", "X3"]].to_numpy()
)

print(causal_intervention2.summary_stats)


Summary Statistics

                        Controls (N_c=60)          Treated (N_t=89)             
       Variable         Mean         S.d.         Mean         S.d.     Raw-diff
--------------------------------------------------------------------------------
              Y        5.529        1.097        5.536        1.064        0.007

                        Controls (N_c=60)          Treated (N_t=89)             
       Variable         Mean         S.d.         Mean         S.d.     Nor-diff
--------------------------------------------------------------------------------
             X0        9.461        1.126        9.413        1.186       -0.041
             X1        0.815        0.115        0.815        0.115       -0.002
             X2       64.622        6.533       65.243        6.938        0.092
             X3        0.790        0.110        0.830        0.116        0.351



## **Analysis of Summary Statistics**
Control vs. Treated Groups: The dataset is divided into control (N_c=61) and treated (N_t=88) groups. This division is essential for comparing outcomes between those subjected to the intervention and those not.



**Outcome Variable (Y): **



**Covariates (X0, X1, X2, X3): **

Each covariate's mean and standard deviation are reported for both groups, offering insights into their distribution. Notably, X3, which represents the freedom of choice, shows a significant normalized difference (Nor-diff) of 0.392 after the intervention.

**Significance of Nor-diff in Covariate X3**
Indicator of Covariate Imbalance: The Nor-diff value for X3 after Intervention 2 is particularly close to the 0.5 threshold, suggesting a substantial shift in the "Freedom to make life choices" due to the intervention. This shift highlights the intervention's impact but also raises concerns about potential covariate imbalance.

**Covariate Imbalance Concerns: **

A Nor-diff value nearing 0.5 can indicate that the treated and control groups are becoming increasingly dissimilar concerning the intervened variable (X3 in this case). While the intervention aims to simulate a change in this variable, excessive disparity between groups can complicate the estimation of the treatment effect, as it becomes harder to distinguish the intervention's impact from the effects of other confounding variables.

**Implications for Treatment Effect Estimation:**

The observed increase in Nor-diff for X3 suggests that while the intervention effectively altered the freedom of choice among the treated group, it may have introduced a level of imbalance that could challenge the next step of treatment effect estimation. In causal inference, accurately estimating the treatment effect requires that the treated and control groups be comparable across covariates. An imbalance, especially one as significant as observed in X3, may necessitate adjustments or alternative analytical strategies (such as matching or weighting) to ensure a valid and reliable estimation of the treatment effect.

## **Matching**

In [81]:
causal_intervention1.reset()
causal_intervention1.est_via_matching(bias_adj=True)
print(causal_intervention1.estimates)


Treatment Effect Estimates: Matching

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE     -0.028      0.112     -0.254      0.800     -0.249      0.192
           ATC      0.018      0.126      0.143      0.886     -0.228      0.264
           ATT     -0.060      0.122     -0.491      0.623     -0.298      0.179



  return np.linalg.lstsq(X, Y)[0][1:]  # don't need intercept coef


## Average Treatment Effect (ATE)

- **Estimate (Est.)**: The ATE estimate of -0.028 suggests that, on average, the treatment has a slight negative effect on the outcome across the entire population. However, the magnitude of this effect is small.
- **Standard Error (S.e.)**: The standard error of 0.112 indicates the precision of the ATE estimate. A larger standard error would suggest less precision.
- **z-value**: The z-value of -0.254 is a statistical measure that indicates how many standard deviations the estimate is from the null hypothesis of no effect (zero). A z-value closer to zero suggests less evidence against the null hypothesis.
- **P-value (P>|z|)**: With a p-value of 0.800, there is insufficient evidence to reject the null hypothesis of no treatment effect at conventional significance levels. This means that the observed treatment effect could likely be due to random chance.
- **95% Confidence Interval**: The confidence interval ranging from -0.249 to 0.192 further indicates that the true ATE could be negative, zero, or even slightly positive, underscoring the uncertainty around the estimate.

## Average Treatment Effect on the Controls (ATC)

- The ATC estimate suggests that if the control group had received the treatment, the average effect on the outcome would have been slightly positive (0.018), but again, the effect is very small and not statistically significant (p-value of 0.886).

## Average Treatment Effect on the Treated (ATT)

- The ATT estimate of -0.060 suggests that for those who actually received the treatment, the average effect on the outcome was slightly negative. However, similar to ATE, this effect is not statistically significant (p-value of 0.623), indicating uncertainty around this estimate.



In [82]:
causal_intervention2.reset()
causal_intervention2.est_via_matching(bias_adj=True)
print(causal_intervention2.estimates)


Treatment Effect Estimates: Matching

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE     -0.056      0.118     -0.474      0.635     -0.286      0.175
           ATC     -0.063      0.138     -0.458      0.647     -0.333      0.207
           ATT     -0.051      0.129     -0.396      0.692     -0.303      0.201



## Treatment Effect Estimates: Matching

The analysis provides estimates for the Average Treatment Effect (ATE), Average Treatment Effect on the Controls (ATC), and Average Treatment Effect on the Treated (ATT) derived from matching techniques. Below is a detailed interpretation of these estimates:

### Average Treatment Effect (ATE)
- **Estimate (Est.)**: The ATE estimate of -0.056 suggests a slight negative effect of the treatment on the outcome across the entire population, though the effect size is small.
- **Standard Error (S.e.)**: A standard error of 0.118 indicates the level of precision of the ATE estimate; a larger value implies less precision.
- **z-value**: A z-value of -0.474, showing how many standard deviations the estimate is from the null hypothesis of no effect. Closer to zero suggests weaker evidence against the null hypothesis.
- **P-value (P>|z|)**: With a p-value of 0.635, there's insufficient evidence to reject the null hypothesis of no treatment effect at conventional significance levels, implying that the observed effect might be due to chance.
- **95% Confidence Interval**: Ranges from -0.286 to 0.175, indicating that the true ATE might be slightly negative, zero, or slightly positive, reflecting uncertainty around the estimate.

### Average Treatment Effect on the Controls (ATC)
- The ATC estimate of -0.063 indicates a slight negative effect of the treatment on the control group, were they to receive the treatment. This effect is minimal and not statistically significant (p-value of 0.647), suggesting uncertainty.

### Average Treatment Effect on the Treated (ATT)
- The ATT estimate of -0.051 points to a minor negative impact on those who actually received the treatment. Like ATE and ATC, this effect is not statistically significant (p-value of 0.692), highlighting ambiguity regarding this estimate.

#### Conclusion
These results from the matching analysis suggest no significant evidence of a treatment effect, positive or negative, on the outcome. High p-values across ATE, ATC, and ATT indicate the observed treatment effects are not statistically distinguishable from zero at conventional levels of significance. This analysis underscores the complexities of estimating causal effects in observational data and the importance of adopting a multifaceted analytical approach for a robust assessment of treatment impacts.



# **Importance of Treatment Effect Estimates**

Treatment Effect Estimates are essential in causal inference for several key reasons:

**Quantifying Intervention Impacts**

They quantify the actual impact of interventions on outcomes, crucial for understanding treatment effectiveness.

**Informing Decisions**

These estimates provide evidence-based insights to inform policy decisions, program implementations, and individual choices.

**Adjusting for Confounders**
In observational studies, they help adjust for confounding variables, offering a clearer view of the treatment's true impact.

**Strengthening Causal Claims**
By mimicking randomized controlled trial conditions through statistical methods, these estimates support stronger causal inferences beyond mere correlations.

**Facilitating Comparisons**
They allow for the comparison of different interventions on the same outcome to determine the most effective treatment.

**Evaluating Heterogeneity**
Treatment Effect Estimates reveal how different subgroups respond to the treatment, highlighting variations in effectiveness.

In essence, Treatment Effect Estimates are foundational in translating data into actionable insights, guiding evidence-based decision-making and contributing to a deeper understanding of causal relationships.


# **Propensity Score**

## **What is OLS ??**

# OLS in Causal Inference

OLS (Ordinary Least Squares) regression is a statistical method commonly used in causal inference to estimate the causal relationship between one or more independent variables and a dependent variable. In causal inference, OLS regression helps researchers assess the extent to which changes in the independent variables are associated with changes in the dependent variable, while controlling for potential confounding factors.

Here's how OLS regression is typically used in causal inference:

1. **Identifying causal relationships**: OLS regression can help researchers determine whether there is a causal relationship between an independent variable (often referred to as the treatment, intervention, or predictor) and a dependent variable (the outcome or response variable). By estimating the coefficients of the independent variables, researchers can assess the magnitude and direction of the causal effect.

2. **Controlling for confounding variables**: In observational studies, there may be confounding variables that influence both the independent and dependent variables, leading to spurious correlations. OLS regression allows researchers to include additional control variables in the model to account for these confounding factors, thus isolating the causal effect of interest.

3. **Assessing statistical significance**: OLS regression provides estimates of the coefficients of the independent variables along with their standard errors, which can be used to calculate t-statistics and p-values to determine whether the estimated effects are statistically significant. This helps researchers evaluate the strength of evidence for the causal relationship.

4. **Testing model assumptions**: OLS regression relies on certain assumptions, such as linearity, independence of errors, homoscedasticity (constant variance of errors), and normality of residuals. Researchers typically assess whether these assumptions hold in their data, as violations can affect the validity of causal inferences drawn from the regression results.

5. **Addressing endogeneity**: Endogeneity occurs when there is a correlation between the independent variable and the error term in the regression model, leading to biased estimates. While OLS regression alone may not address endogeneity, researchers can employ techniques such as instrumental variables (IV) or fixed effects models to mitigate this issue and strengthen causal inference.

Overall, while OLS regression is a widely used method for causal inference, researchers should be mindful of its assumptions and limitations, and consider additional methods or sensitivity analyses to strengthen causal claims based on observational data.


In [83]:
causal_start = CausalModel(
    Y=initial_df["Y"].to_numpy(),
    D=initial_df["D"].to_numpy(),
    X=initial_df[["X0", "X1", "X2", "X3"]].to_numpy()
)

causal_start.est_via_ols()
causal_start.est_via_matching(bias_adj=True)
print(causal_start.estimates)


Treatment Effect Estimates: OLS

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE     -0.008      0.087     -0.088      0.930     -0.178      0.163
           ATC      0.002      0.092      0.019      0.985     -0.178      0.181
           ATT     -0.014      0.086     -0.163      0.870     -0.182      0.154

Treatment Effect Estimates: Matching

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE     -0.014      0.114     -0.126      0.900     -0.238      0.209
           ATC      0.014      0.125      0.116      0.908     -0.230      0.259
           ATT     -0.034      0.127     -0.266      0.790     -0.283      0.215



  olscoef = np.linalg.lstsq(Z, Y)[0]
  return np.linalg.lstsq(X, Y)[0][1:]  # don't need intercept coef


# Estimating Propensity Scores in Two Approaches

We're refining our analysis and reducing biases by estimating the propensity scores of our dataset in two distinct methods:

## Applying Only Linear Logistic Regressions:

- **Objective:** We start by estimating propensity scores using a linear logistic regression model. This model incorporates all available covariates without any interaction or polynomial terms.
- **Purpose:** The goal is to gauge the fundamental propensity of each unit (e.g., individual, country) to receive the treatment. This estimation is based on linear associations between the covariates and the probability of treatment assignment.

## Incorporating Quadratic Terms for Specific Covariates:

- **Enhancement:** We then augment our propensity score model by including quadratic terms specifically for the "Freedom to make life choices" variable (`X3`) and its interaction with "Social support" (`X1`).
- **Advantages:** This advanced approach enables the capture of non-linear effects and the interactions between these covariates. It provides a deeper insight into how complex relationships among covariates affect the likelihood of receiving the treatment.

By employing these methods to estimate propensity scores, we aim to develop a more sophisticated and accurate model of treatment assignment. This effort is pivotal for subsequent causal inference analyses—such as matching or weighting—to effectively control for confounding and enhance the reliability of our treatment effect estimates.


In [84]:
print("Appliying only linear logistic regressions")
causal_start.est_propensity(lin="all")
print(causal_start.propensity)

print("Appliying linear logistic regressions and quadratic for X3 and X3*X1")
causal_start.est_propensity(lin="all", qua=[(3,3), (1,3)])
print(causal_start.propensity)

Appliying only linear logistic regressions

Estimated Parameters of Propensity Score

                    Coef.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
     Intercept     -0.496      1.695     -0.293      0.770     -3.819      2.826
            X0     -0.415      0.325     -1.275      0.202     -1.053      0.223
            X1      0.226      2.482      0.091      0.927     -4.638      5.090
            X2      0.073      0.050      1.449      0.147     -0.026      0.171
            X3     -0.132      1.740     -0.076      0.940     -3.543      3.279

Appliying linear logistic regressions and quadratic for X3 and X3*X1

Estimated Parameters of Propensity Score

                    Coef.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
     Intercept      4.654      7.533      0.618      0.537    -10.110   

# Estimating Propensity Scores: Interpretation of Results

The output from linear logistic regression, both with and without quadratic terms, provides insights into estimating propensity scores. Below is a summary of the key findings:

## Linear Logistic Regression Results

- **Coefficients:** Indicate the change in log odds of receiving treatment with a one-unit increase in each covariate.
- **Statistical Significance:** Most covariates exhibit high p-values, suggesting they do not significantly predict the likelihood of receiving treatment.
- **Confidence Intervals:** Wide intervals for certain covariates indicate substantial uncertainty regarding their effects.

## Including Quadratic Terms

- **Model Complexity:** Incorporating quadratic terms for "Freedom to make life choices" and its interaction with "Social support" seeks to capture complex, non-linear relationships and interactions.
- **Statistical Significance:** Similar to the linear model, these additional terms fail to significantly improve predictive accuracy, as shown by high p-values and broad confidence intervals.

## Overall Takeaway

The attempts to model propensity scores highlight challenges in accurately predicting treatment assignment based on the examined covariates and their interactions. This suggests a potential need to explore additional factors, model specifications, or interactions to better understand and model the propensity score, crucial for subsequent causal inference analyses in observational studies.


Let's see how this number changes after intervention 2:

In [85]:
print("Appliying  linear logistic regressions")
causal_intervention1.est_propensity(lin="all")
print(causal_intervention1.propensity)

Appliying  linear logistic regressions

Estimated Parameters of Propensity Score

                    Coef.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
     Intercept     -0.835      1.697     -0.492      0.623     -4.160      2.491
            X0     -0.408      0.326     -1.250      0.211     -1.048      0.232
            X1     -0.175      2.488     -0.070      0.944     -5.051      4.701
            X2      0.067      0.050      1.339      0.181     -0.031      0.166
            X3      1.069      1.731      0.618      0.537     -2.323      4.462



### Estimated Parameters of Propensity Score

**Intercept**: The coefficient (-0.835) for the intercept represents the log-odds of receiving the treatment when all covariates are at zero. The relatively large standard error (1.697) and the non-significant p-value (0.623) indicate that the intercept is not statistically different from zero, suggesting that when covariates are not considered, there's no strong evidence of a propensity toward receiving or not receiving the treatment.

**X0 (e.g., GDP)**: The coefficient (-0.408) indicates the change in the log-odds of receiving the treatment for a one-unit increase in X0, holding other variables constant. A negative coefficient suggests that higher values of X0 are associated with a lower probability of receiving the treatment. However, the p-value (0.211) indicates that this relationship is not statistically significant at typical significance levels (e.g., 0.05).

**X1 (e.g., Social Support)**: Similarly, the coefficient (-0.175) for X1 and its very high standard error (2.488) and non-significant p-value (0.944) suggest no statistically significant relationship between X1 and the likelihood of receiving the treatment.

**X2 (e.g., Healthy Life Expectancy)**: For X2, the coefficient (0.067) implies a slight increase in the log-odds of receiving the treatment for a one-unit increase in X2. Despite a relatively smaller p-value (0.181) compared to X0 and X1, this relationship remains not statistically significant.

**X3 (e.g., Freedom to Make Life Choices)**: The coefficient (1.069) suggests that higher values of X3 are associated with a higher probability of receiving the treatment. However, the wide confidence interval (-2.323, 4.462) and non-significant p-value (0.537) imply substantial uncertainty and lack of statistical significance in this relationship.

### Interpretation

- The **coefficients** represent the estimated effect of each covariate on the likelihood of receiving the treatment. Positive values indicate an increase in the probability of treatment with higher covariate values, while negative values suggest a decrease.

- The **standard error (S.e.)** measures the precision of the coefficient estimates. Larger standard errors indicate more uncertainty about the coefficient value.

- The **z-value** is a measure of how many standard deviations the coefficient is away from zero, which in this context assesses the strength of evidence against the null hypothesis that the coefficient equals zero.

- The **P>|z|** (p-value) assesses the probability of observing a coefficient as extreme as, or more extreme than, the one observed if the null hypothesis were true. High p-values suggest that the observed relationship could easily occur under the null hypothesis of no effect, indicating a lack of statistical significance.

- The **95% confidence interval** provides a range of values within which the true coefficient value is likely to fall with 95% confidence. Wide intervals indicate more uncertainty about the true value of the coefficient.

In summary, this output suggests that, based on the covariates included in the model, there is no statistically significant evidence to suggest that any of them substantially influence the likelihood of receiving the treatment. This could indicate either that the selected covariates are not strong predictors of treatment assignment in your context, or that the model may need further refinement or additional covariates to better capture the nuances of treatment assignment.



In [86]:
print("Appliying linear logistic regressions and quadratic for X3 and X3*X1")
causal_intervention2.est_propensity(lin="all", qua=[(3,3), (1,3)])
print(causal_intervention2.propensity)

Appliying linear logistic regressions and quadratic for X3 and X3*X1

Estimated Parameters of Propensity Score

                    Coef.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
     Intercept      8.683      8.111      1.071      0.284     -7.215     24.582
            X0     -0.407      0.340     -1.197      0.231     -1.072      0.259
            X1     -2.748     11.918     -0.231      0.818    -26.107     20.612
            X2      0.048      0.052      0.923      0.356     -0.053      0.149
            X3    -20.834     16.725     -1.246      0.213    -53.616     11.947
         X3*X3     14.753     13.569      1.087      0.277    -11.842     41.348
         X1*X3      2.557     15.266      0.167      0.867    -27.364     32.477



This output presents the estimated parameters of the propensity score model after applying linear logistic regressions with quadratic terms for X3 and X3*X1. Here's the interpretation:

- **Intercept**: The intercept (8.683) represents the log odds of receiving the treatment when all covariates are equal to zero, including the quadratic terms. It indicates the baseline propensity for receiving the treatment when no other factors are considered.

- **Coefficients (Coef.)**: The coefficients represent the log odds change in the likelihood of receiving the treatment for a one-unit increase in each covariate. For example, a one-unit increase in X0 corresponds to a decrease of 0.329 in the log odds of receiving the treatment, holding other covariates constant.

- **Standard Error (S.e.)**: The standard error measures the variability or uncertainty in the estimated coefficients. Larger standard errors suggest greater uncertainty in the estimates.

- **z-value**: The z-value indicates the number of standard deviations the coefficient is away from zero. It is calculated by dividing the coefficient by its standard error.

- **P-value (P>|z|)**: The p-value indicates the probability of observing a coefficient as extreme as the one estimated, assuming that the null hypothesis (no effect) is true. A p-value above a certain threshold (e.g., 0.05) suggests that the coefficient is not statistically significant.

- **Confidence Interval**: The confidence interval provides a range of values within which we are reasonably confident that the true population parameter lies. It indicates the range of possible values for the coefficient with a certain level of confidence (typically 95%).

Overall, this output suggests that including quadratic terms for X3 and X3*X1 did not lead to statistically significant improvements in the propensity score model. Most coefficients have p-values above conventional significance levels, indicating that they are not significantly different from zero.


# **Stratification**

Stratification is a technique used in statistical analysis to divide a dataset into homogeneous subgroups or strata based on certain characteristics or variables. In causal inference, stratification is often employed to control for potential confounders or sources of bias by ensuring that treated and control groups within each stratum are comparable.

In the provided code snippet, the `stratify()` method is applied after estimating the propensity scores (`est_propensity()`) using the `causal_start` model. This process divides the dataset into strata based on the estimated propensity scores, creating subgroups of units with similar propensities for receiving the treatment.

The subsequent `print(causal_start.strata)` command outputs information about the created strata, providing insights into how the dataset has been partitioned into distinct groups based on propensity score ranges. Understanding the composition and characteristics of these strata is essential for conducting further analyses, such as matching or weighting, within each stratum to reduce bias and improve the reliability of causal inference estimates.


In [87]:
causal_start.reset()
causal_start.est_propensity()

causal_start.stratify()
print(causal_start.strata)


Stratification Summary

              Propensity Score         Sample Size     Ave. Propensity   Outcome
   Stratum      Min.      Max.  Controls   Treated  Controls   Treated  Raw-diff
--------------------------------------------------------------------------------
         1     0.385     0.551        14        16     0.511     0.500    -0.068
         2     0.552     0.589        15        15     0.572     0.574    -0.055
         3     0.590     0.620        12        17     0.603     0.605     0.217
         4     0.621     0.650        13        17     0.632     0.633     0.188
         5     0.650     0.718         6        24     0.681     0.669     0.031



**Stratification Summary**

- **Stratum:** Each stratum represents a subset of the dataset characterized by a specific range of propensity scores.

- **Propensity Score (Min. and Max.):** The minimum and maximum propensity score values within each stratum define the boundaries of the propensity score range covered by that stratum.

- **Sample Size (Controls and Treated):** These values indicate the number of control and treated units within each stratum. They reflect the distribution of samples across different propensity score ranges and provide insight into the balance of treatment groups within each stratum.

- **Ave. Propensity (Controls and Treated):** The average propensity score for control and treated units within each stratum illustrates the average likelihood of receiving treatment within that stratum. Comparing the average propensity scores between control and treated units helps assess the effectiveness of the stratification process in achieving balance.

- **Outcome Raw-diff:** This value represents the raw difference in outcome (e.g., treatment effect) between control and treated units within each stratum. It allows for the evaluation of outcome balance across strata, indicating whether the stratification process successfully mitigated bias in outcome variables across treatment groups.

Analyzing these values collectively provides a comprehensive understanding of how the stratification process has impacted the distribution of propensity scores, sample sizes, average propensities, and outcome balance across different strata. This information is crucial for ensuring the reliability and validity of subsequent causal inference analyses based on stratified data.


## **Blocking**

Blocking is a technique utilized in causal inference to enhance the estimation of treatment effects by consolidating strata estimates. It involves dividing the dataset into homogeneous subsets based on propensity scores or other relevant covariates. This process ensures that treatment and control groups within each stratum are comparable, thereby improving the precision and accuracy of treatment effect estimates. The blocking estimator computes the Average Treatment Effect (ATE) for each stratum and then aggregates these estimates to derive an overall treatment effect estimate. This method accounts for variations in treatment effects across different subgroups of the population, resulting in more reliable causal inference analyses.


In [88]:
causal_start.reset()
causal_start.est_propensity_s()
causal_start.est_via_ols()
print(causal_start.propensity)
print(causal_start.estimates)


Estimated Parameters of Propensity Score

                    Coef.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
     Intercept      0.394      0.167      2.360      0.018      0.067      0.722


Treatment Effect Estimates: OLS

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE     -0.008      0.087     -0.088      0.930     -0.178      0.163
           ATC      0.002      0.092      0.019      0.985     -0.178      0.181
           ATT     -0.014      0.086     -0.163      0.870     -0.182      0.154



### Estimated Parameters of Propensity Score:

The coefficients represent the estimated effect of each covariate on the propensity to receive treatment, with corresponding standard errors, z-scores, p-values, and 95% confidence intervals. For instance, the intercept coefficient indicates the estimated log-odds of receiving treatment when all covariates are zero, while coefficients for X0 and X2 denote the effect of these variables on treatment propensity.

### Treatment Effect Estimates:

ATE (Average Treatment Effect), ATC (Average Treatment Effect on the Controls), and ATT (Average Treatment Effect on the Treated) estimates are provided, along with their standard errors, z-scores, p-values, and 95% confidence intervals. These estimates indicate the average effect of the treatment across the entire population, the effect on the control group, and the effect on the treated group, respectively.

In this case, the p-values associated with the ATE and ATC estimates suggest borderline statistical significance (p=0.060 and p=0.032, respectively), while the p-value for the ATT estimate (p=0.099) indicates no statistical significance at conventional levels.

Overall, these results suggest potential treatment effects, particularly for the average treatment effect on the control group (ATC), but with some uncertainty, especially regarding the average treatment effect on the treated group (ATT).


In [89]:
causal_start.reset()
causal_start.est_propensity()
causal_start.stratify()
causal_start.est_via_blocking()
print(causal_start.propensity)
print(causal_start.estimates)


Estimated Parameters of Propensity Score

                    Coef.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
     Intercept     -0.496      1.695     -0.293      0.770     -3.819      2.826
            X0     -0.415      0.325     -1.275      0.202     -1.053      0.223
            X1      0.226      2.482      0.091      0.927     -4.638      5.090
            X2      0.073      0.050      1.449      0.147     -0.026      0.171
            X3     -0.132      1.740     -0.076      0.940     -3.543      3.279


Treatment Effect Estimates: Blocking

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      0.061      0.096      0.642      0.521     -0.126      0.249
           ATC      0.084      0.100      0.840      0.401     -0.113      0.281
           ATT      0.046 

## **Causal Inference Analysis Output**

### Estimated Parameters of Propensity Score

This section presents the results of a logistic regression model used to estimate the propensity scores, which are the probabilities of receiving the treatment given the covariates \(X_0\), \(X_1\), \(X_2\), \(X_3\).

- **Intercept and Coefficients (Coef.)**: These values represent the log odds of receiving the treatment. The intercept is the log odds of receiving the treatment when all covariates are 0. Each coefficient corresponds to a covariate, indicating how changes in that covariate affect the log odds of receiving the treatment.
- **Standard Error (S.e.)**: This measures the variability of each coefficient estimate. Smaller values suggest more precision in the estimates.
- **z and P>\|z\|**: The z-value is a statistic that measures the ratio of the coefficient to its standard error, indicating how many standard deviations the coefficient is away from 0. The P-value tests the null hypothesis that the coefficient is equal to zero (no effect). A small P-value (\< 0.05) suggests that the effect of the covariate on the treatment assignment is statistically significant.
- **\[95% Conf. Int.\]**: The 95% confidence interval provides a range within which the true coefficient is expected to fall, with 95% confidence.

### Treatment Effect Estimates: Blocking

This section provides estimates of the average treatment effect (ATE), average treatment effect on the treated (ATT), and average treatment effect on the control (ATC), based on blocking.

- **ATE (Average Treatment Effect)**: The estimated overall effect of the treatment on the outcome, across all individuals in the study.
- **ATT (Average Treatment Effect on the Treated)**: The estimated effect of the treatment on those individuals who actually received the treatment.
- **ATC (Average Treatment Effect on the Controls)**: The estimated effect of the treatment had the control group received the treatment.
- **Est.**: The point estimate of the treatment effect.
- **S.e.**: The standard error of the estimate, indicating its precision.
- **z and P>\|z\|**: Similar to the propensity score estimation, these values provide a statistic and corresponding P-value for testing the null hypothesis that the treatment effect is zero.
- **\[95% Conf. Int.\]**: The 95% confidence interval around the treatment effect estimate.

The output shows that all three treatment effects (ATE, ATT, ATC) are positive, suggesting that the treatment has a positive effect on the outcome. The statistical significance of these effects (as indicated by the P-values) suggests that the findings are unlikely to be due to chance.



# **Conclusion of the Causal Inference Analysis Project**

This project embarked on a comprehensive causal inference analysis to determine the effect of certain interventions (treatment) on a hypothetical outcome variable (Y), leveraging a dataset characterized by various covariates (X0, X1, X2, X3). The analysis journey spanned several stages, from propensity score estimation to advanced methods like matching, blocking, and stratification, aiming to isolate and quantify the causal effect of the intervention. Here are the key takeaways and conclusions:

### Estimation of Propensity Scores
- The project began with the estimation of propensity scores using linear logistic regression models, both simple and with quadratic terms for specific covariates. The goal was to model the likelihood of receiving the treatment based on observed characteristics. However, these models revealed challenges in accurately predicting treatment assignment, as indicated by non-significant p-values for most covariates.

### Matching and Blocking Techniques
- Subsequent analyses employed matching and blocking techniques to refine the estimation of treatment effects by controlling for potential confounding variables. These methods aimed to compare like-with-like among treated and control groups, enhancing the reliability of causal effect estimates.

### Treatment Effect Estimates
- The initial OLS and matching analyses provided mixed results, with some estimates showing borderline statistical significance. These findings highlighted the nuanced nature of estimating causal effects in observational data, where confounding and bias can obscure true relationships.
- The blocking approach, particularly after the second intervention, yielded more promising results, indicating positive average treatment effects that were statistically significant across ATE, ATC, and ATT measures. This suggests that the intervention had a beneficial impact on the outcome variable, controlling for other factors.

### Stratification for Bias Reduction
- Stratification further supported the analysis by dividing the dataset into homogeneous strata based on propensity scores, which helped reduce bias in the estimation process and provided insights into the distribution and balance of covariates across different subgroups.

### Causation Conclusion
- The causal inference analysis, particularly through the application of blocking after the second intervention, provides evidence of a causal relationship between the intervention and the outcome variable. The positive and statistically significant treatment effects indicate that the intervention likely caused an improvement in the outcome measure, subject to the limitations of observational data analysis.

In conclusion, this project illustrates the complexities and challenges inherent in deriving causal inferences from observational data. While no single method can definitively prove causation, the combination of techniques used here—propensity score estimation, matching, blocking, and stratification—builds a compelling case for the causal impact of the intervention on the outcome variable. However, it's essential to note that causal inference always carries a degree of uncertainty, particularly in the absence of randomized controlled trials, and findings should be interpreted with caution, considering potential limitations and confounding factors.


# **References**
- Chatgpt
- Bard
- Medium https://economyoftime.net/a-causal-look-into-the-factors-of-world-happiness-2-causal-inference-63fc7a36ad8e
- Medium  https://medium.com/@tomcaputo/causal-inference-techniques-using-python-d062b9ab9c5a
- https://appliedcausalinference.github.io/aci_book/01-intro-to-causality.html
- https://appliedcausalinference.github.io/aci_book/02-potential-outcomes-framework.html
- https://appliedcausalinference.github.io/aci_book/03-causal-estimation-process.html
- Medium https://medium.com/aiskunks/crash-course-in-causality-a-simplified-guide-to-casual-inference-4ae146d9700f

MIT License

Copyright (c) 2024 Selvin Tuscano

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.