## what is causal model?

 A "causal model" is a framework or mathematical model used to describe and analyze cause-and-effect relationships between variables. In machine learning and statistics, causal models help us understand how changes in one variable (the cause) can directly affect another variable (the effect), beyond mere correlation. These models are essential for tasks like predicting the outcome of interventions, policy analysis, and scientific discovery.


### Assumptions

Common assumptions in causal modeling:
 
 1. **Causal Sufficiency**: All relevant confounders (common causes of treatment and outcome) are measured and included in the model.
 2. **No Unmeasured Confounding**: There are no hidden variables that affect both the treatment and the outcome.
 3. **Consistency**: The observed outcome under the actual treatment received is the same as the potential outcome under that treatment.
 4. **Positivity (Overlap)**: Every unit has a positive probability of receiving each level of the treatment, given the confounders.
 5. **Correct Model Specification**: The functional form of the relationships between variables is correctly specified.
 6. **Stable Unit Treatment Value Assumption (SUTVA)**: The treatment of one unit does not affect the outcome of another unit (no interference).
 
 These assumptions are crucial for identifying and estimating causal effects from data.




#### What is confounding?
 
**Confounding** occurs when a third variable, known as a confounder, influences both the treatment (or exposure) and the outcome, creating a spurious association between them. In other words, a confounder is a variable that can make it appear as though there is a causal relationship between the treatment and the outcome, when in fact the observed association is (at least partly) due to the confounder.
 
For example, suppose we want to study whether drinking coffee (treatment) causes heart disease (outcome). If age is related to both coffee consumption and heart disease risk, then age is a confounder. Failing to account for age could lead to incorrect conclusions about the effect of coffee on heart disease.
 
In causal modeling, it is crucial to identify and adjust for confounders to estimate the true causal effect of a treatment or intervention.


### Use cases

Example use cases of causal models in machine learning

 1. Healthcare: Estimating the effect of a new drug or treatment on patient outcomes, accounting for confounding variables.
 2. Economics: Predicting the impact of policy changes (e.g., tax reforms) on employment or economic growth.
 3. Marketing: Understanding how a marketing campaign causally affects sales, separating true effect from correlation.
 4. Social Sciences: Studying the causal impact of education on income or social mobility.
 5. Recommendation Systems: Determining whether showing a user a particular item causes them to make a purchase, rather than just being correlated.
 6. A/B Testing: Analyzing the causal effect of website changes on user engagement or conversion rates.
 7. Public Policy: Evaluating the effectiveness of interventions (e.g., vaccination programs) on public health outcomes.

 These use cases highlight how causal models go beyond correlation to answer "what if" and "why" questions, enabling better decision-making.


In [None]:
# Example: Estimating the causal effect of a treatment using the backdoor adjustment (with simulated data)

import numpy as np
import pandas as pd
import statsmodels.api as sm

# Simulate data for a simple causal model:
# Z (confounder) -> X (treatment) -> Y (outcome)
# Z also affects Y directly (confounding)

np.random.seed(42)
n = 1000

# Simulate confounder Z
Z = np.random.normal(0, 1, n)

# Simulate treatment X, affected by Z
X = 0.5 * Z + np.random.normal(0, 1, n)

# Simulate outcome Y, affected by both X and Z
Y = 2.0 * X + 1.5 * Z + np.random.normal(0, 1, n)

# Create a DataFrame
df = pd.DataFrame({'Z': Z, 'X': X, 'Y': Y})

# Naive regression: Estimate effect of X on Y without adjusting for Z (ignores confounding)
naive_model = sm.OLS(df['Y'], sm.add_constant(df['X'])).fit()
print("Naive estimate (ignoring confounder Z):")
print(naive_model.summary())

# Proper regression: Adjust for confounder Z (backdoor adjustment)
adjusted_model = sm.OLS(df['Y'], sm.add_constant(df[['X', 'Z']])).fit()
print("\nAdjusted estimate (controlling for confounder Z):")
print(adjusted_model.summary())

# How to interpret the results:
# - Look at the coefficient for 'X' in both models.
# - In the naive model, the coefficient for X represents the *association* between X and Y,
#   which may be biased due to confounding by Z.
# - In the adjusted model, the coefficient for X represents the *causal effect* of X on Y,
#   because we have controlled for the confounder Z.
# - The closer the adjusted coefficient is to the true simulated value (2.0), the better.
# - The p-value for X tells you whether the effect is statistically significant.
# - The coefficient for Z in the adjusted model should be close to its true effect (1.5).
# - Always interpret the adjusted model's X coefficient as the estimated average causal effect
#   of the treatment (X) on the outcome (Y), assuming all confounders are properly controlled.



Naive estimate (ignoring confounder Z):
                            OLS Regression Results                            
Dep. Variable:                      Y   R-squared:                       0.734
Model:                            OLS   Adj. R-squared:                  0.733
Method:                 Least Squares   F-statistic:                     2750.
Date:                Wed, 03 Sep 2025   Prob (F-statistic):          5.05e-289
Time:                        15:35:05   Log-Likelihood:                -1937.6
No. Observations:                1000   AIC:                             3879.
Df Residuals:                     998   BIC:                             3889.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       

### Pros

- Allows for estimation of causal effects rather than just associations.
 - Enables answering "what if" (counterfactual) questions.
 - Supports better decision-making in policy, medicine, business, etc.
 - Can control for confounding variables to reduce bias.
 - Provides a framework for understanding mechanisms and pathways.
 - Facilitates generalization of findings to new settings (external validity).


### Cons

- Requires strong assumptions (e.g., no unmeasured confounding) that may not hold in practice.
 - Causal inference methods can be sensitive to model misspecification.
 - Data collection for all relevant variables (especially confounders) can be difficult or expensive.
 - Results can be biased if important confounders are omitted.
 - Interpretation of causal effects may be challenging in complex systems.
 - Some methods require large sample sizes for reliable estimation.
 - Not all causal questions can be answered from observational data alone.
