<a href="https://colab.research.google.com/github/rohitm487/Causal_Inference_in_Python/blob/main/5_Causal_Inference_Double_Selection_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Lasso Regression:** We fit two separate Lasso models:

## Outcome on control variables.

- Treatment on control variables.

This helps us select the control variables that are most predictive of the outcome and the treatment.

**Combining Non-Zero Coefficients:** The selected features from both *Lasso regressions (with non-zero coefficients)* are combined for the final regression model.

**Final Regression (Double Selection):** Using the selected features and the treatment indicator, we fit the final regression model.

**Comparing Models:** We compare the results of the naive model (only treatment), the full model (all control variables and treatment), and the double selection model (selected control variables and treatment).



In [3]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LassoCV
from sklearn.model_selection import train_test_split
import statsmodels.api as sm

# Step 1: Simulate Data (Alternatively, load your dataset)
def simulate_data(n=1000, p=20):
    np.random.seed(0)
    X = np.random.randn(n, p)  # control variables
    treatment = np.random.binomial(1, 0.5, size=n)  # binary treatment variable
    # Outcome variable with some causal effect of the treatment and noise
    outcome = 2 * treatment + np.dot(X, np.random.randn(p)) + np.random.randn(n)
    return pd.DataFrame(X, columns=[f'X{i}' for i in range(p)]), pd.Series(treatment, name="Treatment"), pd.Series(outcome, name="Outcome")

X, treatment, outcome = simulate_data()


In [4]:

# Step 2: Fit Lasso Regression (Outcome ~ Control Variables)
lasso_outcome = LassoCV(cv=5).fit(X, outcome)
nonzero_outcome = np.where(lasso_outcome.coef_ != 0)[0]  # non-zero coefficients


In [5]:

# Step 3: Fit Lasso Regression (Treatment ~ Control Variables)
lasso_treatment = LassoCV(cv=5).fit(X, treatment)
nonzero_treatment = np.where(lasso_treatment.coef_ != 0)[0]  # non-zero coefficients


In [6]:

# Step 4: Combine non-zero coefficients from both models
selected_features = np.union1d(nonzero_outcome, nonzero_treatment)
X_selected = X.iloc[:, selected_features]


In [7]:

# Step 5: Fit final regression (Double Selection)
X_double_selection = sm.add_constant(pd.concat([X_selected, treatment], axis=1))
double_selection_model = sm.OLS(outcome, X_double_selection).fit()


In [8]:

# Step 6: Fit Naive and Full Models for comparison
# Naive regression: Outcome ~ Treatment
X_naive = sm.add_constant(treatment)
naive_model = sm.OLS(outcome, X_naive).fit()


In [9]:

# Full Model: Outcome ~ All Control Variables + Treatment
X_full = sm.add_constant(pd.concat([X, treatment], axis=1))
full_model = sm.OLS(outcome, X_full).fit()


In [10]:

# Step 7: Compare Models
print("Naive Model:")
print(naive_model.summary())

print("\nFull Model:")
print(full_model.summary())

print("\nDouble Selection Model:")
print(double_selection_model.summary())


Naive Model:
                            OLS Regression Results                            
Dep. Variable:                Outcome   R-squared:                       0.064
Model:                            OLS   Adj. R-squared:                  0.063
Method:                 Least Squares   F-statistic:                     67.89
Date:                Fri, 11 Oct 2024   Prob (F-statistic):           5.40e-16
Time:                        17:09:26   Log-Likelihood:                -2910.9
No. Observations:                1000   AIC:                             5826.
Df Residuals:                     998   BIC:                             5836.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.3314      0.202     -1