Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute ate_ and method ate() give different results in CausalForestDML #753

Open
bart-vanneste opened this issue Mar 27, 2023 · 2 comments

Comments

@bart-vanneste
Copy link

Attribute ate_ and method ate() do not give the same point estimate and confidence interval. I suspect the former applies a doubly robust correction, whereas the latter may not.

Code to reproduce the issue:
import numpy as np
import pandas as pd
from econml.dml import CausalForestDML

# create synthetic data
n = 1000
np.random.seed(1)
T = np.random.randint(2, size=n)
X = np.random.normal(size=(n, 10))
W = np.random.normal(size=(n, 10))
Y = X**2 * T.reshape(-1, 1) + X * W
Y = np.sum(Y, axis=1)

# train model
m = CausalForestDML(discrete_treatment=True, max_features='sqrt', random_state=1)
m.tune(Y=Y, T=T, X=X, W=W)
m.fit(Y=Y, T=T, X=X, W=W)
m.summary()

# get ate (same as summary())
print(m.ate_)
print(m.ate_stderr_)

# get ate (not the same as summary())
print(m.ate(X=X, T0=0, T1=1))
print(m.ate_interval(X=X, T0=0, T1=1, alpha=.05))

The output is:
results

Especially, the confidence interval is substantially different.

The use case is to calculate the ATE for data not used for training.

Thanks!

@bart-vanneste bart-vanneste changed the title How to calculate ATE with CausalForestDML? Attribute ate_ and method ate() give different results in CausalForestDML Mar 28, 2023
@kbattocchi
Copy link
Collaborator

This is expected; as you note the ate_ attribute applies a double-robustness correction to the computation of the ATE itself (on the training data); the ate() method allows you to compute the ATE for any population by averaging the computed CATE values for each individual, so will not provide exactly the same result; however, if your use case is to compute the ATE for a data set that was not used in training then only the ate() method can be used for that.

@bart-vanneste
Copy link
Author

Thanks much @kbattocchi

And how can one calculate the ATE with a double-robustness correction on data not used in training?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants