You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Attribute ate_ and method ate() do not give the same point estimate and confidence interval. I suspect the former applies a doubly robust correction, whereas the latter may not.
Code to reproduce the issue: import numpy as np import pandas as pd from econml.dml import CausalForestDML
# create synthetic data n = 1000 np.random.seed(1) T = np.random.randint(2, size=n) X = np.random.normal(size=(n, 10)) W = np.random.normal(size=(n, 10)) Y = X**2 * T.reshape(-1, 1) + X * W Y = np.sum(Y, axis=1)
# train model m = CausalForestDML(discrete_treatment=True, max_features='sqrt', random_state=1) m.tune(Y=Y, T=T, X=X, W=W) m.fit(Y=Y, T=T, X=X, W=W) m.summary()
# get ate (same as summary()) print(m.ate_) print(m.ate_stderr_)
# get ate (not the same as summary()) print(m.ate(X=X, T0=0, T1=1)) print(m.ate_interval(X=X, T0=0, T1=1, alpha=.05))
The output is:
Especially, the confidence interval is substantially different.
The use case is to calculate the ATE for data not used for training.
Thanks!
The text was updated successfully, but these errors were encountered:
bart-vanneste
changed the title
How to calculate ATE with CausalForestDML?
Attribute ate_ and method ate() give different results in CausalForestDML
Mar 28, 2023
This is expected; as you note the ate_ attribute applies a double-robustness correction to the computation of the ATE itself (on the training data); the ate() method allows you to compute the ATE for any population by averaging the computed CATE values for each individual, so will not provide exactly the same result; however, if your use case is to compute the ATE for a data set that was not used in training then only the ate() method can be used for that.
Attribute ate_ and method ate() do not give the same point estimate and confidence interval. I suspect the former applies a doubly robust correction, whereas the latter may not.
Code to reproduce the issue:
import numpy as np
import pandas as pd
from econml.dml import CausalForestDML
# create synthetic data
n = 1000
np.random.seed(1)
T = np.random.randint(2, size=n)
X = np.random.normal(size=(n, 10))
W = np.random.normal(size=(n, 10))
Y = X**2 * T.reshape(-1, 1) + X * W
Y = np.sum(Y, axis=1)
# train model
m = CausalForestDML(discrete_treatment=True, max_features='sqrt', random_state=1)
m.tune(Y=Y, T=T, X=X, W=W)
m.fit(Y=Y, T=T, X=X, W=W)
m.summary()
# get ate (same as summary())
print(m.ate_)
print(m.ate_stderr_)
# get ate (not the same as summary())
print(m.ate(X=X, T0=0, T1=1))
print(m.ate_interval(X=X, T0=0, T1=1, alpha=.05))
The output is:
Especially, the confidence interval is substantially different.
The use case is to calculate the ATE for data not used for training.
Thanks!
The text was updated successfully, but these errors were encountered: