<table border="0">
    <tr>
        <td>
            <img src="https://ictd2016.files.wordpress.com/2016/04/microsoft-research-logo-copy.jpg" style="width 30px;" />
             </td>
        <td>
            <img src="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/12/MSR-ALICE-HeaderGraphic-1920x720_1-800x550.jpg" style="width 100px;"/></td>
        </tr>
</table>

# Orthogonal Random Forest and Causal Forest: Use Cases and Examples

Causal Forests and Generalized Random Forests are a flexible method for estimating treatment effect heterogeneity with Random Forests. Orthogonal Random Forest (ORF) combines orthogonalization, a technique that effectively removes the confounding effect in two-stage estimation, with generalized random forests. Due to the orthogonalization aspect of this method, the ORF performs especially well in the presence of high-dimensional confounders. For more details, see [this paper](https://arxiv.org/abs/1806.03467) or the [EconML docummentation](https://econml.azurewebsites.net/).

The EconML SDK implements the following OrthoForest variants:

* DMLOrthoForest: suitable for continuous or discrete treatments

* DROrthoForest: suitable for discrete treatments

* CausalForest: suitable for both discrete and continuous treatments

In this notebook, we show the performance of the ORF on synthetic and observational data. 

## Notebook Contents

1. [Example Usage with Continuous Treatment Synthetic Data](#1.-Example-Usage-with-Continuous-Treatment-Synthetic-Data)
2. [Example Usage with Binary Treatment Synthetic Data](#2.-Example-Usage-with-Binary-Treatment-Synthetic-Data)
3. [Example Usage with Multiple Treatment Synthetic Data](#3.-Example-Usage-with-Multiple-Treatment-Synthetic-Data)
4. [Example Usage with Real Continuous Treatment Observational Data](#4.-Example-Usage-with-Real-Continuous-Treatment-Observational-Data)

In [None]:
import econml

In [None]:
# Main imports
from econml.orf import DMLOrthoForest, DROrthoForest
from econml.dml import CausalForestDML
from econml.sklearn_extensions.linear_model import WeightedLassoCVWrapper, WeightedLasso, WeightedLassoCV

# Helper imports
import numpy as np
from itertools import product
from sklearn.linear_model import Lasso, LassoCV, LogisticRegression, LogisticRegressionCV
import matplotlib.pyplot as plt

%load_ext autoreload
%autoreload 2

%matplotlib inline

# 1. Example Usage with Continuous Treatment Synthetic Data

## 1.1 DGP 
We use the data generating process (DGP) from [here](https://arxiv.org/abs/1806.03467). The DGP is described by the following equations:

\begin{align}
T =& \langle W, \beta\rangle + \eta, & \;\eta \sim \text{Uniform}(-1, 1)\\
Y =& T\cdot \theta(X) + \langle W, \gamma\rangle + \epsilon, &\; \epsilon \sim \text{Uniform}(-1, 1)\\
W \sim& \text{Normal}(0,\, I_{n_w})\\
X \sim& \text{Uniform}(0,1)^{n_x}
\end{align}

where $W$ is a matrix of high-dimensional confounders and $\beta, \gamma$ have high sparsity.

For this DGP, 
\begin{align}
\theta(x) = \exp(2\cdot x_1).
\end{align}

In [None]:
# Treatment effect function
def exp_te(x):
    return np.exp(2*x[0])

In [None]:
# DGP constants
np.random.seed(123)
n = 1000
n_w = 30
support_size = 5
n_x = 1
# Outcome support
support_Y = np.random.choice(range(n_w), size=support_size, replace=False)
coefs_Y = np.random.uniform(0, 1, size=support_size)
epsilon_sample = lambda n: np.random.uniform(-1, 1, size=n)
# Treatment support 
support_T = support_Y
coefs_T = np.random.uniform(0, 1, size=support_size)
eta_sample = lambda n: np.random.uniform(-1, 1, size=n) 

# Generate controls, covariates, treatments and outcomes
#W = np.random.normal(0, 1, size=(n, n_w))
#X = np.random.normal(1,1,size=(n,n_x))#np.random.uniform(0, 1, size=(n, n_x))
# Heterogeneous treatment effects
TE = np.array([exp_te(x_i) for x_i in X])
#T = np.dot(W[:, support_T], coefs_T) + eta_sample(n)
#Y = TE * T + np.dot(W[:, support_Y], coefs_Y) + epsilon_sample(n)

In [None]:
from datagen import DGPGraph
class CF_DGP():
    
    def __init__(self, nx=1, n_w=30,support_size=5):
        self.coefs_T = np.zeros(n_w)
        self.coefs_T[0:support_size] = np.random.normal(1,1,size=support_size)
        
        self.coefs_Y = np.zeros(n_w)
        self.coefs_Y[0:support_size] = np.random.uniform(0,1,size=support_size)

        def fW(n):
            n_w = 30
            return np.random.normal(0,1,size=(n,n_w))

        def fX(n):
            n_x = 1
            #return np.random.normal(0,1,size=(n,n_x))
            return np.random.uniform(-1,1,size=(n,n_x))

        def fT(W,n):
            return W@self.coefs_T + np.random.uniform(-1,1,size=n) 
            
        def fY(X,W,T,n):
            TE = np.exp(2*X[:,0])
            return TE*T + W@self.coefs_Y + np.random.uniform(-1,1,size=n)
    
        dgp = DGPGraph()
        dgp.add_node('W',fW)
        dgp.add_node('X',fX)
        dgp.add_node('T',fT, parents=['W'])
        dgp.add_node('Y',fY, parents=['X','W','T'])
        self.dgp = dgp
    
d1 = CF_DGP()  
data = d1.dgp.sample(1000)
X, W, T, Y = data['X'], data['W'], data['T'], data['Y']

## 1.2. Train Estimator

**Note:** The models in the final stage of the estimation (``model_T_final``, ``model_Y_final``) need to support sample weighting. 

If the models of choice do not support sample weights (e.g. ``sklearn.linear_model.LassoCV``), the ``econml`` packages provides a convenient wrapper for these models ``WeightedModelWrapper`` in order to allow sample weights.

In [None]:
# ORF parameters and test data

X_test = np.array(list(product(np.arange(0, 1, 0.01), repeat=n_x)))

subsample_ratio = 0.3
lambda_reg = np.sqrt(np.log(n_w) / (10 * subsample_ratio * n))
est = DMLOrthoForest(
    n_trees=1000, min_leaf_size=5,
    max_depth=50, subsample_ratio=subsample_ratio,
    model_T=Lasso(alpha=lambda_reg),
    model_Y=Lasso(alpha=lambda_reg),
    model_T_final=WeightedLasso(alpha=lambda_reg),
    model_Y_final=WeightedLasso(alpha=lambda_reg),
    global_residualization=False,
    random_state=123)

To use the built-in confidence intervals constructed via Bootstrap of Little Bags, we can specify `inference="blb"` at `fit` time or leave the default `inference='auto'` which will automatically use the Bootstrap of Little Bags.

In [None]:
est.fit(Y, T, X=X, W=W, inference="blb")

In [None]:
est.ate_interval(X=X)

In [None]:
est.ate_inference(X=X)

In [None]:
# Calculate treatment effects
treatment_effects = est.effect(X_test)

In [None]:
# Calculate default (95%) confidence intervals for the test data
te_lower, te_upper = est.effect_interval(X_test)

In [None]:
res = est.effect_inference(X_test)

In [None]:
res.summary_frame().head()

In [None]:
res.population_summary()

Similarly we can estimate effects and get confidence intervals and inference results using a `CausalForest`.

In [None]:
est2 = CausalForestDML(model_t=Lasso(alpha=lambda_reg),
                       model_y=Lasso(alpha=lambda_reg),
                       n_estimators=4000, min_samples_leaf=5,
                       max_depth=50,
                       verbose=0, random_state=123)
est2.tune(Y, T, X=X, W=W)
est2.fit(Y, T, X=X, W=W)
treatment_effects2 = est2.effect(X_test)
te_lower2, te_upper2 = est2.effect_interval(X_test, alpha=0.01)

## 1.3. Performance Visualization

In [None]:
plt.figure(figsize=(15, 5))
plt.subplot(1, 2, 1)
plt.title("ContinuousOrthoForest")
plt.plot(X_test, treatment_effects, label='ORF estimate')
expected_te = np.array([exp_te(x_i) for x_i in X_test])
plt.plot(X_test[:, 0], expected_te, 'b--', label='True effect')
plt.fill_between(X_test[:, 0], te_lower, te_upper, label="95% BLB CI", alpha=0.3)
plt.ylabel("Treatment Effect")
plt.xlabel("x")
plt.legend()
plt.subplot(1, 2, 2)

plt.title("CausalForest")
plt.plot(X_test, treatment_effects2, label='ORF estimate')
expected_te = np.array([exp_te(x_i) for x_i in X_test])
plt.plot(X_test[:, 0], expected_te, 'b--', label='True effect')
plt.fill_between(X_test[:, 0], te_lower2, te_upper2, label="95% BLB CI", alpha=0.3)
plt.ylabel("Treatment Effect")
plt.xlabel("x")
plt.legend()
plt.show()

# 2. Example Usage with Binary Treatment Synthetic Data

## 2.1. DGP 
We use the following DGP:

\begin{align}
T \sim & \text{Bernoulli}\left(f(W)\right), &\; f(W)=\sigma(\langle W, \beta\rangle + \eta), \;\eta \sim \text{Uniform}(-1, 1)\\
Y = & T\cdot \theta(X) + \langle W, \gamma\rangle + \epsilon, & \; \epsilon \sim \text{Uniform}(-1, 1)\\
W \sim & \text{Normal}(0,\, I_{n_w}) & \\
X \sim & \text{Uniform}(0,\, 1)^{n_x}
\end{align}

where $W$ is a matrix of high-dimensional confounders, $\beta, \gamma$ have high sparsity and $\sigma$ is the sigmoid function.

For this DGP, 
\begin{align}
\theta(x) = \exp( 2\cdot x_1 ).
\end{align}

In [None]:
# DGP constants
np.random.seed(1234)
n = 1000
n_w = 30
support_size = 5
n_x = 1
# Outcome support
support_Y = np.random.choice(range(n_w), size=support_size, replace=False)
coefs_Y = np.random.uniform(0, 1, size=support_size)
epsilon_sample = lambda n: np.random.uniform(-1, 1, size=n)
# Treatment support
support_T = support_Y
coefs_T = np.random.uniform(0, 1, size=support_size)
eta_sample = lambda n: np.random.uniform(-1, 1, size=n) 

# Generate controls, covariates, treatments and outcomes
#W = np.random.normal(0, 1, size=(n, n_w))
#X = np.random.uniform(0, 1, size=(n, n_x))
# Heterogeneous treatment effects
TE = np.array([exp_te(x_i) for x_i in X])
# Define treatment
log_odds = np.dot(W[:, support_T], coefs_T) + eta_sample(n)
T_sigmoid = 1/(1 + np.exp(-log_odds))
#T = np.array([np.random.binomial(1, p) for p in T_sigmoid])
# Define the outcome
#Y = TE * T + np.dot(W[:, support_Y], coefs_Y) + epsilon_sample(n)

# ORF parameters and test data
subsample_ratio = 0.4
X_test = np.array(list(product(np.arange(0, 1, 0.01), repeat=n_x)))

In [None]:
class Binary_CF_DGP():
    
    def __init__(self, n_w=30,support_size=5, beta_tx = 0, beta_yx = 0):
        self.n_w = n_w
        self.beta_tx = beta_tx
        self.beta_yx = beta_yx
        self.coefs_T = np.zeros(n_w)
        self.coefs_T[0:support_size] = np.random.uniform(0,1,size=support_size)
        
        self.coefs_Y = np.zeros(n_w)
        self.coefs_Y[0:support_size] = np.random.uniform(0,1,size=support_size)

        def fW(n):
            return np.random.normal(0,1,size=(n,self.n_w))

        def fX(n):
            return np.random.uniform(0,1,size=(n))

        def fT(X,W,n):
            log_odds = self.beta_tx*X + W@self.coefs_T + np.random.uniform(-1,1,size=n) 
            p = 1/(1 + np.exp(-log_odds))
            return np.random.binomial(1,p)
            
        def fY(X,W,T,n):
            TE = X#np.exp(2*X)
            return TE*T +self.beta_yx*X+ W@self.coefs_Y + np.random.uniform(-1,1,size=n)
    
        dgp = DGPGraph()
        dgp.add_node('W',fW)
        dgp.add_node('X',fX)
        dgp.add_node('T',fT, parents=['X','W'])
        dgp.add_node('Y',fY, parents=['X','W','T'])
        self.dgp = dgp

In [None]:
class Binary_CF_DGP():
    
    def __init__(self, n_w=30,support_size=5, beta_tx = 0, beta_yx = 0):
        self.n_w = n_w
        self.beta_tx = beta_tx
        self.beta_yx = beta_yx
        self.coefs_T = np.zeros(n_w)
        self.coefs_T[0:support_size] = np.random.uniform(0,1,size=support_size)
        
        self.coefs_Y = np.zeros(n_w)
        self.coefs_Y[0:support_size] = np.random.uniform(0,1,size=support_size)

        def fW(n):
            return np.random.normal(0,1,size=(n,self.n_w))

        def fX(n):
            return np.random.uniform(0,1,size=(n))

        def fT(X,W,n):
            log_odds = self.beta_tx*X + 0.1*W@self.coefs_T + np.random.uniform(-1,1,size=n) 
            p = 1/(1 + np.exp(-log_odds))
            return np.random.binomial(1,p)
            
        def fY(X,W,T,n):
            TE = X# np.exp(2*X)
            return TE*T +self.beta_yx*X+ W@self.coefs_Y + np.random.uniform(-1,1,size=n)
    
        dgp = DGPGraph()
        dgp.add_node('W',fW)
        dgp.add_node('X',fX)
        dgp.add_node('T',fT, parents=['X','W'])
        dgp.add_node('Y',fY, parents=['X','W','T'])
        self.dgp = dgp
        
d2 = Binary_CF_DGP(beta_tx = 1, beta_yx = 1)
data = d2.dgp.sample(2000)
X, W, T, Y = data['X'].reshape(-1,1), data['W'], data['T'], data['Y']

In [None]:
# propensity of T|X,W

## 2.2. Train Estimator 

In [None]:
est = DROrthoForest(
    n_trees=200, min_leaf_size=10,
    max_depth=30, subsample_ratio=subsample_ratio,
    propensity_model = LogisticRegression(C=1/(X.shape[0]*lambda_reg), penalty='l1', solver='saga'),
    model_Y = Lasso(alpha=lambda_reg),
    propensity_model_final=LogisticRegression(C=1/(X.shape[0]*lambda_reg), penalty='l1', solver='saga'), 
    model_Y_final=WeightedLasso(alpha=lambda_reg)
)

In [None]:
est.fit(Y, T, X=X, W=W)
dro_ate = est.ate(X=X)

In [None]:
# Calculate treatment effects for the default treatment points T0=0 and T1=1
treatment_effects = est.effect(X_test)

In [None]:
# Calculate default (95%) confidence intervals for the default treatment points T0=0 and T1=1
te_lower, te_upper = est.effect_interval(X_test)

In [None]:
est2 = CausalForestDML(model_y=Lasso(alpha=lambda_reg),
                       model_t=LogisticRegression(C=1/(X.shape[0]*lambda_reg)),
                       n_estimators=200, min_samples_leaf=5,
                       max_depth=50, max_samples=subsample_ratio/2,
                       discrete_treatment=True,
                       random_state=123)
est2.fit(Y, T, X=X, W=W, cache_values=True)
treatment_effects2 = est2.effect(X_test)
te_lower2, te_upper2 = est2.effect_interval(X_test)
cf_ate = est2.at(X=X)

In [None]:
#est2.summary()

In [None]:
from econml.metalearners import TLearner, SLearner, XLearner, DomainAdaptationLearner
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import LinearRegression

models = GradientBoostingRegressor()
est3 = TLearner(models=models)
est_t.fit(Y, T, X=X)
est_t.ate(X=X), est_t.ate_interval(X=X)

models = 
est3 = TLearn


## 2.3. Performance Visualization

In [None]:
estimated_te = d2.dgp.cate(20000,'T','Y','X',X_test[:,0])

In [None]:
plt.figure(figsize=(15, 5))
plt.subplot(1, 2, 1)
plt.title("DiscreteTreatmentOrthoForest")
plt.plot(X_test, treatment_effects, label='ORF estimate')
expected_te = np.array([exp_te(x_i) for x_i in X_test])
#plt.plot(X_test[:, 0], expected_te, 'b--', label='True effect')
plt.plot(X_test[:,0], estimated_te,'r--',label='MCMC True effect')
plt.fill_between(X_test[:, 0], te_lower, te_upper, label="95% BLB CI", alpha=0.3)
plt.ylabel("Treatment Effect")
plt.xlabel("x")
plt.legend()
plt.subplot(1, 2, 2)
plt.title("CausalForest")
plt.plot(X_test, treatment_effects2, label='ORF estimate')
expected_te = np.array([exp_te(x_i) for x_i in X_test])
#plt.plot(X_test[:, 0], expected_te, 'b--', label='True effect')
plt.fill_between(X_test[:, 0], te_lower2, te_upper2, label="95% BLB CI", alpha=0.3)
plt.ylabel("Treatment Effect")
plt.xlabel("x")
plt.legend()
plt.show()

# 3. Example Usage with Multiple Treatment Synthetic Data

## 3.1. DGP 
We use the following DGP:

\begin{align}
Y = & \sum_{t=1}^{n_{\text{treatments}}} 1\{T=t\}\cdot \theta_{T}(X) + \langle W, \gamma\rangle + \epsilon, \; \epsilon \sim \text{Unif}(-1, 1), \\
\text{Pr}[T=t \mid W] \propto & \exp\{\langle W, \beta_t \rangle\}, \;\;\;\; \forall t\in \{0, 1, \ldots, n_{\text{treatments}}\} 
\end{align}

where $W$ is a matrix of high-dimensional confounders, $\beta_t, \gamma$ are sparse.

For this particular example DGP we used $n_{\text{treatments}}=3$ and 
\begin{align}
\theta_1(x) = & \exp( 2 x_1 ),\\
\theta_2(x) = &  3 \cdot \sigma(100\cdot (x_1 - .5)),\\
\theta_3(x) = & -2 \cdot \sigma(100\cdot (x_1 - .25)),
\end{align}
where $\sigma$ is the sigmoid function.

In [None]:
def get_test_train_data(n, n_w, support_size, n_x, te_func, n_treatments):
    # Outcome support
    support_Y = np.random.choice(range(n_w), size=support_size, replace=False)
    coefs_Y = np.random.uniform(0, 1, size=support_size)
    epsilon_sample = lambda n: np.random.uniform(-1, 1, size=n)
    # Treatment support 
    support_T = support_Y
    coefs_T = np.random.uniform(0, 1, size=(support_size, n_treatments))
    eta_sample = lambda n: np.random.uniform(-1, 1, size=n) 
    # Generate controls, covariates, treatments and outcomes
    W = np.random.normal(0, 1, size=(n, n_w))
    X = np.random.uniform(0, 1, size=(n, n_x))
    # Heterogeneous treatment effects
    TE = np.array([te_func(x_i, n_treatments) for x_i in X])
    log_odds = np.dot(W[:, support_T], coefs_T)
    T_sigmoid = np.exp(log_odds)
    T_sigmoid = T_sigmoid/np.sum(T_sigmoid, axis=1, keepdims=True)
    T = np.array([np.random.choice(n_treatments, p=p) for p in T_sigmoid])
    TE = np.concatenate((np.zeros((n,1)), TE), axis=1)
    Y = TE[np.arange(n), T] + np.dot(W[:, support_Y], coefs_Y) + epsilon_sample(n)
    X_test = np.array(list(product(np.arange(0, 1, 0.01), repeat=n_x)))

    return (Y, T, X, W), (X_test, np.array([te_func(x, n_treatments) for x in X_test]))

In [None]:
import scipy.special
def te_func(x, n_treatments):
    return [np.exp(2*x[0]), 3*scipy.special.expit(100*(x[0] - .5)) - 1, -2*scipy.special.expit(100*(x[0] - .25))]

np.random.seed(123)
(Y, T, X, W), (X_test, te_test) = get_test_train_data(2000, 3, 3, 1, te_func, 4)

## 3.2. Train Estimator

In [None]:
est = DROrthoForest(n_trees=500, model_Y = WeightedLasso(alpha=lambda_reg))

In [None]:
est.fit(Y, T, X=X, W=W)

In [None]:
# Calculate marginal treatment effects
treatment_effects = est.const_marginal_effect(X_test)

In [None]:
# Calculate default (95%) marginal confidence intervals for the test data
te_lower, te_upper = est.const_marginal_effect_interval(X_test)

In [None]:
res = est.const_marginal_effect_inference(X_test)

In [None]:
res.summary_frame()

In [None]:
est2 = CausalForestDML(model_y=Lasso(alpha=lambda_reg),
                       model_t=LogisticRegression(C=1/(X.shape[0]*lambda_reg)),
                       n_estimators=4000, min_samples_leaf=5,
                       max_depth=50, max_samples=subsample_ratio/2,
                       discrete_treatment=True,
                       random_state=123)
est2.fit(Y, T, X=X, W=W)
treatment_effects2 = est2.const_marginal_effect(X_test)
te_lower2, te_upper2 = est2.const_marginal_effect_interval(X_test, alpha=.01)

## 3.3. Performance Visualization

In [None]:
plt.figure(figsize=(15, 5))
plt.subplot(1, 2, 1)
plt.title("DiscreteTreatmentOrthoForest")
y = treatment_effects
colors = ['b', 'r', 'g']
for it in range(y.shape[1]):
    plt.plot(X_test[:, 0], te_test[:, it], '--', label='True effect T={}'.format(it), color=colors[it])
    plt.fill_between(X_test[:, 0], te_lower[:, it], te_upper[:, it], alpha=0.3, color='C{}'.format(it))
    plt.plot(X_test, y[:, it], label='ORF estimate T={}'.format(it), color='C{}'.format(it))
plt.ylabel("Treatment Effect")
plt.xlabel("x")
plt.legend()
plt.subplot(1, 2, 2)
plt.title("CausalForest")
y = treatment_effects2
colors = ['b', 'r', 'g']
for it in range(y.shape[1]):
    plt.plot(X_test[:, 0], te_test[:, it], '--', label='True effect T={}'.format(it), color=colors[it])
    plt.fill_between(X_test[:, 0], te_lower2[:, it], te_upper2[:, it], alpha=0.3, color='C{}'.format(it))
    plt.plot(X_test, y[:, it], label='ORF estimate T={}'.format(it), color='C{}'.format(it))
plt.ylabel("Treatment Effect")
plt.xlabel("x")
plt.legend()
plt.show()

# 4. Example Usage with Real Continuous Treatment Observational Data

We applied our technique to Dominick’s dataset, a popular historical dataset of store-level orange juice prices and sales provided by University of Chicago Booth School of Business. 

The dataset is comprised of a large number of covariates $W$, but researchers might only be interested in learning the elasticity of demand as a function of a few variables $x$ such
as income or education. 

We applied the `ContinuousTreatmentOrthoForest` to estimate orange juice price elasticity
as a function of income, and our results, unveil the natural phenomenon that lower income consumers are more price-sensitive.

## 4.1. Data

In [None]:
# A few more imports
import os
import pandas as pd
import urllib.request
from sklearn.preprocessing import StandardScaler

In [None]:
# Import the data
file_name = "oj_large.csv"

if not os.path.isfile(file_name):
    print("Downloading file (this might take a few seconds)...")
    urllib.request.urlretrieve("https://msalicedatapublic.blob.core.windows.net/datasets/OrangeJuice/oj_large.csv", file_name)
oj_data = pd.read_csv(file_name)
oj_data.head()

In [None]:
# Prepare data
Y = oj_data['logmove'].values
T = np.log(oj_data["price"]).values
scaler = StandardScaler()
W1 = scaler.fit_transform(oj_data[[c for c in oj_data.columns if c not in ['price', 'logmove', 'brand', 'week', 'store']]].values)
W2 = pd.get_dummies(oj_data[['brand']]).values
W = np.concatenate([W1, W2], axis=1)
X = oj_data[['INCOME']].values

## 4.2. Train Estimator

In [None]:
# Define some parameters
n_trees = 1000
min_leaf_size = 50
max_depth = 20
subsample_ratio = 0.04

In [None]:
est = DMLOrthoForest(
        n_trees=n_trees, min_leaf_size=min_leaf_size, max_depth=max_depth, 
        subsample_ratio=subsample_ratio,
        model_T=Lasso(alpha=0.1),
        model_Y=Lasso(alpha=0.1),
        model_T_final=WeightedLassoCVWrapper(cv=3), 
        model_Y_final=WeightedLassoCVWrapper(cv=3)
       )

In [None]:
est.fit(Y, T, X=X, W=W)

In [None]:
min_income = 10.0 
max_income = 11.1
delta = (max_income - min_income) / 100
X_test = np.arange(min_income, max_income + delta - 0.001, delta).reshape(-1, 1)

In [None]:
# Calculate marginal treatment effects
treatment_effects = est.const_marginal_effect(X_test)

In [None]:
# Calculate default (95%) marginal confidence intervals for the test data
te_upper, te_lower = est.const_marginal_effect_interval(X_test)

In [None]:
est2 = CausalForestDML(model_y=WeightedLassoCVWrapper(cv=3),
                       model_t=WeightedLassoCVWrapper(cv=3),
                       n_estimators=n_trees, min_samples_leaf=min_leaf_size, max_depth=max_depth,
                       max_samples=subsample_ratio/2,
                       random_state=123)
est2.fit(Y, T, X=X, W=W)
treatment_effects2 = est2.effect(X_test)
te_lower2, te_upper2 = est2.effect_interval(X_test)

## 4.3. Performance Visualization

In [None]:
# Plot Orange Juice elasticity as a function of income
plt.figure(figsize=(15, 5))
plt.subplot(1, 2, 1)
plt.plot(X_test.flatten(), treatment_effects, label="OJ Elasticity")
plt.fill_between(X_test.flatten(), te_lower, te_upper, label="95% BLB CI", alpha=0.3)
plt.xlabel(r'$\log$(Income)')
plt.ylabel('Orange Juice Elasticity')
plt.legend()
plt.title("Orange Juice Elasticity vs Income: ContinuousTreatmentOrthoForest")
plt.subplot(1, 2, 2)
plt.plot(X_test.flatten(), treatment_effects2, label="OJ Elasticity")
plt.fill_between(X_test.flatten(), te_lower2, te_upper2, label="95% BLB CI", alpha=0.3)
plt.xlabel(r'$\log$(Income)')
plt.ylabel('Orange Juice Elasticity')
plt.legend()
plt.title("Orange Juice Elasticity vs Income: CausalForest")
plt.show()