# (Hainmueller, 2012) Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies

This paper proposes *entropy balancing*, a data preprocessing method to achieve covariate balance in observational studies with binary treatments.

Entropy balancing relies on a maximum entropy reweighting scheme that calibrates unit weights so that the reweighted treatment and control group satisfy a potentially large set of prespecified balance conditions that incorporate information about known sample moments (first, second, and possibly higher moments).

These balance improvements can reduce model dependence for the subsequent estimation of treatment effects. It also obviates the need for continual balance checking and iterative searching over propensity score models that mayu stochastically balance the covariate moments.

## 1. Introduction

One important concern is that many commonly used preprocessing approaches do not directly focus on the goal of producing covariate balance.

In contrast, entropy balancing involves a reweighting scheme that directly incorporates covariate balance into the weight function that is applied to the sample units.

The author claims that there are 4 advantages to using entropy balancing:
1. Most importantly, allows high degree of covariate balance by balancing first, second, and possibly higher moments of the covariate distributions as well as interactions.
2. Retains valuable information in the preprocessed data by allowing the unit weights to vary smoothly across units, in contrast to methods like NNM where units are discarded.
3. Fairly versatile. The resulting weights can be used for a simple weighted difference in means, a weighted OLS, etc.
4. Computationally attractive since the optimization problem to find the unit weights is well behaved and globally convex.

This paper borrows methods from survey, moments estimation, empirical likelihood, exponential tilting, and missing data literature.

## 2. Observational Study with Binary Treatments

### 2.1 Framework

A brief on the potential outcomes framework.

### 2.2 Achieving Balance with Matching and Propensity Score Methods

The practice of matching, assessing balance, and redo'ing is sometimes referred to as the "propensity score tautology" and has been criticized. This iterative process can be tedious and frequently results in low balance levels.

One way to improve the search for a better balancing score is to replace the logistic regression with a better estimation technique for the assignment mechanism such as boosted regression or kernel regression. Entropy balancing takes a different approach and directly focuses on covariate balance.

## 3. Entropy Balancing

A preprocessing procedure that allows researchers to create balanced samples for the subsequent estimation of treatment effects. The balance constraints ensure that the reweighted groups match exactly on the specified moments.

### 3.1 Entropy Balancing Scheme

For convenience, supose that the researcher's goal is to reweight the control group to match the moments of the treatment group in order to subsequently estimate

$$\text{PATT}=\tau=E[Y(1)|D=1]-E[Y(0)|D=1]$$

using the difference in mean outcomes between the treatment group and the reweighted control group. In this case, the counterfactual mean may be estimated by

$$\hat{E}[Y(0)|D=1]=\frac{\sum_{\{i|D=0\}}Y_iw_i}{\sum_{\{i|D=0\}}w_i}$$

where $w_i$ is a weight chosen for each control unit. The weights are chosen by the following reweighting scheme

$$\min_{w_i}H(w)=\sum_{\{i|D=0\}}h(w_i)$$

subject to balance and normalizing constraints

$$\sum_{\{i|D=0\}}w_ic_{ri}(X_i)=m_r$$with $r\in 1,...,R$

$$\sum_{\{i|D=0\}}w_i=1$$ and

$$w_i > 0$$ for all $i$ such as $D=0$,

where $h(\cdot)$ is a distance metric and $c_{ri}(X_i)=m_r$ describes a set of $R$ balance constraints imposed on the covariate moments of the reweighted control group.

The loss function $h(\cdot)$ is a distance metric chosen from the general class of empirical minimum discrepancy estimators from Cressie 1988. The algorithm prefers Kullback (1959)'s *entropy* divergence, $h(w_i)=w_i\log (w_i/q_i)$ with estimated weight $w_i$ and base weight $q_i$, and is presumely where the term "entropy balance" comes from.

The entropy balancing scheme can be understood as a generalization of the conventional propensity score weighting approach where the researcher first estimates the unit weights with a logistic regression and then computes balance checks to see if the estimated weights indeed equalize the covariate distributions.

In many empirical cases, we would expect the bulk of the confounding to depend on the first and second moments.

### 3.2 Implementation

Some math regarding Lagrange multiplers that I did not take the time to understand.

### 3.3 Alternative Base Weights

same

### 3.4 Estimation in the Preprocessed Data

The outcome model can be any of the traditionally used models such as outcome regression. Note that the outcome model can further address the correlation between the outcome and covariates in the weighted data and also provide variance estimates for the treatment effects. In addition, such regression models may include covariates or interactions that are not directly incldued in the reweighting to remove bias that may arise from remaining differences between the treatment and the reweighted control group. The outcome model may also increase precision if the additional variables in the outcome model account for residual variation in the outcome of interest.

### 3.5 Entropy Balancing and Other Preprocessing Methods

Entropy balancing share a similarity with genetic matching as it also directly focuses on covariate balance.

Entropy balancing is also related to CEM as covariate balance is specified before the preprocessing adjustment, but entropy balancing also differs from CEM in important ways as CEM discards units while EB does not.

### 3.6 Potential Limitations

some issues with entropy balancing.

If there exists no set of positive weights to satisfy the constraints, for example if the treatment group is 1% male but control is 99% male. Of-course, this challenge of finding good matches with limited overlap is shared by all matching methods.

Also, while there may be a solution, due to limited overlap, the solution involves an extreme adjustment to the weights of some control units. As in the few control units may receive large weights because they contribute most information about the counterfactual of interest. Large weights increase the variance for the subsequent analysis. In this case, there may be weight refinement done to trim weigfhts that are considered too large.

### 3.7 Weight Refinements

this may be similar conceptually to stabilized weights in other weighting methods.

## 4. Monte Carlo Simulations

Compares Naive Difference in means, PSM, MD, GM, PSMD, PSW, and EB (entropy balancing).

### 4.1 Design

Hainsmueller's simulation study showed that EB had the lowest root MSE for considering the 3 cases of:
- equal variance
- unequal variance
- irrelevant covariates



### 4.2 Results

EB claims to have been best. apparently in larger samples, all methods did well.

## 5. Empirical Applications

### 5.1 The LaLonde Data

Data set used very commonly as a canonical benchmark in the causal inference literature.

### 5.2 News Media Persuasion

a typical political science survey data

## 6. Conclusion

While EB simplifies the search for covariate balance for practicioners, it is important to notice that other problems that are commonly associated with preprocessing methods still apply, such as the fact that it does not provide any safeguards against bias from unmeasured confounders, a vexing problem in observational studies.