# Oui-Love Plots: Outcome-informed Love plots for covariate balance in

causal inference

Ehud Karavani [](https://orcid.org/0000-0002-0187-5437) (IBM Research, Israel)  
April 20, 2024

TODO: \* convert language \* citations \* discussion \* OUI-ASMD score. \## Abstract

## Introduction

Covariate balancing is a significant and essential concept for causal inference from observational studies. Generally, balance diagnostics assess the difference in covariate distribution between exposures levels. Intuitively, if the covariate distribution is the same across exposure groups, there will be no systematic bias in exposure and difference in outcomes will only be due to the exposure status. Therefore, balancing is the most common diagnostics for assessing balancing methods, like inverse propensity weighting (IPW) or matching \[cite that review\].

Ideally, covariate balance is assessed over `{latex}\emph{confounding variables}.` These are variables affecting both the exposure and the outcome, and which failing to adjust for can introduce bias when examining the influence of the exposure on the outcome. <!-- Usually, we would like to examine how pre-adjustment confounders differ across exposure groups, and whether adjustment (like IPW or Matching) was able to reduce those differences, effectively eliminating their contribution on the outcome, and therefore isolating the effect of exposure. --> Adjusting for these confounders reduces their difference across exposure groups, effectively equaling their influence on the outcome across exposure groups; thus enabling the model to isolate the effect of the exposure.

Whether a variable is a confounder cannot be determined from the data itself. Therefore, the most common approach to select confounders for an adjustment set <!-- (and more generally, how any pair of variables interact)  --> is to let a domain expert hand pick them manually. The structure arising from this specification is often formulated and presented as a Directed Acyclic Graph (DAG), where variables are depicted as nodes and two nodes are connected by an arrow if one causally influences the other \[cite DAG intro\]. <!-- Since This formulation of the structure of the problem, often through a Directed Acyclic Graph (DAG),  --> When specifying how any pair of variables interact, the exclusion of an edge – assuming there is no association whatsoever – is a much stronger assumption than its inclusion – allowing the model to infer zero association from the data. Thus, when determining the structure, modellers may prefer to err on the inclusion of edges, rather than their exclusion.

Most importantly, since the structure – and therefore the confounders – are determined a-priori, based on prior knowledge rather than data, we denote them . However, an a-priori may not necessarily be an a-posteriori . Namely, the prior assumption a variable is associated with both exposure and outcome may not manifest at the data at hand. This is quite plausible, even in the absence of finite sampling errors, since we often build DAGs one pair at a time, failing to grasp how many factors may interact to determine the exposure or outcome conditioned on all the other factors. For instance, a factor we assumed is considered when prescribing medicine may not be used, or we can fail to understand the mechanism under which a certain factor conditionally explains the outcome. Both will result with the factor not being statistically associated with the exposure or outcome, respectively (or both).

Covariate balance assessments only capture covariate-exposure associations. This can be insufficient. For example, such a putative confounder may heavily influence differential exposure, leading to high imbalance, but have no impact on the outcome (these may be referred to as instrumental variables). In such cases, it will be unnecessary (or even harmful \[cite z-bias\]) to balance over that putative confounder. However, commonly used diagnostics like the (Absolute) Standardized Mean Difference (ASMD) \[cite\] and the corresponding Love plot \[cite\] will fail to capture that and may mislead the researcher to focus on where they should not.

To overcome that, information about the covariate-outcome association should be incorporated to provide a fuller picture. Fortunately, assessing the statistical importance of a variable on an outcome is a very known problem in regression modeling and machine learning \[cite one harrell one ML book\]. Assessing covariate-outcome importance will provide us with additional information – orthogonal to the covariate-exposure balance assessment – that will allow us to, literally speaking, paint a fuller picture about confounder imbalance.

Our contribution in this manuscript is twofold. First, we will provide visual augmentation to the known Love plot based on both standardized mean difference and covariate-outcome importance. Second, we will combine both measures together to suggest better metrics for covariate/model selection. \[TODO: isn’t high-dimensional propensity score?\].

## Outcome-informed Love plot

``` python
fig = mpl.figure.Figure(figsize=(7, 3))
axes = fig.subplot_mosaic("AE", sharex=True)
fig, axes = mpl.pyplot.subplot_mosaic("AE", figsize=(7, 3), sharex=True)
p = ouilove_plot(
    plot_data, 
    plot_range=True,
    order_by_importance=False,
    opacity=False,
    pointsize=False,
    importance_metric="mse",
    legend=False,
    threshold=0.1, ax=axes["A"],
)
axes["A"].set_title("Love plot (standard)")

p = ouilove_plot(
    plot_data, 
    plot_range=True,
    order_by_importance=True,
    opacity=True,
    pointsize=True,
    importance_metric="mse",
    legend=True,
    threshold=0.1, ax=axes["E"],
)
axes["E"].set_title("Outcome-informed Love plot")
fig.tight_layout()
# fig
```

![](attachment:index_files/figure-ipynb/notebooks-ouilove_plot-fig-love-oui-love-output-1.svg)

Data visualization allows us to encode numeric data values as visual elements. When we “visualize data” we essentially map between data values to graphical elements. Good, consistent visualization will map different dimensions of the data to different encoding channel types using appropriate graphical marks.

For instance, a classic Love plot \[cite\] (@fig-love-oui-love) maps the covariates to the y-axis - so each covariates get its own row, the (absolute) standardized mean difference (ASMD) of each covariate to the x-axis, the type of the model (e.g., weighted/unweighted) to the color (blue/orange) and possibly also to different markers (circle/triangle). All in all, we mapped three data dimensions: covariates, their ASMD, and adjustment model, to three graphical channels: y-axis, x-axis, and color (and possibly a fourth channel of marker shape for emphasis).

In outcome-informed Love plot, we calculate an additional data dimension: the covariate-outcome impact (“feature importance”) of each covariate on the outcome. The covariate-outcome importance is a non-negative score positively associated with importance. Namely, a low score indicates relatively small importance – a change in the covariate levels is associated with only small changes in the outcome, and a high score indicates relatively high importance – a change in the covariate levels is associated with large changes in the outcome. There are several approaches to calculate feature importance, which we describe in more details in the Methods \[cross ref\].

To augment the traditional Love plot, the additional dimension for the covariate-outcome importance score should be mapped to one or more visual channels. In this work we suggest three candidate channels \[@fig-oui-encoding\], which can be combined arbitrarily:

1.  The opacity channel. Marks corresponding to more important covariates are more opaque, while less important marks are more transparent.
2.  The size channel. Marks corresponding to more important covariate are larger, while less important marks are smaller.
3.  The order of the y-axis. Covariates are ranked by their importance with more important covariates appearing on top.

The common property for options (1) and (2) is that less important covariates appear less prominent, either being smaller or more transparent. The argument being that if they do not influence the outcome, they will not bias the estimation, and therefore are not important and less interesting to examine. If they are less interesting to examine, there is less need for them to stand out and can therefore be salient. This will reduce clutter and allow the viewer to focus on the more important (and thus prominent) covariates. Meanwhile, option (3) clusters more important covariates to specific regions of the plot, but breaks the standard of ordering covariates by the unadjusted ASMD that may be familiar to practitioners. All options achieve a similar objective of differential attention onto more important covariates, either by differential prominence (transparency and size) or by differential spatial location (order). See Figure \[cross ref\] for details.

``` python
fig = mpl.figure.Figure(figsize=(8, 3))
axes = fig.subplot_mosaic("BCD")
fig, axes = mpl.pyplot.subplot_mosaic("BCD", figsize=(8, 3))
p = ouilove_plot(
    plot_data, 
    plot_range=True,
    order_by_importance=False,
    opacity=True,
    pointsize=False,
    importance_metric="mse",
    legend=False,
    threshold=0.1, ax=axes["B"],
)
axes["B"].set_title("Opacity")
p = ouilove_plot(
    plot_data, 
    plot_range=True,
    order_by_importance=True,
    opacity=False,
    pointsize=False,
    importance_metric="mse",
    legend=True,
    threshold=0.1, ax=axes["D"],
)
axes["D"].set_title("Order")
p = ouilove_plot(
    plot_data, 
    plot_range=True,
    order_by_importance=False,
    opacity=False,
    pointsize=True,
    importance_metric="mse",
    legend=False,
    threshold=0.1, ax=axes["C"],
)
axes["C"].set_title("Size")
axes["B"].set_xlabel(None)
axes["D"].set_xlabel(None)
# fig.suptitle(
#     "Visual channels to incorporate outcome information",
# )
fig.tight_layout()
# fig
```

![](attachment:index_files/figure-ipynb/notebooks-ouilove_plot-fig-oui-encoding-output-1.svg)

## Outcome-informed balance metrics

Oftentimes, we use the ASMD for model selection. For instance, the maximum ASMD after adjustment (e.g., weighting or matching) across all covariates can be considered as a good summary of the Love plot. Max post-adjustment ASMD describes the worst case scenario for imbalance. If our model can keep the max ASMD – and therefore the ASMD for all covariate – under reasonable tolerance (0.1 threshold is arbitrarily but commonly used), then we can gain further trust in the downstream effect being estimated. Once we have a single numeric metric that diagnoses model performance, we can use it to choose between two (or more) candidates models, choosing the one with minimal post-adjustment max ASMD.

However, as argued above, ASMD alone can be a poor diagnostic. If the covariate with max ASMD has little conditional association with the outcome, there is little benefit in making the effort to improve its balancing. That covariate should not be part of the model’s objective. In fact, that covariate creates a distorted image of the desired confounder balance.

One possible solution is combining the ASMD with the covariate importance measures into a single number metric. In this manuscript, we argue for the multiplication of the two. Multiplication can assess the interaction of the two complementary, orthogonal measures. Specifically, it addresses the issue depicted above naturally, by allowing the two measures to cancel out each other. For example, small covariate-outcome importance will lead to small score overall, regardless of how large or small is the corresponding ASMD, <!-- and similarly for small ASMD.  --> and vice versa. <!-- This answers directly the desiderata --> High scoring covariates will, therefore, only be comprised of both large ASMD large covariate-outcome importance - meaning strong covariate-exposure association and strong covariate-outcome association, which is exactly the definition of a confounder.

## Methods

### Covariate balance measures

The task of assessing covariate balancing is essentially a two-sample test between the exposed and unexposed. Since two-sample tests often do not scale, making the comparison of two multivariable distributions ill-defined, researchers resort to comparing multiple univariable distributions by examining each covariate separately. For balance assessment in causal inference modeling, the most common metric used is the standardized mean difference (SMD). The SMD is the difference in covariate averages divided by the pooled standard error. Mathematically, for each covariate $j$ we define: $$
SMD_j = \frac{\bar{x}_j \vert_{A=1}- \bar{x}_j\vert_{A=0}}{\sqrt{\hat{\sigma}_j^2\vert_{A=1} + \hat{\sigma}_j^2\vert_{A=0}}}
$$ Where $\bar{x}_j \vert_{A=1}$ is the average of feature $x_j$ among those exposed, and $\hat{\sigma}_j^2\vert_{A=0}$ is the estimated standard deviation of $x_j$ among the unexposed. Furthermore, since the direction of the bias is insignificant, we further take the absolute value and denote the $ASMD_j = \left\vert SMD_j \right\vert$.

### Covariate importance measures

The task of assessing the influence of covariates on an outcome is a well established task in statistics, often utilized for dimensionality reduction (feature selection) or model selection. There are multiple approaches to compute this importance: regression models can use absolute magnitude of coefficients (assuming input is standardized), or non-zero coefficients in L1-penalized (LASSO) regression. In models more generally, covariate importance can be assessed by how “excluding” each covariate affect some goodness-of-fit metric. This “exclusion” is either done by removing the feature entirely from the model or just shuffling its values across observations. The goodness-of-fit evaluated can be any arbitrary metric like decreasing the loss or increasing the accuracy. The change in goodness-of-fit can either be multiplicative or additive, grounding the full model (with al covariates) as the baseline to compare against. Covariates that are more important for predicting the outcome will cause larger decrease in performance relative to the full model (including these covariates), meaning they are the ones driving the accuracy of the predictions.

In this work, our goodness-of-fit metric is the natural deviance, which, since our outcome is continuous, is the mean squared error. Additionally, we consider only importance measures that are conditional on all other covariates and exposure. Namely, we don’t consider multiple univariable importance measures, and we always additionally condition on the exposure \[vanderwheele\]. However, outcome-informed Love plot can work with any arbitrary non-negative importance measure, where lower scores correspond to small importance and higher scores to high importance.

### Data

We present our augmented Love plot on a minimally sufficient data simulation. The simulation includes four covariates, one ($X_0$) is not associated with neither the exposure ($A$) nor the outcome ($Y$), one ($X_A$) is associated only with the exposure, another ($X_Y$) only with the outcome, and one ($X_{AY}$), whose a true confounder, is associated with both. Mathematically, the full generating process is $$
\begin{aligned}
  Y &\sim A + X_Y + X_{AY} + \epsilon \\  
  A &\sim \text{Bernoulli}(\pi) \\  
  \text{logit}(\pi) &= X_A + X_{AY} \\  
  X_0, X_A, X_Y, X_{AY} &\sim \text{Normal}(0, 1) \\
  \epsilon &\sim \text{Normal}(0, 1)
\end{aligned}
$$ The directed acyclic graph depicting this setting (@fig-dag1) describes a setting where $X_0, X_A, X_Y$ are wrongly considered to be confounders (influence both exposure and outcome) a-priori, but are actually not.

``` python
fig, axes = plt.subplots(1, 2, figsize=(10, 2.5))

G = nx.DiGraph([
    ("$A$", "$Y$"), 
    ("$X_A$", "$A$"),
    ("$X_{AY}$", "$A$"), ("$X_{AY}$", "$Y$"), 
    ("$X_Y$", "$Y$"),
])
G.add_node("$X_0$")
pos = {
    "$A$":[0,0], "$Y$":[5,0],
    "$X_0$":[1, 2], "$X_A$":[2,2],
    "$X_{AY}$":[3, 2], "$X_Y$":[4, 2],
}

nx.draw(G, pos=pos, ax=axes[1], with_labels=True, node_color="white", font_size=14)
nx.draw(G, pos=pos, ax=axes[0], with_labels=True, node_color="white", font_size=14)
nx.draw_networkx_edges(
    G, pos=pos,
    edgelist=[
        ("$X_0$", "$A$"), ("$X_0$", "$Y$"),
        ("$X_A$", "$Y$"), ("$X_Y$", "$A$"),
    ],
    style="--",
    edge_color="#9e3434",  # "#d12e2e"
    ax=axes[0],
)
axes[0].set_title("Assumed confounding structure", fontsize=14, pad=12)
axes[1].set_title("Actual confounding structure", fontsize=14, pad=12);
fig.subplots_adjust(wspace=0);
# fig.tight_layout();
```

![](attachment:index_files/figure-ipynb/notebooks-dag_dgp_1-fig-dag1-output-1.svg)

## Discussion

### More covariate balance plots

This encoding is not limited to the standard Love plot, but to other covariate-balance plots as well like scatter plots and slope graphs

### TODO:

if our method is just the importance of outcome, we can end up balancing many prognostic confounders that are not actually confounders. we can also calculate the covariate-exposure importance and then multiple the two importances. But doesn’t the covariate-exposure importance captures by the ASMD to begin with?