CEM with replacement? #77

yceny · 2021-06-25T23:17:19Z

Is it possible to use CEM with sampling with replacement? I am aware that there is no argument replace when method = cem is used. I am also aware that setting k2k = TRUE means using nearest neighbor matching without replacement will take place within each stratum. Is it possible to use with replacement here? Also, what does k2k = FALSE mean?

The text was updated successfully, but these errors were encountered:

ngreifer · 2021-06-26T01:25:16Z

Coarsened exact matching is a method of stratification. That means that the entire dataset is carved up based on the coarsened covariates. Any stratum without both a treated and control unit is discarded, leaving strata that have both treated and control units with the same value of the coarsened covariates. No pairing is done. It doesn't make sense to talk about replacement because no units are "used up" and need to be replaced. They are simply assigned to the stratum they fall in. This is the default use of method = "cem". You could implement this yourself by coarsening the covariates, creating an interaction between all the covariates (e.g., using interaction()), and discarding any units that are in strata without both a treated and control unit. Here's an example of how you could do that:

X1c <- cut(data$X1, 4)
X2c <- cut(data$X2, 3)

strata <- interaction(X1c, X2c)

strata_with_both <- intersect(strata[treat==1], strata[treat==0])

strata[!strata %in% strata_with_both] <- NA

This is how coarsened exact matching is implemented in MatchIt. (Exact matching is implemented the same way without coarsening the covariates). Seeing it this way might be instructive to help you understand the method.

An optional second step is to perform matching within the strata, which you can do by setting k2k = TRUE. All this does is discard data. It is not recommended except when you need to discard data (e.g., because it's too expensive t collect outcome data on all units). What k2k = TRUE does is drop units in the strata until the number of treated units is equal to the number of control units within each stratum. It chooses the ones to discard by running nearest neighbor matching without replacement and discarding the units that are not matched. The pairs that are kept are returned as matched pairs.

It doesn't make sense to talk about coarsened exact matching with replacement because the purpose of the second stage matching is to prune units from the strata, not to create optimally matched pairs (which is the purpose of nearest neighbor matching). This is why replace is ignored with method = "cem".

You can do nearest neighbor matching with replacement with strata of the coarsened variables by creating the coarsened version of the variables yourself and supplying them to the exact argument with method = "nearest". For example, you could run

matchit(treat ~ X1 + X2 + X3, data = data, method = "nearest", replace = TRUE,
              exact = ~cut(X1, 4) +  cut(X2, 3))

This would run nearest neighbor propensity score matching with replacement within strata of coarsened versions of X1 and X2. With nearest neighbor matching, the pairs are the primary output, and the coarsened exact matching is used to limit which units can be paired with each other. In general, it makes more sense to place a caliper on the variables you want close matches on rather than using exact, e.g., caliper = c(X1 = .05, X2 = .1).

yceny · 2021-06-28T00:45:22Z

Thank you so much for the detailed explanation. One more question about 1 to 1 matching. In CEM, k2k = TRUE means 1 to 1 matching, right? If I would like to implement 1 to many matching, shall I set k2k = FALSE? How about in nearest neighbor method in terms of 1 to 1/many matching?

ngreifer · 2021-06-28T02:35:34Z

With method = "cem", you cannot implement one-to-many matching. As I mentioned, no pairing takes in CEM with k2k = FALSE. If you don't want to drop many units, you should just use the CEM output as-is. There is no reason to additionally do pairing after the stratification. I see almost no reason to set k2k = TRUE.

Using method = "nearest", the ratio argument determines the number of control units paired with each treated unit. This is explained in the ?matchit and ?method_nearest documentation.

yceny · 2021-06-28T07:45:22Z

Got you. How does method = nearest deal with categorical variables?

ngreifer · 2021-06-28T07:57:00Z

The default is to do propensity score matching. The covariates are included in a logistic regression of the treatment on the covariates and the predicted values are used as the propensity scores. The difference between two units' propensity scores is the distance between the units. So covariates don't feature in nearest neighbor matching, since only the propensity score is used. The fact that a covariate is categorical has no bearing on how it used; it is simply a covariate in the logistic regression model for the propensity score, and logistic regression handles categorical covariates as all regression models do. Propensity score matching is agnostic to the covariates used in the propensity score.

Categorical variables can be supplied to the exact argument to do exact matching on them. They can also feature in the Mahalanobis distance if requested.

yceny · 2021-07-01T03:49:47Z

Thanks. Also, in cem, does the dependent variable have to be 0 and 1? Can the dependent variable be like 0,1,2,3?

ngreifer · 2021-07-01T04:17:09Z

The cem package can handle non-binary treatments, but MatchIt cannot.

ngreifer closed this as completed Jul 15, 2021

ngreifer mentioned this issue Apr 20, 2023

"Simple" exact 1:1 matchint not working #164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CEM with replacement? #77

CEM with replacement? #77

yceny commented Jun 25, 2021

ngreifer commented Jun 26, 2021

yceny commented Jun 28, 2021

ngreifer commented Jun 28, 2021

yceny commented Jun 28, 2021

ngreifer commented Jun 28, 2021

yceny commented Jul 1, 2021

ngreifer commented Jul 1, 2021

CEM with replacement? #77

CEM with replacement? #77

Comments

yceny commented Jun 25, 2021

ngreifer commented Jun 26, 2021

yceny commented Jun 28, 2021

ngreifer commented Jun 28, 2021

yceny commented Jun 28, 2021

ngreifer commented Jun 28, 2021

yceny commented Jul 1, 2021

ngreifer commented Jul 1, 2021