Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CEM with replacement? #77

Closed
yceny opened this issue Jun 25, 2021 · 7 comments
Closed

CEM with replacement? #77

yceny opened this issue Jun 25, 2021 · 7 comments

Comments

@yceny
Copy link

yceny commented Jun 25, 2021

Is it possible to use CEM with sampling with replacement? I am aware that there is no argument replace when method = cem is used. I am also aware that setting k2k = TRUE means using nearest neighbor matching without replacement will take place within each stratum. Is it possible to use with replacement here? Also, what does k2k = FALSE mean?

@ngreifer
Copy link
Collaborator

Coarsened exact matching is a method of stratification. That means that the entire dataset is carved up based on the coarsened covariates. Any stratum without both a treated and control unit is discarded, leaving strata that have both treated and control units with the same value of the coarsened covariates. No pairing is done. It doesn't make sense to talk about replacement because no units are "used up" and need to be replaced. They are simply assigned to the stratum they fall in. This is the default use of method = "cem". You could implement this yourself by coarsening the covariates, creating an interaction between all the covariates (e.g., using interaction()), and discarding any units that are in strata without both a treated and control unit. Here's an example of how you could do that:

X1c <- cut(data$X1, 4)
X2c <- cut(data$X2, 3)

strata <- interaction(X1c, X2c)

strata_with_both <- intersect(strata[treat==1], strata[treat==0])

strata[!strata %in% strata_with_both] <- NA

This is how coarsened exact matching is implemented in MatchIt. (Exact matching is implemented the same way without coarsening the covariates). Seeing it this way might be instructive to help you understand the method.

An optional second step is to perform matching within the strata, which you can do by setting k2k = TRUE. All this does is discard data. It is not recommended except when you need to discard data (e.g., because it's too expensive t collect outcome data on all units). What k2k = TRUE does is drop units in the strata until the number of treated units is equal to the number of control units within each stratum. It chooses the ones to discard by running nearest neighbor matching without replacement and discarding the units that are not matched. The pairs that are kept are returned as matched pairs.

It doesn't make sense to talk about coarsened exact matching with replacement because the purpose of the second stage matching is to prune units from the strata, not to create optimally matched pairs (which is the purpose of nearest neighbor matching). This is why replace is ignored with method = "cem".

You can do nearest neighbor matching with replacement with strata of the coarsened variables by creating the coarsened version of the variables yourself and supplying them to the exact argument with method = "nearest". For example, you could run

matchit(treat ~ X1 + X2 + X3, data = data, method = "nearest", replace = TRUE,
              exact = ~cut(X1, 4) +  cut(X2, 3))

This would run nearest neighbor propensity score matching with replacement within strata of coarsened versions of X1 and X2. With nearest neighbor matching, the pairs are the primary output, and the coarsened exact matching is used to limit which units can be paired with each other. In general, it makes more sense to place a caliper on the variables you want close matches on rather than using exact, e.g., caliper = c(X1 = .05, X2 = .1).

@yceny
Copy link
Author

yceny commented Jun 28, 2021

Thank you so much for the detailed explanation. One more question about 1 to 1 matching. In CEM, k2k = TRUE means 1 to 1 matching, right? If I would like to implement 1 to many matching, shall I set k2k = FALSE? How about in nearest neighbor method in terms of 1 to 1/many matching?

@ngreifer
Copy link
Collaborator

With method = "cem", you cannot implement one-to-many matching. As I mentioned, no pairing takes in CEM with k2k = FALSE. If you don't want to drop many units, you should just use the CEM output as-is. There is no reason to additionally do pairing after the stratification. I see almost no reason to set k2k = TRUE.

Using method = "nearest", the ratio argument determines the number of control units paired with each treated unit. This is explained in the ?matchit and ?method_nearest documentation.

@yceny
Copy link
Author

yceny commented Jun 28, 2021

Got you. How does method = nearest deal with categorical variables?

@ngreifer
Copy link
Collaborator

The default is to do propensity score matching. The covariates are included in a logistic regression of the treatment on the covariates and the predicted values are used as the propensity scores. The difference between two units' propensity scores is the distance between the units. So covariates don't feature in nearest neighbor matching, since only the propensity score is used. The fact that a covariate is categorical has no bearing on how it used; it is simply a covariate in the logistic regression model for the propensity score, and logistic regression handles categorical covariates as all regression models do. Propensity score matching is agnostic to the covariates used in the propensity score.

Categorical variables can be supplied to the exact argument to do exact matching on them. They can also feature in the Mahalanobis distance if requested.

@yceny
Copy link
Author

yceny commented Jul 1, 2021

Thanks. Also, in cem, does the dependent variable have to be 0 and 1? Can the dependent variable be like 0,1,2,3?

@ngreifer
Copy link
Collaborator

ngreifer commented Jul 1, 2021

The cem package can handle non-binary treatments, but MatchIt cannot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants