Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puzzling rbind() behaviour #98

Closed
sconti555 opened this issue Jan 14, 2022 · 4 comments
Closed

Puzzling rbind() behaviour #98

sconti555 opened this issue Jan 14, 2022 · 4 comments

Comments

@sconti555
Copy link

sconti555 commented Jan 14, 2022

Dear Noah,

Many thanks for packaging together so many matching-related facilities in your excellent MatchIt R library. I've recently started using it due to its versatility, but have stumbled in what I suspect may be a bug in the rbind() function.

I've been applying separate Optimal Matching routines to non-overlapping subsets of a data-set identified according to a 'stratum' factor, with the intention of then binding derived matched sub-samples into a full matched data-set via rbind(). However I've been noticing that, by doing so, I end up with a fully matched data-set that has twice the number of rows I'd expect. For instance, running the example around the LaLonde data-set detailed in the on-line documentation to rbind.matchdata() I obtain nrow(md_b) = 174, nrow(md_h) = 22 and nrow(md_w) = 36 but nrow(md_all) = 464 = 2 * (174 + 22 + 36), instead of the expected 232 = 174 + 22 + 36.

I understand from the rbind.matchdata() documentation that the function's main purpose is to disambiguate the 'subclass' factor levels when stacking the matched sub-samples; is however the duplication of rows an intended consequence? If so, it remains unclear to me how I can then carry out diagnostic checks (like those based on e.g. SMDs or VRs) on the re-combined matched data-set. Am I missing something here? Of note, I'm running MatchIt 4.3.2 on R 4.0.3 on Windows.

Thank you in advance for your help and time!

--
Stefano

@sconti555 sconti555 changed the title Inconsistent rbind() behaviour Puzzling rbind() behaviour Jan 14, 2022
@ngreifer
Copy link
Collaborator

Hi Stefano,

Thank you so much for pointing out this bug. It was easy to fix, and I have fixed it in the development version of the package. I'll be uploading it to CRAN shortly. I really appreciate you letting me know about this!

Noah

ngreifer added a commit to ngreifer/MatchIt that referenced this issue Jan 16, 2022
@sconti555
Copy link
Author

Hi Noah,

Sincere thanks for quickly confirming and addressing the bug nature of the rbind.matchdata() feature I wrote to you about.

Unfortunately, given the sensitive nature of the (individual-level health-care) data I work with, I won't have access to your upgraded package as the R environment I work with is off-line. In the meantime I had figured that all I needed to do was to only retain rows in the stacked matched data-frame corresponding to values of the 'subclass' factor featuring a "_" (underscore); is this correct?

All the best,

--
Stefano

@ngreifer
Copy link
Collaborator

That should work. You can also manually do what rbind.matchdata() does.

Change the levels of subclass in each dataset to ensure they are unique, then use rbind.data.frame() to bind them together.

@sconti555
Copy link
Author

sconti555 commented Jan 16, 2022

Thank you for your confirmation, Noah; much appreciated. Indeed your proposed approach is formally equivalent to the one I've outlined and am adopting.

You'll have noticed I sent you a separate query on your Gmail account, as it wasn't related to this (or any other) bug. Do by all means take your time to address it, if at all: I don't mean to abuse of your availability.

All the best,

--
Stefano

@ngreifer ngreifer closed this as completed Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants