Puzzling rbind() behaviour #98

sconti555 · 2022-01-14T15:21:38Z

Dear Noah,

Many thanks for packaging together so many matching-related facilities in your excellent MatchIt R library. I've recently started using it due to its versatility, but have stumbled in what I suspect may be a bug in the rbind() function.

I've been applying separate Optimal Matching routines to non-overlapping subsets of a data-set identified according to a 'stratum' factor, with the intention of then binding derived matched sub-samples into a full matched data-set via rbind(). However I've been noticing that, by doing so, I end up with a fully matched data-set that has twice the number of rows I'd expect. For instance, running the example around the LaLonde data-set detailed in the on-line documentation to rbind.matchdata() I obtain nrow(md_b) = 174, nrow(md_h) = 22 and nrow(md_w) = 36 but nrow(md_all) = 464 = 2 * (174 + 22 + 36), instead of the expected 232 = 174 + 22 + 36.

I understand from the rbind.matchdata() documentation that the function's main purpose is to disambiguate the 'subclass' factor levels when stacking the matched sub-samples; is however the duplication of rows an intended consequence? If so, it remains unclear to me how I can then carry out diagnostic checks (like those based on e.g. SMDs or VRs) on the re-combined matched data-set. Am I missing something here? Of note, I'm running MatchIt 4.3.2 on R 4.0.3 on Windows.

Thank you in advance for your help and time!

--
Stefano

ngreifer · 2022-01-16T06:03:59Z

Hi Stefano,

Thank you so much for pointing out this bug. It was easy to fix, and I have fixed it in the development version of the package. I'll be uploading it to CRAN shortly. I really appreciate you letting me know about this!

Noah

sconti555 · 2022-01-16T15:45:04Z

Hi Noah,

Sincere thanks for quickly confirming and addressing the bug nature of the rbind.matchdata() feature I wrote to you about.

Unfortunately, given the sensitive nature of the (individual-level health-care) data I work with, I won't have access to your upgraded package as the R environment I work with is off-line. In the meantime I had figured that all I needed to do was to only retain rows in the stacked matched data-frame corresponding to values of the 'subclass' factor featuring a "_" (underscore); is this correct?

All the best,

--
Stefano

ngreifer · 2022-01-16T18:05:09Z

That should work. You can also manually do what rbind.matchdata() does.

Change the levels of subclass in each dataset to ensure they are unique, then use rbind.data.frame() to bind them together.

sconti555 · 2022-01-16T21:31:11Z

Thank you for your confirmation, Noah; much appreciated. Indeed your proposed approach is formally equivalent to the one I've outlined and am adopting.

You'll have noticed I sent you a separate query on your Gmail account, as it wasn't related to this (or any other) bug. Do by all means take your time to address it, if at all: I don't mean to abuse of your availability.

All the best,

--
Stefano

sconti555 changed the title ~~Inconsistent rbind() behaviour~~ Puzzling rbind() behaviour Jan 14, 2022

ngreifer added a commit to ngreifer/MatchIt that referenced this issue Jan 16, 2022

Fixed a bug in rbind() (kosukeimai#98)

ba8dcc8

ngreifer closed this as completed Mar 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Puzzling rbind() behaviour #98

Puzzling rbind() behaviour #98

sconti555 commented Jan 14, 2022 •

edited

ngreifer commented Jan 16, 2022

sconti555 commented Jan 16, 2022

ngreifer commented Jan 16, 2022

sconti555 commented Jan 16, 2022 •

edited

Puzzling rbind() behaviour #98

Puzzling rbind() behaviour #98

Comments

sconti555 commented Jan 14, 2022 • edited

ngreifer commented Jan 16, 2022

sconti555 commented Jan 16, 2022

ngreifer commented Jan 16, 2022

sconti555 commented Jan 16, 2022 • edited

sconti555 commented Jan 14, 2022 •

edited

sconti555 commented Jan 16, 2022 •

edited