-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mangling of duplicated colnames in data frames resulting from a query breaks Bioconductor packages #259
Comments
Thanks. I did check BioConductor packages that import RSQLite directly (though not all of them, some couldn't install or check on my system) and none of them had failing tests. Which packages are failing now? Happy to discuss a compatibility option or perhaps an attribute that describes the renaming. |
More than 30 packages are failing. For example: https://master.bioconductor.org/checkResults/3.7/bioc-LATEST/annotate/malbec2-buildsrc.html You won't necessarily catch the problems by checking only packages that import RSQLite directly. You don't have to do anything for this, we've already worked on a fix. I opened this issue to discuss the possibility to make name mangling optional or at least easy to reverse. Thanks! |
Ideally, queries would be written to return unique column names (via I'm happy to check more BioConductor packages. Would you like to share a list of packages that are now failing? An attribute that describes the renaming seems like a good solution. |
Bioconductor packages that were failing are:
As I said, we already came up with a fix. The fix is actually in AnnotationDbi: Bioconductor/AnnotationDbi@2a4aa6a We have a sophisticated SQL generator in AnnotationDbi that was written a long time ago (> 10 years) that automatically generates complicated So yes, there are use cases where one will use Name mangling is one of those features that is only useful when one works in interactive mode. From a programming perspective it has consistently and repeatedly been an annoyance so it was really a good thing that RSQlite didn't play the name mangling game so far. Adding an attribute that describes the renaming would kind of work but note that the 1st thing the conscious programmer will do is define a little wrapper to Finally I would argue that the current mangling scheme is broken if some column names have a
Thanks, |
Sorry for the slow reply. I think it's easiest to just revert the change, I'll do it soon. |
Let's back out this change, and reconsider it. I think we accidentally applied tidyverse renaming ideas (which are more aggressive) to DBI. |
I think we no longer mangle column names here. |
AFAIK you still do:
sessionInfo()
|
Thanks for the nudge. The relevant parts of the spec will need to be removed/renamed, r-dbi/DBItest#181. |
RSQLite 2.2.0 is on CRAN now. |
Excellent. Thanks a lot! |
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary. |
Hi,
We just released BioC 3.7 a few days ago (on May 1st) and several packages broke already because of this change in RSQLite 2.1.1:
Data frames resulting from a query always have unique non-empty column names (r-dbi/DBItest#137).
I see that the issue was discussed here r-dbi/DBItest#137
It's a little bit disappointing that this kind of breaking changes are made without more consideration for the existing packages. If only it was an improvement but name mangling has always proven to be an annoyance more than anything else. Furthermore, the fact that RSQLite uses its own mangling (via internal helper
tidy_names
) instead ofbase::make.names()
adds another layer of inconsistency and confusion.Could this at least be made optional e.g. via a new argument to
dbGetQuery
anddbFetch
(and possibly to a few other functions)? Or is there an easy and robust way to bring the original names back if one wishes to (removing the..n
suffix is not robust). Thanks!H.
The text was updated successfully, but these errors were encountered: