Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Specialised data types in R #203
DBI should offer some type of plugin system that other packages can build upon by offering implementations for representing some of the more exotic data types in R. Throughout the DBI packages, there are many open issues surrounding this problem. A selection:
#199 enum types
Now if geometry types are implemented for Postgres, this is great. But they are also available in MySQL/MariaDB. It therefore might be useful to consider these issues in a more general fashion. Furthermore, approaching this in a type by type fashion might not be sufficient. How could a user map a Postgres composite type, if there is not some inherent extensibility?
Unfortunately, I have no idea how to tackle such an issue. Maybe a pragmatic approach, where things such as composite types are simply not considered, is the best we can do. I just was hoping to get a discussion started on this topic.
referenced this issue
Nov 8, 2017
I agree that this is the way to go, but a plugin system shouldn't impact performance. Currently, the values obtained from the database driver are coerced to their target type (integer, double, int64, string, logical, raw vector) as they arrive. Do you think the decision about the "right" target type can be made from metadata only, without fetching any rows (or after fetching the first row only)?
We could offer an interface that allows registration of column handlers for a particular DBI result class. Backends would then be expected to call these handlers with column metadata (as R objects), and the handler decides if he can handle columns of this type or not. If yes, the handler returns an empty container (think
The data format for the column metadata and the raw values fully depends on the backend and should match that of the underlying C library. We would obey order of registration, younger handlers are called first. Handlers can also be provided at the connection or the result level.
Backends then also could expose "default" built-in handlers for R's data types, and also use this mechanism to decide how to handle integers, blobs, times etc..
If we don't care that much about performance and permit an extra copy operation and memory allocation, we could also package everything as lists of
Yes, no problem.