-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend data.table instead of data.frame? #7
Comments
Great ideas! However, if I construct
everything else works identical as above. I would prefer a solution where it doesn't matter whether I'm not convinced by your As of the name of the geometry column, why not always call it
? |
Possibly... but... dplyr fails:
and attributes (including class) disappear on subsetting:
Maybe my version of R is a bit old or something if you are getting different behaviour... The |
None of these issues here: maybe update?; with
I have
|
I'm on 3.2.0 - hard to believe a minor point release would change low-level fundaments like that but I'll do something for an hour while 3.2.3 compiles and let you know... |
Just found my 3.2.3. Here's a session with R --vanilla, shows data frame attribute dropping, sessionInfo follows:
Note loss of attributes on polys element when selecting first row of data frame.
Same behaviour if dplyr is loaded, or R started without --vanilla. Nothing else in my workspace (except all the hair I'm pulling out...) |
I can see that. It's a different point: your first point had attributes on |
It was the attributes on the column that I was really initially concerned about, since that's where the metadata for the geometry (CRS, at least) probably ought to be. Sadly subsetting a list like Which is more useful to us here? |
Who else should be alerted to this discussion? I feel that the gg* and dplyr infrastructure is important at least to track and to try to see how tmap and mapview mesh on the visualization side. |
I have feeling that the underlying representation could benefit by thinking out-of-workspace - maybe the geom colunm could be external pointers to objects in an OGR or GEOM abstraction of SF? How does data.table do external pointers (to indices??)? What spatial index data should be in-memory to speed access to stuff outside? |
@barryrowlingson : adding
would be enough. |
@hadley do tibbles solve all the problems here? |
See here for a discussion of list columns using tibbles http://r4ds.had.co.nz/many-models.html#list-columns-1 |
The advantage of using (or extending) tibbles would be to avoid yet another type of data frame. In the linked examples you still have the problem of: roads = data.frame(widths = c(5, 4.5))
roads$geom = sfc
sf(roads) Instead of: roads = data.frame(widths = c(5, 4.5), geom = sfc)
sf(roads) |
This has been solved:
|
Is there a reason you don't want to build on top of tibbles? |
I try to minimize dependencies; I like building on top of code that doesn't change; I don't think that the improvement of tibbles over
But I'd be happy to help make simple features work for |
Would you mind filing an issue on tibble? You should be able to control how tibbles print your objects. (You might be able to already but the docs might need improvement) |
Sure: see tidyverse/tibble#157 |
See also #25 where simple features are used in |
This seems very relevant to me as well from a end-user's perspective. For now my code is littered with costly (and prone to error): dt <- data.table(sp@data)
dt[, rn := row.names(sp)]
# [...]
sp <- SpatialPolygonsDataFrame(sp, as.data.frame(dt), match.ID="rn") I'd love for data.table's by-reference operations and indexing to work more seamlessly with spatial features. Also using data.table's Expanding on this idea a bit, I could envision: dt1 = data.table(sf1=polys1, b1=letters[c(1,2,3,3)], c1=runif(4))
dt2 = data.table(sf2=polys2, b2=letters[c(3,3,2,6)], c2=runif(4))
# with...
sapply(dt1, class)
# sf1 b1 c1
# "sfc" "character" "numeric"
sapply(dt2, class)
# sf2 b2 c2
# "sfc" "character" "numeric"
# create a spatial index on sf columns
setkey(dt1, sf1)
setkey(dt2, sf2)
# Return attributes of dt2 at locations of dt1 (e.g. where the geometries intersect or overlap)
dt2[dt1]
# Set attributes of dt2 to attributes of dt1 at locations of dt1
dt2[dt1, c2 := c1]
# And the usual st_* correlates
dt2[st_touches(dt1)]
dt2[st_covers(dt1)]
dt2[st_contains(dt1)]
dt2[st_within_distance(dt1, 10)]
# etc...
# And union operations on geometry columns within a data.table
dt1[, .(st_union(sf1), sum(c2)), by=b2]
|
I think this issue has now been settled -- we now extend data.frame as well as tibble, depending where you start with. Feel free to reopen if needed. |
I've occasionally tried to extend data.frame classes and always given up. I've never found a satisfactory way to store non-trivial things in a data frame column.
See my poorly documented
spong
package for example: https://github.com/barryrowlingson/spongBUT...
data.table
provides a more flexible data grid structure that is very happy to store structured data in columns. Example:d
now prints like this:and the
polys
column still has its class:Usefully, data table row selection preserves attributes:
which is something even R's default subsetting doesn't do - it drops as much as it can including the class of the object:
and hence you waste your life writing
[.sf
methods that do little more than restore the attributes that R took away in the first place (see for examplegetAnywhere("[.POSIXct")
)The only basic thing I can't figure out at the moment is how to identify a geometry column within a data.table. We could add an attribute to a data table, but that gets lost on selection - perhaps the data.table authors might like to help with this:
or we define a superclass of data.table and write some methods for that.
SpatialPolygonsDataTable anyone?
The text was updated successfully, but these errors were encountered: