-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
skim of sf
objects
#88
Comments
Can you say what list of statistics would make sense? https://github.com/ropenscilabs/skimr/blob/master/R/functions.R We haven't done anything generic for list columns yet see #10 . |
This is a really interesting issue. I added this issue #90 related to the error message. |
Also relates to this issue #75 because the geometry object has class |
@elinw One useful summary stat for simple features would be the count of valid geometries (see This stat basically tells the user if the dataset contains any records that need to be omitted or pre-processed with |
@tiernanmartin good point; my favorite analogy to missing values in non-spatial data is the empty geometry, found by |
@edzer That's a useful distinction: empty geometries and invalid geometries can both cause headaches downstream but in different ways. In both cases, |
I agree, the analogy for both is headaches, really! (the default print method for sfc and sf objects prints the number of empty geometries if larger than zero, btw) |
There is also @mdsumner's idea of decomposition stats, no. parts, no. holes, no. of segments, no. rook/queen neighbours, no. coordinates. Source: https://twitter.com/mdsumner/status/872276792953917440 |
So the proper name for that class is "sfc_MULTIPOLYGON" right? |
@elinw take a look at http://edzer.github.io/sfr/articles/sf1.html#simple-feature-geometry-types. It could be |
So I'm thinking that to start it might make sense to do something like this
And then, if I'm understanding you could add support for each of these types. |
They all derive from |
Wow, this is a really amazing package! I'm still exploring how the classes (like sf) get registered, but I see now how skim_with helps you get started, and @elinw that list of funs works fine - as @edzer says it could simply be called I'll be working on this a bit so please anyone let me know if you start too. |
@mdsumner Make sure to look at stats.R also for anything that requires more complex handling. I think you have two options architecturally. If you look in skim.R you'll see that it's handling data frames but another option is to pass sfc right there and then create a make a separate sfc_funs list. I haven't really thought that through but I'm just thinking about how many functions are potentially getting pushed into the environment as more specialized data structures get added. I can see from this discussion and looking at the linked materials that there are going to be a lot. |
Thanks that's helpful! I'm having no problems with processing sfc with a custom sfc_funs list, that I register like this: library(skimr)
library(sf)
sfp <- st_read(system.file("shape/nc.shp", package="sf"))
sfc_funs <- list(
missing = n_missing,
complete = n_complete,
n = length,
n_unique = purrr::compose(length, n_unique),
valid = purrr::compose(sum, sf::st_is_valid)
)
skim_with(sfc = sfc_funs , append = TRUE)
skim_v(st_geometry(sfp))
I haven't been able to get it to apply to a data frame though, I thought this minimally would work: library(skimr)
adhoc_funs <- list(
missing = n_missing,
complete = n_complete,
n = length,
n_unique = purrr::compose(length, n_unique),
funny = function(x) length(x) + 1
)
d <- structure(list(a = 1:4, b = structure(as.list(letters[1:4]), class = "adhoc")), class = "data.frame", row.names = letters[1:4])
skim_with(adhoc = adhoc_funs , append = TRUE)
## no problems, and with much more complex funs too
skim_v(d$b)
## how do we get this to work?
skim(d)
I think you're telling me what I need to know with the "pass sfc right there"?, do you mean we need a skim.sfc method? I'm confused about how the custom list-col type gets "registered". |
Can you try updating to current master? There are at least two issues mentioned above that need to be addressed for this to work. I think one has been merged but the other hasn't. |
I just push up a PR for handling generic lists. |
This would be my general idea of how to do it
|
Ah thanks, all makes sense. @elinw what's your thoughts on importing sf versus another "sk.sf.imr" package, or perhaps sf including these summary funs so they are available from it? sf is a pretty heavy dependency requiring GDAL and GEOS so I tend to wrap around it to keep related packages lighter and keep GDAL out of my .travis.yml if possible. I'll have to put this aside for a little while but hoping to help get it off the ground. (I've used an experimental package sc to derive the decomposition metrics for now (sc structures the data in way that makes that natural, but sf can bust itself into pieces reasonably well, copying out an feature-level ID and one way to do it, another is cunning lapply nrow/length cumulations. ) |
Well that’s one reason I would definitely not want to load whatever the functions are for everyone. I’m not sure whether it’s ultimately going to be better for different specialized packages to make their own skim.x methods or if it makes sense to include them in the main package or some combination. I will say that it’s very exciting to have people interested in doing it. There are probably some classes it definitely makes sense to include and some where there is more of a judgement call. At the unconf we were looking at broom as a possible model for the.
… On Jun 9, 2017, at 8:40 PM, Michael Sumner ***@***.***> wrote:
Ah thanks, all makes sense.
@elinw <https://github.com/elinw> what's your thoughts on importing sf versus another "sk.sf.imr" package, or perhaps sf including these summary funs so they are available from it? sf is a pretty heavy dependency requiring GDAL and GEOS so I tend to wrap around it to keep related packages lighter and keep GDAL out of my .travis.yml if possible.
I'll have to put this aside for a little while but hoping to help get it off the ground. (I've used an experimental package sc to derive the decomposition metrics for now (sc structures the data in way that makes that natural), but sf can bust itself into pieces reasonably well, copying out an feature-level ID and that's how it should be done. )
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#88 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAuEfd2d3Pc_0TyY2ihSkGYDAeqBLhzUks5sCeYHgaJpZM4NyFVF>.
|
I was thinking maybe we could use this in a vignette that explains how to extend to specialized types of data. |
@mdsumner @Nowosad @edzer @tiernanmartin I added a vignette using the code here as an example. It's in the develop branch if you want to take a look. |
Closing for now since the vignette is there. Happy to get PRs for improvements. |
Thanks @elinw - that vignette is really great! Just FYI for anyone who might be pursuing this as well - I did some work on decomposing sf (in |
The
sf
package is the R implementation of Simple Features and starts to be a new standard for working with spatial data in R. More information at https://github.com/edzer/sfr and http://robinlovelace.net/geocompr/spatial-class.html.The most important element of this package is the
sf
class. It is a simple data.frame with a one, additionallist-column
, which store a geometry of the data.I think it would be useful to add an ability of creating a summary of
sf
objects. A summary of thegeometry
column could return some basic informations, such as projection, geometry type, etc.The text was updated successfully, but these errors were encountered: