New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider read_st and write_st? #140
Comments
I think it was you who suggested to have all functions start with a common prefix, so that you could see all functions of the package by typing |
Yeah, I think this is one exception that is useful. (BTW why is the common prefix |
This comes from some standard document; all PostGIS commands follow the same pattern. |
Ah, got it |
Why not |
Finally looked up what the st is, spatial temporal is not what I expected. http://stackoverflow.com/questions/7234679/what-is-st-in-postgis |
While I understand the In fact, my question originates from trying to follow the |
@etiennebr I agree with this, I think auto-factoring is an unhappy legacy we should actively stamp out. The argument for staying compatible with R's defaults is a reasonable one though, and I know that @edzer made the choice for it purposefully. I do think it's worth breaking the tradition though. Shapefiles, MapInfo, GeoPackage etc. don't have factors afaik, and there's no common-standard across the huge variety of possible drivers and data sources, so I think the auto-factoring is incorrect. (Manifold GIS does have a "Lookup" type, which is a clear analog to factor, but I don't know how common that is across vendors. "factor_alikes_as_factors = TRUE"? It might be a partial table read from a database, for example, so the set of available values are not all seen in the first pass, which means subsequent read-bind workflows already don't make sense from the strict view-point of factors as a pre-set of allowable values. |
By that, we'd try to mirror in R what a relational database does. But relational databases were not designed to model data, and that's why we're using R. There is a fundamental difference between a character variable and a factor: character data identifies records, factors group them. In R we already identify records by their row number, so there is no need: the character rownames of
in their .Renviron or on the to of their reproducible scripts. |
I don't want to beat a dead horse, but my reasoning is a little different. To me, the distinction between factors and character vectors is that factors have a fixed and known set of possible values. It is not possible to know the set of possible values by inspecting the data (i.e. you might have a gender column that only contains males) so by default it's safer to load as character and rely on the user to supply the full set of possible values. (The order of the levels often also has some meaning.) Historically, I think the default simply reflects that character vectors were poorly supported in the early days of R and factors were easier to work with. Unfortunately I don't think it's viable to change the global option for many people because much existing code depends on the default being TRUE (e.g. it also affects every call to |
"To model data" is not why I'm using R. The simple features standard has no metadata system or scaling semantics applied to the fields so I don't understand why these would be added automatically for every user of them. |
Instead of
st_read
andst_write
.The (small) advantages are greater consistency with the tidyverse (e.g.
read_csv()
) and somewhat more consistency with base R (e.g.read.csv()
).I think there's something nice about typing
packagename::read_
and easily seeing what formats can be read in.The text was updated successfully, but these errors were encountered: