Consider read_st and write_st? #140

hadley · 2017-01-02T21:14:45Z

Instead of st_read and st_write.

The (small) advantages are greater consistency with the tidyverse (e.g. read_csv()) and somewhat more consistency with base R (e.g. read.csv()).

I think there's something nice about typing packagename::read_ and easily seeing what formats can be read in.

The text was updated successfully, but these errors were encountered:

edzer · 2017-01-02T21:27:20Z

I think it was you who suggested to have all functions start with a common prefix, so that you could see all functions of the package by typing st_[tab]. Hard to have both...

hadley · 2017-01-02T22:42:57Z

Yeah, I think this is one exception that is useful.

(BTW why is the common prefix st_ and not sf_? I found that slightly confusing)

edzer · 2017-01-02T23:03:16Z

This comes from some standard document; all PostGIS commands follow the same pattern.

hadley · 2017-01-02T23:16:15Z

Ah, got it

etiennebr · 2017-01-03T01:52:43Z

Why not read_sf <- function(...) st_read(...) ? That would make both ways exist.

mdsumner · 2017-01-03T11:53:44Z

Finally looked up what the st is, spatial temporal is not what I expected. http://stackoverflow.com/questions/7234679/what-is-st-in-postgis

etiennebr · 2017-01-19T17:47:01Z

While I understand the stringsAsFactors = TRUE I find it generally confusing. readr package uses stringsAsFactors = FALSE, which I think would generally be a good default for sf given the diversity of data stored in spatial features. Would you consider changing the default?

In fact, my question originates from trying to follow the readr interface, so I was thinking read_sf() could have a stringAsFactors = FALSE default, but it would probably be very confusing to have st_read() with a different default than read_sf(). I know users can change that default, but for reproducibility, I think it's right to assume a --vanilla environment.

mdsumner · 2017-01-19T22:54:16Z

@etiennebr I agree with this, I think auto-factoring is an unhappy legacy we should actively stamp out. The argument for staying compatible with R's defaults is a reasonable one though, and I know that @edzer made the choice for it purposefully. I do think it's worth breaking the tradition though. Shapefiles, MapInfo, GeoPackage etc. don't have factors afaik, and there's no common-standard across the huge variety of possible drivers and data sources, so I think the auto-factoring is incorrect.

(Manifold GIS does have a "Lookup" type, which is a clear analog to factor, but I don't know how common that is across vendors. "factor_alikes_as_factors = TRUE"?

It might be a partial table read from a database, for example, so the set of available values are not all seen in the first pass, which means subsequent read-bind workflows already don't make sense from the strict view-point of factors as a pre-set of allowable values.

edzer · 2017-01-20T09:02:36Z

By that, we'd try to mirror in R what a relational database does. But relational databases were not designed to model data, and that's why we're using R. There is a fundamental difference between a character variable and a factor: character data identifies records, factors group them. In R we already identify records by their row number, so there is no need: the character rownames of mtcars are convenient. These rownames are not a grouping variable, they are identifiers. I notice that tidyverse is trying to replace most default functions in base R, but in this case I think they made the wrong decision, although I consider this on the edge of bikeshedding. Users who hate me for this can simply set

options(stringsAsFactors = FALSE)

in their .Renviron or on the to of their reproducible scripts.

hadley · 2017-01-20T13:19:35Z

I don't want to beat a dead horse, but my reasoning is a little different. To me, the distinction between factors and character vectors is that factors have a fixed and known set of possible values. It is not possible to know the set of possible values by inspecting the data (i.e. you might have a gender column that only contains males) so by default it's safer to load as character and rely on the user to supply the full set of possible values. (The order of the levels often also has some meaning.)

Historically, I think the default simply reflects that character vectors were poorly supported in the early days of R and factors were easier to work with. Unfortunately I don't think it's viable to change the global option for many people because much existing code depends on the default being TRUE (e.g. it also affects every call to data.frame() in every loaded package).

mdsumner · 2017-01-20T13:27:24Z

"To model data" is not why I'm using R.

The simple features standard has no metadata system or scaling semantics applied to the fields so I don't understand why these would be added automatically for every user of them.

edzer added a commit that referenced this issue Jan 19, 2017

add read_sf and write_sf aliases, fixing #140

dfbcca4

edzer closed this as completed Jan 19, 2017

github-actions bot mentioned this issue Feb 8, 2021

[ratnanil] add this somewhere arc2r/book#32

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider read_st and write_st? #140

Consider read_st and write_st? #140

hadley commented Jan 2, 2017

edzer commented Jan 2, 2017

hadley commented Jan 2, 2017

edzer commented Jan 2, 2017

hadley commented Jan 2, 2017

etiennebr commented Jan 3, 2017

mdsumner commented Jan 3, 2017

etiennebr commented Jan 19, 2017 •

edited

mdsumner commented Jan 19, 2017 •

edited

edzer commented Jan 20, 2017

hadley commented Jan 20, 2017

mdsumner commented Jan 20, 2017 •

edited

Consider read_st and write_st? #140

Consider read_st and write_st? #140

Comments

hadley commented Jan 2, 2017

edzer commented Jan 2, 2017

hadley commented Jan 2, 2017

edzer commented Jan 2, 2017

hadley commented Jan 2, 2017

etiennebr commented Jan 3, 2017

mdsumner commented Jan 3, 2017

etiennebr commented Jan 19, 2017 • edited

mdsumner commented Jan 19, 2017 • edited

edzer commented Jan 20, 2017

hadley commented Jan 20, 2017

mdsumner commented Jan 20, 2017 • edited

etiennebr commented Jan 19, 2017 •

edited

mdsumner commented Jan 19, 2017 •

edited

mdsumner commented Jan 20, 2017 •

edited