Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sf print method is slow for features containing many elements #1602

Closed
charliejhadley opened this issue Feb 10, 2021 · 5 comments
Closed

sf print method is slow for features containing many elements #1602

charliejhadley opened this issue Feb 10, 2021 · 5 comments

Comments

@charliejhadley
Copy link

The print.sf method abbreviates the geometry column to the extent that the only information communicated to the user is the geometry type. However, the print method is slow for MULTIPOLYGON that contain many polygons. Could this be sped up by abbreviating the column contents?

Here's an example

library("raster")
library("tidyverse")
library("mapview")
library("sf")
library("tictoc")

canada_adm1 <- getData(country = "CAN", level = 1) %>% 
  st_as_sf()

tic()
canada_adm1 %>% 
  slice(11)
toc()
# Simple feature collection with 1 feature and 10 fields
# geometry type:  MULTIPOLYGON
# dimension:      XY
# bbox:           xmin: -80.99115 ymin: 44.9908 xmax: -57.10548 ymax: 63.69792
# CRS:            +proj=longlat +datum=WGS84 +no_defs
# GID_0 NAME_0    GID_1 NAME_1    VARNAME_1 NL_NAME_1   TYPE_1 ENGTYPE_1 CC_1 HASC_1
# 1   CAN Canada CAN.11_1 Québec Lower Canada      <NA> Province  Province   24  CA.QC
# geometry
# 1 MULTIPOLYGON (((-74.64564 4...
# 4.096 sec elapsed
@edzer
Copy link
Member

edzer commented Feb 10, 2021

See also #747, #800, #957

@paleolimbot
Copy link
Contributor

I wonder if the (dev version of) wk::wkt_format_handler() (or the approach used) could be helpful here - it's designed to be very lazy! A note that the RStudio Viewer (annoyingly) calls as.character() to display sfc, which renders the session useless if you accidentally click on it in the viewer.

library(wk) # remotes::install_github("paleolimbot/wk")
library(sf)
#> Linking to GEOS 3.8.1, GDAL 3.1.2, PROJ 7.1.0

canada <- readRDS(url("https://biogeo.ucdavis.edu/data/gadm3.6/Rsf/gadm36_CAN_1_sf.rds"))

head(wk_format(canada))
#> [1] "MULTIPOLYGON (((-111.9534 49.00005, -111.9608 49.00004, -111.9628 49.00004, -111.9653 49.00004, -111.9697 49.00003, -111.9728 49.00003..."
#> [2] "MULTIPOLYGON (((-123.5406 48.31833, -123.5403 48.31805, -123.5392 48.31805, -123.5389 48.31778, -123.5375 48.31778, -123.5375 48.31667..."
#> [3] "MULTIPOLYGON (((-90.385 57.18528, -90.385 57.185, -90.38389 57.185, -90.3839 57.18534, -90.38277 57.18554, -90.38269 57.18555..."         
#> [4] "MULTIPOLYGON (((-66.84995 44.48388, -66.85 44.48361, -66.85031 44.48367, -66.85056 44.48333, -66.85084 44.48357, -66.85101 44.48361..."   
#> [5] "MULTIPOLYGON (((-53.37361 46.72556, -53.37375 46.72509, -53.37385 46.72454, -53.37394 46.72384, -53.37414 46.72313, -53.37442 46.72247..."
#> [6] "MULTIPOLYGON (((-135.1167 68.47083, -135.1187 68.47083, -135.1203 68.47135, -135.1203 68.4724, -135.1187 68.47292, -135.1167 68.47292..."

bench::mark(
  wk_format(canada),
  format(canada),
  check = F
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 x 6
#>   expression             min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>        <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 wk_format(canada)  139.1µs  150.8µs 5617.            0B     4.00
#> 2 format(canada)       27.5s    27.5s    0.0363    41.7MB     8.50

Created on 2021-02-21 by the reprex package (v0.3.0)

@edzer
Copy link
Member

edzer commented Feb 21, 2021

Sounds like a good idea, and much faster than lwgeom::st_astext! Any idea about making it sensitive to global R things like

options(digits = 3)

?

@paleolimbot
Copy link
Contributor

I should make this the default, but you use precision, which refers to the total numeric width:

wk::wk_format(wk::xy(10 + 1/3,  10 + 2/3), precision = 4)
#> [1] "POINT (10.33 10.67)"

It doesn't do all of the sf types (circular string, etc.) (plus needs dev wk to be on CRAN for native sf support!)

@edzer
Copy link
Member

edzer commented May 31, 2021

Closing due to lack of activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants