Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing a large gpkg file taking forever #1409

Closed
Robinlovelace opened this issue May 28, 2020 · 25 comments
Closed

Writing a large gpkg file taking forever #1409

Robinlovelace opened this issue May 28, 2020 · 25 comments
Labels
help wanted ❤️ we'd love your help!

Comments

@Robinlovelace
Copy link
Contributor

Robinlovelace commented May 28, 2020

I'm trying to write a large (300 ~600 MB as .Rds) file to disk. It saved in about 5 minutes in the .Rds format and took around 10 minutes to read in from a load of compressed .gml file using this mini package developed for the job: https://github.com/ITSLeeds/mastermapr

sf::write_sf(mm_highway_uk, "destination.gpkg")

Has been running for over an hour now and am wondering if it will ever finish! I know this is likely to be an issue upstream with GDAL but I'm raising the issue here in case others have had similar issues and in case it's of use. It's related to wider question of which geographic file format to save data as.

This is my set-up:

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0

Created on 2020-05-28 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.3 (2020-02-29)
#>  os       Ubuntu 18.04.4 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_GB:en                    
#>  collate  en_GB.UTF-8                 
#>  ctype    en_GB.UTF-8                 
#>  tz       Europe/London               
#>  date     2020-05-28                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                             
#>  assertthat    0.2.1      2019-03-21 [2] CRAN (R 3.6.0)                     
#>  backports     1.1.7      2020-05-13 [1] CRAN (R 3.6.3)                     
#>  callr         3.4.3      2020-03-28 [1] CRAN (R 3.6.3)                     
#>  class         7.3-17     2020-04-26 [2] CRAN (R 3.6.3)                     
#>  classInt      0.4-3      2020-04-06 [1] Github (r-spatial/classInt@d024051)
#>  cli           2.0.2      2020-02-28 [1] CRAN (R 3.6.2)                     
#>  crayon        1.3.4      2017-09-16 [2] standard (@1.3.4)                  
#>  DBI           1.1.0      2019-12-15 [2] CRAN (R 3.6.2)                     
#>  desc          1.2.0      2018-05-01 [2] standard (@1.2.0)                  
#>  devtools      2.3.0      2020-04-10 [1] CRAN (R 3.6.3)                     
#>  digest        0.6.25     2020-02-23 [1] CRAN (R 3.6.2)                     
#>  e1071         1.7-3      2019-11-26 [2] CRAN (R 3.6.1)                     
#>  ellipsis      0.3.1      2020-05-15 [3] CRAN (R 3.6.3)                     
#>  evaluate      0.14       2019-05-28 [2] CRAN (R 3.6.0)                     
#>  fansi         0.4.1      2020-01-08 [1] CRAN (R 3.6.2)                     
#>  fs            1.4.1      2020-04-04 [2] CRAN (R 3.6.3)                     
#>  glue          1.4.1      2020-05-13 [2] CRAN (R 3.6.3)                     
#>  highr         0.8        2019-03-20 [3] CRAN (R 3.5.3)                     
#>  htmltools     0.4.0.9003 2020-04-09 [1] Github (rstudio/htmltools@1a7d0dc) 
#>  KernSmooth    2.23-17    2020-04-26 [4] CRAN (R 3.6.3)                     
#>  knitr         1.28       2020-02-06 [1] CRAN (R 3.6.2)                     
#>  magrittr      1.5        2014-11-22 [2] CRAN (R 3.5.2)                     
#>  memoise       1.1.0      2017-04-21 [3] CRAN (R 3.5.0)                     
#>  pkgbuild      1.0.8      2020-05-07 [1] CRAN (R 3.6.3)                     
#>  pkgload       1.0.2      2018-10-29 [3] CRAN (R 3.5.1)                     
#>  prettyunits   1.1.1      2020-01-24 [1] CRAN (R 3.6.2)                     
#>  processx      3.4.2      2020-02-09 [1] CRAN (R 3.6.3)                     
#>  ps            1.3.3      2020-05-08 [1] CRAN (R 3.6.3)                     
#>  R6            2.4.1      2019-11-12 [2] CRAN (R 3.6.1)                     
#>  Rcpp          1.0.4.6    2020-04-09 [1] CRAN (R 3.6.3)                     
#>  remotes       2.1.1      2020-02-15 [1] CRAN (R 3.6.2)                     
#>  rlang         0.4.6.9000 2020-05-05 [1] Github (r-lib/rlang@4bea875)       
#>  rmarkdown     2.1.2      2020-04-09 [1] Github (rstudio/rmarkdown@65dd144) 
#>  rprojroot     1.3-2      2018-01-03 [2] CRAN (R 3.5.3)                     
#>  rstudioapi    0.11       2020-02-07 [2] CRAN (R 3.6.2)                     
#>  sessioninfo   1.1.1      2018-11-05 [3] CRAN (R 3.5.1)                     
#>  sf          * 0.9-3      2020-05-04 [1] CRAN (R 3.6.3)                     
#>  stringi       1.4.6      2020-02-17 [1] CRAN (R 3.6.2)                     
#>  stringr       1.4.0      2019-02-10 [2] standard (@1.4.0)                  
#>  testthat      2.3.2      2020-03-02 [1] CRAN (R 3.6.3)                     
#>  units         0.6-6      2020-03-16 [1] CRAN (R 3.6.3)                     
#>  usethis       1.6.1      2020-04-29 [1] CRAN (R 3.6.3)                     
#>  withr         2.2.0      2020-04-20 [2] CRAN (R 3.6.3)                     
#>  xfun          0.14       2020-05-20 [1] CRAN (R 3.6.3)                     
#>  yaml          2.2.1      2020-02-01 [1] CRAN (R 3.6.2)                     
#> 
#> [1] /home/robin/R/x86_64-pc-linux-gnu-library/3.6
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
@edzer
Copy link
Member

edzer commented May 28, 2020

Have you tried with layer creation option SPATIAL_INDEX set to NO ?

@Robinlovelace
Copy link
Contributor Author

No. Will try now and aim to put in a PR documenting that feature if it works. Many thanks for fast reply!

@Robinlovelace
Copy link
Contributor Author

I gave it a go on a smaller dataset (61k vs ~6m rows) and the spatial index seemed to make it a bit faster. Assuming the impact of that option increases with dataset size that could solve it (gave up trying the other day):

remotes::install_cran("pct")
#> Skipping install of 'pct' from a cran remote, the SHA1 (0.4.0) has not changed since last install.
#>   Use `force = TRUE` to force installation
remotes::install_github("r-spatial/sf")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'sf' from a github remote, the SHA1 (2ca6483f) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
l = pct::get_pct_routes_fast(region = "london")

# test writing both ways
f = function(x) file.path(tempdir(), paste0(x, ".gpkg"))
f("l1")
#> [1] "/tmp/RtmpTuVzyM/l1.gpkg"
system.time(
  st_write(l, f("l1"))
)
#> Writing layer `l1' to data source `/tmp/RtmpTuVzyM/l1.gpkg' using driver `GPKG'
#> Writing 61051 features with 141 fields and geometry type Line String.
#>    user  system elapsed 
#>   8.268   0.369   8.686
system.time(
  st_write(l, f("l2"), layer_options = "SPATIAL_INDEX=NO")
)
#> Writing layer `l2' to data source `/tmp/RtmpTuVzyM/l2.gpkg' using driver `GPKG'
#> options:        SPATIAL_INDEX=NO 
#> Writing 61051 features with 141 fields and geometry type Line String.
#>    user  system elapsed 
#>   7.722   0.314   8.038

Created on 2020-05-30 by the reprex package (v0.3.0)

@Robinlovelace
Copy link
Contributor Author

Update: building on the previous example I explored the impact of the layer option on different sized datasets, no clear finding:

bench::press(
  n = c(10, 100, 1000, 10000),
  layer_options = c("", "SPATIAL_INDEX=NO"),
  {
    bench::mark(
      time_unit = "ms",
      sf = st_write(l[1:n, ], f(paste0(n, layer_options, runif(1))), layer_options = layer_options)
      )
  }
)
  expression     n layer_options    min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory time  gc   
  <bch:expr> <dbl> <chr>          <dbl>  <dbl>     <dbl> <bch:byt>    <dbl> <int> <dbl>      <dbl> <list> <list> <lis> <lis>
1 sf            10 ""              16.6   17.6    56.5   1016.09KB     4.35    26     2       460. <df[,… <Rpro… <bch… <tib…
2 sf           100 ""              32.3   35.6    28.3      2.81MB     2.17    13     1       460. <df[,… <Rpro… <bch… <tib…
3 sf          1000 ""             189.   189.      5.29    14.71MB     2.64     2     1       378. <df[,… <Rpro… <bch… <tib…
4 sf         10000 ""            1696.  1696.      0.590  109.41MB     1.77     1     3      1696. <df[,… <Rpro… <bch… <tib…
5 sf            10 "SPATIAL_IND…   15.6   16.6    60.2   1015.21KB     2.08    29     1       482. <df[,… <Rpro… <bch… <tib…
6 sf           100 "SPATIAL_IND…   31.0   32.8    30.6      2.81MB     2.18    14     1       458. <df[,… <Rpro… <bch… <tib…
7 sf          1000 "SPATIAL_IND…  174.   176.      5.68    14.71MB     2.84     2     1       352. <df[,… <Rpro… <bch… <tib…
8 sf         10000 "SPATIAL_IND… 1739.  1739.      0.575  109.41MB     1.73     1     3      1739. <df[,… <Rpro… <bch… <tib…

@Robinlovelace
Copy link
Contributor Author

Trying on the full dataset, which takes over a minute to load as an .Rds file:

system.time({
+   mm_roads_uk = readRDS("mm.Rds")
+ })
   user  system elapsed 
 68.613   0.758  70.442 
mm_subset = mm_roads_uk[1:100000, ]
bench::press(
  n = c(10, 100, 1000, 100000),
  layer_options = c("", "SPATIAL_INDEX=NO"),
  {
    bench::mark(
      time_unit = "ms",
      sf = write_sf(mm_subset[1:n, ], f(paste0(n, layer_options, runif(1))), layer_options = layer_options)
    )
  }
)

Waiting for results...

@Robinlovelace
Copy link
Contributor Author

Seems that the relative speed-up associated with SPATIAL_INDEX=NO may increase with dataset size:

  expression      n layer_options    min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory time 
  <bch:expr>  <dbl> <chr>          <dbl>  <dbl>     <dbl> <bch:byt>    <dbl> <int> <dbl>      <dbl> <list> <list> <lis>
1 sf             10 ""            1.37e1 1.40e1   70.1       1.78MB     0       36     0       513. <df[,… <Rpro… <bch…
2 sf            100 ""            1.70e1 1.94e1   53.6       2.01MB     0       27     0       504. <df[,… <Rpro… <bch…
3 sf           1000 ""            4.87e1 4.93e1   20.0      14.37MB     0       10     0       500. <df[,… <Rpro… <bch…
4 sf         100000 ""            5.55e4 5.55e4    0.0180  111.86GB     1.30     1    72     55465. <df[,… <Rpro… <bch…
5 sf             10 "SPATIAL_IND… 1.24e1 1.27e1   77.5       1.78MB     0       39     0       503. <df[,… <Rpro… <bch…
6 sf            100 "SPATIAL_IND… 1.49e1 1.52e1   64.6       2.01MB     0       33     0       511. <df[,… <Rpro… <bch…
7 sf           1000 "SPATIAL_IND… 4.50e1 4.53e1   22.0      14.37MB     0       11     0       500. <df[,… <Rpro… <bch…
8 sf         100000 "SPATIAL_IND… 4.10e4 4.10e4    0.0244  111.86GB     1.07     1    44     41038. <df[,… <Rpro… <bch…

@Robinlovelace
Copy link
Contributor Author

Final benchmark on 10% sample:

t1 = system.time({
  write_sf(mm_roads_uk[1:500000, ], "/tmp/test1.gpkg")
})


t2 = system.time({
  write_sf(mm_roads_uk[1:500000, ], "/tmp/test2.gpkg", layer_options = "SPATIAL_INDEX=NO")
})

t3 = system.time({
  saveRDS(mm_roads_uk[1:500000, ], "/tmp/test3.Rds")
})

I get:

> t1
    user   system  elapsed 
1094.022  226.910 1321.227 
> t2
    user   system  elapsed 
1002.148    3.357 1005.638 
> t3
   user  system elapsed 
 18.796   0.195  18.999 

So writing to .Rds is about 70 and 50 times faster than writing to .gpkg with and without the spatial index from R on my computer. I will try out writing this same 10% sample with QGIS as a test. Tempted to try .shp as an output and upgrade to GDAL 3.1.0 for FlatGeobuff outputs.

@Robinlovelace
Copy link
Contributor Author

Test results from QGIS: it saved the object as a .gpkg file with a spatial index in 18 seconds, around the same impressive write speed as saving as an .Rds file.

Without the spatial index the same object was written by QGIS in 12s, around 80 times faster than in R.

image

@Robinlovelace
Copy link
Contributor Author

Robinlovelace commented Jun 1, 2020

Minor update on this: I left it running over the weekend and 33.5 hours later the file still hasn't finished writing. The output file is still growing in size, currently it is:

ls -al 
# -rw-r--r-- 1 robin robin 1798160384 Jun  1 08:48 destination.gpkg

bytes. A few minutes later it is 1801400320 bytes. I think something strange is going on with the memory allocation with this, fluctuating by several GB every few seconds as shown in the .gif of the system monitor below:

Peek 2020-05-30 23-36

If you'd like any further info on this let me know. I'm not sure if this issue is specific to the dataset I have which is has many variables and xyz geometry, can share a sample securely if needs be but my guess is that this isn't dataset specific. Happy to provide further details/tests for sure though to support development of R so it's I/O capabilities for spatial data are comparable with desktop GIS.

@edzer edzer added the help wanted ❤️ we'd love your help! label Jul 8, 2020
@Jo-Schie
Copy link

Jo-Schie commented Mar 2, 2022

I can confirm this issue. Also other filteypes are affected (e.g. geojson). I tried to explore the issue a little bit and noticed, that the problem (in my case) was writing logical from an sf and data.frame class to disk. Quick fix for me was to convert logical to e.g. 1/0 dummy coding (see code below). Not sure if this helps you to further nail down the problem, but here is some code that is hopefully reproducible:

library("sf")
library("dplyr")

nc <-
  st_read(system.file("shapes/sids.shp", package = "spData")[1], quiet =
            TRUE)
st_crs(nc) <- "+proj=longlat +datum=NAD27"
nc <-
  st_transform(nc, crs = 3395)

testgrid <-
  st_make_grid(nc, cellsize = 1000)

starttime <- Sys.time()
st_write(testgrid, "testgrid.gpkg")
endtime <- Sys.time()
starttime - endtime

# add column with dummy
testgrid <-
  testgrid %>%
  st_as_sf() %>% 
  mutate(dummy = 1:length(testgrid))

testgrid$dummy <- ifelse(testgrid$dummy < 100, 1, 0)

starttime <- Sys.time()
st_write(testgrid, dsn = "testgrid2.gpkg", driver = "GPKG")
endtime <- Sys.time()
starttime - endtime

#
testgrid$logical <- 1:length(testgrid)
testgrid$logical <- ifelse(testgrid$logical < 100, T, F)

starttime <- Sys.time()
st_write(testgrid, "testgrid3.gpkg") # hangs forever
endtime <- Sys.time()
starttime - endtime

@barryrowlingson
Copy link
Contributor

Whatever is causing this is in the C(++?) code. I just did some R profiling and 98% of the time in my tests was in the CPL_write_ogr function, which is .Call("_sf_CPL_write_ogr",....

Test code attached:

sp.txt

Usage:

times = test1(100*c(100,200,300,400))

returns a data frame of timings, number of rows, and logical being if the data was written a logical or numeric, eg:

  user.self sys.self elapsed user.child sys.child     n logical
1     0.113    0.005   0.117          0         0 10000   FALSE
2     0.233    0.004   0.236          0         0 20000   FALSE
3     0.345    0.004   0.349          0         0 30000   FALSE
4     0.473    0.004   0.477          0         0 40000   FALSE
5     0.360    0.007   0.367          0         0 10000    TRUE
6     1.320    0.400   1.720          0         0 20000    TRUE
7     2.794    0.979   3.774          0         0 30000    TRUE
8     5.060    1.860   6.921          0         0 40000    TRUE

feed into ggplot if you want to plot it and see the difference....

If I knew how to profile C++ code within R I'd go deeper...

@rsbivand
Copy link
Member

rsbivand commented Feb 7, 2023

These are points, so see #2059 and maybe try the pointx branch? Or #2036 for a different take using GDAL-devel?

@edzer
Copy link
Member

edzer commented Feb 7, 2023

> times
  user.self sys.self elapsed user.child sys.child     n logical
1     0.756    0.029   0.792          0         0 10000   FALSE
2     0.261    0.001   0.263          0         0 20000   FALSE
3     0.386    0.016   0.401          0         0 30000   FALSE
4     0.524    0.001   0.524          0         0 40000   FALSE
5     0.409    0.268   0.675          0         0 10000    TRUE
6     1.308    1.051   2.360          0         0 20000    TRUE
7     2.706    2.740   5.450          0         0 30000    TRUE
8     4.682    5.092   9.779          0         0 40000    TRUE

with pointx branch:

> times
  user.self sys.self elapsed user.child sys.child     n logical
1     0.735    0.025   0.761          0         0 10000   FALSE
2     0.225    0.008   0.233          0         0 20000   FALSE
3     0.352    0.004   0.356          0         0 30000   FALSE
4     0.483    0.000   0.483          0         0 40000   FALSE
5     0.354    0.288   0.642          0         0 10000    TRUE
6     1.133    1.171   2.315          0         0 20000    TRUE
7     2.601    2.812   5.413          0         0 30000    TRUE
8     4.626    5.257   9.888          0         0 40000    TRUE

@kadyb
Copy link
Contributor

kadyb commented Feb 7, 2023

Out of curiosity, I also checked {terra} and it seems there is no overhead for the logical type.

library("sf")
library("terra")

n = 50000
df = data.frame(x = runif(n), y = runif(n), z = logical(n))
sf = st_as_sf(df, coords = c("x", "y"))
terra = vect(df, geom = c("x", "y"))

## with logical column
system.time( write_sf(sf, "test.gpkg") ) #> 3.30
system.time( writeVector(terra, "test.gpkg", overwrite = TRUE) ) #> 0.65

## without logical column
system.time( write_sf(sf[, -1], "test.gpkg") ) #> 0.77
system.time( writeVector(terra[, -1], "test.gpkg", overwrite = TRUE) ) #> 0.66

edzer added a commit that referenced this issue Feb 7, 2023
@edzer
Copy link
Member

edzer commented Feb 7, 2023

> times
  user.self sys.self elapsed user.child sys.child     n logical
1     0.703    0.030   0.733          0         0 10000   FALSE
2     0.226    0.000   0.226          0         0 20000   FALSE
3     0.340    0.000   0.340          0         0 30000   FALSE
4     0.450    0.005   0.454          0         0 40000   FALSE
5     0.116    0.000   0.116          0         0 10000    TRUE
6     0.223    0.000   0.223          0         0 20000    TRUE
7     0.333    0.000   0.333          0         0 30000    TRUE
8     0.445    0.002   0.447          0         0 40000    TRUE

@edzer
Copy link
Member

edzer commented Feb 7, 2023

@kadyb thanks! @rhijmans terra doesn't write logical NA's:

library("sf")
# Linking to GEOS 3.11.1, GDAL 3.6.2, PROJ 9.1.1; sf_use_s2() is TRUE
library("terra")
# terra 1.7.3

n = 3
df = data.frame(x = runif(n), y = runif(n), z = c(TRUE, FALSE, NA))
sf = st_as_sf(df, coords = c("x", "y"))
terra = vect(df, geom = c("x", "y"))

## with logical column
system.time( write_sf(sf, "test.gpkg") ) #> 3.30
# writing: substituting ENGCRS["Undefined Cartesian SRS with unknown unit"] for missing CRS
#    user  system elapsed 
#   0.022   0.001   0.123 
read_sf("test.gpkg")
# Simple feature collection with 3 features and 1 field
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: 0.659066 ymin: 0.009806918 xmax: 0.8063622 ymax: 0.7639054
# Projected CRS: Undefined Cartesian SRS with unknown unit
# # A tibble: 3 × 2
#   z                        geom
#   <lgl>                 <POINT>
# 1 TRUE  (0.8063622 0.009806918)
# 2 FALSE    (0.659066 0.7639054)
# 3 NA      (0.6713299 0.5175844)
system.time( writeVector(terra, "test.gpkg", overwrite = TRUE) ) #> 0.65
#    user  system elapsed 
#   0.011   0.000   0.011 
# Warning messages:
# 1: In x@ptr$write(filename, layer, filetype, insert[1], overwrite[1],  :
#   GDAL Message 6: dataset test.gpkg does not support layer creation option ENCODING
# 2: In x@ptr$write(filename, layer, filetype, insert[1], overwrite[1],  :
#   GDAL Message 1: Only 0 or 1 should be passed for a OFSTBoolean subtype. Considering this non-zero value as 1.
read_sf("test.gpkg")
# Simple feature collection with 3 features and 1 field
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: 0.659066 ymin: 0.009806918 xmax: 0.8063622 ymax: 0.7639054
# Geodetic CRS:  Undefined geographic SRS
# # A tibble: 3 × 2
#   z                        geom
#   <lgl>             <POINT [°]>
# 1 TRUE  (0.8063622 0.009806918)
# 2 FALSE    (0.659066 0.7639054)
# 3 TRUE    (0.6713299 0.5175844)

@robinlovelace-ate
Copy link

For the record this still seems to be taking forever compared with saveRDS() with sf_1.0-11.

Context: #2142

@robinlovelace-ate
Copy link

But did just complete, around 5x slower than RDS but workable:

 user  system elapsed 
 278.30  170.42  450.90 

@RikFerreira
Copy link

I had a similar problem. Having a POINT layer coerced from a terra raster. At the first moment I was writing only the geometry column and it was ok.

But, when I added a column to be populated in QGIS, the writing time was way more slower.

I created the column like this:

ref_points <- sentinel2 %>%
    as.points() %>%
    st_as_sf() %>%
    select(geometry) %>%
    mutate(distance = NA, .before = "geometry")

I solved removing the distance column.

@robinlovelace-ate
Copy link

What was the class of the distance column and what version of sf were you running @RikFerreira ? That could diagnose any further issues, if there are any.

@Jo-Schie
Copy link

Jo-Schie commented Apr 5, 2023

I had a similar problem. Having a POINT layer coerced from a terra raster. At the first moment I was writing only the geometry column and it was ok.

But, when I added a column to be populated in QGIS, the writing time was way more slower.

I created the column like this:


ref_points <- sentinel2 %>%

    as.points() %>%

    st_as_sf() %>%

    select(geometry) %>%

    mutate(distance = NA, .before = "geometry")

I solved removing the distance column.

I can tell from experience that despite logical (as indicated in my previous comment) also "NAs" create issues for me. I guess the problem could be that the driver does not know or is slow in converting and writing R specific data types into the above mentioned geospatial formats.

@edzer
Copy link
Member

edzer commented Apr 5, 2023

To make any progress, we need data points and reprexes, not experience. I can only see this:

library(sf)
# Linking to GEOS 3.11.1, GDAL 3.6.2, PROJ 9.1.1; sf_use_s2() is TRUE
n = 100000
df = data.frame(x = runif(n), y = runif(n), z= rnorm(n))
sf = st_as_sf(df, coords = c("x", "y"), crs = 4326)
system.time(write_sf(sf, "x.gpkg"))
#    user  system elapsed 
#   1.294   0.125   1.625 
sf$z[1] = NA
system.time(write_sf(sf, "x.gpkg"))
#    user  system elapsed 
#   1.294   0.118   1.642 
sf$z = rep(NA_real_, n)
system.time(write_sf(sf, "x.gpkg"))
#    user  system elapsed 
#   1.229   0.156   1.608 

@RikFerreira
Copy link

RikFerreira commented Apr 5, 2023

@Robinlovelace, the class is logical.

I can't provide the raster here, but it generated a sf object with 2.240.799 features. If

sf object:

> ref_points
Simple feature collection with 2240799 features and 0 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 272630 ymin: 9094410 xmax: 300570 ymax: 9126770
Projected CRS: SIRGAS 2000 / UTM zone 25S
First 10 features:
                 geometry
1  POINT (297850 9126770)
2  POINT (297870 9126770)
3  POINT (297890 9126770)
4  POINT (297910 9126770)
5  POINT (297930 9126770)
6  POINT (297950 9126770)
7  POINT (297970 9126770)
8  POINT (297990 9126770)
9  POINT (298010 9126770)
10 POINT (298030 9126770)

---

Rows: 2,240,799
Columns: 2
$ geometry <POINT [m]> POINT (297850 9126770), POINT (297870 9126770), POINT (…
$ distance <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

Session info:

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=Portuguese_Brazil.utf8  LC_CTYPE=Portuguese_Brazil.utf8
[3] LC_MONETARY=Portuguese_Brazil.utf8 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Brazil.utf8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] sf_1.0-9        terra_1.7-3     forcats_1.0.0   stringr_1.5.0
 [5] dplyr_1.1.0     purrr_1.0.1     readr_2.1.4     tidyr_1.3.0
 [9] tibble_3.1.8    ggplot2_3.4.1   tidyverse_1.3.2

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0    haven_2.5.1         gargle_1.3.0
 [4] colorspace_2.1-0    vctrs_0.5.2         generics_0.1.3     
 [7] utf8_1.2.3          rlang_1.0.6         e1071_1.7-13
[10] pillar_1.8.1        glue_1.6.2          withr_2.5.0
[13] DBI_1.1.3           dbplyr_2.3.0        modelr_0.1.10
[16] readxl_1.4.2        lifecycle_1.0.3     munsell_0.5.0
[19] gtable_0.3.1        cellranger_1.1.0    rvest_1.0.3
[22] codetools_0.2-18    tzdb_0.3.0          class_7.3-20
[25] fansi_1.0.4         broom_1.0.3         Rcpp_1.0.10        
[28] KernSmooth_2.23-20  scales_1.2.1        backports_1.4.1
[31] googlesheets4_1.0.1 classInt_0.4-8      jsonlite_1.8.4
[34] fs_1.6.1            hms_1.1.2           stringi_1.7.12
[37] grid_4.2.2          cli_3.6.0           tools_4.2.2
[40] magrittr_2.0.3      proxy_0.4-27        crayon_1.5.2
[43] pkgconfig_2.0.3     ellipsis_0.3.2      xml2_1.3.3
[46] reprex_2.0.2        lubridate_1.9.2     googledrive_2.0.0
[49] timechange_0.2.0    assertthat_0.2.1    httr_1.4.4
[52] R6_2.5.1            units_0.8-1         compiler_4.2.2

@kadyb
Copy link
Contributor

kadyb commented Apr 5, 2023

@RikFerreira, this is fixed in version 1.0.10, so the update should fix this problem.

@RikFerreira
Copy link

Thak you for your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted ❤️ we'd love your help!
Projects
None yet
Development

No branches or pull requests

8 participants