using sf::st_as_text in data.frame is very slow #947

harryprince · 2019-01-11T12:09:29Z

currently, I am working on district polygon building.
I wish to read a .geojson and convert it into .tsv for Hive.

library(dplyr)
library(data.table)
library(sf)

building  <– sf::st_read("~/buildings_wgs84_2016.geojson")

building  %>% 
  data.table::as.data.table() -> d

this step will cost almost 30min with 1M rows data.

d[,geometry:=sf::st_as_text(geometry),]

data.table is commonly faster dplyr in this scenario.

d %>% readr::write_tsv("tmp.tsv")

The text was updated successfully, but these errors were encountered:

Robinlovelace · 2019-01-11T13:12:12Z

Do you have a reproducible example? That could help benchmarking, testing and, in the context of this issue, knowing if it is 'solved'.

edzer · 2019-01-11T14:01:01Z

See also #800

harryprince · 2019-01-11T22:09:44Z

@Robinlovelace
I can't provide this .geojson for you, for the sake of the company private. I have tried many .geojson files which downloaded from osm, when the table is more than 1,000,000 rows, it seems pretty slow and cost me 30 mins. Except the .geojson, the rest of part is all reproducible.

Robinlovelace · 2019-01-12T08:35:48Z

Reproducible example with a smaller dataset (a 13 MB geojson file):

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(sf))
f2 = "promenade-all.geojson"
u = "https://github.com/spnethack/spnethack/releases/download/0.1/promenade-all.geojson"
download.file(u, destfile = f2)

system.time({b = read_sf(f2)})
#>    user  system elapsed 
#>   3.474   0.102   3.580
system.time({d = b  %>% data.table::as.data.table()})
#>    user  system elapsed 
#>   0.027   0.000   0.027
system.time(d[,geometry:=sf::st_as_text(geometry),])
#>    user  system elapsed 
#>   6.422   0.004   6.427
system.time(d %>% readr::write_tsv("tmp.tsv"))
#>    user  system elapsed 
#>   1.038   0.040   1.077

^{Created on 2019-01-12 by the reprex package (v0.2.1)}

Robinlovelace · 2019-01-12T08:38:34Z

Note, there's a bit on benchmarking here: https://geocompr.github.io/geocompkg/articles/benchmark.html

Hoping that issues/conversations like this will motivate us to add more reproducible benchmarks to that document.

harryprince · 2019-01-13T12:20:48Z

Thanks for your proposal. My file is around 400M, which is great than 13M as your link. When I head 10000 records it works fine too but fail at more large scale. I guess the reason is sf::st_as_text running in an only single thread, in that when the data is not big, we can't find the problem.

close r-spatial#800, r-spatial#947, r-spatial#703, r-spatial#747

etiennebr added a commit to etiennebr/sf that referenced this issue Jan 19, 2019

accelerate printing

03f2aa6

close r-spatial#800, r-spatial#947, r-spatial#703, r-spatial#747

etiennebr added tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day and removed tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day labels Jan 19, 2019

etiennebr mentioned this issue Jan 19, 2019

accelerate printing #957

Closed

edzer closed this as completed Mar 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using sf::st_as_text in data.frame is very slow #947

using sf::st_as_text in data.frame is very slow #947

harryprince commented Jan 11, 2019 •

edited

Loading

Robinlovelace commented Jan 11, 2019

edzer commented Jan 11, 2019

harryprince commented Jan 11, 2019 •

edited

Loading

Robinlovelace commented Jan 12, 2019

Robinlovelace commented Jan 12, 2019

harryprince commented Jan 13, 2019 •

edited

Loading

using sf::st_as_text in data.frame is very slow #947

using sf::st_as_text in data.frame is very slow #947

Comments

harryprince commented Jan 11, 2019 • edited Loading

Robinlovelace commented Jan 11, 2019

edzer commented Jan 11, 2019

harryprince commented Jan 11, 2019 • edited Loading

Robinlovelace commented Jan 12, 2019

Robinlovelace commented Jan 12, 2019

harryprince commented Jan 13, 2019 • edited Loading

harryprince commented Jan 11, 2019 •

edited

Loading

harryprince commented Jan 11, 2019 •

edited

Loading

harryprince commented Jan 13, 2019 •

edited

Loading