Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue using st_intersection #394

Closed
acanion opened this issue Jun 20, 2017 · 13 comments
Closed

Memory issue using st_intersection #394

acanion opened this issue Jun 20, 2017 · 13 comments

Comments

@acanion
Copy link

acanion commented Jun 20, 2017

I am running into memory issues using st_intersection on two sf objects. The first one is 19.4 MB (16398 obs. of 4 variables) and the second is 17.1 MB (6869 obs. of 2 variables).

int = st_intersection(test,hsg2)

Error in CPL_geos_op2(op, st_geometry(x), st_geometry(y)) :
Evaluation error: std::bad_alloc.

I hope this is enough information to decide whether this a bug worth investigating or whether it's just a problem on my end. I can provide any info or files needed.

@edzer
Copy link
Member

edzer commented Jun 20, 2017

I'd be happy to try on my machine if you make the data available, but it would also help if you gave your sessionInfo() here.

@acanion
Copy link
Author

acanion commented Jun 20, 2017

Rdata file:
test_intersection.zip

Session info:

R version 3.4.0 (2017-04-21)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] units_0.4-4 mapview_2.0.1 leaflet_1.1.0 dplyr_0.5.0 sf_0.4-3

loaded via a namespace (and not attached):
[1] rgdal_1.2-7 codetools_0.2-15 digest_0.6.12 htmltools_0.3.6 R6_2.2.0 scales_0.4.1
[7] assertthat_0.2.0 grid_3.4.0 R.utils_2.5.0 munsell_0.4.3 compiler_3.4.0 tibble_1.3.0
[13] gdalUtils_2.0.1.7 httpuv_1.3.3 crosstalk_1.0.0 lattice_0.20-35 viridisLite_0.2.0 mime_0.5
[19] DBI_0.6-1 R.methodsS3_1.7.1 foreach_1.4.3 iterators_1.0.8 shiny_1.0.3 raster_2.5-8
[25] jsonlite_1.4 sp_1.2-4 plyr_1.8.4 base64enc_0.1-3 stats4_3.4.0 magrittr_1.5
[31] png_0.1-7 udunits2_0.13 colorspace_1.3-2 yaml_2.1.14 tools_3.4.0 satellite_0.2.0
[37] webshot_0.4.1 htmlwidgets_0.8 xtable_1.8-2 R.oo_1.21.0 Rcpp_0.12.10

@edzer
Copy link
Member

edzer commented Jun 20, 2017

I succeeded on my 16 Gb laptop, but the R process in the end needed 11 Gb memory. If you have that much, do install the 64 bits version of R first.

@tcovert
Copy link

tcovert commented Jun 20, 2017

I had a similar problem that I solved using st_join(): a million or so points intersected with 10k or so polygons.

Note that st_intersection() pre-allocates memory for a full cartesian product of the two sf objects, while some other st_ tools first check for the existence of intersections before allocating memory (st_join(), for example).

Is there any reason why st_intersection() allocates that much RAM to start? I can see why it would make sense when the number of features is small or the user is confident that the resulting intersection will have roughly the same size as the cartesian product, but I would guess that this is an infrequent use case.

@edzer
Copy link
Member

edzer commented Jun 20, 2017

@tcovert please give me more of such golden tips. Memory usage is now way under a gigabyte for the example above from @acanion .

@tcovert
Copy link

tcovert commented Jun 20, 2017

I wish it were easy! I am not a C++ programmer, so I don't exactly know what to look for, but when I saw that intersection was allocating a cartesian product I figured it was possible to do better.

@tim-salabim
Copy link
Member

I actually had encountered these issues numerous times but due to time pressure at work and corporate data didn't get around to post issues. Though be sure that performance improvements are highly appreciated. And I hope to find more time and data to report issues that arise during our standard workflow. In any case, sf has made our lives so much easier!

@acanion
Copy link
Author

acanion commented Jun 21, 2017

I still need to get the areas of the new polygons from the intersection. My solution was to use aggregate.sf to create multipolygon features for 28 categories in the larger object, then run st_intersection.

@edzer
Copy link
Member

edzer commented Jun 22, 2017

Also, a factor 20 speedup (in this case) when using a spatial index, see here.

@tcovert
Copy link

tcovert commented Jun 23, 2017

Cool - can't wait for it to be available on CRAN! For some reason the current devtools version of sf gives me the error I described previously:

Assertion failed: (0), function query, file AbstractSTRtree.cpp, line 287.
Abort trap: 6

@edzer
Copy link
Member

edzer commented Jun 23, 2017

Was that on the example above,

 int = st_intersection(test,hsg2)

? And by devtools, you mean: install from github?

@tcovert
Copy link

tcovert commented Jun 23, 2017

yeah - that is what I meant. I ended up solving the problem by getting rid of duplicate geometry libraries from an old QGIS install. Now I think I am working with a pure homebrew setup and devtools worked. By the way, spatial indices are awesome! Thank you for implementing them.

@jebyrnes
Copy link

jebyrnes commented Feb 6, 2019

Oddly, had this pop up today as well with st_union while running on a cluster with 32GB memory allocated

Error in CPL_geos_union(st_geometry(x), by_feature) :
Evaluation error: std::bad_alloc.

But, it's a sporadic error - I reran the same line of code on the object, and it was fine. Any thoughts to what one might do to have your system poised to deal with this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants