New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue using st_intersection #394

Closed
acanion opened this Issue Jun 20, 2017 · 12 comments

Comments

Projects
None yet
4 participants
@acanion

acanion commented Jun 20, 2017

I am running into memory issues using st_intersection on two sf objects. The first one is 19.4 MB (16398 obs. of 4 variables) and the second is 17.1 MB (6869 obs. of 2 variables).

int = st_intersection(test,hsg2)

Error in CPL_geos_op2(op, st_geometry(x), st_geometry(y)) :
Evaluation error: std::bad_alloc.

I hope this is enough information to decide whether this a bug worth investigating or whether it's just a problem on my end. I can provide any info or files needed.

@edzer

This comment has been minimized.

Show comment
Hide comment
@edzer

edzer Jun 20, 2017

Member

I'd be happy to try on my machine if you make the data available, but it would also help if you gave your sessionInfo() here.

Member

edzer commented Jun 20, 2017

I'd be happy to try on my machine if you make the data available, but it would also help if you gave your sessionInfo() here.

@acanion

This comment has been minimized.

Show comment
Hide comment
@acanion

acanion Jun 20, 2017

Rdata file:
test_intersection.zip

Session info:

R version 3.4.0 (2017-04-21)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] units_0.4-4 mapview_2.0.1 leaflet_1.1.0 dplyr_0.5.0 sf_0.4-3

loaded via a namespace (and not attached):
[1] rgdal_1.2-7 codetools_0.2-15 digest_0.6.12 htmltools_0.3.6 R6_2.2.0 scales_0.4.1
[7] assertthat_0.2.0 grid_3.4.0 R.utils_2.5.0 munsell_0.4.3 compiler_3.4.0 tibble_1.3.0
[13] gdalUtils_2.0.1.7 httpuv_1.3.3 crosstalk_1.0.0 lattice_0.20-35 viridisLite_0.2.0 mime_0.5
[19] DBI_0.6-1 R.methodsS3_1.7.1 foreach_1.4.3 iterators_1.0.8 shiny_1.0.3 raster_2.5-8
[25] jsonlite_1.4 sp_1.2-4 plyr_1.8.4 base64enc_0.1-3 stats4_3.4.0 magrittr_1.5
[31] png_0.1-7 udunits2_0.13 colorspace_1.3-2 yaml_2.1.14 tools_3.4.0 satellite_0.2.0
[37] webshot_0.4.1 htmlwidgets_0.8 xtable_1.8-2 R.oo_1.21.0 Rcpp_0.12.10

acanion commented Jun 20, 2017

Rdata file:
test_intersection.zip

Session info:

R version 3.4.0 (2017-04-21)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] units_0.4-4 mapview_2.0.1 leaflet_1.1.0 dplyr_0.5.0 sf_0.4-3

loaded via a namespace (and not attached):
[1] rgdal_1.2-7 codetools_0.2-15 digest_0.6.12 htmltools_0.3.6 R6_2.2.0 scales_0.4.1
[7] assertthat_0.2.0 grid_3.4.0 R.utils_2.5.0 munsell_0.4.3 compiler_3.4.0 tibble_1.3.0
[13] gdalUtils_2.0.1.7 httpuv_1.3.3 crosstalk_1.0.0 lattice_0.20-35 viridisLite_0.2.0 mime_0.5
[19] DBI_0.6-1 R.methodsS3_1.7.1 foreach_1.4.3 iterators_1.0.8 shiny_1.0.3 raster_2.5-8
[25] jsonlite_1.4 sp_1.2-4 plyr_1.8.4 base64enc_0.1-3 stats4_3.4.0 magrittr_1.5
[31] png_0.1-7 udunits2_0.13 colorspace_1.3-2 yaml_2.1.14 tools_3.4.0 satellite_0.2.0
[37] webshot_0.4.1 htmlwidgets_0.8 xtable_1.8-2 R.oo_1.21.0 Rcpp_0.12.10

@edzer

This comment has been minimized.

Show comment
Hide comment
@edzer

edzer Jun 20, 2017

Member

I succeeded on my 16 Gb laptop, but the R process in the end needed 11 Gb memory. If you have that much, do install the 64 bits version of R first.

Member

edzer commented Jun 20, 2017

I succeeded on my 16 Gb laptop, but the R process in the end needed 11 Gb memory. If you have that much, do install the 64 bits version of R first.

@tcovert

This comment has been minimized.

Show comment
Hide comment
@tcovert

tcovert Jun 20, 2017

I had a similar problem that I solved using st_join(): a million or so points intersected with 10k or so polygons.

Note that st_intersection() pre-allocates memory for a full cartesian product of the two sf objects, while some other st_ tools first check for the existence of intersections before allocating memory (st_join(), for example).

Is there any reason why st_intersection() allocates that much RAM to start? I can see why it would make sense when the number of features is small or the user is confident that the resulting intersection will have roughly the same size as the cartesian product, but I would guess that this is an infrequent use case.

tcovert commented Jun 20, 2017

I had a similar problem that I solved using st_join(): a million or so points intersected with 10k or so polygons.

Note that st_intersection() pre-allocates memory for a full cartesian product of the two sf objects, while some other st_ tools first check for the existence of intersections before allocating memory (st_join(), for example).

Is there any reason why st_intersection() allocates that much RAM to start? I can see why it would make sense when the number of features is small or the user is confident that the resulting intersection will have roughly the same size as the cartesian product, but I would guess that this is an infrequent use case.

edzer added a commit that referenced this issue Jun 20, 2017

@edzer

This comment has been minimized.

Show comment
Hide comment
@edzer

edzer Jun 20, 2017

Member

@tcovert please give me more of such golden tips. Memory usage is now way under a gigabyte for the example above from @acanion .

Member

edzer commented Jun 20, 2017

@tcovert please give me more of such golden tips. Memory usage is now way under a gigabyte for the example above from @acanion .

@tcovert

This comment has been minimized.

Show comment
Hide comment
@tcovert

tcovert Jun 20, 2017

I wish it were easy! I am not a C++ programmer, so I don't exactly know what to look for, but when I saw that intersection was allocating a cartesian product I figured it was possible to do better.

tcovert commented Jun 20, 2017

I wish it were easy! I am not a C++ programmer, so I don't exactly know what to look for, but when I saw that intersection was allocating a cartesian product I figured it was possible to do better.

@tim-salabim

This comment has been minimized.

Show comment
Hide comment
@tim-salabim

tim-salabim Jun 20, 2017

Member

I actually had encountered these issues numerous times but due to time pressure at work and corporate data didn't get around to post issues. Though be sure that performance improvements are highly appreciated. And I hope to find more time and data to report issues that arise during our standard workflow. In any case, sf has made our lives so much easier!

Member

tim-salabim commented Jun 20, 2017

I actually had encountered these issues numerous times but due to time pressure at work and corporate data didn't get around to post issues. Though be sure that performance improvements are highly appreciated. And I hope to find more time and data to report issues that arise during our standard workflow. In any case, sf has made our lives so much easier!

@acanion

This comment has been minimized.

Show comment
Hide comment
@acanion

acanion Jun 21, 2017

I still need to get the areas of the new polygons from the intersection. My solution was to use aggregate.sf to create multipolygon features for 28 categories in the larger object, then run st_intersection.

acanion commented Jun 21, 2017

I still need to get the areas of the new polygons from the intersection. My solution was to use aggregate.sf to create multipolygon features for 28 categories in the larger object, then run st_intersection.

edzer added a commit that referenced this issue Jun 22, 2017

@edzer

This comment has been minimized.

Show comment
Hide comment
@edzer

edzer Jun 22, 2017

Member

Also, a factor 20 speedup (in this case) when using a spatial index, see here.

Member

edzer commented Jun 22, 2017

Also, a factor 20 speedup (in this case) when using a spatial index, see here.

@tcovert

This comment has been minimized.

Show comment
Hide comment
@tcovert

tcovert Jun 23, 2017

Cool - can't wait for it to be available on CRAN! For some reason the current devtools version of sf gives me the error I described previously:

Assertion failed: (0), function query, file AbstractSTRtree.cpp, line 287.
Abort trap: 6

tcovert commented Jun 23, 2017

Cool - can't wait for it to be available on CRAN! For some reason the current devtools version of sf gives me the error I described previously:

Assertion failed: (0), function query, file AbstractSTRtree.cpp, line 287.
Abort trap: 6
@edzer

This comment has been minimized.

Show comment
Hide comment
@edzer

edzer Jun 23, 2017

Member

Was that on the example above,

 int = st_intersection(test,hsg2)

? And by devtools, you mean: install from github?

Member

edzer commented Jun 23, 2017

Was that on the example above,

 int = st_intersection(test,hsg2)

? And by devtools, you mean: install from github?

@tcovert

This comment has been minimized.

Show comment
Hide comment
@tcovert

tcovert Jun 23, 2017

yeah - that is what I meant. I ended up solving the problem by getting rid of duplicate geometry libraries from an old QGIS install. Now I think I am working with a pure homebrew setup and devtools worked. By the way, spatial indices are awesome! Thank you for implementing them.

tcovert commented Jun 23, 2017

yeah - that is what I meant. I ended up solving the problem by getting rid of duplicate geometry libraries from an old QGIS install. Now I think I am working with a pure homebrew setup and devtools worked. By the way, spatial indices are awesome! Thank you for implementing them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment