`st_sample` for very small, scattered multipolygons #1480

edzer · 2020-09-03T14:52:29Z

This issue was brought up to me by email. The naive algorithm used in st_sample to randomly sample n observatoins from a (multi-)polygon was to

compute the area of the polygon A
compute the area of its bounding box B
sample n * B / A points from the bounding box
select those within the polygon
if exact = TRUE continue sampling until we have n

This algorithm fails if B/A is very large, e.g. when sampling a very rare land use class over a large region (out of memory errors). The solution I worked on is this:

compute the areas of sub-polygons
distribute n proportional to these sub-areas
sample each sub-polygon

This worked somewhat, but wouldn't converge because many n values would be rounded to 0. The current implementation (which is not satisfactory) will, for polygons with rounded n of zero, use rbinom to draw a sample with probability n. See:

bbb7337
4a910da
bb81399

I think the better solution would be to

compute the areas of the sub-polygons
divide them by the total
use that as a cutoff values for a uniform distributions
draw n U[0,1] values
classify them according to this distribution, and then draw the target n' values from each subpolygon.

This still needs to be done.

The text was updated successfully, but these errors were encountered:

jfberner · 2021-06-15T20:31:08Z

I think I might have come up with a workaround on this. I had a set of 520 occurrence records, and needed to resample each of them inside a buffer around the original point, to see if coordinate precision will influence my model or not.

For that, I got the original dataset as a dataframe (directly imported from a csv and cleaned with cleancoordinates() )

> head(occd.cl)
  species       lat       lon uncert
1   alces  0.901015 -52.00409   2895
2   alces  3.802764 -61.74121   4181
3   alces  3.789833 -54.31493  65117
4   alces  5.664470 -55.17876  21503
5 armiger -0.269760 -79.10875   9725
6 batesii -9.522227 -66.92051  24918

turned it into a spatial points data frame and assigned it WGS84 projection:

coordinates(occd.cl) <- ~lon + lat
proj4string(occd.cl) <- projection(raster())

generated the buffer around the points and turned them into polygons using the sp package:

radius <- 0.5 #1=100km
occ.buffer <- gBuffer(spgeom = occd.cl, byid = T, 
                      width = radius, quadsegs = 100, 
                      capStyle = 'ROUND' , joinStyle = 'ROUND')

and then converted into sfc:
occb.sf <- as(occ.buffer, "sfc")

Finally, I created a copy of the original points (class=SpatialPointDataFrame) to receive the data from st_sample, and used for to generate a single point for each polygon:

occ.rd.dt.1 <- as(occd.cl,"sfc")  

for(i in 1:length(occ.rd.dt.1)){
 occ.rd.dt.1[i] <- st_sample(x =occb.sf[i,], size = 1, type = "random", exact =T, by_polygon = T)
}

To my understanding, this has generated a random point inside the polygons generated around the original points. This is what I needed to do, but I think the iteration at the end of the code can work with the description of this topic. I'm sorry about all the detail, but I think it's always necessary.

I'll completely understand if I missed the point and this has nothing to do with it, and this comment has to be removed.

edzer · 2021-06-15T21:12:09Z

Thanks! I think your sample is a stratified random sample, and your algorithm works because your polygons (buffers) are of equal size. The issue I tried to address is to randomly sample over a set of very small polygons (or a multi-polygon with sub-polygons) of unequal size.

edzer added a commit that referenced this issue Sep 3, 2020

mention #1480; turn off using s2 by default

042701c

edzer closed this as completed Mar 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`st_sample` for very small, scattered multipolygons #1480

`st_sample` for very small, scattered multipolygons #1480

edzer commented Sep 3, 2020

jfberner commented Jun 15, 2021

edzer commented Jun 15, 2021

st_sample for very small, scattered multipolygons #1480

st_sample for very small, scattered multipolygons #1480

Comments

edzer commented Sep 3, 2020

jfberner commented Jun 15, 2021

edzer commented Jun 15, 2021

`st_sample` for very small, scattered multipolygons #1480

`st_sample` for very small, scattered multipolygons #1480