Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

st_sample for very small, scattered multipolygons #1480

Closed
edzer opened this issue Sep 3, 2020 · 2 comments
Closed

st_sample for very small, scattered multipolygons #1480

edzer opened this issue Sep 3, 2020 · 2 comments

Comments

@edzer
Copy link
Member

edzer commented Sep 3, 2020

This issue was brought up to me by email. The naive algorithm used in st_sample to randomly sample n observatoins from a (multi-)polygon was to

  • compute the area of the polygon A
  • compute the area of its bounding box B
  • sample n * B / A points from the bounding box
  • select those within the polygon
  • if exact = TRUE continue sampling until we have n

This algorithm fails if B/A is very large, e.g. when sampling a very rare land use class over a large region (out of memory errors). The solution I worked on is this:

  • compute the areas of sub-polygons
  • distribute n proportional to these sub-areas
  • sample each sub-polygon

This worked somewhat, but wouldn't converge because many n values would be rounded to 0. The current implementation (which is not satisfactory) will, for polygons with rounded n of zero, use rbinom to draw a sample with probability n. See:

bbb7337
4a910da
bb81399

I think the better solution would be to

  • compute the areas of the sub-polygons
  • divide them by the total
  • use that as a cutoff values for a uniform distributions
  • draw n U[0,1] values
  • classify them according to this distribution, and then draw the target n' values from each subpolygon.

This still needs to be done.

@jfberner
Copy link

I think I might have come up with a workaround on this. I had a set of 520 occurrence records, and needed to resample each of them inside a buffer around the original point, to see if coordinate precision will influence my model or not.

For that, I got the original dataset as a dataframe (directly imported from a csv and cleaned with cleancoordinates() )

> head(occd.cl)
  species       lat       lon uncert
1   alces  0.901015 -52.00409   2895
2   alces  3.802764 -61.74121   4181
3   alces  3.789833 -54.31493  65117
4   alces  5.664470 -55.17876  21503
5 armiger -0.269760 -79.10875   9725
6 batesii -9.522227 -66.92051  24918

turned it into a spatial points data frame and assigned it WGS84 projection:

coordinates(occd.cl) <- ~lon + lat
proj4string(occd.cl) <- projection(raster())

generated the buffer around the points and turned them into polygons using the sp package:

radius <- 0.5 #1=100km
occ.buffer <- gBuffer(spgeom = occd.cl, byid = T, 
                      width = radius, quadsegs = 100, 
                      capStyle = 'ROUND' , joinStyle = 'ROUND')

and then converted into sfc:
occb.sf <- as(occ.buffer, "sfc")

Finally, I created a copy of the original points (class=SpatialPointDataFrame) to receive the data from st_sample, and used for to generate a single point for each polygon:

occ.rd.dt.1 <- as(occd.cl,"sfc")  

for(i in 1:length(occ.rd.dt.1)){
 occ.rd.dt.1[i] <- st_sample(x =occb.sf[i,], size = 1, type = "random", exact =T, by_polygon = T)
}

To my understanding, this has generated a random point inside the polygons generated around the original points. This is what I needed to do, but I think the iteration at the end of the code can work with the description of this topic. I'm sorry about all the detail, but I think it's always necessary.

I'll completely understand if I missed the point and this has nothing to do with it, and this comment has to be removed.

@edzer
Copy link
Member Author

edzer commented Jun 15, 2021

Thanks! I think your sample is a stratified random sample, and your algorithm works because your polygons (buffers) are of equal size. The issue I tried to address is to randomly sample over a set of very small polygons (or a multi-polygon with sub-polygons) of unequal size.

@edzer edzer closed this as completed Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants