The goal of controlledburn is to rasterize without materializing any pixel values.
A very fast rasterization algorithm for polygons in fasterize does the following:
- restructures polygons to edges
- considers only non-horizontal polygon edges and indexes them in y,x order (slope always down)
- determines all affected raster rows and scans these for edge start and end column
- burns polygon value or identity into every pixel between those starts and ends
controlledburn avoids that last step. With these rasterization functions the return value is not a set of pixels or a side-effect of a materialized raster file but simply those row and column start and end indexes.
This is an expression of the cell abstraction wish item fasterize/issues/11.
- move to cpp11
- rasterize lines fasterize/issues/30
- formats for import (wk, geos, grd, rct, triangles etc.)
- streaming with wkb/xy unpack with wk
- provide output options (see next section)
- port back into fasterize, with options for efficiently writing out to a tiled and sparse GeoTIFF
- points is too easy, see vaster::cell_from_xy
- name the package
- copy logic from fasterize, and remove Armadillo array handling
- remove use of raster objects, in favour of input extent and dimension
- remove all trace of the raster package
- implement return of the ‘yline, xpix’ and polygon ID info to user (see below)
- make return of ylin,xpix structure efficient (CollectorList.h ftw)
Lines is working but still has some problems. For polygons it’s pretty good, see fasterize #6 for a remaining issue.
Internal function burn_polygon()
has arguments sf
, extent
,
dimension
. The first is a sf polygons data frame, extent is
c(xmin, xmax, ymin, ymax)
and dimension is c(ncol, nrow)
. In
fasterize you need an actual non-materialized RasterLayer object, but
all that was really used for was the six numbers extent, dimension and
as the shell for the in-memory output.
The output of burn_polygon()
is a list of four-element indexes
start,end,row,poly_id
- these are zero-based atm because they reflect
the underlying C++ code. Examples shown here flatten this to a 4-column
matrix (and add 1).
These options are still in play for what the interface/s could do:
- record presence of polygon (this is all we have currently) OR ID of polygon
- tools to format this meaningfully, and plot lazily (see example for quick plot)
- tools to materialize as actual raster data
I also found this real world example for which this is relevant, discussed in PROJ for very fast lookup for large non-materialized (highly compressed) grids by Thomas Knudsen:
You can install the development version of controlledburn like so:
remotes::install_github("hypertidy/controlledburn")
This is a basic example, this is fast, and shows that it works.
pols <- silicate::inlandwaters
library(vaster)
#>
#> Attaching package: 'vaster'
#> The following object is masked from 'package:stats':
#>
#> ts
## define a raster (xmin, xmax, ymin, ymax), (ncol, nrow)
ext <- unlist(lapply(silicate::sc_vertex(pols), range))
dm <- c(50, 40)
r <- controlledburn:::burn_polygon(pols, extent = ext,
dimension = dm)
## our index is triplets of start,end,line where the polygon edge was detected -
## this essentially an rle by scanline of start,end polygon coverage
index <- matrix(unlist(r, use.names = F), ncol = 4L, byrow = TRUE) + 1 ## plus one because 0-index internally
## plot just the start and ends of each scanline detected
xy0 <- vaster::xy_from_cell(dm, ext, vaster::cell_from_row_col(dm, index[,c(3, 3)], index[,1:2]))
plot(xy0)
plot(silicate::PATH0(pols), add = TRUE)
## expand out to every cell
cr <- do.call(rbind, apply(index, 1,\(.x) cbind(seq(.x[1], .x[2]), .x[3], .x[4])))
xy <- vaster::xy_from_cell(dm, ext, vaster::cell_from_row_col(dm, cr[,2], cr[,1]))
plot(xy, pch = 19, cex = .3, col = palr::d_pal(cr[,3]))
plot(silicate::PATH0(pols), add = TRUE)
It scales to very large tasks, with small output.
dm <- c(500000, 400000)
system.time(r <- controlledburn:::burn_polygon(pols, extent = ext,
dimension = dm))
#> user system elapsed
#> 0.913 0.068 0.982
length(r)
#> [1] 989153
## consider a prod(dm) raster of type double (or even bool) obviously would compress again but why churn
pryr::object_size(r)
#> 71.22 MB
The following is inefficient, but shows that we get the right result.
dm <- c(500, 400)
system.time(r <- controlledburn:::burn_polygon(pols, extent = ext,
dimension = dm))
#> user system elapsed
#> 0.002 0.001 0.003
index <- matrix(unlist(r, use.names = F), ncol = 4L, byrow = TRUE) + 1 ## plus one because 0-index internally
## now go inefficient, this is every column,row index, then converted to cell, converted to xy
cr <- do.call(rbind, apply(index, 1, \(.x) cbind(seq(.x[1], .x[2]), .x[3], .x[4])))
xy <- vaster::xy_from_cell(dm, ext, vaster::cell_from_row_col(dm, cr[,2], cr[,1]))
plot(xy, pch = ".", col = palr::d_pal(cr[,3]))
rr <- terra::rast(cbind(xy, 0), type = "xyz")
rr[terra::cellFromXY(rr, xy)] <- 1
terra::plot(rr ,col = "firebrick", asp = NA)
plot(silicate::SC0(pols), add = TRUE)
Please note that the controlledburn project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.