# Language in Space

## Session 07: Vector data operations

### Gerhard Jäger

December 9, 2021


In [None]:
options(repr.plot.width=12, repr.plot.height=9)


In [None]:
library(tidyverse)
library(sf)
library(spData)


## Spatial data operations

When working with spatial data sets, we want to use the spatial information for various data processing tasks. 

### Examples

- Join a tibble with polygon geometries to a tibble with point geometries such that each polygon is joined with each point within its area.

- When summarizing observations with polygon geometries, assign the union of the geometries to the aggregated observations.

- Find all observations that are within 100 km distance of a given point.

- Form the intersection of polygons (e.g., to crop a map).

- Change the shape and location of geometries, e.g. shifting, scaling, mirroring or rotating them.



### Topological relations

Topological relations are relations between geometric objects that remain constant under continuous transformations like shifting, scaling, rotating or deforming.


**Example**

In [None]:
# create a polygon
a_poly = st_polygon(list(rbind(c(-1, -1), c(1, -1), c(1, 1), c(-1, -1))))
a = st_sfc(a_poly)

# create a second polygon

b_poly = st_polygon(list(rbind(c(0.1, -1), c(0.1, 0), c(.9, 0), c(.9, -1), c(.1, -1))))
b = st_sfc(b_poly)

# create a line
l_line = st_linestring(x = matrix(c(-1, -1, -0.5, 1), ncol = 2))
l = st_sfc(l_line)
# create points
p_matrix = matrix(c(0.5, 1, -1, 0, 0, 1, 0.5, 1), ncol = 2)
p_multi = st_multipoint(x = p_matrix)
p = st_cast(st_sfc(p_multi), "POINT")

exampleGeometries <- st_sf(
    names = c("a", "b", "l", "p1", "p2", "p3", "p4"),
    type = c("polygon", "polygon", "line", "point", "point", "point", "point"),
    c(a, b, l, p)
)
exampleGeometries

In [None]:
exampleGeometries %>%
    ggplot() +
    theme_bw() +
    geom_sf(aes(col=names), alpha=0.5, size=4) +
    geom_sf_label(aes(label=names), size=5, nudge_x=.1)  


Which points intersect with the large polygon?

In [None]:
st_intersects(p, a, sparse=F)

Note that the result for `p2` is `TRUE`, even though the point is at the boundary of the polygon.

The two polygons of course also intersect.

In [None]:
st_intersects(b, a, sparse=F)

The opposite of `sp_intersects` is `sp_disjoint`.

In [None]:
st_disjoint(p, a, sparse = F)


`st_within` returns `TRUE` only in case of complete inclusion. (NB: This is not a topological relation.)

In [None]:
st_within(p, a, sparse=F)

In [None]:
st_within(p, b, sparse=F)

`st_touches` is true if a point or line or border of a polygon includes the other object.

In [None]:
st_touches(p, a, sparse=F)

`st_is_within_distance` does what its name says.

In [None]:
st_is_within_distance(p, a, dist = 0.9, sparse=F)

## Geometric operations on vector data

Geometric operations create geometric objects out of geometric objects. They include

- simplification
- finding centroids of areas
- creating buffers around geometries
- affine transformations (shifting, scaling, rotating)
- clipping
- geometric union
- Voronoi tesselation

### Geometric simplification

Linestrings and polygons may consists of many segments, not all of which are needed for a specific purpose.

Consider this representation of the river Seine with tributuaries from `spData`:

In [None]:
seine

In [None]:
seine %>%
    st_geometry() %>%
    st_cast("POINT")

In [None]:
seine %>%
    ggplot() +
    geom_sf()

With `st_simplify` we can remove points while preserving the overall shape. The `dTolarance` argument determines how strongly the shape is simplified.

In [None]:
dt = 2000 # units are metre
seine %>%
    st_simplify(dTolerance=dt) %>%
    st_geometry() %>%
    st_cast("POINT") %>%
    length()

In [None]:
seine %>%
    st_simplify(dTolerance=dt) %>%
    ggplot() +
    geom_sf()

### Simplification of polygons

`st_simplify` also applies to polygons. There is a catch though.



In [None]:
us_states_wu <- us_states %>%
    mutate(AREA = as.numeric(AREA))

In [None]:
us_states_wu %>%
    ggplot() +
    geom_sf()

In [None]:

us_states_wu %>%
    st_simplify(dTolerance=100000) %>%
    ggplot() +
    geom_sf()

`st_simplify` simplifies each geometry individually, without regard of shared borders. 

An alternative is the function `ms_simplify` from the package `rmapshaper`.

In [None]:
library(rmapshaper)
us_states_wu %>%
    ms_simplify(keep=0.01, keep_shapes=T) %>%
    ggplot() +
    geom_sf()


## Centroids

The *centroid* of an area is its middle point. There are multiple ways to define what "middle" means hear. In the simplest case, it is just the center of gravity.

`st_centroid` computes the centroids of geometries.

In [None]:
world %>%
    st_centroid() %>%
    ggplot() +
    geom_sf() +
    geom_sf(data=world, alpha=0)

The operation can also be applied to linestrings.

In [None]:
seine %>%
    st_centroid() %>%
    ggplot() +
    geom_sf(col="red", size=5) +
    geom_sf(data=seine)

Here you see that the centroid of an object need not be included in it. We can enforce this with `st_point_on_surface`. 

This is also useful for multipolygons with several components, or concavely shaped polygons.

In [None]:
seine %>%
    st_point_on_surface() %>%
    ggplot() +
    geom_sf(col="red", size=5) +
    geom_sf(data=seine)

## Buffers

A spatial *buffer* around a geometry is the area of points within a certain distance of this geometry. It is always a (multi)polygon.

In [None]:
seine

In [None]:
seine %>%
    st_buffer(dist = 5000) %>%
    ggplot() +
    theme_bw() +
    geom_sf(aes(fill=name), alpha=0.5) +
    geom_sf(data=seine)

In [None]:
seine %>%
    st_buffer(dist = 50000) %>%
    ggplot() +
    theme_bw() +
    geom_sf(aes(fill=name), alpha=0.5) +
    geom_sf(data=seine)

## Affine transformations

These are all geometric transformations that preserve straight lines and parallels. They are applicable to geometries and geometry columns. The implementation in `sf` is so that we can apply the same operations we would apply to vectors.


### Translation

Simply add the translation vector to the geometries.

In [None]:
(exampleGeometries %>%
    st_geometry() + c(1, 1.5)) %>% 
    ggplot() +
    geom_sf(col='red') +
    geom_sf(data=exampleGeometries, alpha=0.5)



### Mirroring

Multiply the geometries with the matrix
$$
\begin{pmatrix}
-1 & 0\\
0 & 1
\end{pmatrix}
$$

for mirroring along the $x$-axis, and 

$$
\begin{pmatrix}
1 & 0\\
0 & -1
\end{pmatrix}
$$

along the $y$-axis.


In [None]:

(exampleGeometries %>%
    st_geometry() * matrix(c(-1, 0, 0, 1), nrow=2)) %>%
    ggplot() +
    geom_sf(col='red') +
    geom_sf(data=exampleGeometries, alpha=0.5)

In [None]:
(exampleGeometries %>%
    st_geometry() * matrix(c(1, 0, 0, -1), nrow=2)) %>%
    ggplot() +
    geom_sf(col='red') +
    geom_sf(data=exampleGeometries, alpha=0.5)

### Skewing

Multiply the geometries with the matrix 
$$
\begin{pmatrix}
u \\v
\end{pmatrix}
$$

where $u$ is the image of the vector $\begin{pmatrix}1\\0\end{pmatrix}$, and $v$ the image of $\begin{pmatrix}0\\1\end{pmatrix}$

In [None]:
(skewMatrix <- matrix(
    c(1, 1, 0, 1), nrow=2
))

In [None]:
(exampleGeometries %>%
    st_geometry() * skewMatrix)  %>%
    ggplot() +
    geom_sf(col='red') +
    geom_sf(data=exampleGeometries, alpha=0.5)

### Rotation

works as skewing, via matrix multiplication.

#### Matrix for clock-wise rotation by 45°:

In [None]:
(rotationMatrix <- matrix(
    c(sqrt(.5), sqrt(.5), -sqrt(.5), sqrt(.5)),
    nrow=2
))

In [None]:
(exampleGeometries %>%
    st_geometry() * rotationMatrix)  %>%
    ggplot() +
    geom_sf(col='red') +
    geom_sf(data=exampleGeometries, alpha=0.5)

### Scaling

works by the same principle. In a scaling matrix, only the diagonal entries are non-zero. The upper left corner contains the horizontal scale factor, and the lower right corner contains the vertical scale factor.

In [None]:
(scaleMatrix <- matrix(
    c(2, 0, 0, 0.5),
    nrow=2
))

In [None]:
(exampleGeometries %>%
    st_geometry() * scaleMatrix)  %>%
    ggplot() +
    geom_sf(col='red') +
    geom_sf(data=exampleGeometries, alpha=0.5)

## Spatial set-theoretic operations.

We can apply the standard set-theoretic operations to geometries

In [None]:
p1 = st_point(c(-1, 0))
p2 = st_point(c(1, 0))

circles <- st_sfc(p1, p2) %>%
    st_buffer(dist = 1.5) %>%
    st_sf()

circles[["name"]] <- c("A", "B")

In [None]:
circles

In [None]:
circles %>%
    ggplot() +
    theme_bw() +
    geom_sf(alpha = .5, aes(fill=name))

In [None]:
circles[1,] %>%
    st_intersection(circles[2,]) %>%
    ggplot() +
    theme_bw() +
    geom_sf(fill='red') +
    geom_sf(data=circles, alpha=.4)


In [None]:
circles[1,] %>%
    st_union(circles[2,]) %>%
    ggplot() +
    theme_bw() +
    geom_sf(fill='red') 


In [None]:
circles[1,] %>%
    st_sym_difference(circles[2,]) %>%
    ggplot() +
    theme_bw() +
    geom_sf(fill='red') 


In [None]:
circles[1,] %>%
    st_difference(circles[2,]) %>%
    ggplot() +
    theme_bw() +
    geom_sf(fill='red') 

## Spatial union and data aggregation

When we aggregate data via `group_by` and `summarize`, the geometries of the observations involved are combined via spatial union.

**Example**

In [None]:
world %>%
    group_by(continent) %>%
    summarize(pop = sum(pop, na.rm=T))

In [None]:
world %>%
    group_by(continent) %>%
    summarize(pop = sum(pop, na.rm=T)) %>%
    ggplot() +
    geom_sf(aes(fill=continent))

## Spatial subsetting

Spatial subsetting is the operation of selection a subset of observations from some dataset on the basis of its spatial relation to some object (or collection of objects).

It can be seen as an extension of the `filter` operation from `tidyverse` with spatial filter criteria.

**Example: High points in New Zealand**

Consider the following two datasets from `spData`:

In [None]:
nz %>%
    slice_sample(n=10)

In [None]:
nz_height %>%
    slice_sample(n=10)

In [None]:
nz %>%
    ggplot() +
    geom_sf(aes(fill=Name), alpha=0.2) +
    geom_sf(data=nz_height, col='red', pch=3)

We want to find all high elevation points with the region *Canterbury*.

In [None]:
canterbury <- nz %>%
    filter(Name == "Canterbury")
canterbury

In [None]:
st_intersects(nz_height, canterbury, sparse=F)

In [None]:
canterbury_height <- nz_height %>%
    filter(st_intersects(x=., y=canterbury, sparse=F))

In [None]:
nz %>% 
    ggplot() +
    geom_sf() +
    geom_sf(data=canterbury_height, col='red', pch=3)

The same method can be applied with other geometric relations as well.

## Interactive tasks

Canterbury is the region of New Zealand containing most of the 100 highest points in the country. How many of these high points does the Canterbury region contain?


Which region has the second highest number of nz_height points in, and how many does it have?

Find the geographic centroid of New Zealand. How far is it from the geographic centroid of Canterbury?


Most world maps have a north-up orientation. A world map with a south-up orientation could be created by a reflection (one of the affine transformations not mentioned in this chapter) of the world object’s geometry. Write code to do so. Hint: you need to use a two-element vector for this transformation. Bonus: create an upside-down map of your country.


## Spatial joining

In a regular join operation, two observations from the two tibbles involved are merged if they have the same value for the `by` column.

In *spatial joining*, we combine observations from `sf` objects based on their geometries. It has to be specified which geometric relation holds between them. The default is `st_intersects`.

### Example: Random points on a world map

In [None]:
# bounding box of 'world'

(bb_world = st_bbox(world))

In [None]:
random_df = tibble(
  x = runif(n = 10, min = bb_world[1], max = bb_world[3]),
  y = runif(n = 10, min = bb_world[2], max = bb_world[4])
)
random_df

In [None]:
random_points = random_df %>% 
  st_as_sf(coords = c("x", "y")) %>% # set coordinates
  st_set_crs(4326) # set geographic CRS

random_points
 

In [None]:
world %>% 
    ggplot() +
    geom_sf() +
    geom_sf(data=random_points, color='red', size=3)


In [None]:
random_points %>%
    st_join(world) 


In [None]:
countries_with_point <- random_points %>%
    st_join(world, left=F) %>%
    pull(iso_a2)

countries_with_point

In [None]:
world %>% 
    mutate(has_point = iso_a2 %in% countries_with_point) %>%
    ggplot() +
    geom_sf(aes(fill=has_point)) +
    geom_sf(data=random_points, size=4)


## Spatial joining with non-overlapping data

Sometimes one wants to spatially join data where the spatial information is not identical but overlapping (e.g., due to measurement errors or different precision levels).

Consider the following two datasets from `spData`:

In [None]:
cycle_hire %>% 
    slice_sample(n=10)

In [None]:
cycle_hire_osm %>%
    slice_sample(n=10)

Check compatibility:

In [None]:
cycle_hire %>%
    st_crs()

In [None]:
cycle_hire_osm %>%
    st_crs()

In [None]:
t1 <- cycle_hire %>%
    select(geometry) %>%
    mutate(source="cycle_hire")

t2 <- cycle_hire_osm %>%
    select(geometry) %>%
    mutate(source="cycle_hire_osm")

t1 %>%
    rbind(t2) %>%
    ggplot() +
    geom_sf(aes(col=source))

In [None]:
t1 %>% 
    st_join(t2, left=F)

The points from the two datasets are disjoint.

Suppose we want to add the `capacity` information from `cycle_hire_osm` to `cycle_hire`. We can do so by defining a threshold distance and then apply spatial joining.



In [None]:
cycle_hire_P = st_transform(cycle_hire, 27700)
cycle_hire_osm_P = st_transform(cycle_hire_osm, 27700)


cycle_hire_P %>%
    st_crs()

In [None]:
cycle_hire_P %>%
    st_join(cycle_hire_osm_P, join = st_is_within_distance, dist = 20) %>%
    group_by(id) %>%
    summarize(capacity = mean(capacity, na.rm=T)) %>%
    st_drop_geometry() %>%
    inner_join(cycle_hire_P) %>%
    slice_sample(n=20)