Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repipes the book #794

Merged
merged 9 commits into from
Jun 7, 2022
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 4 additions & 4 deletions 01-introduction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -139,8 +139,8 @@ rowMeans(b)
```{r, eval=FALSE}
library(leaflet)
popup = c("Robin", "Jakub", "Jannes")
leaflet() %>%
addProviderTiles("NASAGIBS.ViirsEarthAtNight2012") %>%
leaflet() |>
addProviderTiles("NASAGIBS.ViirsEarthAtNight2012") |>
addMarkers(lng = c(-3, 23, 11),
lat = c(52, 53, 49),
popup = popup)
Expand All @@ -152,8 +152,8 @@ if(knitr::is_latex_output()){
} else if(knitr::is_html_output()){
# library(leaflet)
# popup = c("Robin", "Jakub", "Jannes")
# interactive = leaflet() %>%
# addProviderTiles("NASAGIBS.ViirsEarthAtNight2012") %>%
# interactive = leaflet() |>
# addProviderTiles("NASAGIBS.ViirsEarthAtNight2012") |>
# addMarkers(lng = c(-3, 23, 11),
# lat = c(52, 53, 49),
# popup = popup)
Expand Down
4 changes: 2 additions & 2 deletions 02-spatial-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@ There are many reasons (linked to the advantages of the simple features model):
- Enhanced plotting performance
- **sf** objects can be treated as data frames in most operations
- **sf** function names are relatively consistent and intuitive (all begin with `st_`)
- **sf** functions can be combined using `%>%` operator and works well with the [tidyverse](http://tidyverse.org/) collection of R packages\index{tidyverse}.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed a typo. That should be:

- **sf** functions can be combined with the `|>` operator and works well with the [tidyverse](http://tidyverse.org/) collection of R packages\index{tidyverse}.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

- **sf** functions can be combined with the `|>` operator and works well with the [tidyverse](http://tidyverse.org/) collection of R packages\index{tidyverse}.

**sf**'s support for **tidyverse** packages is exemplified by the provision of the `read_sf()` function for reading geographic vector datasets.
Unlike the function `st_read()`, which returns attributes stored in a base R `data.frame` (and which provides more verbose messages, not shown in the code chunk below), `read_sf()` returns data as a **tidyverse** `tibble`.
Expand Down Expand Up @@ -298,7 +298,7 @@ world_sf = st_as_sf(world_sp) # from sp to sf
### Basic map making {#basic-map}

Basic maps are created in **sf** with `plot()`.
By default this creates a multi-panel plot (like **sp**'s `spplot()`), one sub-plot for each variable of the object, as illustrated in the left-hand panel in Figure \@ref(fig:sfplot).
By default this creates a multi-panel plot, one sub-plot for each variable of the object, as illustrated in the left-hand panel in Figure \@ref(fig:sfplot).
A legend or 'key' with a continuous color is produced if the object to be plotted has a single variable (see the right-hand panel).
Colors can also be set with `col = `, although this will not create a continuous palette or a legend.
\index{map making!basic}
Expand Down
60 changes: 30 additions & 30 deletions 03-attribute-operations.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ library(osmdata)
london_coords = c(-0.1, 51.5)
london_bb = c(-0.11, 51.49, -0.09, 51.51)
bb = tmaptools::bb(london_bb)
osm_data = opq(bbox = london_bb) %>%
add_osm_feature(key = "highway", value = "bus_stop") %>%
osm_data = opq(bbox = london_bb) |>
add_osm_feature(key = "highway", value = "bus_stop") |>
osmdata_sf()
osm_data_points = osm_data$osm_points
osm_data_points[4, ]
Expand Down Expand Up @@ -78,7 +78,7 @@ methods(class = "sf") # methods for sf objects, first 12 shown

```{r 03-attribute-operations-5, eval=FALSE, echo=FALSE}
# Another way to show sf methods:
attributes(methods(class = "sf"))$info %>%
attributes(methods(class = "sf"))$info |>
dplyr::filter(!visible)
```

Expand Down Expand Up @@ -193,15 +193,15 @@ Key functions for subsetting data frames (including `sf` data frames) with **dpl
i = sample(nrow(world), size = 10)
benchmark_subset = bench::mark(
world[i, ],
world %>% slice(i)
world |> slice(i)
)
benchmark_subset[c("expression", "itr/sec", "mem_alloc")]
# # October 2021 on laptop with CRAN version of dplyr:
# # A tibble: 2 × 3
# expression `itr/sec` mem_alloc
# <bch:expr> <dbl> <bch:byt>
# 1 world[i, ] 1744. 5.55KB
# 2 world %>% slice(i) 671. 4.45KB
# 2 world |> slice(i) 671. 4.45KB
```

`select()` selects columns by name or position.
Expand Down Expand Up @@ -317,15 +317,15 @@ Pipes enable expressive code: the output of a previous function becomes the firs
This is illustrated below, in which only countries from Asia are filtered from the `world` dataset, next the object is subset by columns (`name_long` and `continent`) and the first five rows (result not shown).

```{r 03-attribute-operations-24}
world7 = world %>%
filter(continent == "Asia") %>%
dplyr::select(name_long, continent) %>%
world7 = world |>
filter(continent == "Asia") |>
dplyr::select(name_long, continent) |>
slice(1:5)
```

The above chunk shows how the pipe operator allows commands to be written in a clear order:
the above run from top to bottom (line-by-line) and left to right.
The alternative to `%>%` is nested function calls, which is harder to read:
The alternative to `|>` is nested function calls, which is harder to read:

```{r 03-attribute-operations-25}
world8 = slice(
Expand Down Expand Up @@ -364,20 +364,20 @@ nrow(world_agg2)
```

The resulting `world_agg2` object is a spatial object containing 8 features representing the continents of the world (and the open ocean).
`group_by() %>% summarize()` is the **dplyr** equivalent of `aggregate()`, with the variable name provided in the `group_by()` function specifying the grouping variable and information on what is to be summarized passed to the `summarize()` function, as shown below:
`group_by() |> summarize()` is the **dplyr** equivalent of `aggregate()`, with the variable name provided in the `group_by()` function specifying the grouping variable and information on what is to be summarized passed to the `summarize()` function, as shown below:

```{r 03-attribute-operations-28}
world_agg3 = world %>%
group_by(continent) %>%
world_agg3 = world |>
group_by(continent) |>
summarize(pop = sum(pop, na.rm = TRUE))
```

The approach may seem more complex but it has benefits: flexibility, readability, and control over the new column names.
This flexibility is illustrated in the command below, which calculates not only the population but also the area and number of countries in each continent:

```{r 03-attribute-operations-29}
world_agg4 = world %>%
group_by(continent) %>%
world_agg4 = world |>
group_by(continent) |>
summarize(pop = sum(pop, na.rm = TRUE), `area (sqkm)` = sum(area_km2), n = n())
```

Expand All @@ -388,14 +388,14 @@ Let's combine what we have learned so far about **dplyr** functions, by chaining
The following command calculates population density (with `mutate()`), arranges continents by the number countries they contain (with `dplyr::arrange()`), and keeps only the 3 most populous continents (with `top_n()`), the result of which is presented in Table \@ref(tab:continents)):

```{r 03-attribute-operations-30}
world_agg5 = world %>%
st_drop_geometry() %>% # drop the geometry for speed
dplyr::select(pop, continent, area_km2) %>% # subset the columns of interest
group_by(continent) %>% # group by continent and summarize:
summarize(Pop = sum(pop, na.rm = TRUE), Area = sum(area_km2), N = n()) %>%
mutate(Density = round(Pop / Area)) %>% # calculate population density
top_n(n = 3, wt = Pop) %>% # keep only the top 3
arrange(desc(N)) # arrange in order of n. countries
world_agg5 = world |>
st_drop_geometry() |> # drop the geometry for speed
dplyr::select(pop, continent, area_km2) |> # subset the columns of interest
group_by(continent) |> # group by continent and summarize:
summarize(Pop = sum(pop, na.rm = TRUE), Area = sum(area_km2), N = n()) |>
mutate(Density = round(Pop / Area)) |> # calculate population density
top_n(n = 3, wt = Pop) |> # keep only the top 3
arrange(desc(N)) # arrange in order of n. countries
```

```{r continents, echo=FALSE}
Expand Down Expand Up @@ -551,14 +551,14 @@ Alternatively, we can use one of **dplyr** functions - `mutate()` or `transmute(
`mutate()` adds new columns at the penultimate position in the `sf` object (the last one is reserved for the geometry):

```{r 03-attribute-operations-43, eval=FALSE}
world %>%
world |>
mutate(pop_dens = pop / area_km2)
```

The difference between `mutate()` and `transmute()` is that the latter drops all other existing columns (except for the sticky geometry column):

```{r 03-attribute-operations-44, eval=FALSE}
world %>%
world |>
transmute(pop_dens = pop / area_km2)
```

Expand All @@ -567,15 +567,15 @@ For example, we want to combine the `continent` and `region_un` columns into a n
Additionally, we can define a separator (here: a colon `:`) which defines how the values of the input columns should be joined, and if the original columns should be removed (here: `TRUE`):

```{r 03-attribute-operations-45, eval=FALSE}
world_unite = world %>%
world_unite = world |>
unite("con_reg", continent:region_un, sep = ":", remove = TRUE)
```

The `separate()` function does the opposite of `unite()`: it splits one column into multiple columns using either a regular expression or character positions.
This function also comes from the **tidyr** package.

```{r 03-attribute-operations-46, eval=FALSE}
world_separate = world_unite %>%
world_separate = world_unite |>
separate(con_reg, c("continent", "region_un"), sep = ":")
```

Expand All @@ -588,20 +588,20 @@ The first replaces an old name with a new one.
The following command, for example, renames the lengthy `name_long` column to simply `name`:

```{r 03-attribute-operations-48, eval=FALSE}
world %>%
world |>
rename(name = name_long)
```

`setNames()` changes all column names at once, and requires a character vector with a name matching each column.
This is illustrated below, which outputs the same `world` object, but with very short names:

```{r 03-attribute-operations-49, eval=FALSE, echo=FALSE}
abbreviate(names(world), minlength = 1) %>% dput()
abbreviate(names(world), minlength = 1) |> dput()
```

```{r 03-attribute-operations-50, eval=FALSE}
new_names = c("i", "n", "c", "r", "s", "t", "a", "p", "l", "gP", "geom")
world %>%
world |>
setNames(new_names)
```

Expand All @@ -613,7 +613,7 @@ Hence, an approach such as `select(world, -geom)` will be unsuccessful and you s
]

```{r 03-attribute-operations-51}
world_data = world %>% st_drop_geometry()
world_data = world |> st_drop_geometry()
class(world_data)
```

Expand Down
39 changes: 21 additions & 18 deletions 04-spatial-operations.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ To demonstrate spatial subsetting, we will use the `nz` and `nz_height` datasets
The following code chunk creates an object representing Canterbury, then uses spatial subsetting to return all high points in the region:

```{r 04-spatial-operations-3}
canterbury = nz %>% filter(Name == "Canterbury")
canterbury = nz |> filter(Name == "Canterbury")
canterbury_height = nz_height[canterbury, ]
```

Expand Down Expand Up @@ -125,16 +125,18 @@ Note: the solution involving `sgbp` objects is more generalisable though, as it
The same result can be achieved with the **sf** function `st_filter()` which was [created](https://github.com/r-spatial/sf/issues/1148) to increase compatibility between `sf` objects and **dplyr** data manipulation code:

```{r}
canterbury_height3 = nz_height %>%
canterbury_height3 = nz_height |>
st_filter(y = canterbury, .predicate = st_intersects)
```

<!--toDo:jn-->
<!-- fix pipes -->

```{r 04-spatial-operations-7b-old, eval=FALSE, echo=FALSE}
# Additional tests of subsetting
canterbury_height4 = nz_height %>%
filter(st_intersects(x = ., y = canterbury, sparse = FALSE))
canterbury_height5 = nz_height %>%
canterbury_height4 = nz_height |>
filter(st_intersects(x = _, y = canterbury, sparse = FALSE))
canterbury_height5 = nz_height |>
filter(sel_logical)
identical(canterbury_height3, canterbury_height4)
identical(canterbury_height3, canterbury_height5)
Expand Down Expand Up @@ -437,7 +439,7 @@ b9sf$domain_b = rep(rep(domains, each = 3), each = 2)
b9sf = rbind(b9sf, ii, bi, ei, ib, bb, eb, ie, be, ee)
b9sf$domain_a = ordered(b9sf$domain_a, levels = c("Interior", "Boundary", "Exterior"))
b9sf$domain_b = ordered(b9sf$domain_b, levels = c("Interior", "Boundary", "Exterior"))
b9sf = b9sf %>%
b9sf = b9sf |>
mutate(alpha = case_when(
Object == "x" ~ 0.1,
Object == "y" ~ 0.1,
Expand Down Expand Up @@ -597,8 +599,8 @@ random_df = data.frame(
x = runif(n = 10, min = bb[1], max = bb[3]),
y = runif(n = 10, min = bb[2], max = bb[4])
)
random_points = random_df %>%
st_as_sf(coords = c("x", "y")) %>% # set coordinates
random_points = random_df |>
st_as_sf(coords = c("x", "y")) |> # set coordinates
st_set_crs("EPSG:4326") # set geographic CRS
```

Expand Down Expand Up @@ -663,9 +665,9 @@ if (knitr::is_latex_output()){
# tm_bubbles(col = "red", alpha = 0.5, size = 0.2) +
# tm_scale_bar()
library(leaflet)
leaflet() %>%
# addProviderTiles(providers$OpenStreetMap.BlackAndWhite) %>%
addCircles(data = cycle_hire) %>%
leaflet() |>
# addProviderTiles(providers$OpenStreetMap.BlackAndWhite) |>
addCircles(data = cycle_hire) |>
addCircles(data = cycle_hire_osm, col = "red")
}
```
Expand Down Expand Up @@ -712,8 +714,8 @@ This is because some cycle hire stations in `cycle_hire` have multiple matches i
To aggregate the values for the overlapping points and return the mean, we can use the aggregation methods learned in Chapter \@ref(attr), resulting in an object with the same number of rows as the target:

```{r 04-spatial-operations-26}
z = z %>%
group_by(id) %>%
z = z |>
group_by(id) |>
summarize(capacity = mean(capacity))
nrow(z) == nrow(cycle_hire)
```
Expand All @@ -731,7 +733,7 @@ The result of this join has used a spatial operation to change the attribute dat

As with attribute data aggregation, spatial data aggregation *condenses* data: aggregated outputs have fewer rows than non-aggregated inputs.
Statistical *aggregating functions*, such as mean average or sum, summarise multiple values \index{statistics} of a variable, and return a single value per *grouping variable*.
Section \@ref(vector-attribute-aggregation) demonstrated how `aggregate()` and `group_by() %>% summarize()` condense data based on attribute variables, this section shows how the same functions work with spatial objects.
Section \@ref(vector-attribute-aggregation) demonstrated how `aggregate()` and `group_by() |> summarize()` condense data based on attribute variables, this section shows how the same functions work with spatial objects.
\index{aggregation!spatial}

Returning to the example of New Zealand, imagine you want to find out the average height of high points in each region: it is the geometry of the source (`y` or `nz` in this case) that defines how values in the target object (`x` or `nz_height`) are grouped.
Expand All @@ -753,8 +755,8 @@ tm_shape(nz_agg) +
```

```{r 04-spatial-operations-29}
nz_agg2 = st_join(x = nz, y = nz_height) %>%
group_by(Name) %>%
nz_agg2 = st_join(x = nz, y = nz_height) |>
group_by(Name) |>
summarize(elevation = mean(elevation, na.rm = TRUE))
```

Expand All @@ -766,10 +768,11 @@ plot(nz_agg2)

The resulting `nz_agg` objects have the same geometry as the aggregating object `nz` but with a new column summarising the values of `x` in each region using the function `mean()`.
Other functions could be used instead of `mean()` here, including `median()`, `sd()` and other functions that return a single value per group.
Note: one difference between the `aggregate()` and `group_by() %>% summarize()` approaches is that the former results in `NA` values for unmatching region names while the latter preserves region names.
Note: one difference between the `aggregate()` and `group_by() |> summarize()` approaches is that the former results in `NA` values for unmatching region names while the latter preserves region names.
The 'tidy' approach is thus more flexible in terms of aggregating functions and the column names of the results.
Aggregating operations that also create new geometries are covered in Section \@ref(geometry-unions).


### Joining incongruent layers {#incongruent}

Spatial congruence\index{spatial congruence} is an important concept related to spatial aggregation.
Expand Down Expand Up @@ -809,7 +812,7 @@ This is illustrated in the code chunk below, which finds the distance between th
\index{sf!distance relations}

```{r 04-spatial-operations-31, warning=FALSE}
nz_heighest = nz_height %>% top_n(n = 1, wt = elevation)
nz_heighest = nz_height |> top_n(n = 1, wt = elevation)
canterbury_centroid = st_centroid(canterbury)
st_distance(nz_heighest, canterbury_centroid)
```
Expand Down