forked from r-spatial/stars
-
Notifications
You must be signed in to change notification settings - Fork 0
/
stars1.Rmd
329 lines (273 loc) · 12 KB
/
stars1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
---
title: "1. introduction"
author: "Edzer Pebesma"
output:
html_document:
toc: true
toc_float:
collapsed: false
smooth_scroll: false
toc_depth: 2
vignette: >
%\VignetteIndexEntry{1. introduction}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(collapse = TRUE)
ev = suppressWarnings(require(starsdata, quietly = TRUE))
```
Package `stars` provides infrastructure for _data cubes_, array
data with labeled dimensions, with emphasis on arrays where some
of the dimensions relate to time and/or space.
Spatial data cubes are arrays with one or more spatial dimensions.
Raster data cubes have at least two spatial dimensions that form the
raster tesselation. Vector data cubes have at least one spatial
dimension that may for instance reflect a polygon tesselation, or
a set of point locations. Conversions between the two (rasterization,
polygonization) are provided. Vector data are represented by simple
feature geometries (packages `sf`). Tidyverse methods are provided.
The `stars` package is loaded by
```{r}
library(stars)
```
Spatiotemporal arrays are stored in objects of class `stars`;
methods for class `stars` currently available are
``` {r}
methods(class = "stars")
```
(tidyverse methods are only visible after loading package `tidyverse`).
# Reading a satellite image
We can read a satellite image through GDAL, e.g. from a GeoTIFF file in the package:
```{r}
tif = system.file("tif/L7_ETMs.tif", package = "stars")
x = read_stars(tif)
plot(x, axes = TRUE)
```
We see that the image is geographically referenced (has coordinate values along axes), and that the object returned (`x`) has three dimensions called `x`, `y` and `band`, and has one attribute:
```{r}
x
```
Each dimension has a name; the meaning of the fields of a single dimension are:
|*field* |*meaning* |
|--------|------------------------------------------------------------|
| from | the origin index (1) |
| to | the final index (dim(x)[i]) |
| offset | the start value for this dimension (pixel boundary), if regular |
| delta | the step (pixel, cell) size for this dimension, if regular |
| refsys | the reference system, or proj4string |
| point | logical; whether cells refer to points, or intervals |
| values | the sequence of values for this dimension (e.g., geometries), if irregular |
This means that for an index i (starting at $i=1$) along a certain dimension, the corresponding dimension value (coordinate, time) is $\mbox{offset} + (i-1) \times \mbox{delta}$. This value then refers to the start (edge) of the cell or interval; in order to get the interval middle or cell centre, one needs to add half an offset.
Dimension `band` is a simple sequence from 1 to 6. Since bands refer to colors, one could put their wavelength values in the `values` field.
For this particular dataset (and most other raster datasets), we see that offset for dimension `y` is negative: this means that consecutive array values have decreasing $y$ values: cell indexes increase from top to bottom, in the direction opposite to the $y$ axis.
`read_stars` reads all bands from a raster dataset, or optionally a subset of raster datasets, into a single `stars` array structure. While doing so, raster values (often UINT8 or UINT16) are converted to double (numeric) values, and scaled back to their original values if needed if the file encodes the scaling parameters.
The data structure `stars` is a generalisation of the `tbl_cube` found in `cubelyr`; we can convert to that by
```{r eval=ev}
library(cubelyr)
as.tbl_cube(x)
```
but this will cause a loss of certain properties (cell size,
reference system, vector geometries)
## Switching attributes to dimensions and back
```{r}
(x.spl = split(x, "band"))
merge(x.spl)
```
We see that the newly created dimension lost its name, and the single attribute got a default name. We can set attribute names with `setNames`, and dimension names and values with `st_set_dimensions`:
```{r}
merge(x.spl) %>%
setNames(names(x)) %>%
st_set_dimensions(3, values = paste0("band", 1:6)) %>%
st_set_dimensions(names = c("x", "y", "band"))
```
## Subsetting
Besides the `tidyverse` subsetting and selection operators explained
in [this vignette](stars3.html), we can also use `[` and `[[`.
Since `stars` objects are a list of `array`s with a metadata table
describing dimensions, list extraction (and assignment) works as expected:
```{r}
class(x[[1]])
dim(x[[1]])
x$two = 2 * x[[1]]
x
```
At this level, we can work with `array` objects directly.
The `stars` subset operator `[` works a bit different: its
* first argument selects attributes
* second argument selects the first dimension
* third argument selects the second dimension, etc
Thus,
```{r}
x["two", 1:10, , 2:4]
```
selects the second attribute, the first 10 columns (x-coordinate),
all rows, and bands 2-4.
Alternatively, when `[` is given a single argument of class `sf`,
`sfc` or `bbox`, `[` will work as a crop operator:
```{r}
circle = st_sfc(st_buffer(st_point(c(293749.5, 9115745)), 400), crs = st_crs(x))
plot(x[circle][, , , 1], reset = FALSE)
plot(circle, col = NA, border = 'red', add = TRUE, lwd = 2)
```
## Overviews
We can read rasters at a lower resolution when they contain so-called overviews. For this
GeoTIFF file, they were created with the `gdaladdo` utility, in particular
```
gdaladdo -r average L7_ETMs.tif 2 4 8 16
```
which adds coarse resolution versions by using the _average_
resampling method to compute values based on blocks of pixels.
These can be read by
```{r eval=FALSE}
x1 = read_stars(tif, options = c("OVERVIEW_LEVEL=1"))
x2 = read_stars(tif, options = c("OVERVIEW_LEVEL=2"))
x3 = read_stars(tif, options = c("OVERVIEW_LEVEL=3"))
dim(x1)
dim(x2)
dim(x3)
par(mfrow = c(1, 3), mar = rep(0.2, 4))
image(x1[,,,1])
image(x2[,,,1])
image(x3[,,,1])
```
# Reading a raster time series: NetCDF
Another example is when we read raster time series model outputs in a NetCDF file, e.g. by
```{r eval=ev}
system.file("nc/bcsd_obs_1999.nc", package = "stars") %>%
read_stars() -> w
```
We see that
```{r eval=ev}
w
```
For this dataset we can see that
* variables have units associated (and a wrong unit, `C` is assigned to temperature)
* time is now a dimension, with proper units and time steps
Alternatively, this dataset can be read using `read_ncdf`, as in
```{r}
system.file("nc/bcsd_obs_1999.nc", package = "stars") %>%
read_ncdf()
```
The difference between `read_ncdf` and `read_stars` for NetCDF files
is that the former uses package RNetCDF to directly read the NetCDF
file, where the latter uses the GDAL driver for NetCDF files.
## Reading datasets from multiple files
Model data are often spread across many files. An example of a 0.25
degree grid, global daily sea surface temperature product is found
[here](https://psl.noaa.gov/data/gridded/data.noaa.oisst.v2.highres.html);
the subset from 1981 used below was downloaded from a NOAA ftp
site that is no longer available in this form. (ftp site used to
be eclipse.ncdc.noaa.gov/pub/OI-daily-v2/NetCDF/1981/AVHRR/).
We read the data by giving `read_stars` a vector with character names:
```{r eval=ev}
x = c(
"avhrr-only-v2.19810901.nc",
"avhrr-only-v2.19810902.nc",
"avhrr-only-v2.19810903.nc",
"avhrr-only-v2.19810904.nc",
"avhrr-only-v2.19810905.nc",
"avhrr-only-v2.19810906.nc",
"avhrr-only-v2.19810907.nc",
"avhrr-only-v2.19810908.nc",
"avhrr-only-v2.19810909.nc"
)
# see the second vignette:
# install.packages("starsdata", repos = "http://pebesma.staff.ifgi.de", type = "source")
file_list = system.file(paste0("netcdf/", x), package = "starsdata")
(y = read_stars(file_list, quiet = TRUE))
```
Next, we select sea surface temperature (`sst`), and drop the singular `zlev` (depth) dimension using `adrop`:
```{r eval=ev}
library(dplyr)
library(abind)
z <- y %>% select(sst) %>% adrop()
```
We can now graph the sea surface temperature (SST) using `ggplot`, which needs data in a long table form, and without units:
```{r eval=ev}
# convert POSIXct time to character, to please ggplot's facet_wrap()
z1 = st_set_dimensions(z, 3, values = as.character(st_get_dimension_values(z, 3)))
library(ggplot2)
library(viridis)
library(ggthemes)
ggplot() +
geom_stars(data = z1[1], alpha = 0.8, downsample = c(10, 10, 1)) +
facet_wrap("time") +
scale_fill_viridis() +
coord_equal() +
theme_map() +
theme(legend.position = "bottom") +
theme(legend.key.width = unit(2, "cm"))
```
# Writing stars objects to disk
We can write a stars object to disk by using `write_stars`; this used the GDAL write engine. Writing NetCDF files without going through the GDAL interface is currently not supported.
`write_stars` currently writes only a single attribute:
```{r eval=ev}
write_stars(adrop(y[1]), "sst.tif")
```
See the explanation of `merge` above to see how multiple attributes
can be merged (folded) into a dimension.
# Cropping a raster's extent
Using a curvilinear grid, taken from the example of `read_ncdf`:
```{r}
prec_file = system.file("nc/test_stageiv_xyt.nc", package = "stars")
prec = read_ncdf(prec_file, curvilinear = c("lon", "lat"))
##plot(prec) ## gives error about unique breaks
## remove NAs, zeros, and give a large number
## of breaks (used for validating in detail)
qu_0_omit = function(x, ..., n = 22) {
if (inherits(x, "units"))
x = units::drop_units(na.omit(x))
c(0, quantile(x[x > 0], seq(0, 1, length.out = n)))
}
library(dplyr) # loads slice generic
prec_slice = slice(prec, index = 17, along = "time")
plot(prec_slice, border = NA, breaks = qu_0_omit(prec_slice[[1]]), reset = FALSE)
nc = sf::read_sf(system.file("gpkg/nc.gpkg", package = "sf"), "nc.gpkg")
plot(st_geometry(nc), add = TRUE, reset = FALSE, col = NA, border = 'red')
```
We can now crop the grid to those cells falling in
```{r}
nc = st_transform(nc, st_crs(prec_slice)) # datum transformation
plot(prec_slice[nc], border = NA, breaks = qu_0_omit(prec_slice[[1]]), reset = FALSE)
plot(st_geometry(nc), add = TRUE, reset = FALSE, col = NA, border = 'red')
```
The selection `prec_slice[nc]` essentially calls `st_crop(prec_slice, nc)` to get a cropped selection. What happened here is that all
cells not intersecting with North Carolina (sea) are set to `NA`
values. For regular grids, the extent of the resulting `stars`
object is also be reduced (cropped) by default; this can be
controlled with the `crop` parameter to `st_crop` and `[.stars`.
# Vector data cube example
Like `tbl_cube`, `stars` arrays have no limits to the number of dimensions they handle. An example is the origin-destination (OD) matrix, by time and travel mode.
## OD: space x space x travel mode x time x time
We create a 5-dimensional matrix of traffic between regions, by day, by time of day, and by travel mode. Having day and time of day each as dimension is an advantage when we want to compute patterns over the day, for a certain period.
```{r}
nc = st_read(system.file("gpkg/nc.gpkg", package="sf"))
to = from = st_geometry(nc) # 100 polygons: O and D regions
mode = c("car", "bike", "foot") # travel mode
day = 1:100 # arbitrary
library(units)
units(day) = as_units("days since 2015-01-01")
hour = set_units(0:23, h) # hour of day
dims = st_dimensions(origin = from, destination = to, mode = mode, day = day, hour = hour)
(n = dim(dims))
traffic = array(rpois(prod(n), 10), dim = n) # simulated traffic counts
(st = st_as_stars(list(traffic = traffic), dimensions = dims))
```
This array contains the simple feature geometries of origin and destination so that we can directly plot every slice without additional table joins. If we want to represent such an array as a `tbl_cube`, the simple feature geometry dimensions need to be replaced by indexes:
```{r eval=FALSE}
st %>% as.tbl_cube()
```
The following demonstrates how `dplyr` can filter bike travel, and compute mean bike traffic by hour of day:
```{r eval=FALSE}
b <- st %>%
as.tbl_cube() %>%
filter(mode == "bike") %>%
group_by(hour) %>%
summarise(traffic = mean(traffic)) %>%
as.data.frame()
require(ggforce) # for plotting a units variable
ggplot() +
geom_line(data = b, aes(x = hour, y = traffic))
```