-
Notifications
You must be signed in to change notification settings - Fork 19
/
example.Rmd
1236 lines (976 loc) · 77.6 KB
/
example.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "ingestr example"
author: "Beni Stocker"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{ingestr example}
%\VignetteEngine{knitr::rmarkdown}
%\usepackage[utf8]{inputenc}
---
```{r setup, include=FALSE, eval = TRUE}
library(ingestr)
library(dplyr)
library(tidyr)
library(readr)
library(stringr)
```
# Overview
The package `ingestr` provides functions to extract (ingest) environmental point data (given longitude, latitude, and required dates) from large global files or remote data servers and create time series at user-specified temporal resolution (varies for different data sets).
- Temporal downscaling from montly to daily resolution
- Quality filtering, temporal interpolation and smoothing of remote sensing data
- Handling of different APIs and file formats, returning ingested data in *tidy* format.
This is to make your life simpler when downloading and reading site-scale data, using a common interface with a single function for single-site and multi-site ingest, respectively, and a common and tidy format of ingested data across a variety of data *sources* and formats of original files. *Sources*, refers to both data sets hosted remotely and accessed through an API and local data sets. ingestr is particularly suited for preparing model forcing and offers a set of functionalities to transform original data into common standardized formats and units. This includes interpolation methods for converting monthly climate data (CRU TS currently) to daily time steps.
The key functions are `ingest_bysite()` and `ingest()` for a single-site data ingest and a multi-site data ingest, respectively. For the multi-site data ingest, site meta information is provided through the argument `siteinfo` which takes a data frame with columns `lon` for longitude, `lat` for latitude, and (for time series downloads) `year_start` and `year_end`, specifying required dates (including all days of respective years). Sites are organised along rows. An example site meta info data frame is provided as part of this package for sites included in the FLUXNET2015 Tier 1 data set (additional columns are not required by `ingest_bysite()` and `ingest()`):
```{r eval = FALSE}
siteinfo_fluxnet2015 %>%
slice(1:5) %>%
knitr::kable()
```
The following *sources* can be handled currently:
+-----------------------------------------------------------------------------------------------------------+------------------------+----------+---------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Data source | Data type | Coverage | Source ID | Reading from | Remark |
+===========================================================================================================+========================+==========+===============+===============+============================================================================================================================================================================+
| [FLUXNET](https://fluxnet.fluxdata.org/data/fluxnet2015-dataset/) | time series by site | site | `fluxnet` | local files | |
+-----------------------------------------------------------------------------------------------------------+------------------------+----------+---------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [WATCH-WFDEI](http://www.eu-watch.org/data_availability) | time series raster map | global | `watch_wfdei` | local files | |
+-----------------------------------------------------------------------------------------------------------+------------------------+----------+---------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [CRU](https://crudata.uea.ac.uk/cru/data/hrg/) | time series raster map | global | `cru` | local files | |
+-----------------------------------------------------------------------------------------------------------+------------------------+----------+---------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| MODIS LP DAAC | time series raster map | global | `modis` | remote server | using [MODISTools](https://docs.ropensci.org/MODISTools/) |
+-----------------------------------------------------------------------------------------------------------+------------------------+----------+---------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Google Earth Engine | time series raster map | global | `gee` | remote server | using Koen Hufken's [gee_suset](https://khufkens.github.io/gee_subset/) library |
+-----------------------------------------------------------------------------------------------------------+------------------------+----------+---------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [ETOPO1](https://www.ngdc.noaa.gov/mgg/global/) | raster map | global | `etopo1` | local files | |
+-----------------------------------------------------------------------------------------------------------+------------------------+----------+---------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [Mauna Loa CO2](https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html) | time series | site | `co2_mlo` | remote server | using the [climate](https://github.com/bczernecki/climate) R package |
+-----------------------------------------------------------------------------------------------------------+------------------------+----------+---------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [HWSD](https://webarchive.iiasa.ac.at/Research/LUC/External-World-soil-database/HTML/) | | | | | | raster map, database | global | `hwsd` | local files | using an adaption of David Le Bauer's [rhwsd](https://github.com/dlebauer/rhwsd) R package |
+-----------------------------------------------------------------------------------------------------------+------------------------+----------+---------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [WWF Ecoregions](https://databasin.org/datasets/68635d7c77f1475f9b6c1d1dbe0a4c4c) | shapefile map | global | `wwf` | local files | Olsen et al. (2001) |
+-----------------------------------------------------------------------------------------------------------+------------------------+----------+---------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [N deposition](https://link.springer.com/article/10.1007%2Fs10584-011-0155-0) | time series raster map | global | `ndep` | local files | Lamarque et al. (2011) |
+-----------------------------------------------------------------------------------------------------------+------------------------+----------+---------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [SoilGrids](https://www.isric.org/explore/soilgrids) | raster map | global | `soilgrids` | remote server | Hengl et al. (2017) |
+-----------------------------------------------------------------------------------------------------------+------------------------+----------+---------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [ISRIC WISE30sec](https://data.isric.org/geonetwork/srv/api/records/dc7b283a-8f19-45e1-aaed-e9bd515119bc) | raster map | global | `wise` | local files | [Batjes (2016)](http://dx.doi.org/10.1016/j.geoderma.2016.01.034) |
+-----------------------------------------------------------------------------------------------------------+------------------------+----------+---------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [GSDE Soil](http://globalchange.bnu.edu.cn/research/soilwd.jsp) | raster map | global | `gsde` | local files | [Shangguan et al. 2014](https://doi.org/10.1002/2013MS000293) |
+-----------------------------------------------------------------------------------------------------------+------------------------+----------+---------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [WorldClim](https://www.worldclim.org/data/worldclim21.html) | raster map | global | `gsde` | local files | [Fick & Hijmans, 2017](https://doi.org/10.1002/joc.5086) |
+-----------------------------------------------------------------------------------------------------------+------------------------+----------+---------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Examples to read data for a single site for each data type are given in Section 'Examples for a single site'. Handling ingestion for multiple sites is described in Section 'Example for a set of sites'. **Note** that this package does not provide the original data. Please follow links to data sources above where data is read from local files, and always cite original references.
## Variable names and units
All ingested data follows standardized variable naming and (optionally) units.
+------------------------------------+---------------+---------------------------------+
| Variable | Variable name | Units |
+====================================+===============+=================================+
| Gross primary production | `gpp` | g CO$^{-2}$ m$^{-2}$ |
+------------------------------------+---------------+---------------------------------+
| Air temperature | `temp` | $^\circ$C |
+------------------------------------+---------------+---------------------------------+
| Daily minimum air temperature | `tmin` | $^\circ$C |
+------------------------------------+---------------+---------------------------------+
| Daily maximum air temperature | `tmax` | $^\circ$C |
+------------------------------------+---------------+---------------------------------+
| Precipitation | `prec` | mm s$^{-1}$ |
+------------------------------------+---------------+---------------------------------+
| Vapour pressure deficit | `vpd` | Pa |
+------------------------------------+---------------+---------------------------------+
| Atmospheric pressure | `patm` | Pa |
+------------------------------------+---------------+---------------------------------+
| Net radiation | `netrad` | J m$^{-2}$ s$^{-1}=$ W m$^{-2}$ |
+------------------------------------+---------------+---------------------------------+
| Photosynthetic photon flux density | `ppfd` | mol m$^{-2}$ s$^{-1}$ |
+------------------------------------+---------------+---------------------------------+
| Elevation (altitude) | `elv` | m a.s.l. |
+------------------------------------+---------------+---------------------------------+
Use these variable names for specifying which variable names they correspond to in the original data source (see argument `getvars` to functions `ingest()` and `ingest_bysite()`). `gpp` is cumulative, corresponding to the time scale of the data. For example, if daily data is read, `gpp` is the total gross primary production per day (g CO$^{-2}$ m$^{-2}$ d$^{-1}$).
# Examples for a single site
The function `ingest_bysite()` can be used to ingest data for a single site. The argument `source` specifies which data type (*source*) is to be read from and triggers the use of specific wrapper functions that are designed to read from original files with formats that differ between sources. Source-specific settings for data processing can be provided by argument `settings` (described for each data source below). More info about other, source-independent arguments are available through the man page (see `?ingest_bysite`).
## FLUXNET
### Meteo data
Reading from FLUXNET files offers multiple settings to be used specified by the user. Here, we're specifying that no soil water content data is read (`getswc = FALSE` in `settings_fluxnet`, passed to `ingest_bysite()` through argument `settings`).
```{r message=FALSE, eval = FALSE}
settings_fluxnet <- list(getswc = FALSE)
df_fluxnet <- ingest_bysite(
sitename = "FR-Pue",
source = "fluxnet",
getvars = list(temp = "TA_F",
prec = "P_F",
vpd = "VPD_F",
ppfd = "SW_IN_F",
netrad = "NETRAD",
patm = "PA_F"),
dir = paste0(path.package("ingestr"), "/extdata/"), # example file delivered through package and located here
settings = settings_fluxnet,
timescale = "d",
year_start = 2007,
year_end = 2007,
verbose = FALSE
)
df_fluxnet
```
`getvars` defines the variable names in the original files corresponding to the respective variables with ingestr-standard naming (see table above). The example above triggers the ingestion of the six variables `"TA_F", "P_F", "VPD_F", "SW_IN_F", "NETRAD", "PA_F"` for `"temp", "prec", "vpd", "ppfd", "netrad", "patm"`, respectively.
### Flux data
The same function can also be used to read in other FLUXNET variables (e.g., CO2 flux data) and conduct data filtering steps. Here, we're reading daily GPP and uncertainty (standard error), based on the nighttime flux decomposition method (`"GPP_NT_VUT_REF"` and `"GPP_NT_VUT_SE"` in argument `getvars`). The `settings` argument can be used again to specify settings that are specific to the `"fluxnet"` data source. Here, we keep only data where at least 80% is based on non-gapfilled half-hourly data (`threshold_GPP = 0.8`), and where the daytime and nighttime-based estimates are consistent, that is, where their difference is below the the 97.5% and above the 2.5% quantile (`filter_ntdt = TRUE`). Negative GPP values are not removed (`remove_neg = FALSE`). We read data for just one year here (2007).
```{r warning=FALSE, message=FALSE, eval = FALSE}
settings_fluxnet <- list(
getswc = FALSE,
filter_ntdt = TRUE,
threshold_GPP= 0.8,
remove_neg = FALSE
)
ddf_fluxnet <- ingest_bysite(
sitename = "FR-Pue",
source = "fluxnet",
getvars = list( gpp = "GPP_NT_VUT_REF",
gpp_unc = "GPP_NT_VUT_SE"),
dir = paste0(path.package("ingestr"), "/extdata/"),
settings = settings_fluxnet,
timescale = "d",
year_start = 2007,
year_end = 2007
)
```
### Settings
The argument `settings` in functions `ingest_bysite()` and `ingest()` is used to pass settings that are specific to the data source (argument `source`) with which the functions are used. Default settings are specified for each data source. For `source = "fluxnet"`, defaults are returned by a function call of `get_settings_fluxnet()` and are described in the function's man page (see `?get_settings_fluxnet`). Defaults are used for settings elements that are not specified by the user.
## WATCH-WFDEI
Let's extract data for the location corresponding to FLUXNET site 'CH-Lae' (lon = 8.365, lat = 47.4781). This extracts from original WATCH-WFDEI files, provided as NetCDF (global, 0.5 degree resolution), provided as monthly files containing all days in each month. The data directory specified here (`dir = "~/data/watch_wfdei/"`) contains sub-directories with names containing the variable names. The argument `getvars` works a differently compared to `"fluxnet"`. Here, `getvars` is a vector of ingestr-standard variable names to be read. ingestr automatically reads from the respective files with WATCH-WFDEI variable names. Available variables are: `"temp", "ppfd", "vpd", "patm", "prec"`. The latter is the sum of snow and rain. Below, we read data for just one year here (2007).
A bias correction may be applied by specifying the settings as in the example below. By specifying `correct_bias = "worldclim"` (the only option currently available), this uses a high-resolution (30'') monthly climatology based on years 1970-2000 and corrects the WATCH-WFDEI data by month, based on the difference (ratio for variables other than temperature) of its monthly means, averaged across 1979-2000.
WATCH-WFDEI data is available for years from 1979. If `year_start` is before that, the mean seasonal cycle, averaged across 1979-1988 is returned for all years before 1979.
```{r message=FALSE, echo = T, results = 'hide', eval = FALSE}
df_watch <- ingest_bysite(
sitename = "FR-Pue",
source = "watch_wfdei",
getvars = c("temp"),
dir = "~/data/watch_wfdei/",
timescale = "d",
year_start = 2007,
year_end = 2007,
lon = 3.5958,
lat = 43.7414,
verbose = TRUE,
settings = list(correct_bias = "worldclim", dir_bias = "~/data/worldclim")
)
df_watch
```
## CRU TS
As above, let's extract CRU data for the location corresponding to FLUXNET site 'FR-Pue' (lon = 8.365, lat = 47.4781). The argument `getvars` works the same way as for WATCH-WFDEI: is a vector of ingestr-standard variable names to be read. ingestr automatically reads from the respective files with CRU variable names. Available variables are: `"tmin", "tmax", "temp", "vpd", "ccov", "wetd"`.
Note that we're using `tmx` (the daily maximum temperature). This extracts monthly data from the CRU TS data. Interpolation to daily values is done using a weather generator for daily precipitation (given monthly total precipitation and number of wet days in each month), and a polynomial that conserves monthly means for all other variables.
```{r message=FALSE, eval = FALSE}
df_cru <- ingest_bysite(
sitename = "FR-Pue",
source = "cru",
getvars = c("tmin", "tmax"),
dir = "~/data/cru/ts_4.05/",
timescale = "d",
year_start = 2007,
year_end = 2007,
lon = 3.5958,
lat = 43.7414,
verbose = FALSE
)
df_cru
```
We can compare the temperature recorded at the site and the temperature data extracted from WATCH-WFDEI and CRU.
```{r eval = FALSE}
df <- df_fluxnet %>%
rename(temp_fluxnet = temp) %>%
left_join(rename(df_watch, temp_watch = temp), by = c("sitename", "date")) %>%
left_join(rename(df_cru, temp_min_cru = tmin, temp_max_cru = tmax), by = c("sitename", "date")) %>%
pivot_longer(cols = c(temp_fluxnet, temp_watch, temp_min_cru, temp_max_cru), names_to = "source", values_to = "temp", names_prefix = "temp_")
library(ggplot2)
df %>%
ggplot(aes(x = date, y = temp, color = source)) +
geom_line()
```
Looks sweet.
## WFDE5
Let's have a look at the hourly climate data from the [WFDE5 dataset](https://doi.org/10.5194/essd-12-2097-2020). Again, let's extract meth data for the location corresponding to FLUXNET site 'FR-Pue' (lon = 8.365, lat = 47.4781). All input arguments work the same way as described above for WATCH-WFDEI and CRU TS. Note that we are setting `timescale = "h"` here to obtain an hourly dataframe. Available variables are: `"temp", "ppfd", "vpd", "patm", "prec", "wind", "swin", "lwin"`.
```{r message=FALSE, echo = T, results = 'hide', eval = FALSE}
df_wfde5 <- ingest_bysite(
sitename = "FR-Pue",
source = "wfde5",
getvars = c("temp"),
dir = "~/data/wfde5/",
timescale = "h",
year_start = 2007,
year_end = 2007,
lon = 3.5958,
lat = 43.7414,
verbose = TRUE,
settings = list(correct_bias = "worldclim", dir_bias = "~/data/worldclim")
)
df_wfde5
```
## MODIS LP DAAC
This uses the [MODISTools](https://docs.ropensci.org/MODISTools/) R package making its interface consistent with ingestr. Settings can be specified and passed on using the `settings` argument. To facilitate the selection of data products and bands to be downloaded, you may use the function `get_settings_modis)` which defines defaults for different data bundles (`c("modis_fpar", "modis_ndvi", "modis_evi")` are available).
- `"modis_fpar"`: MODIS collection 6, MCD15A3H, band `Fpar_500m`
- `"modis_lai"`: MODIS collection 6, MCD15A3H, band `Lai_500m`
- `"modis_evi"`: MODIS collection 6, MOD13Q1, band `250m_16_days_EVI`
- `"modis_ndvi"`: MODIS collection 6, MOD13Q1, band `250m_16_days_NDVI`
- `"modis_refl"`: MODIS/Terra+Aqua Nadir BRDF-Adjusted Reflectance (NBAR) Daily L3 Global 500 m SIN Grid, all bands
- `"modis_lst"`: MODIS/Terra Land Surface Temperature/Emissivity 8-Day L3 Global 1 km SIN Grid (MOD11A2 v006)
The filtering criteria are hard-coded specifically for each product, using its respective quality control information (see function `gapfill_interpol()` in `R/ingest_modis_bysite.R`). For more information on the settings do `?get_settings_modis`.
The following example is for downloading MODIS FPAR MCD15A3H data. Note the specification of the argument `network = "FLUXNET"`. This triggers the download of prepared subsets aligning with site locations for different networks (see [here](https://modis.ornl.gov/sites/)) which is much faster than the download of data for arbitrary locations. This also makes the specification of longitude and latitude values in the call to `ingest_bysite()` obsolete and downloads a scene of 17 x 17 pixels. Using `n_focal` in `get_settings_modis()` subsets the scene to central pixels where the value provided for `n_focal` is the distance in number of pixels away from the center pixel to be taken for averaging. This is done the same way for the network and non-network ingest options.
```{r eval = FALSE}
settings_modis <- get_settings_modis(
bundle = "modis_fpar",
data_path = "~/data/modis_subsets/",
method_interpol = "loess",
keep = TRUE,
overwrite_raw = FALSE,
overwrite_interpol= TRUE,
n_focal = 0,
network = "FLUXNET"
)
```
This can now be used to download the data to the directory specified by argument `data_path` of function `get_settings_gee()`. The data downloaded through MODISTools is then stored in `<data_path>/raw/`. When calling the functions `ingest()` or `ingest_bysite()` with the setting `overwrite_raw = FALSE`, the raw data file is read and not re-downloaded if available locally. Raw data contains information only for dates where MODIS data is provided. `ingest()` and `ingest_bysite()` interpolate to daily values following the setting `method_interpol`.
Note also that downloaded raw data are cutouts including pixels of 1 km within the focal point indicated by the site longitude and latitude (using arguments `km_lr = 1.0` and `km_ab = 1.0` in the `MODISTools::mt_subset()` call). This is hard-coded in ingestr. To select a smaller radius of pixels around the focal point included for taking the mean, set the setting `n_focal` to an integer (0:N), with 0 selecting only the single centre pixel in which the site is located, N=1 for including one pixel around the centre (nine in total), N=2 for 25 in total etc.
```{r warning=FALSE, eval = FALSE}
df_modis_fpar <- ingest_bysite(
sitename = "CH-Lae",
source = "modis",
year_start= 2018,
year_end = 2019,
# lon = 8.36439, # not needed when network = "FLUXNET"
# lat = 47.47833, # not needed when network = "FLUXNET"
settings = settings_modis,
verbose = FALSE
)
```
Plot this data.
```{r warning=FALSE, eval = FALSE}
plot_fapar_ingestr_bysite(
df_modis_fpar,
settings_modis)
```
## Google Earth Engine
The library `gee_subset` by Koen Hufkens can be downloaded from this [link](https://khufkens.github.io/gee_subset/) and used to extract data directly from Google Earth Engine. Note that this requires the following programmes to be available:
- git: You can use [Homebrew](https://brew.sh/) to installing git by entering in your terminal: `brew install git`.
- [python](https://www.python.org/)
- The Python Pandas library
Then, carry out the follwing steps:
- In your terminal, change to where you want to have the repository. In this example, we're cloning it into our home directory:
```{r warning=FALSE, eval = FALSE}
cd ~
git clone https://github.com/khufkens/google_earth_engine_subsets.git
```
To get access to using the Google Earth Engine API (required to use the `gee_subset` library), carry out the following steps in your terminal. This follows steps described [here](https://github.com/google/earthengine-api/issues/27).
1. Install google API Python client
```{r warning=FALSE, eval = FALSE}
sudo pip install --upgrade google-api-python-client
```
I had an error and first had to do this here following [this link](https://github.com/pypa/pip/issues/3165):
```{r warning=FALSE, eval = FALSE}
sudo pip install --ignore-installed six
```
2. Install pyCrypto
```{r warning=FALSE, eval = FALSE}
sudo pip install pyCrypto --upgrade
```
3. Install Python GEE API
```{r warning=FALSE, eval = FALSE}
sudo pip install earthengine-api
```
4. Run authentification for GEE
```{r warning=FALSE, eval = FALSE}
earthengine authenticate
```
5. Finally, try if it works. This shouldn't return an error:
```{r warning=FALSE, eval = FALSE}
python -c "import ee; ee.Initialize()"
```
### MODIS FPAR
To facilitate the selection of data products and bands to be downloaded, you may use the function `get_settings_gee()` which defines defaults for different data bundles (`c("modis_fpar", "modis_evi", "modis_lai", "modis_gpp")` are available).
- `"modis_fpar"`: MODIS/006/MCD15A3H, band Fpar
- `"modis_evi"`: MODIS/006/MOD13Q1, band EVI
- `"modis_lai"`: MOD15A2, band `Lai_1km`
- `"modis_gpp"`: MODIS/006/MOD17A2H, band Gpp
The following example is for downloading MODIS FPAR data.
```{r warning=FALSE, eval = FALSE}
settings_gee <- get_settings_gee(
bundle = "modis_fpar",
python_path = system("which python", intern = TRUE),
gee_path = "~/google_earth_engine_subsets/gee_subset/",
data_path = "~/data/gee_subsets/",
method_interpol = "linear",
keep = TRUE,
overwrite_raw = FALSE,
overwrite_interpol= TRUE
)
```
This can now be used to download the data to the directory specified by argument `data_path` of function `get_settings_gee()`.
```{r warning=FALSE, eval = FALSE}
df_gee_modis_fpar <- ingest_bysite(
sitename = "CH-Lae",
source = "gee",
year_start= 2009,
year_end = 2010,
lon = 8.36439,
lat = 47.47833,
settings = settings_gee,
verbose = FALSE
)
```
## CO2
Ingesting CO2 data is particularly simple. We can safely assume it's well mixed in the atmosphere (independent of site location), and we can use a annual mean value for all days in respective years, and use the same value for all sites. Using the R package [climate](https://github.com/bczernecki/climate), we can load CO2 data from Mauna Loa directly into R. This is downloading data from <ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt>. Here, `ingest()` is a wrapper for the function `climate::meteo_noaa_co2()`.
```{r warning=FALSE, eval = FALSE}
df_co2 <- ingest_bysite(
sitename = "CH-Lae",
source = "co2_mlo",
year_start= 2007,
year_end = 2014,
verbose = FALSE
)
```
Argument `dir` can be provided here, too. In that case, CO2 data is written (after download if it's not yet available) and read to/from a file located at `<dir>/df_co2_mlo.csv`.
More info about the **climate** package and the data can be obtained [here](https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html) and by:
```{r warning=FALSE, eval = FALSE}
?climate::meteo_noaa_co2
```
**THE FOLLOWING IS UNDER CONSTRUCTION. MAKE READABLE FOR FILE AVAILABLE HERE: <http://www.pik-potsdam.de/~mmalte/rcps/>**
Mauna Loa CO2 is not available for years before 1958. Alternative CO2 data is from CMIP standard forcing with merged time series from atmospheric measurements and ice core reconstructions. This can be selected with `source = "co2_cmip`.
```{r warning=FALSE, eval = FALSE}
df_co2 <- ingest_bysite(
sitename = "CH-Lae",
source = "co2_cmip",
year_start= 2007,
year_end = 2014,
verbose = FALSE,
dir = "~/data/co2"
)
```
## HWSD
Four steps are required before you can use `ingest_bysite()` to get [HWSD](https://webarchive.iiasa.ac.at/Research/LUC/External-World-soil-database/HTML/) data:
1. The the modified version of David LeBauer's [rhwsd](https://github.com/dlebauer/rhwsd) R package. The modified version can be installed by:
```{r warning=FALSE, eval = FALSE}
if(!require(devtools)){install.packages(devtools)}
devtools::install_github("stineb/rhwsd")
```
2. Install additionally required packages: DBI and RSQLite.
```{r warning=FALSE, eval = FALSE}
list.of.packages <- c("DBI", "RSQLite")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
```
3. Download the HWSD data file [HWSD_RASTER.zip](http://webarchive.iiasa.ac.at/Research/LUC/External-World-soil-database/HWSD_Data/HWSD_RASTER.zip) and extract.
4. Move the extracted files to a local directory and adjust the file path in the `settings` argument accordingly (in this example: `"~/data/hwsd/HWSD_RASTER/hwsd.bil"`).
Then, use similarly to above, with providing the path to the downloaded file with the `settings` argument:
```{r warning=FALSE, eval = FALSE}
df_hwsd <- ingest_bysite(
sitename = "CH-Lae",
source = "hwsd",
lon = 8.36439,
lat = 47.47833,
settings = list(fil = "~/data/hwsd/HWSD_RASTER/hwsd.bil"),
verbose = FALSE
)
```
## Nitrogen deposition
This reads nitrogen deposition from global annual maps by Lamarque et al. (2011). This provides annual data separately for NHx and NOy in gN m$^{-2}$ yr$^{-1}$ from a global map provided at half-degree resolution and covering years 1860-2009.
```{r warning=FALSE, eval = FALSE}
df_ndep <- ingest_bysite(
sitename = "CH-Lae",
source = "ndep",
lon = 8.36439,
lat = 47.47833,
year_start= 2000,
year_end = 2009,
timescale = "y",
dir = "~/data/ndep_lamarque/",
verbose = FALSE
)
```
## SoilGrids
Point extractions from [SoilGrids](https://soilgrids.org/) layers are implemented following [this](https://git.wur.nl/isric/soilgrids/soilgrids.notebooks/-/blob/master/markdown/xy_info_from_R.md) and are provided through [ISRIC](https://www.isric.org/).
Available layers and variable naming conventions are described [here](https://www.isric.org/explore/soilgrids/faq-soilgrids#What_do_the_filename_codes_mean). Which variable is to be extracted and for which soil depth layer can be specified in the settings, a list returned by the function call `get_settings_soilgrids()`.
Available variables are described in the table below. Conversion facto are applied by ingestr. Hence, the returned data is in units as described in the table below as "Conventional units".
+----------+------------------------------------------------------------------------------------+----------------+-------------------+--------------------+
| Name | Description | Mapped units | Conversion factor | Conventional units |
+==========+====================================================================================+================+===================+====================+
| bdod | Bulk density of the fine earth fraction | cg/cm³ | 100 | kg/dm³ |
+----------+------------------------------------------------------------------------------------+----------------+-------------------+--------------------+
| cec | Cation Exchange Capacity of the soil | mmol(c)/kg | 10 | cmol(c)/kg |
+----------+------------------------------------------------------------------------------------+----------------+-------------------+--------------------+
| cfvo | Volumetric fraction of coarse fragments (\> 2 mm) | cm3/dm3 (vol‰) | 10 | cm3/100cm3 (vol%) |
+----------+------------------------------------------------------------------------------------+----------------+-------------------+--------------------+
| clay | Proportion of clay particles (\< 0.002 mm) in the fine earth fraction | g/kg | 10 | g/100g (%) |
+----------+------------------------------------------------------------------------------------+----------------+-------------------+--------------------+
| nitrogen | Total nitrogen (N) | cg/kg | 100 | g/kg |
+----------+------------------------------------------------------------------------------------+----------------+-------------------+--------------------+
| phh2o | Soil pH | pHx10 | 10 | pH |
+----------+------------------------------------------------------------------------------------+----------------+-------------------+--------------------+
| sand | Proportion of sand particles (\> 0.05 mm) in the fine earth fraction | g/kg | 10 | g/100g (%) |
+----------+------------------------------------------------------------------------------------+----------------+-------------------+--------------------+
| silt | Proportion of silt particles (≥ 0.002 mm and ≤ 0.05 mm) in the fine earth fraction | g/kg | 10 | g/100g (%) |
+----------+------------------------------------------------------------------------------------+----------------+-------------------+--------------------+
| soc | Soil organic carbon content in the fine earth fraction | dg/kg | 10 | g/kg |
+----------+------------------------------------------------------------------------------------+----------------+-------------------+--------------------+
| ocd | Organic carbon density | hg/dm³ | 10 | kg/dm³ |
+----------+------------------------------------------------------------------------------------+----------------+-------------------+--------------------+
| ocs | Organic carbon stocks | t/ha | 10 | kg/m² |
+----------+------------------------------------------------------------------------------------+----------------+-------------------+--------------------+
Data is available for the following six layers.
| Layer | 1 | 2 | 3 | 4 | 5 | 6 |
|-------------------|-----|-----|-----|-----|-----|-----|
| Top depth (cm) | 0 | 5 | 15 | 30 | 60 | 100 |
| Bottom depth (cm) | 5 | 15 | 30 | 60 | 100 | 200 |
The specify which data is to be ingested define the settings using the function `get_settings_soilgrids()`, and provide standard variable names as a vector of character strings for argument `varnam`, and layers as a vector of integers for argument `layer`. For example:
```{r warning=FALSE, eval = FALSE}
settings_soilgrids <- get_settings_soilgrids(varnam = c("nitrogen", "cec"), layer = 1:3)
```
The ingested data is then averaged across specified layers, weighted with respective layer depths.
Now, the data can be ingested.
```{r warning=FALSE, eval = FALSE}
df_soilgrids <- ingest_bysite(
sitename = "CH-Lae",
source = "soilgrids",
lon = 8.36439,
lat = 47.47833,
settings = settings_soilgrids
)
```
This returns a data frame with a nested column `data` which contains actually just a 1 x 1 tibble. This is to be consistent with other ingest options. You may prefer to have a normal flat data frame. Just do:
```{r warning=FALSE, eval = FALSE}
df_soilgrids %>%
unnest(data)
```
## ISRIC WISE30sec
This reads from local files. Download them from ISRIC [here](https://data.isric.org/geonetwork/srv/eng/catalog.search#/metadata/dc7b283a-8f19-45e1-aaed-e9bd515119bc).
Point extraction from the global gridded WISE30sec data product ([Batjes et al., 2016](http://dx.doi.org/10.1016/j.geoderma.2016.01.034)) can be done for a set of variables and soil layers by specifying the ingest demand in the settings. ingestr returns data as a mean across available map units for selected location (pixel), weighted by the fractional coverage of map units for this pixel. The table below describes available variables (info based on [ISRIC Report 2015/01](https://www.isric.org/documents/document-type/isric-report-201501-world-soil-property-estimates-broad-scale-modelling)).
+-----------+----------------------------------------------------------------------------------------------------------+
| Name | Description |
+===========+==========================================================================================================+
| CFRAG | Coarse fragments (vol. % \> 2mm), mean |
+-----------+----------------------------------------------------------------------------------------------------------+
| SDTO | Sand (mass %), mean |
+-----------+----------------------------------------------------------------------------------------------------------+
| STPC | Silt (mass %) |
+-----------+----------------------------------------------------------------------------------------------------------+
| CLPC | Clay (mass %) |
+-----------+----------------------------------------------------------------------------------------------------------+
| PSCL | Texture class (SOTER conventions) |
+-----------+----------------------------------------------------------------------------------------------------------+
| BULK | Bulk density (kg dm-3, g cm-3) |
+-----------+----------------------------------------------------------------------------------------------------------+
| TAWC | Available water capacity (cm m-1, -33 to -1500 kPa, conform USDA standards) Standard deviation for above |
+-----------+----------------------------------------------------------------------------------------------------------+
| CECS | Cation exchange capacity (cmol kg-1) of fine earth fraction |
+-----------+----------------------------------------------------------------------------------------------------------+
| BSAT | Base saturation as percentage of CECsoil |
+-----------+----------------------------------------------------------------------------------------------------------+
| ESP | Exchangeable sodium percentage |
+-----------+----------------------------------------------------------------------------------------------------------+
| CECc | CECclay, corrected for contribution of organic matter (cmol kg-1) |
+-----------+----------------------------------------------------------------------------------------------------------+
| PHAQ | pH measured in water |
+-----------+----------------------------------------------------------------------------------------------------------+
| TCEQ | Total carbonate equivalent (g C kg-1) |
+-----------+----------------------------------------------------------------------------------------------------------+
| GYPS | Gypsum content (g kg-1) |
+-----------+----------------------------------------------------------------------------------------------------------+
| ELCO | Electrical conductivity (dS m-1) |
+-----------+----------------------------------------------------------------------------------------------------------+
| ORGC | Organic carbon content (g kg-1) |
+-----------+----------------------------------------------------------------------------------------------------------+
| TOTN | Total nitrogen (g kg-1) |
+-----------+----------------------------------------------------------------------------------------------------------+
| CNrt | C/N ratio |
+-----------+----------------------------------------------------------------------------------------------------------+
| ECEC | Effective CEC (cmol kg-1) |
+-----------+----------------------------------------------------------------------------------------------------------+
| ALSA | Aluminum saturation (as % of ECEC) |
+-----------+----------------------------------------------------------------------------------------------------------+
By default, data is extracted for the top layer only. Data is provided for the following seven layers (depths in cm).
| Layer | Top depth | Bottom depth |
|-------|-----------|--------------|
| 1 | 0 | 20 |
| 2 | 20 | 40 |
| 3 | 40 | 60 |
| 4 | 60 | 80 |
| 5 | 80 | 1000 |
| 6 | 100 | 150 |
| 7 | 150 | 200 |
The following settings specify data extraction for the C:N ratio of the top three layers. The returned value is the mean across selected soil layers, weighted by the respective layer's depth. `dir` specifies the path to the downloaded data bundle. Don't change the structure of it. ingestr reads from two files: `<dir>/GISfiles/wise30sec_fin` and `<dir>/Interchangeable_format/HW30s_FULL.txt`.
```{r warning=FALSE, eval = FALSE}
settings_wise <- get_settings_wise(varnam = c("CNrt"), layer = 1:7)
```
Now, the data can be ingested.
```{r warning=FALSE, eval = FALSE}
df_wise <- ingest_bysite(
sitename = "CH-Lae",
source = "wise",
lon = 8.36439,
lat = 47.47833,
settings = settings_wise,
dir = "~/data/soil/wise"
)
```
## GSDE Soil
Global Soil Dataset for use in Earth System Models (GSDE) by [Shangguan et al. 2014](https://doi.org/10.1002/2013MS000293), obtained from [here](http://globalchange.bnu.edu.cn/research/soilwd.jsp). Available variables are given in the table below.
+------------+----------------------------------------+---------------+---------------+
| No. | Attribute | units | variable name |
+============+========================================+===============+===============+
| 1 | total carbon | %of weight | TC |
+------------+----------------------------------------+---------------+---------------+
| 2 | organic carbon | %of weight | OC |
+------------+----------------------------------------+---------------+---------------+
| 3 | total N | %of weight | TN |
+------------+----------------------------------------+---------------+---------------+
| 7 | pH(H2O) | | PHH2O |
+------------+----------------------------------------+---------------+---------------+
| 8 | pH(KCl) | | PHK |
+------------+----------------------------------------+---------------+---------------+
| 9 | pH(CaCl2) | | PHCA |
+------------+----------------------------------------+---------------+---------------+
| 15 | Exchangeable aluminum | cmol/kg | EXA |
+------------+----------------------------------------+---------------+---------------+
| 27 | The amount of P using the Bray1 method | ppm of weight | PBR |
+------------+----------------------------------------+---------------+---------------+
| 28 | The amount of P by Olsen method | ppm of weight | POL |
+------------+----------------------------------------+---------------+---------------+
| 29 | P retention by New Zealand method | \% of weight | PNZ |
+------------+----------------------------------------+---------------+---------------+
| 30 | The amount of water soluble P | ppm of weight | PHO |
+------------+----------------------------------------+---------------+---------------+
| 31 | The amount of P by Mehlich method | ppm of weight | PMEH |
+------------+----------------------------------------+---------------+---------------+
| 33 | Total P | \% of weight | TP |
+------------+----------------------------------------+---------------+---------------+
| 34 | Total potassium | \% of weight | TK |
+------------+----------------------------------------+---------------+---------------+
The 8 layers are:
```{r warning=FALSE, eval = FALSE}
df_layers <- tibble(layer = 1:8, bottom = c(4.5, 9.1, 16.6, 28.9, 49.3, 82.9, 138.3, 229.6)) %>%
mutate(top = lag(bottom)) %>%
mutate(top = ifelse(is.na(top), 0, top))
df_layers
```
Specify the settings directly as a list with elements `varnam` (a vector of character strings specifying the variables as defined in the table above), and `layer` (a vector of integers specifying the layers over which a depth-weighted average is taken).
```{r warning=FALSE, eval = FALSE}
settings_gsde <- list(varnam = c("TN", "PBR", "PHH2O"), layer = 1:3)
```
Now, the data can be ingested.
```{r warning=FALSE, eval = FALSE}
df_gsde <- ingest_bysite(
sitename = "CH-Lae",
source = "gsde",
lon = 8.36439,
lat = 47.47833,
settings = settings_gsde,
dir = "~/data/soil/shangguan"
)
```
And data is returned with variables along columns inside a nested column `data`, and sites along rows (as for all ingestr). Make it flat by:
```{r warning=FALSE, eval = FALSE}
df_gsde %>%
unnest(data)
```
## WorldClim
This ingests Worldclim monthly climatology (averaged over 1970-2000) at 30 seconds spatial resolution by [Fick & Hijmans, 2017](https://doi.org/10.1002/joc.5086), obtained [here](https://www.worldclim.org/data/worldclim21.html). Available variables are:
+---------------+-----------------------------------------------------------------------------------------+--------------+
| Variable name | Description | Units |
+===============+=========================================================================================+==============+
| bio | Bioclimatic variables (description [here](https://www.worldclim.org/data/bioclim.html)) | |
+---------------+-----------------------------------------------------------------------------------------+--------------+
| tmin | Minimum temperature | °C |
+---------------+-----------------------------------------------------------------------------------------+--------------+
| tmax | Maximum temperature | °C |
+---------------+-----------------------------------------------------------------------------------------+--------------+
| tavg | Average temperature | °C |
+---------------+-----------------------------------------------------------------------------------------+--------------+
| prec | Precipitation | mm |
+---------------+-----------------------------------------------------------------------------------------+--------------+
| srad | Solar radiation | kJ m-2 day-1 |
+---------------+-----------------------------------------------------------------------------------------+--------------+
| wind | Wind speed | m s-1 |
+---------------+-----------------------------------------------------------------------------------------+--------------+
| vapr | Water vapour pressure | kPa |
+---------------+-----------------------------------------------------------------------------------------+--------------+
Specify the settings directly as a list with elements `varnam` (a vector of character strings specifying the variables as defined in the table above), and `layer` (a vector of integers specifying the layers over which a depth-weighted average is taken).
```{r warning=FALSE, eval = FALSE}
settings_worldclim <- list(varnam = c("bio"))
```
Now, the data can be ingested.
```{r warning=FALSE, eval = FALSE}
df_worldclim <- ingest_bysite(
sitename = "CH-Lae",
source = "worldclim",
lon = 8.36439,
lat = 47.47833,
settings = settings_worldclim,
dir = "~/data/worldclim"
)
```
And for Flat-Earthers:
```{r warning=FALSE, eval = FALSE}
df_worldclim %>%
unnest(data)
```
# Examples for a site ensemble
To collect data from an ensemble of sites, we have to define a meta data frame, here called `siteinfo`, with rows for each site and columns `lon` for longitude, `lat` for latitude, `date_start` and `date_end` for required dates (Dates are objects returned by a `lubridate::ymd()` function call - this stands for year-month-day). The function `ingest()` can then be used to collect all site-level data as a nested data frame corresponding to the metadata `siteinfo` with an added column named `data` where the time series of ingested data is nested inside.
Note that extracting for an ensemble of sites at once is more efficient for data types that are global files (WATCH-WFDEI, and CRU). In this case, the `raster` package can be used to efficiently ingest data.
First, define a list of sites and get site meta information. The required meta information is provided by the exported data frame `siteinfo` (it comes as part of the ingestr package). This file is created as described in (and using code from) [metainfo_fluxnet2015](https://github.com/stineb/metainfo_fluxnet2015).
```{r warning=FALSE, eval = FALSE}
mysites <- c("BE-Vie", "DE-Tha", "DK-Sor", "FI-Hyy", "IT-Col", "NL-Loo", "US-MMS", "US-WCr", "US-UMB", "US-Syv", "DE-Hai")
siteinfo <- ingestr::siteinfo_fluxnet2015 %>%
dplyr::filter(sitename %in% mysites) %>%
dplyr::mutate(date_start = lubridate::ymd(paste0(year_start, "-01-01"))) %>%
dplyr::mutate(date_end = lubridate::ymd(paste0(year_end, "-12-31")))
```
This file looks like this:
```{r warning=FALSE, eval = FALSE}
print(siteinfo)
```
Next, the data can be ingested for all sites at once. Let's do it for different data types again.
## FLUXNET
### Meteo data
This ingests meteorological data from the FLUXNET files for variables temperature, precipitation, VPD, shortwave incoming radiation, net radiation, and atmospheric pressure. Arguments that are specific for this data source are provided in the `settings` list.
```{r warning=FALSE, eval = FALSE}
ddf_fluxnet <- ingest(
siteinfo = siteinfo %>% slice(1:3),
source = "fluxnet",
getvars = list(temp = "TA_F",
prec = "P_F",
vpd = "VPD_F",
ppfd = "SW_IN_F",
netrad = "NETRAD",
patm = "PA_F"
),
dir = "~/data/FLUXNET-2015_Tier1/20191024/DD/", # adjust this with your local path
settings = list(
dir_hh = "~/data/FLUXNET-2015_Tier1/20191024/HH/", # adjust this with your local path
getswc = FALSE),
timescale = "d",
verbose = TRUE
)
```
Additional variables defined at a daily time scale can be derived from half-hourly data. For example daily minimum temperature can be obtained as follows:
```{r eval=FALSE}
ddf_tmin <- ingest(
siteinfo = siteinfo %>% slice(1:3),
source = "fluxnet",
getvars = list(tmin = "TMIN_F"),
dir = "~/data/FLUXNET-2015_Tier1/20191024/DD/", # adjust this with your local path
settings = list(
dir_hh = "~/data/FLUXNET-2015_Tier1/20191024/HH/", # adjust this with your local path
getswc = FALSE),
timescale = "d",
verbose = TRUE
)
```
### Flux data
As described above for a single site, the same function can also be used to read in other FLUXNET variables (e.g., CO2 flux data) and conduct data filtering steps. Here, we're reading daily GPP and uncertainty (standard error), based on the nighttime flux decomposition method (`""GPP_NT_VUT_REF""`), keep only data where at least 80% is based on non-gapfilled half-hourly data (`threshold_GPP = 0.8`), and where the daytime and nighttime-based estimates are consistent, that is, where their difference is below the the 97.5% and above the 2.5% quantile (`filter_ntdt = TRUE`, see also `?get_obs_bysite_fluxnet2015`).
```{r warning=FALSE, eval = FALSE}
settings_fluxnet <- list(
getswc = FALSE,
filter_ntdt = TRUE,
threshold_GPP= 0.8,
remove_neg = FALSE
)
ddf_fluxnet_gpp <- ingest(
siteinfo = siteinfo %>% slice(1:3),
source = "fluxnet",
getvars = list(gpp = "GPP_NT_VUT_REF",
pp_unc = "GPP_NT_VUT_SE"),
dir = "~/data/FLUXNET-2015_Tier1/20191024/DD/", # adjust this with your local path
settings = settings_fluxnet,
timescale= "d"
)
```
## WATCH-WFDEI
This extracts from original WATCH-WFDEI files, provided as NetCDF (global, 0.5 degree resolution), provided as monthly files containing all days in each month. The data directory specified here (`dir = "~/data/watch_wfdei/"`) contains subdirectories with names containing the variable names (corresponding to the ones specified by the argument `getvars = list(temp = "Tair")`).
A bias correction may be applied by specifying the settings as in the example below. By specifying `correct_bias = "worldclim"` (the only option currently available), this uses a high-resolution (30'') monthly climatology based on years 1970-2000 and corrects the WATCH-WFDEI data by month, based on the difference (ratio for variables other than temperature) of its monthly means, averaged across 1979-2000.
WATCH-WFDEI data is available for years from 1979. If `year_start` is before that, the mean seasonal cycle, averaged across 1979-1988 is returned for all years before 1979.
```{r warning=FALSE, eval = FALSE}
ddf_watch <- ingest(
siteinfo = siteinfo %>% slice(1:2),
source = "watch_wfdei",
getvars = c("temp", "prec"),
dir = "~/data/watch_wfdei/", # adjust this with your local path
settings = list(correct_bias = "worldclim", dir_bias = "~/data/worldclim")
)
```
## CRU TS
This extracts monthly data from the CRU TS data. Interpolation to daily values is done using a wather generator for daily precipitation (given monthly total precipitation and number of wet days in each month), and a polynomial that conserves monthly means for all other variables.
```{r warning=FALSE, eval = FALSE}
ddf_cru <- ingest(
siteinfo = siteinfo %>% slice(1:2),
source = "cru",
getvars = c("tmax"),
dir = "~/data/cru/ts_4.01/" # adjust this with your local path
)
```
Check it out for the first site (BE-Vie).
```{r warning=FALSE, eval = FALSE}
ggplot() +
geom_line(data = ddf_fluxnet$data[[1]], aes(x = date, y = temp)) +
geom_line(data = ddf_watch$data[[1]], aes(x = date, y = temp), col = "royalblue") +
geom_line(data = ddf_cru$data[[1]], aes(x = date, y = tmax), col = "red") +
xlim(ymd("2000-01-01"), ymd("2005-12-31"))
```
## MODIS LP DAAC
This uses the [MODISTools](https://docs.ropensci.org/MODISTools/) R package making its interface consistent with ingestr. Settings can be specified and passed on using the `settings` argument. To facilitate the selection of data products and bands to be downloaded, you may use the function `get_settings_modis)` which defines defaults for different data bundles (`c("modis_fpar", "modis_ndvi", "modis_evi")` are available).
- `"modis_fpar"`: MODIS collection 6, MCD15A3H, band `Fpar_500m`
- `"modis_lai"`: MODIS collection 6, MCD15A3H, band `Lai_500m`
- `"modis_evi"`: MODIS collection 6, MOD13Q1, band `250m_16_days_EVI`
- `"modis_ndvi"`: MODIS collection 6, MOD13Q1, band `250m_16_days_NDVI`
The filtering criteria are hard-coded specifically for each product, using its respective quality control information (see function `gapfill_interpol()` in `R/ingest_modis_bysite.R`).
Downloading with parallel jobs is available for the `"modis"` data ingest, using the package [multidplyr](https://github.com/tidyverse/multidplyr). This is not (yet) available on CRAN, but can be installed with `devtools::install_github("tidyverse/multidplyr")`. To do parallel downloading, set the following arguments in the function `ingest()`: `parallel = TRUE, ncores = <number_of_parallel_jobs>`.
The following example is for downloading MODIS NDVI data.
```{r warning=FALSE, eval = FALSE}
settings_modis <- get_settings_modis(
bundle = "modis_ndvi",
data_path = "~/data/modis_subsets/",
method_interpol = "loess",
keep = TRUE,
overwrite_raw = FALSE,
overwrite_interpol= TRUE,
network = "FLUXNET"
)
```
This can now be used to download the data to the directory specified by argument `data_path` of function `get_settings_gee()`.
```{r warning=FALSE, eval = FALSE}
df_modis_fpar <- ingest(
siteinfo_fluxnet2015 %>% slice(1:3),
source = "modis",
settings = settings_modis,
parallel = FALSE
)
```
This can now be used to download the data to the directory specified by argument `data_path` of function `get_settings_gee()`. The data downloaded through MODISTools is then stored in `<data_path>/raw/`. When calling the functions `ingest()` or `ingest_bysite()` with the setting `overwrite_raw = FALSE`, the raw data file is read and not re-downloaded if available locally. Raw data contains information only for dates where MODIS data is provided. `ingest()` and `ingest_bysite()` interpolate to daily values following the setting `method_interpol`.
Note also that downloaded raw data are cutouts including pixels of 1 km within the focal point indicated by the site longitude and latitude (using arguments `km_lr = 1.0` and `km_ab = 1.0` in the `MODISTools::mt_subset()` call). This is hard-coded in ingestr. To select a smaller radius of pixels around the focal point included for taking the mean, set the setting `n_focal` to an integer (0:N), with 0 selecting only the single centre pixel in which the site is located, N=1 for including one pixel around the centre (nine in total), N=2 for 25 in total etc.
Plot the ingested data.
```{r warning=FALSE, eval = FALSE}
plot_fapar_ingestr_bysite(
df_modis_fpar$data[[1]] %>%
dplyr::filter(year(date) %in% 2010:2015),
settings_modis)
plot_fapar_ingestr_bysite(
df_modis_fpar$data[[2]] %>%
dplyr::filter(year(date) %in% 2010:2015),
settings_modis)
plot_fapar_ingestr_bysite(
df_modis_fpar$data[[3]] %>%
dplyr::filter(year(date) %in% 2010:2015),
settings_modis)
```
## Google Earth Engine
Using the same settings as specified above, we can download MODIS FPAR data for multiple sites at once from GEE:
```{r warning=FALSE, eval = FALSE}
settings_gee <- get_settings_gee(
bundle = "modis_fpar",
python_path = system("which python", intern = TRUE),
gee_path = "~/google_earth_engine_subsets/gee_subset/", # adjust this with your local path
data_path = "~/data/gee_subsets/", # adjust this with your local path
method_interpol = "linear",
keep = TRUE,
overwrite_raw = FALSE,
overwrite_interpol= TRUE
)
df_gee_modis_fpar <- ingest(
siteinfo= siteinfo,
source = "gee",
settings= settings_gee,
verbose = FALSE
)
```
Collect all plots.
```{r warning=FALSE, eval = FALSE}
list_gg <- plot_fapar_ingestr(df_gee_modis_fpar, settings_gee)
#purrr::map(list_gg, ~print(.))
```
## CO2
Ingesting CO2 data is particularly simple. We can safely assume it's well mixed in the atmosphere (independent of site location), and we can use a annual mean value for all days in respective years, and use the same value for all sites. Using the R package [climate](https://github.com/bczernecki/climate), we can load CO2 data from Mauna Loa directly into R. This is downloading data from <ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt>. Here, `ingest()` is a wrapper for the function `climate::meteo_noaa_co2()`.
```{r warning=FALSE, eval = FALSE}
df_co2 <- ingest(
siteinfo,
source = "co2_mlo",
verbose = FALSE
)
```
Argument `dir` can be provided here, too. In that case, CO2 data is written (after download if it's not yet available) and read to/from a file located at `<dir>/df_co2_mlo.csv`.
More info about the **climate** package and the data can be obtained [here](https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html) and by:
```{r warning=FALSE, eval = FALSE}
?climate::meteo_noaa_co2
```