/
about.Rmd
executable file
·118 lines (78 loc) · 5.59 KB
/
about.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
title: "About"
output:
workflowr::wflow_html:
toc: false
editor_options:
chunk_output_type: console
---
**Modeling defoliation as a proxy for tree health: Comparison of feature-selection methods across multiple feature sets derived from hyperspectral data**
# Authors
**Patrick Schratz** (patrick.schratz@gmail.com) [![](https://orcid.org/sites/default/files/images/orcid_16x16.png)](http://orcid.org/0000-0003-0748-6624)
Jannes Muenchow [![](https://orcid.org/sites/default/files/images/orcid_16x16.png)](http://orcid.org/0000-0001-7834-4717)
Eugenia Iturritxa [![](https://orcid.org/sites/default/files/images/orcid_16x16.png)](https://orcid.org/0000-0002-0577-3315)
José Cortés [![](https://orcid.org/sites/default/files/images/orcid_16x16.png)](http://orcid.org/0000-0003-2567-8689)
Bernd Bischl [![](https://orcid.org/sites/default/files/images/orcid_16x16.png)](http://orcid.org/0000-0001-6002-6980)
Alexander Brenning [![](https://orcid.org/sites/default/files/images/orcid_16x16.png)](http://orcid.org/0000-0001-6640-679X)
# Contents
## Paper
This repository contains the research compendium of our work on comparing algorithms across multiple feature sets and filtering methods (including ensemble filter methods).
- keywords
- hyperspectral imagery
- forest health monitoring
- machine learning
- feature selection
- feature effects
- model comparison
- filter
- imaging spectroscopy
- Using machine-learning algorithms to model defoliation of _Pinus Radiata_ trees.
- Compare filtering methods (ensemble filter methods) across various algorithms and datasets
- Predict defoliation to all available plots (24) and the whole Basque Country (at 200 m resolution)
The following directories belong to this project
- `code/01-download.R`
- `code/02-hyperspectral-processing.R`
- `code/04-data-processing.R`
- `code/05-modeling/`
- `code/06-benchmark-matrix/`
- `code/07-reports/`
## Other Content
In addition, this repo contains the workflow for an analysis related to the [LIFE 14 ENV/ES/000179 LIFE HEALTHY FOREST](http://www.lifehealthyforest.com/) project: Predicting defoliation at trees for the Basque Country (for the years 2017 and 2018) using Sentinel-2 data.
Target `defoliation_maps_wfr` builds subsequent argets which are necessary for the final results [report](https://pat-s.github.io/2019-feature-selection/report-defoliation.html).
# How to use
## Reading the code, accessing the data
See the [`code`](https://github.com/pat-s/paper_hyperspectral/tree/master/analysis) directory on GitHub for the source code that generated the figures and statistical results contained in the manuscript.
See the [`data`](https://github.com/pat-s/paper_hyperspectral/tree/master/analysis/data) directory for instructions how to access the raw data used in the manuscript.
## Installing the R package
This repository is organized as an R package, providing functions and raw data to reproduce and extend the analysis reported in the publication.
Note that this package has been written explicitly for this project and is not suited a for more general use.
This project is setup with a [drake workflow](https://github.com/ropensci/drake), ensuring reproducibility.
Intermediate targets/objects will be stored in a hidden `.drake` directory.
The R library of this project is managed by [renv](https://rstudio.github.io/renv/index.html).
This makes sure that the exact same package versions are used when recreating the project.
When calling `renv::restore()`, all required packages will be installed with their specific version.
Please note that this project was built with R version 4.0.3 on a CentOS 7 operating system.
Some packages from this project **are not compatible with R versions prior version 3.6.0.**
To clone the project, a working installation of `git` is required.
Open a terminal in the directory of your choice and execute:
```sh
git clone https://github.com/pat-s/2019-feature-selection.git
```
Then start R in this directory and run
```{r README-1, eval = FALSE}
renv::restore()
r_make()
```
## Creating targets with {drake}
Calling `r_make()` will create targets specified in `drake_config(targets = <target>)` in `_drake.R` with the additional drake settings specified.
Out of the 400+ targets in this project, the following targets are important:
- `bm_aggregated_new_buffer2`: Aggregated benchmark results of all models using a 2 meter buffer for hyperspectral data extraction.
- `eda_wfr`: Creates the [report which shows Exploratory Data Analysis (EDA) plots and tables](https://pat-s.github.io/2019-feature-selection/eda.html).
- `eval_performance_wfr`: Creates the [report which evaluates the model performances](https://pat-s.github.io/2019-feature-selection/eval-performance.html).
- `spectral_signatures_wfr`: Creates the [report which inspects the spectral signatures of the hyperspectral data](https://pat-s.github.io/2019-feature-selection/spectral-signatures.html).
- `feature_importance_wfr`: Creates the [report which inspects the feature importance of variables](https://pat-s.github.io/2019-feature-selection/feature-importance.html).
- `filter_correlations_wfr`: Creates the [report which inspects correlations among filter methods](https://pat-s.github.io/2019-feature-selection/feature-importance.html).
Note that most reports require some/all fitted models.
Creating these (e.g. target `benchmark_no_models_new_buffer2`) is a costly process and takes several days on a HPC and way longer on a single machine.
# Notes and resources
* The organisation of this compendium was inspired by the works of [Carl Boettiger](http://www.carlboettiger.info/) and [Ben Marwick](https://github.com/benmarwick).