/
methods_gettingstarted.Rmd
115 lines (92 loc) · 6.58 KB
/
methods_gettingstarted.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
title: "Getting started"
site: workflowr::wflow_site
output:
workflowr::wflow_html:
toc: false
theme: flatly
highlight: espresso
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
**To reproduce this analysis, please contact mathis.messager@mail.mcgill.ca to obtain the required raw and pre-processed data, which represent between 60 and 500GB depending on how comprehensive of a re-production is wanted.**
This analysis relies as much as possible on [good enough practices in scientific computing](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510), which users are encouraged to read.
**Structure**: the overall project directory is structured with the following sub-directories:
bin/ (compiled code/external packages)
data/ (raw data, not to be altered)
results/ (results of the analysis, mostly reproduceable through code executiong but also includes manually modified results)
src/ (code written for the project)
|---- globalIRmap (source code for analysis in R)
|---- globalIRmap_HydroATLAS_py (source code for formatting of global river network environmental attributes)
|---- globalIRmap_py (source code for formatting of spatial data in Python)
All scripts rely on this structure.
The overall workflow of the project is detailed in the [Workflow tab](https://messamat.github.io/globalIRmap/methods_workflow.html) of this website.
# Python analysis
The documentation that follows is specifically for the portions of the analysis conducted in Python, which encompass all spatial formatting and analyses.
**Prerequisites** All GIS analyses in this study require an ESRI ArcGIS license including the Spatial Analyst extension, which itself requires a Windows OS. We used the Python Arcpy module associated with ArcGIS 10.7 in Python 2.7 with 64-bit background processing.
### Download a repository for Python
In [Git Bash](https://git-scm.com/), the following commands illustrate the procedure to make a local copy (i.e. clone) of the Github repository in a newly created directory at C://test_globalIRmap/src :
```{r, engine = 'bash', eval = FALSE}
Mathis@DESKTOP MINGW64 ~
$ cd /c/
Mathis@DESKTOP MINGW64 /c
$ mkdir test_globalIRmap
Mathis@DESKTOP MINGW64 /c
$ mkdir /c/test_globalIRmap/src
Mathis@DESKTOP MINGW64 /c
$ cd /c/test_globalIRmap/src
Mathis@DESKTOP MINGW64 /c/test_globalIRmap/src
$ git clone https://github.com/messamat/globalIRmap_py.git
Cloning into 'globalIRmap_py'...
remote: Enumerating objects: 164, done.
remote: Counting objects: 100% (164/164), done.
remote: Compressing objects: 100% (108/108), done.
Receiving obremote: Total 164 (delta 97), reused 118 (delta 53), pack-reused 0
Receiving objects: 100% (164/164), 116.62 KiB | 1003.00 KiB/s, done.
Resolving deltas: 100% (97/97), done.
```
### Notes and resources
* The [issues tracker](https://github.com/messamat/globalIRmap_py/issues) is the place to report problems or ask questions
* See the repository [history](https://github.com/messamat/globalIRmap_py/commits/master) for a fine-grained view of progress and changes.
# R analysis
The documentation that follows is specifically for the portions of the analysis conducted in R, which encompass all statistical analyses and figure-making (aside from maps).
**Documentation**: this project is organized as an R package, providing documented functions to reproduce and extend the analysis reported in the publication. Note that this package has been written explicitly for this project and may not be suitable for general use. See guidelines below to install the package.
**R Workflow**: this project is setup with a [drake workflow](https://github.com/ropensci/drake), ensuring reproducibility.
In the `drake` philosophy, every action is a function, and every R object is a "target" with dependencies.
Intermediate targets/objects are stored in a `.drake` directory.
**Dependency management**: the R library of this project is managed by [renv](https://rstudio.github.io/renv/articles/renv.html).
This makes sure that the exact same package versions are used when recreating the project.
When calling `renv::restore()`, all required packages will be installed with their specific version.
Please note that this project was built with R version 4.0.3 on a Windows 10 operating system.
The renv packages from this project **are not compatible with R versions prior to version 3.6.0.**
**Syntax**: this analysis relies on the [data.table](https://rdatatable.gitlab.io/data.table/) syntax, which provides a high-performance version of data.frame. It is concise, faster, and more memory efficient than conventional data.frames and the tidyverse syntax.
**Machine learning model development**: for random forest model development, this project relies on the [mlr3](https://mlr3.mlr-org.com/) package and ecosystem (see the [mlr3 book](https://mlr3book.mlr-org.com/introduction.html) for learning its usage), which provides an object-oriented framework for machine learning.
### Download the repository for R
In Git Bash, continuing from the previous example for downloading the Github repository of the Python analysis, the following commands illustrate the procedure to make a local copy of the Github repository in a newly created directory at C://test_globalIRmap/src :
```{r, engine = 'bash', eval = FALSE}
Mathis@DESKTOP MINGW64 /c/test_globalIRmap/src
$ git clone https://github.com/messamat/globalIRmap.git
Cloning into 'globalIRmap'...
remote: Enumerating objects: 116, done.
remote: Counting objects: 100% (116/116), done.
remote: Compressing objects: 100% (89/89), done.
remote: Total 7363 (delta 48), reused 75 (delta 19), pack-reused 7247
Receiving objects: 100% (7363/7363), 1.91 GiB | 3.78 MiB/s, done.
Resolving deltas: 100% (925/925), done.
```
In R Studio for Windows, the following procedure can be used:
* Click on “File” in the menu ribbon
* Select “New project…”
* Choose the “Version control” option in the New Project Wizard window.
* Then, select “Git” in the next window.
* In the next window, fill the fields as follows:
* Repository URL: https://github.com/messamat/globalIRmap
* Project directory name: [will autofill as “globalIRmap”]
* Create project as subdirectory of: [choose the directory, e.g., C://test_globalIRmap//src]
* Tick “Open in new session” and then click “Create project”.
### Notes and resources
* The [issues tracker](https://github.com/messamat/globalIRmap/issues) is the place to report problems or ask questions
* See the repository [history](https://github.com/messamat/globalIRmap/commits/master) for a fine-grained view of progress and changes.