Skip to content

hrdawson/Durin_data_project

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the git repository for the DURIN project. The goal of this repo is to organize and streamline the data management in the project and beyond.

DATA MANAGEMENT

Location of data, metadata and code

The overview of all the datasets is here.

The raw and clean datasets from are stored and will be made available after the end of the project on OSF. For now the data is only available to the project partners.

All R code for the cleaning the raw data is available on the Durin GitHub.

The data documentation, including a draft for the data paper is available here (only available for authors), and the data dictionaries are in this below in this readme file.

Naming conventions for the datasets and files

We use snake_case style for names and coding. Snake_case means that we use lower case letters separated by an underscore, for example biomass_g or age_class. The one exceptionare siteID, blockID, plotID (legacy from previous projects).

File and variable names should be meaningful. Do not use my_data.csv or var1. Follow the naming convention for the main variables (see full description of the main variables in the table below).

We use redundancy in variable names to avoid mistakes. PlotID should contain the higher hierarchical levels such as siteID, blockID, treatment, habitat, species etc. For example plotID contains siteID, habitat, species and plot number: LY_O_VV_1, LY_F_VM_3

Files or variable Naming convention Example
File name Project_Status_Experiment_Response_Year(s).Extension DURIN_clean_gradient_cflux_2023-2025.csv
Project DURIN
Experiment 4 corner = 4 main Durin sites, drought = drought experiment at Lygra and Tjotta, gradient = gradient study, nutrient = nutrient experiment
4 corner study
date Date of data collection yyyy-mm-dd; do not split year, month and day into several columns
year Year of data collection yyyy; sometimes there is no specific date, then year can be used
site_ID_name Unique full site name Lygra, Sogndal, Senja, Kautokeino and Tjotta
siteID Unique siteID, first 2 letters of site_ID_name. LY, SO, SE, and KA, TJ
habitat Open versus forested habitat O, F
species Vascular plant taxon names follow for Norway Lid & Lid (Lid J & Lid, 2010). We use full species names. For field sheets the names can be abbreviated (see speciesID), but the clean data should contain the full species name Vaccinium myrtillus
speciesID 2 letter abbreviation of species VM, VV, CV, EN, BN
plot_nr Unique plot number, numeric value from 1-5. 1-5
plotID Unique plot ID as a combination of siteID, habitat, speciesID and plot number LY_O_VV_1, LY_F_VM_1
variable Response variable(s) cover, biomass, Reco
value Value of response variable(s) numeric value
unit Unit for response variable(s) %, µmol m−2 s−1
other variables Other important variables in the dataset plant nr, remark, data collector, weather, flag
DroughtNet experiment
site_ID_name Site name Lygra, Tjotta
siteID Unique siteID, first 2 letters of site_ID_name. LY, TJ
age_class Vegetation representing post-fire successional stages. pioneer, building, mature
drought_treatment Drought treatment using rain-out shelters that reduce precipitation by 0 = ambient, 60 = moderate, or 90% = extreme ambient, moderate, extreme
plotID Unique plot ID as a combination of siteID, age_class, drought_treatment and plot number LY_P_PIO_1, LY_M_EXT_3
droughtNet_plot Unique plotID from the DroughtNet frames in the field. Correspond with Landpress naming. 1.1,1.2,1.3 - 9.1,9.2,9.3
species Vascular plant taxon names follow for Norway Lid & Lid (Lid J & Lid, 2010). We use full species names. For field sheets the names can be abbreviated (see speciesID), but the clean data should contain the full species name Vaccinium myrtillus
speciesID 2 letter abbreviation of species VM, VV, CV, EN
Nutrient experiment
date Date of data collection yyyy-mm-dd; do not split year, month and day into several columns
site_ID_name Site name Lygra
siteID Unique siteID, first 2 letters of site_ID_name. LY
habitat Open versus forested habitat O, F
age_class Vegetation representing post-fire successional stages. pioneer, building, mature
nitrogen_addition Added level of nitrogen 5, 10, 25, 30
plot_nr Unique plot number 1-5
plotID Combination of siteID, habitat, age_class, nitrogen addition and plot number. LY_O_BUI_10_3
Gradient study
date ?? ?
siteID ?? ?
habitat Open versus forested habitat O, F
plot_nr Unique plot number 1-5
plotID Combination of siteID, habitat, and plot number. ??

Organize data sets

Each dataset should contain only one response variable, or several if they are closely related. E.g. biomass and carbon flux should be two separate datasets. But the functional trait dataset can contain several traits, and the carbon flux dataset can contain GPP, NEE and Reco.

Datasets should be in a long format. If you have several response variables (e.g. traits, flux measurements), then use pivot_long(cols = ..., names_to = "variable", values_to = "value") to conver the data to a long format. Using the names variable and value for the columns is a standard. If you have very different response variables, it can be useful to have a column called unit.

When dealing with many different datasets it can be useful to structure them in a similar way. Arrange the dataset so that the important variables come first, preferable use this order:

  • Year and date
  • siteID
  • plotID
  • habitat
  • treatment
  • plantID
  • leafID
  • variable (diversity, biomass, flux)
  • value
  • (unit)
  • predictor_variables (temperature level, oceanity)
  • other_variables (remark, data collector, weather)

Data dictionary

How to make a data dictionary?

The R package dataDocumentation that will help you to make the data dictionary. You can install and load the package as follows:

# if needed install the remotes package
install.packages("remotes")

# then install the dataDocumentation package
remotes::install_github("audhalbritter/dataDocumentation")

# and load it
library(dataDocumentation)

Make data description table

Find the file R/data_dic/data_description.xlsx. Enter all the variables into that table, including variable name, description, unit/treatment level and how measured. If the variables are global for all of Funder, leave TableID blank (e.g. siteID). If the variable is unique for a specific dataset, create a TableID and use it consistently for one specific dataset. Make sure you have described all variables.

Make data dictionary

Then run the function make_data_dic().

data_dic <- make_data_dictionary(data = biomass,
                                 description_table = description_table,
                                 table_ID = "biomass",
                                 keep_table_ID = FALSE)

Check that the function produces the correct data dictionary.

Add data dictionary to readme file

Finally, add the data dictionary below to be displayed in this readme file. Add a title, and a code chunk using kable() to display the data dictionary.

For more details go to the dataDocumentation readme file.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%