Skip to content

ETL functions for data integration into crop simulation modeling applications

Notifications You must be signed in to change notification settings

fairagro/uc6_csmTools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

csmTools

ETL functions for semi-automated data integration into crop simulation modeling

Problem description

Crop experiment datasets are generally published as collections of tables describing the experimental design, management events, and measured variables. These tables are linked to one another by identifier variables, as in a relational database (primary/foreign keys). Many crop simulation model frameworks, such as DSSAT and APSIM, have standard annotation conventions (i.e., variable names, units, relationships) that can be mapped with automated workflows to a single data dictionary, the ICASA dictionary, in order to facilitate model intercomparison. The ICASA dictionary covers an extensive range of crop experiment variables for a variety of cropping systems. It follows a category structure analogous to common data structures of crop experiment datasets. However, because the relationships between different tables can be defined in multiple ways, crop experiment datasets must be reshaped to the standard input format to become usable as model inputs. This is an essential preliminary step to variable mapping, that can be carried out using existing tools such as the ARDN vMapper (for unstructured dataset) or with pre-defined data maps.

Tool purpose

csmTools aims to facilitate the ETL process for crop modelers by offering functions to identify and reshape datasets, map variables, perform simulations, and evaluate the quality of model inputs and outputs with graphical tools.

Current version

Currently, available functions have been developed based on a prototype dataset Seehausen Long-term Fertilization Experiment published on the BonaRes Repository. T The ETL process can be chiefly divided into four steps: (1) data identification and reshaping, (2) variable mapping, (3) data transformation into model input, and (4) simulation and visualization.

Test script

A test script allows to run the entire pipeline, from raw data to simulation output, on the example data.

Data sources

All raw data, model inputs, and model outputs can be found in the external data folder. One exception is the weather data which can be downloaded from the DWD Open Data Server using dedicated functions that leverage the [rdwd package] (https://github.com/cran/rdwd). Additional data sources will be progressively added. The soil profile data used in the test script is currently a DSSAT generic soil profile that comes with the DSSAT software. Functions to find, access, and format soil profile data from other sources will be progressively added to the package. Similarly, the crop genotype data used are currently generic DSSAT cultivars. A set of functions to estimate cultivar-specific genetic parameters based on expert knowledge will be developed.

Data identification and reshaping

The function uses metadata documented with the BonaRes metadata schema for long-term field experiments to identify the structure of the dataset and re-arrange it according to the ICASA standard. This type of domain-specific metadata is crucial to the purpose of the workflows as it provides specific information (e.g., spatial coverage, design structure, number of replicates...) that allows the machine to reconstruct the dataset and identify its components.

Variable mapping

Variable mapping involves renaming column headers, converting numeric variables, recoding categorical variables, as well as more complex operations such as the creation of new variables based on existing ones. Variable mapping is currently performed using pre-defined lookup tables ("maps") that are provided in the internal data folder. Complex mapping operations have not been implemented into these files yet and are instead performed in a data-specific manner using data-wrangling libraries.

DSSAT Crop Simulation Model

Simulations are performed with the DSSAT package that allows running the DSSAT CSM executable from within an R session. For this to work, DSSAT must installed onto the local machine. The program and all the associated documentation can be downloaded here It is recommended to use the default installation path (C:/DSSAT48/) as using custom paths may cause issues with running DSSAT from within R.

About

ETL functions for data integration into crop simulation modeling applications

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages