From 91f8adf545e095ba2b2c1b409a3dfcbd39ebee05 Mon Sep 17 00:00:00 2001 From: Philip Hackstock <20710924+phackstock@users.noreply.github.com> Date: Thu, 23 Jan 2025 17:12:21 +0100 Subject: [PATCH 1/7] Add section 'requirements for processing modules' --- scenario-databases.rst | 67 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 61 insertions(+), 6 deletions(-) diff --git a/scenario-databases.rst b/scenario-databases.rst index f288091..046a73c 100644 --- a/scenario-databases.rst +++ b/scenario-databases.rst @@ -75,18 +75,73 @@ Scenario processing ------------------- When submitting a scenario (a.k.a. "run") to an IIASA database instance, the server -executes a scenario-processing workflow including *region-aggregation* and -*scenario validation* prior to saving the scenario to the database. The processing uses -the **nomenclature** package (`read the docs `_). +executes a scenario-processing workflow including *region-aggregation* and *scenario +validation* prior to saving the scenario to the database. The processing uses the +**nomenclature** package (`read the docs `_). The region-aggregation and validation is configured via a project-specific GitHub_ repository, usually named `https://github.com/iiasa/-workflow`_. Please contact the respective project managers or the Scenario Services team if you need access. -You can also run the project workflow locally (on your computer) before submission to -an IIASA database instance, to make sure that the validation and processing works. -See :ref:`local-processing` for more information! +The workflow for processing files uploaded via the IIASA Scenario Explorer is +implemented in a modular fashion. This makes it straightforward to execute programs, +code and tools developed by (non-IIASA) research partners as part of the processing +workflow. + +Requirements for processing modules +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Any module (a.k.a. program, code or tool) must adhere to the following standards of +best-practice software development. The aim of these guidelines is to ensure reliability +of our services, minimize maintenance requirements, and guarantee reproducibility of +results across platforms. + +General requirements +```````````````````` + +- The program, code or tool must be implemented in Python (≥3.7) or R; compiled + executables are not acceptable for security reasons +- Distribution of the source code + - via an online version-controlled repository + (preferably GitHub) to which the IIASA admin team has access; or + - installation via a package manager (pip, conda, CRAN). +- The program must run on Debian (preferably Ubuntu) +- The dependencies must be clearly stated, + e.g. as Dockerfile (describing execution environment, library dependencies etc.) + Python package dependencies according to packaging user guide (e.g. as environment.yml, requirements.txt etc.) + R dependencies +- The license must be clearly stated. +- The documentation of the program, code or tool must include: + - Purpose of the program and individual top-level functions + - Instructions how to run the program + - Expected input (variables, region mappings) and standard output + - Explanation of any settings and optional parameters + +Application programming interface +````````````````````````````````` + +**Option 1**: + +The module is called via a command-line interface (CLI) +and take the following arguments: + +- :code:`input`: path to an IAMC-formatted file (:code:`xlsx` or :code:`csv`) +- :code:`output`: path where to write an output file + (usually derived timeseries data) in the same format +- Any relevant settings and optional parameters must also be specified + via the CLI + +e.g. :code:`"python process.py --input path-to-input-file.xlsx --output path-to-output-file.xlsx"` + +**Option 2** (applicable for packages/functions written in Python): + +Importable Python functions that take and return :class:`pandas.DataFrame` (with columns +folllowing the IAMC format) or :class:`pyam.IamDataFrame` objects can be called as part +of the processing workflow. Any settings or optional parameters must be given as keyword +arguments to the top-level function, preferably with the option to set them via a +settings or configuration file. .. _GitHub: https://www.github.com .. _`https://github.com/iiasa/-workflow`: https://github.com/iiasa + From de172cbb30b0e355d97476ee285005a1c4e3896f Mon Sep 17 00:00:00 2001 From: Philip Hackstock <20710924+phackstock@users.noreply.github.com> Date: Thu, 23 Jan 2025 17:14:39 +0100 Subject: [PATCH 2/7] Set minimum python requirement to 3.10 --- scenario-databases.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scenario-databases.rst b/scenario-databases.rst index 046a73c..1219f2d 100644 --- a/scenario-databases.rst +++ b/scenario-databases.rst @@ -99,7 +99,7 @@ results across platforms. General requirements ```````````````````` -- The program, code or tool must be implemented in Python (≥3.7) or R; compiled +- The program, code or tool must be implemented in Python (≥3.10) or R; compiled executables are not acceptable for security reasons - Distribution of the source code - via an online version-controlled repository From ccaddaae81024764203f3db6a324da1ae2b16b7a Mon Sep 17 00:00:00 2001 From: Philip Hackstock <20710924+phackstock@users.noreply.github.com> Date: Tue, 18 Feb 2025 11:21:50 +0100 Subject: [PATCH 3/7] Move processing requirements into user guide --- user-guide/processing-requirements.rst | 54 ++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) create mode 100644 user-guide/processing-requirements.rst diff --git a/user-guide/processing-requirements.rst b/user-guide/processing-requirements.rst new file mode 100644 index 0000000..b4e7bbf --- /dev/null +++ b/user-guide/processing-requirements.rst @@ -0,0 +1,54 @@ +.. _processing-requirements: + +Requirements for processing modules +=================================== + +Any module (a.k.a. program, code or tool) must adhere to the following standards of +best-practice software development. The aim of these guidelines is to ensure reliability +of our services, minimize maintenance requirements, and guarantee reproducibility of +results across platforms. + +General requirements +-------------------- + +- The program, code or tool must be implemented in Python (≥3.10) or R; compiled + executables are not acceptable for security reasons +- Distribution of the source code + - via an online version-controlled repository + (preferably GitHub) to which the IIASA admin team has access; or + - installation via a package manager (pip, conda, CRAN). +- The program must run on Debian (preferably Ubuntu) +- The dependencies must be clearly stated, + e.g. as Dockerfile (describing execution environment, library dependencies etc.) + Python package dependencies according to packaging user guide (e.g. as environment.yml, requirements.txt etc.) + R dependencies +- The license must be clearly stated. +- The documentation of the program, code or tool must include: + - Purpose of the program and individual top-level functions + - Instructions how to run the program + - Expected input (variables, region mappings) and standard output + - Explanation of any settings and optional parameters + +Application programming interface +--------------------------------- + +**Option 1**: + +The module is called via a command-line interface (CLI) +and take the following arguments: + +- :code:`input`: path to an IAMC-formatted file (:code:`xlsx` or :code:`csv`) +- :code:`output`: path where to write an output file + (usually derived timeseries data) in the same format +- Any relevant settings and optional parameters must also be specified + via the CLI + +e.g. :code:`"python process.py --input path-to-input-file.xlsx --output path-to-output-file.xlsx"` + +**Option 2** (applicable for packages/functions written in Python): + +Importable Python functions that take and return :class:`pandas.DataFrame` (with columns +folllowing the IAMC format) or :class:`pyam.IamDataFrame` objects can be called as part +of the processing workflow. Any settings or optional parameters must be given as keyword +arguments to the top-level function, preferably with the option to set them via a +settings or configuration file. From faa03a7c9cd6f685e64bda390c472e66ee1ff87b Mon Sep 17 00:00:00 2001 From: Philip Hackstock <20710924+phackstock@users.noreply.github.com> Date: Tue, 18 Feb 2025 11:22:06 +0100 Subject: [PATCH 4/7] Limit heading depth to 1 for links --- user-guide.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/user-guide.rst b/user-guide.rst index 9bd57c3..661bd7e 100644 --- a/user-guide.rst +++ b/user-guide.rst @@ -33,8 +33,9 @@ Detailed User Guides -------------------- .. toctree:: - :maxdepth: 2 + :maxdepth: 1 user-guide/local-processing + user-guide/processing-requirements .. _common-definitions: https://github.com/iamconsortium/common-definitions From 82df17e145434e901589b9eec0607e5088687cb6 Mon Sep 17 00:00:00 2001 From: Philip Hackstock <20710924+phackstock@users.noreply.github.com> Date: Tue, 18 Feb 2025 11:32:44 +0100 Subject: [PATCH 5/7] Remove requirements for processing modules after moving it in its own page --- scenario-databases.rst | 53 ------------------------------------------ 1 file changed, 53 deletions(-) diff --git a/scenario-databases.rst b/scenario-databases.rst index 1219f2d..fa32517 100644 --- a/scenario-databases.rst +++ b/scenario-databases.rst @@ -88,59 +88,6 @@ implemented in a modular fashion. This makes it straightforward to execute progr code and tools developed by (non-IIASA) research partners as part of the processing workflow. -Requirements for processing modules -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Any module (a.k.a. program, code or tool) must adhere to the following standards of -best-practice software development. The aim of these guidelines is to ensure reliability -of our services, minimize maintenance requirements, and guarantee reproducibility of -results across platforms. - -General requirements -```````````````````` - -- The program, code or tool must be implemented in Python (≥3.10) or R; compiled - executables are not acceptable for security reasons -- Distribution of the source code - - via an online version-controlled repository - (preferably GitHub) to which the IIASA admin team has access; or - - installation via a package manager (pip, conda, CRAN). -- The program must run on Debian (preferably Ubuntu) -- The dependencies must be clearly stated, - e.g. as Dockerfile (describing execution environment, library dependencies etc.) - Python package dependencies according to packaging user guide (e.g. as environment.yml, requirements.txt etc.) - R dependencies -- The license must be clearly stated. -- The documentation of the program, code or tool must include: - - Purpose of the program and individual top-level functions - - Instructions how to run the program - - Expected input (variables, region mappings) and standard output - - Explanation of any settings and optional parameters - -Application programming interface -````````````````````````````````` - -**Option 1**: - -The module is called via a command-line interface (CLI) -and take the following arguments: - -- :code:`input`: path to an IAMC-formatted file (:code:`xlsx` or :code:`csv`) -- :code:`output`: path where to write an output file - (usually derived timeseries data) in the same format -- Any relevant settings and optional parameters must also be specified - via the CLI - -e.g. :code:`"python process.py --input path-to-input-file.xlsx --output path-to-output-file.xlsx"` - -**Option 2** (applicable for packages/functions written in Python): - -Importable Python functions that take and return :class:`pandas.DataFrame` (with columns -folllowing the IAMC format) or :class:`pyam.IamDataFrame` objects can be called as part -of the processing workflow. Any settings or optional parameters must be given as keyword -arguments to the top-level function, preferably with the option to set them via a -settings or configuration file. - .. _GitHub: https://www.github.com .. _`https://github.com/iiasa/-workflow`: https://github.com/iiasa From 630f3ded1f2d4ec46fe1ae5f05d824621e7cd440 Mon Sep 17 00:00:00 2001 From: Philip Hackstock <20710924+phackstock@users.noreply.github.com> Date: Tue, 18 Feb 2025 16:44:41 +0100 Subject: [PATCH 6/7] Apply suggestions from code review Co-authored-by: Daniel Huppmann --- scenario-databases.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/scenario-databases.rst b/scenario-databases.rst index fa32517..02e4f19 100644 --- a/scenario-databases.rst +++ b/scenario-databases.rst @@ -84,9 +84,9 @@ repository, usually named `https://github.com/iiasa/-workflow`_. Please the respective project managers or the Scenario Services team if you need access. The workflow for processing files uploaded via the IIASA Scenario Explorer is -implemented in a modular fashion. This makes it straightforward to execute programs, -code and tools developed by (non-IIASA) research partners as part of the processing -workflow. +implemented in a modular fashion. It is possible to execute programs, code and tools +developed by (non-IIASA) research partners as part of the processing workflow +if the tool follows the :ref:`processing-requirements`. .. _GitHub: https://www.github.com From 3c7069ac9eb2db52903b104c93e2139843fbe8da Mon Sep 17 00:00:00 2001 From: Daniel Huppmann Date: Thu, 31 Jul 2025 06:53:57 +0200 Subject: [PATCH 7/7] Reinsert section on local processing of workflows --- scenario-databases.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/scenario-databases.rst b/scenario-databases.rst index 52208eb..61617cc 100644 --- a/scenario-databases.rst +++ b/scenario-databases.rst @@ -84,6 +84,10 @@ The region-aggregation and validation is configured via a project-specific GitHu repository, usually named `https://github.com/iiasa/-workflow`_. Please contact the respective project managers or the Scenario Services team if you need access. +You can also run the project workflow locally (on your computer) before submission to +an IIASA database instance, to make sure that the validation and processing works. +See :ref:`local-processing` for more information. + The workflow for processing files uploaded via the IIASA Scenario Explorer is implemented in a modular fashion. It is possible to execute programs, code and tools developed by (non-IIASA) research partners as part of the processing workflow