-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework/Add CWL Part #111
Draft
caroott
wants to merge
6
commits into
nfdi4plants:main
Choose a base branch
from
caroott:CWL
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Rework/Add CWL Part #111
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
96c65c2
correct allowed paths for tool description #110
caroott e538379
fix clw user guide links
caroott fc95c79
add metadata section to run and workflow #110
caroott 205c0c2
update arc example structure (cwl files)
caroott 929a950
link to example arc structure for file locations
caroott b6f6e1b
replace root arc.cwl with run.cwl
caroott File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,7 +26,7 @@ Licensed under the Creative Commons License CC BY, Version 4.0; you may not use | |
- [Additional Payload](#additional-payload) | ||
- [Top-level Metadata and Workflow Description](#top-level-metadata-and-workflow-description) | ||
- [Investigation and Study Metadata](#investigation-and-study-metadata) | ||
- [Top-Level Run Description](#top-level-run-description) | ||
- [Individual Run Description](#individual-run-description) | ||
- [Data Path Annotation](#data-path-annotation) | ||
- [Examples](#examples) | ||
- [General Pattern](#general-pattern) | ||
|
@@ -84,7 +84,7 @@ Each ARC is a directory containing the following elements: | |
|
||
- *Runs* capture data products (i.e., outputs of computational analyses) derived from assays, other runs, or study materials using workflows (located in the aforementioned *workflows* subdirectory). Each run is a collection of files, stored in the top-level `runs` subdirectory. It MUST be accompanied by a per-run CWL workflow description, stored in `<run_name>.cwl` as further described [below](#run-description). | ||
|
||
- *Top-level metadata and workflow description* tie together the elements of an ARC in the contexts of investigation and associated studies (in the ISA definition), captured in the file `isa.investigation.xlsx` in [ISA-XLSX format](#isa-xlsx-format), which MUST be present. Furthermore, top-level reproducibility information SHOULD be provided in the CWL `arc.cwl`. | ||
- *Top-level metadata and workflow description* tie together the elements of an ARC in the contexts of investigation and associated studies (in the ISA definition), captured in the file `isa.investigation.xlsx` in [ISA-XLSX format](#isa-xlsx-format), which MUST be present. | ||
|
||
All other files contained in an ARC (e.g., a `README.txt`, pre-print PDFs, additional annotation files) are referred to as *additional payload*, and MAY be located anywhere within the ARC structure. However, an ARC MUST be [reproducible](#reproducible-arcs) and [publishable](#shareable-and-publishable-arcs) even if these files are deleted. Further considerations on additional payload are described [below](#additional-payload). | ||
|
||
|
@@ -96,9 +96,7 @@ Note: | |
|
||
``` | ||
<top-level directory> | ||
| isa.investigation.xlsx | ||
| arc.cwl [optional] | ||
| arc.yml [optional] | ||
| isa.investigation.xlsx | ||
\--- studies | ||
\--- <study_name> | ||
| isa.study.xlsx | ||
|
@@ -118,8 +116,8 @@ Note: | |
\--- runs | ||
\--- <run_name> | ||
| [files;...] (different output files) | ||
| run.cwl | ||
| run.yml [optional] | ||
| run.cwl | ||
| run.yml | ||
``` | ||
|
||
## ARC Representation | ||
|
@@ -186,21 +184,31 @@ Notes: | |
|
||
Workflow execution and metadata MUST be described using the [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, in a file `workflow.cwl`, which MUST be placed in the subdirectory containing all files specific to this workflow under the top-level `workflows` subdirectory. This file MUST contain either of: | ||
|
||
- A CWL [tool description](https://www.commonwl.org/v1.2/CommandLineTool.html). Tool descriptions must be self-contained and not refer to any files outside the workflow subdirectory. All paths used within the tool description MUST be relative to itself. | ||
- A CWL [tool description](https://www.commonwl.org/v1.2/CommandLineTool.html). Tool descriptions must be self-contained and not refer to any files outside the ARC root directory. All paths used within the tool description MUST be relative to itself. | ||
|
||
- A CWL [workflow description](https://www.commonwl.org/v1.2/Workflow.html). Such descriptions MAY utilize other ARC workflows as [nested workflows](https://www.commonwl.org/user_guide/22-nested-workflows/index.html), but MUST use relative paths in this case. Files outside the ARC root directory MUST NOT be referenced. | ||
- A CWL [workflow description](https://www.commonwl.org/v1.2/Workflow.html). Such descriptions MAY utilize other ARC workflows as [nested workflows](https://www.commonwl.org/user_guide/topics/workflows.html#nested-workflows), but MUST use relative paths in this case. Files outside the ARC root directory MUST NOT be referenced. | ||
|
||
The file locations can be seen in the [Example ARC structure](#example-arc-structure). | ||
|
||
Notes: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This section and the next deviate from the usual MUST/SHOULD/MAY usage |
||
|
||
- There are no requirements on the structure or granularity of workflows. An ARC may contain no workflows at all if it contains no [run results](#run-description), or MAY utilize a single workflow to generate a single run result containing all computational output. | ||
|
||
- While workflows typically are (and should be) *generic*, i.e. a single workflow can be applied to different data of the same type, this is not a requirement. It is allowed to hard-code assay file paths and other parameters if workflow reusability is not a priority. | ||
|
||
- It is highly recommended that tool descriptions contain a reproducible execution environment description in the form of a [Docker](https://www.commonwl.org/user_guide/07-containers/index.html) container description. | ||
- It is highly recommended that tool descriptions contain a reproducible execution environment description in the form of a [Docker](https://www.commonwl.org/user_guide/topics/using-containers.html) container description. | ||
|
||
- It is expected that workflow and tool descriptions are authored semi-automatically, e.g. using the [arcCommander](https://github.com/nfdi4plants/arcCommander) tool. | ||
|
||
- It is strongly encouraged to include author and contributor metadata in tool descriptions and workflow descriptions as [CWL metadata](https://www.commonwl.org/user_guide/17-metadata/index.html). | ||
### Workflow Metadata | ||
|
||
- For metadata annotation, it is encouraged to reference namespaces and schemas, as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html) | ||
|
||
- It is strongly encouraged to include author and contributor metadata in tool descriptions and workflow descriptions as CWL metadata. | ||
|
||
- The referenced authors and contributors must be the ones involved in the creation of the tool description or workflow description, not the person executing the [processing unit](https://www.commonwl.org/user_guide/introduction/basic-concepts.html#processes-and-requirements). | ||
|
||
- It is encouraged, to add metadata relevant to the tool description or workflow description. This metadata must be limited to only metadata that directly describes the processing unit. Metadata describing the run parameters must be added to the `run.yml` parameter file. | ||
|
||
## Run Description | ||
|
||
|
@@ -218,7 +226,15 @@ Notes: | |
|
||
- It is expected that run descriptions are authored semi-automatically, e.g. using the [arcCommander](https://github.com/nfdi4plants/arcCommander) tool. | ||
|
||
- It is strongly encouraged to include author and contributor metadata in run descriptions as [CWL metadata](https://www.commonwl.org/user_guide/17-metadata/index.html). | ||
### Run Metadata | ||
|
||
- For metadata annotation, it is encouraged to reference namespaces and schemas, as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html) | ||
|
||
- It is strongly encouraged to include author and contributor metadata in `run.yml` parameter files as CWL metadata. | ||
|
||
- The referenced authors and contributors must be the ones executing the [processing unit](https://www.commonwl.org/user_guide/introduction/basic-concepts.html#processes-and-requirements), not the person that created the processing unit. | ||
|
||
- It is encouraged, to add metadata relevant to the `run.yml` parameter file. This metadata must be limited to only metadata that directly describes the run parameters. Metadata describing the processing unit must be added to the corresponding `.cwl` file. | ||
|
||
## Additional Payload | ||
|
||
|
@@ -235,7 +251,7 @@ Note: | |
|
||
The `investigation` file MUST follow the [ISA-XLSX investigation file specification](ISA-XLSX.md#investigation-file). | ||
|
||
Furthermore, top-level reproducibility information SHOULD be provided in the CWL `arc.cwl`. | ||
Furthermore, run-level reproducibility information SHOULD be provided in the CWL `run.cwl` ([Individual Run Description](#individual-run-description)). | ||
|
||
### Investigation and Study Metadata | ||
|
||
|
@@ -244,11 +260,11 @@ The ARC root directory is identifiable by the presence of the `isa.investigation | |
Multiple studies MUST be stored using one worksheet per study in `isa.studies.xlsx` in the root directory of the ARC. | ||
The study-level SHOULD define [ISA factors](https://isa-specs.readthedocs.io/en/latest/isamodel.html#study) of a study and MAY contain overlapping information also to be found in all assays grouped by the study. --> | ||
|
||
### Top-Level Run Description | ||
### Individual Run Description | ||
|
||
The file `arc.cwl` SHOULD exist at the root directory of each ARC. It describes which runs are executed (and specifically, their order) to (re)produce the computational outputs contained within the ARC. | ||
The file `run.cwl` MUST exist in the directory of each run. It describes the runs execution to (re)produce the computational outputs contained within the ARC. | ||
|
||
`arc.cwl` MUST be a CWL v1.2 workflow description and adhere to the same requirements as [run descriptions](#run-description). In particular, references to study or assay data files, nested workflows MUST use relative paths. An optional file `arc.yml` MAY be provided to specify input parameters. | ||
`run.cwl` MUST be a CWL v1.2 workflow description and adhere to the same requirements as [run descriptions](#run-description). In particular, references to study or assay data files, nested workflows MUST use relative paths. | ||
|
||
## Data Path Annotation | ||
|
||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a file tree here as example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the example file structure and linked it there