Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include Changes for specification V2.0.0 #105

Merged
merged 49 commits into from
Jun 11, 2024
Merged

Include Changes for specification V2.0.0 #105

merged 49 commits into from
Jun 11, 2024

Conversation

HLWeil
Copy link
Member

@HLWeil HLWeil commented Jun 6, 2024

HLWeil and others added 30 commits November 17, 2023 12:03
fix typo in STUDY CONTACTS table section
Add data path annotation section to ARC specification
Add comment to annotation table

Each ARC is a directory containing the following elements:

- *Studies* are collections of material and resources used within the investigation.
Metadata that describe the characteristics of material and resources follow the ISA study model. Study-level metadata is stored in [ISA-XLSX](#isa-xlsx-format) format in a file `isa.study.xlsx`, which MUST exist to specify the input material or data resources. Resources MAY include biological materials (e.g. plant samples, analytical standards) created during the current investigation. Resources MAY further include external data (e.g., knowledge files, results files) that need to be included and cannot be referenced due to external limitations. Resources described in a study file can be the input for one or multiple assays. Further details on `isa.study.xlsx` are specified [below](#study-and-resources). Resource (descriptor) files MUST be placed in a `resources` subdirectory.
Metadata that describe the characteristics of material and resources follow the ISA study model. Study-level metadata is stored in [ISA-XLSX](#isa-xlsx-format) format in a `isa.study.xlsx` file, which MUST exist to specify the input material or data resources. Resources MAY include biological materials (e.g. plant samples, analytical standards) created during the current investigation. Resources MAY further include external data (e.g., knowledge files, results files) that need to be included and cannot be referenced due to external limitations. Resources described in a study file can be the input for one or multiple assays. Further details on `isa.study.xlsx` are specified [below](#study-and-resources). Resource (descriptor) files MUST be placed in a `resources` subdirectory. Further explications about data entities defined in the study MAY be stored in [ISA-XLSX](#isa-xlsx-format) format in a `isa.datamap.xlsx` file, which SHOULD exist for studies containing data. Further details on `isa.datamap.xlsx` are specified [in the isa-xlsx specification](ISA-XLSX.md#datamap-file).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata that describe the characteristics of material and resources follow the ISA study model.

For me this sentence reads strangely

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it was replaced but not cut out. Will throw it out now


- *Assays* correspond to outcomes of experimental assays or analytical measurements (in the interpretation of the ISA model) and are treated as immutable data. Each assay is a collection of files, together with a corresponding metadata file, stored in a subdirectory of the top-level subdirectory `assays`. Assay-level metadata is stored in [ISA-XLSX](#isa-xlsx-format) format in a file `isa.assay.xlsx`, which MUST exist for each assay. Further details on `isa.assay.xlsx` are specified [below](#assay-data-and-metadata). Assay data files MUST be placed in a `dataset` subdirectory.
- *Assays* correspond to outcomes of experimental assays or analytical measurements (in the interpretation of the ISA model) and are treated as immutable data. Each assay is a collection of files, together with a corresponding metadata file, stored in a subdirectory of the top-level subdirectory `assays`. Assay-level metadata is stored in [ISA-XLSX](#isa-xlsx-format) format in a `isa.assay.xlsx` file, which MUST exist for each assay. Further details on `isa.assay.xlsx` are specified [below](#assay-data-and-metadata). Assay data files MUST be placed in a `dataset` subdirectory. Further explications about data entities defined in the assay MAY be stored in [ISA-XLSX](#isa-xlsx-format) format in a `isa.datamap.xlsx` file, which SHOULD exist for each assay. Further details on `isa.datamap.xlsx` are specified [in the isa-xlsx specification](ISA-XLSX.md#datamap-file).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further explications about data entities defined in the assay MAY be stored in ISA-XLSX format in a isa.datamap.xlsx file, which SHOULD exist for each assay. Further details on isa.datamap.xlsx are specified in the isa-xlsx specification

Mixing MAY and SHOULD? maybe unify?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way it is currently written holds two questions (and imho in the wrong order).

  1. Should such metadata be stored?
  2. If so, where should it be stored?

Suggestion changing the order and using only one keyword (SHOULD):

Further explications about data entities defined in the assay SHOULD exist for each assay, in ISA-XLSX format in a isa.datamap.xlsx file. Further details ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was meant the following way:

  • In general, each assay or study SHOULD contain a datamap file
  • For every data entity you MAY decide to add additional information to this datamap file

Copy link
Member Author

@HLWeil HLWeil Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I agree it reads like there are options for how to store additional information.

\--- resources
\--- protocol [optional / add. payload]
\--- assays
\--- <assay_name>
| isa.assay.xlsx
| isa.datamap.xlsx

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional]?

@@ -142,12 +156,16 @@ The `study` file MUST follow the [ISA-XLSX study file specification](ISA-XLSX.md

Protocols that are necessary to describe the sample or material creating process can be placed under the protocols directory.

### Assay Data and Metadata
Further explications about data entities defined in the assay MAY be stored in [ISA-XLSX](#isa-xlsx-format) format in a `isa.datamap.xlsx` file, which SHOULD exist for each assay. Further details on `isa.datamap.xlsx` are specified [in the isa-xlsx specification](ISA-XLSX.md#datamap-file).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above (switch beween MAY and SHOULD)


All measurement data sets are considered as assays and are considered immutable input data. Assay data MUST be placed into a unique subdirectory of the top-level `assays` subdirectory. All ISA metadata specific to a single assay MUST be annotated in the file `isa.assay.xlsx` at the root of the assay's subdirectory. This workbook MUST contain a single assay that can be organized in one or multiple worksheets.

The `assay` file MUST follow the [ISA-XLSX assay file specification](ISA-XLSX.md#assay-file).

Further explications about data entities defined in the assay MAY be stored in [ISA-XLSX](#isa-xlsx-format) format in a `isa.datamap.xlsx` file, which SHOULD exist for each assay. Further details on `isa.datamap.xlsx` are specified [in the isa-xlsx specification](ISA-XLSX.md#datamap-file).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above (switch beween MAY and SHOULD)


`assays/Assay2/isa.assay.xlsx`:

| Input [Data] | Parameter[script file] | Output [Data] |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Component [script file]? 😅

- Data nodes in `isa.assay.xlsx` files: The path MAY be specified relative to the `dataset` sub-folder of the assay
- Data nodes in `isa.study.xlsx` files: The path MAY be specified relative to the `resources` sub-folder of the study

### Examples

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also get example for folder specific pattern?

| Comment [Answer to everything] |
|--------------------------------|
| forty-two |

## Others

Columns whose headers do not follow any of the formats described above are considered additional payload and are out of the scope of this specification.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we now officially support free text columns?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd read that as "the tool implementing the standard is free to decide what to do with free text columns"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just a heads-up that a hard fail on unknown columns is not necessary.

ISA-XLSX.md Outdated
In the `Datamap Table sheets`, column headers MUST have the first letter of each word in upper case, with the exception of the referencing label (REF).

The content of the datamap table MUST be placed in an `xlsx table` whose name starts with `datamapTable`. Each sheet MUST contain at most one such annotation table. Only cells inside this table are considered as part of the formatted metadata.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

placed in an xlsx table whose name starts with

start with or equal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wanted to keep it in line with the AnnotationTable. But I agree it's not necessary as there can only be one table. Will change to equals.

## Comments

A `Comment` can be used to provide some additional information. Columns headed with `Comment[<comment name>]` MAY appear anywhere in the Annotation Table. The comment always refers to the Annotation Table. The value MUST be free text.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we must remember Comment location in ARCtrl?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, as the comment column does not refere to a specific other column but always to the table as a whole.

@HLWeil
Copy link
Member Author

HLWeil commented Jun 10, 2024

@Freymaurer @kappe-c @kMutagene
Thanks a lot for your input!

I made some changes according to your comments. Please check again if your remarks are resolved now!

@HLWeil HLWeil merged commit 4ce4fb8 into main Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants