diff --git a/ARC specification.md b/ARC specification.md index 66342ff..8991509 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -1,4 +1,4 @@ -# Annotated Research Context Specification, v1.1-rfc +# Annotated Research Context Specification, v1.2 Please provide feedback via GitHub issues or a pull request. @@ -10,29 +10,30 @@ Licensed under the Creative Commons License CC BY, Version 4.0; you may not use ## Table of Contents -- [Introduction](#introduction) - - [Extensions](#extensions) -- [ARC Structure and Content](#arc-structure-and-content) - - [High-Level Schema](#high-level-schema) - - [Example ARC structure](#example-arc-structure) - - [ARC Representation](#arc-representation) - - [ISA-XLSX Format](#isa-xlsx-format) - - [Study and Resources](#study-and-resources) - - [Assay Data and Metadata](#assay-data-and-metadata) - - [Workflow Description](#workflow-description) - - [Run Description](#run-description) - - [Additional Payload](#additional-payload) - - [Top-level Metadata and Workflow Description](#top-level-metadata-and-workflow-description) - - [Investigation and Study Metadata](#investigation-and-study-metadata) - - [Top-Level Run Description](#top-level-run-description) -- [Shareable and Publishable ARCs](#shareable-and-publishable-arcs) - - [Reproducible ARCs](#reproducible-arcs) -- [Mechanism for Quality Control of ARCs](#mechanism-for-quality-control-of-arcs) -- [Best Practices](#best-practices) - - [Community Specific Data Formats](#community-specific-data-formats) - - [Compression and Encryption](#compression-and-encryption) - - [Directory and File Naming Conventions](#directory-and-file-naming-conventions) -- [Appendix: Conversion of ARCs to RO Crates](#appendix-conversion-of-arcs-to-ro-crates) +- [Annotated Research Context Specification, v1.2](#annotated-research-context-specification-v12) + - [Introduction](#introduction) + - [Extensions](#extensions) + - [ARC Structure and Content](#arc-structure-and-content) + - [High-Level Schema](#high-level-schema) + - [Example ARC structure](#example-arc-structure) + - [ARC Representation](#arc-representation) + - [ISA-XLSX Format](#isa-xlsx-format) + - [Study and Resources](#study-and-resources) + - [Assay Data and Metadata](#assay-data-and-metadata) + - [Workflow Description](#workflow-description) + - [Run Description](#run-description) + - [Additional Payload](#additional-payload) + - [Top-level Metadata and Workflow Description](#top-level-metadata-and-workflow-description) + - [Investigation and Study Metadata](#investigation-and-study-metadata) + - [Top-Level Run Description](#top-level-run-description) + - [Shareable and Publishable ARCs](#shareable-and-publishable-arcs) + - [Reproducible ARCs](#reproducible-arcs) + - [Mechanism for Quality Control of ARCs](#mechanism-for-quality-control-of-arcs) + - [Best Practices](#best-practices) + - [Community Specific Data Formats](#community-specific-data-formats) + - [Compression and Encryption](#compression-and-encryption) + - [Directory and File Naming Conventions](#directory-and-file-naming-conventions) + - [Appendix: Conversion of ARCs to RO Crates](#appendix-conversion-of-arcs-to-ro-crates) ## Introduction @@ -127,53 +128,25 @@ Notes: ### ISA-XLSX Format -ISA-XLSX follows the ISA model specification (v1.0) saved in a XLSX format. The XLSX format uses the SpreadsheetML markup language and schema to represent a spreadsheet document. Conceptually, using the terminology of the Spreadsheet ML specification [ISO/IEC 29500-1](https://www.loc.gov/preservation/digital/formats/fdd/fdd000398.shtml#:~:text=The%20XLSX%20format%20uses%20the,a%20rectangular%20grid%20of%20cells.), the document comprises one or more worksheets in a workbook. Every worksheet MUST contain one table object storing the metadata. Comments or auxillary information MAY be stored alongside with table objects in a worksheet. +The ISA-XLSX specification is currently part of the ARC specification. Its version therefore follows the version of the ARC specification. -### Study and Resources - -The characteristics of all material and resources used within the investigation must be specified in a study. Studies must be placed into a unique subdirectory of the top-level `studies` subdirectory. All ISA metadata specific to a single study MUST be annotated in the file `isa.study.xlsx` at the root of the study's subdirectory. This workbook MUST contain a single resources description that can be organized in one or multiple worksheets. Material or experimental samples can be stored in the form of virtual sample files (containing unique identifiers) in the resources directory. Each external data file can be interpreted as a virtual sample and stored accordingly under resources. External data refers to data that is neither originating within the investigation scope of the ARC nor can be referenced externally, but is required to ensure reproducibility. - -Protocols that are necessary to describe the sample or material creating process can be placed under the protocols directory. - -### Assay Data and Metadata +https://github.com/nfdi4plants/ARC-specfication/blob/main/ISA-XLSX.md -All measurement data sets are considered as assays and are considered immutable input data. Assay data MUST be placed into a unique subdirectory of the top-level `assays` subdirectory. All ISA metadata specific to a single assay MUST be annotated in the file `isa.assay.xlsx` at the root of the assay's subdirectory. This workbook MUST contain a single assay that can be organized in one or multiple worksheets. Worksheets MUST be named uniquely within the same workbook. A worksheet named `Assay` MUST store the STUDY ASSAYS section defined on investigation-level of the ISA model and is not required in the `isa.investigation.xlsx`. These include the terms `Study Assay Measurement Type`, `Study Assay Measurement Type Term Accession Number`, `Study Assay Measurement Type Term Source REF`, `Study Assay Technology Type`, `Study Assay Technology Type Term Accession Number`, `Study Assay Technology Type Term Source REF`, and `Study Assay Technology Platform`. -Additional worksheets MUST contain a table object with fields organized on a per-row basis. The first row of the table object MUST be used for column headers. A `Source` MUST be indicated with the column heading `Source Name`. Every table object MUST define one source per row and MUST contain at least one source. A `Sample` MUST be indicated with the column heading `Sample Name`. The source-sample-relation MUST follow a unique path in a directed acyclic graph, but MAY be distributed across different worksheets. - - +### Study and Resources - - - - - - -
+The characteristics of all material and resources used within the investigation must be specified in a study. Studies must be placed into a unique subdirectory of the top-level `studies` subdirectory. All ISA metadata specific to a single study MUST be annotated in the file `isa.study.xlsx` at the root of the study's subdirectory. This workbook MUST contain a single resources description that can be organized in one or multiple worksheets. -| | | -|-|-| -| Study Assay Measurement Type | "value" | -| Study Assay Measurement Type Term Accession Number | "value" | -| Study Assay Measurement Type Term Source REF | "value" | -| ... | ... | +The `study` file MUST follow the [ISA-XLSX study file specification](ISA-XLSX.md#study-file). - + Material or experimental samples can be stored in the form of virtual sample files (containing unique identifiers) in the resources directory. Each external data file can be interpreted as a virtual sample and stored accordingly under resources. External data refers to data that is neither originating within the investigation scope of the ARC nor can be referenced externally, but is required to ensure reproducibility. -| Source Name | building block* | Sample Name | -|-|-|-| -mrv1 | descriptorA | s1 | -mrv2 | descriptorB | s2 | -_ +Protocols that are necessary to describe the sample or material creating process can be placed under the protocols directory. - +### Assay Data and Metadata -| Source Name | building block* | Sample Name | -|-|-|-| -s1 | descriptorC | n1 | -s1 | descriptorD | n2 | -s2 | descriptorD | n3 | +All measurement data sets are considered as assays and are considered immutable input data. Assay data MUST be placed into a unique subdirectory of the top-level `assays` subdirectory. All ISA metadata specific to a single assay MUST be annotated in the file `isa.assay.xlsx` at the root of the assay's subdirectory. This workbook MUST contain a single assay that can be organized in one or multiple worksheets. -
[assay][worksheet1][worksheet2]
+The `assay` file MUST follow the [ISA-XLSX assay file specification](ISA-XLSX.md#assay-file). Notes: @@ -181,7 +154,7 @@ Notes: - To ensure reusability of assays, it is strongly RECOMMENDED to include necessary metadata mandated by typical metadata schemes necessary for reproduction. This process is facilitated by the use of templates that can be found [here](https://github.com/nfdi4plants/SWATE_templates). -- It is RECOMMENDED to order worksheets according to the source-sample-relation for readability. +- It is RECOMMENDED to order worksheets according to the input-output-relation for readability. - It is RECOMMENDED to adopt the structure outlined [below](#best-practices) to organize assay data files and other supporting information. @@ -240,7 +213,11 @@ Note: ### Top-level Metadata and Workflow Description -*Top-level metadata and workflow description* tie together the elements of an ARC in the contexts of investigation and associated studies (in the ISA definition), captured in the files `isa.investigation.xlsx` in [ISA-XLSX format](#isa-xlsx-format), which MUST be present. Furthermore, top-level reproducibility information MUST be provided in the CWL `arc.cwl`, which also MUST exist. +*Top-level metadata and workflow description* tie together the elements of an ARC in the contexts of an investigation captured in the `isa.investigation.xlsx` file, which MUST be present. + +The `investigation` file MUST follow the [ISA-XLSX investigation file specification](ISA-XLSX.md#investigation-file). + +Furthermore, top-level reproducibility information MUST be provided in the CWL `arc.cwl`, which also MUST exist. #### Investigation and Study Metadata diff --git a/ISA-XLSX.md b/ISA-XLSX.md new file mode 100644 index 0000000..0f3afde --- /dev/null +++ b/ISA-XLSX.md @@ -0,0 +1,738 @@ +# ISA-XLSX format + +For detail on ISA framework terminology, please read the [ISA Abstract Model specification](https://isa-specs.readthedocs.io/en/latest/isamodel.html). + +This document describes the ISA Abstract Model reference implementation specified in the ISA-XLSX format. The XLSX format uses the SpreadsheetML markup language and schema to represent a spreadsheet document. Conceptually, using the terminology of the Spreadsheet ML specification [ISO/IEC 29500-1](https://www.loc.gov/preservation/digital/formats/fdd/fdd000398.shtml#:~:text=The%20XLSX%20format%20uses%20the,a%20rectangular%20grid%20of%20cells.), the document comprises one or more worksheets in a workbook. + +**Table of contents** + +- [Investigation File](#investigation-file) +- [Study File](#study-file) +- [Assay File](#assay-file) +- [Top-level metadata sheets](#top-level-metadata-sheets) + - [Ontology Source Reference section](#ontology-source-reference-section) + - [INVESTIGATION section](#investigation-section) + - [STUDY section](#study-section) + - [ASSAY section](#assay-section) +- [Annotation Table sheets](#annotation-table-sheets) + - [Inputs and Outputs](#inputs-and-outputs) + - [Protocol Columns](#protocol-columns) + - [Ontology Annotations](#ontology-annotations) + - [Unit](#unit) + - [Characteristics](#characteristics) + - [Factors](#factors) + - [Components](#components) + - [Parameters](#parameters) + - [Examples](#examples) + +Below we provide the schemas and the content rules for valid ISA-XLSX documents. + +ISA-XLSX uses three types of files to capture the experimental metadata: + - Investigation file + - Study file + - Assay file + +The Investigation file contains all the information needed to understand the overall goals and means used in an experiment; experimental steps (or sequences of events) are described in the Study and in the Assay file(s). For each Investigation file there may be one or more Studies defined with a corresponding Study file; for each Study there may be one or more Assays defined with corresponding Assay files; one assay file may be registered in different studies. + +In order to facilitate identification of ISA-XLSX component files, specific naming patterns MUST be followed: + +- `isa.investigation.xlsx` for identifying the [Investigation file](#investigation-file) +- `isa.study.xlsx` for identifying [Study file(s)](#study-file) +- `isa.assay.xlsx` for identifying [Assay file(s)](#assay-file) + +Sheets described in this specification MUST follow one of the two given formats: + +- [`Top-level metadata sheets`](#top-level-metadata-sheets) for listing top-level metadata +- [`Annotation Table sheets`](#annotation-table-sheets) for describing experimental workflows + +Sheets which do not follow any of these two formats are considered additional payload and are ignored in this specification. + +All labels are case-sensitive: + +Dates SHOULD be supplied in the [ISO8601](http://www.iso.org/iso/home/standards/iso8601.htm) format. + +For maximal portability file names SHOULD contain only ASCII characters not excluded +already (that is `A-Za-z0-9._!#$%&+,;=@^(){}'[]` - we exclude space as many utilities +do not accept spaces in file paths): non-English alphabetic characters cannot be guaranteed +to be supported in all locales. It is recommended to avoid the shell metacharacters +`(){}'[]$."`. + +# Investigation File + +The `Investigation file` fulfils four needs: + +1. to declare key entities, such as factors, protocols, which may be referenced in the other files +2. to track provenance of the used terminologies (controlled vocabularies or ontologies), where applicable +3. to relate Assay files to Studies +4. to select those Studies, that are considered part of the investigation. + +The `Investigation File` MUST contain one [`Top-Level Metadata sheet`](#top-level-metadata-sheets). This sheet MUST be named `isa_investigation` and MUST contain the following sections: + +- [`ONTOLOGY SOURCE REFERENCE`](#ontology-source-reference) +- [`INVESTIGATION`](#investigation) +- [`INVESTIGATION PUBLICATIONS`](#investigation-publications) +- [`INVESTIGATION CONTACTS`](#investigation-contacts) +- [`STUDY`](#study-section) +- [`STUDY DESIGN DESCRIPTORS`](#study-design-descriptors) +- [`STUDY PUBLICATIONS`](#study-publications) +- [`STUDY FACTORS`](#study-factors) +- [`STUDY ASSAYS`](#study-assays) +- [`STUDY PROTOCOLS`](#study-protocols) +- [`STUDY CONTACTS`](#study-contacts) + +The `Investigation File` implements the [`Investigation`](https://isa-specs.readthedocs.io/en/latest/isamodel.html#investigation) graph from the ISA Abstract Model. + +# Study File + +The `Study` represents a set of logically connected experiments. A `Study File` contains contextualising information for one or more `Assays`, metadata about the study design, study factors used, and study protocols, as well as information similarly to the Investigation including title and description of the study, and related people and scholarly publications, but also details the sample collection process needed to perform the connected `Assays`. + +The `Study File` MUST contain one [`Top-Level Metadata sheet`](#top-level-metadata-sheets). This sheet MUST be named `isa_study` and MUST contain the following sections: + +- [`STUDY`](#study-section) +- [`STUDY DESIGN DESCRIPTORS`](#study-design-descriptors) +- [`STUDY PUBLICATIONS`](#study-publications) +- [`STUDY FACTORS`](#study-factors) +- [`STUDY ASSAYS`](#study-assays) +- [`STUDY PROTOCOLS`](#study-protocols) +- [`STUDY CONTACTS`](#study-contacts) + +Additionally, the `Study File` SHOULD contain one or more [`Annotation Table sheet(s)`](#annotation-table-sheets), which MAY record provenance of biological samples, from source material through a collection process to sample material. + +Therefore, the main entities of the `Study File` should be `Sources` and `Samples`. + +The `Study File` implements the [`Study`](https://isa-specs.readthedocs.io/en/latest/isamodel.html#study) graph from the ISA Abstract Model. graph from the ISA Abstract Model. + +# Assay File + +The `Assay` represents one experimental measurement. An `Assay File` metadata about the assay design, information about the people performing the experiment, and most importantly, details about the preparation and/or execution of the experimental measurement. + +The `Assay File` MUST contain one [`Top-Level Metadata sheet`](#top-level-metadata-sheets). This sheet MUST be named `isa_assay` and MUST contain the following sections: + +- [`ASSAY`](#assay-section) +- [`ASSAY PERFORMERS`](#assay-performers) + +Additionally, the `Assay File` SHOULD contain one or more [`Annotation Table sheet(s)`](#annotation-table-sheets), which MAY record preparation of biological samples, measurement of these samples and basic computations performed on the resulting data. + +Therefore, the main entities of the `Assay File` should be `Samples` and `Data`. + +The `Assay File` implements the [`Assay`](https://isa-specs.readthedocs.io/en/latest/isamodel.html#assay) graph from the ISA Abstract Model. + +# Top-level metadata sheets + +The purpose of top-level metadata sheets is aggregating and listing top-level metadata. Each sheet consists of sections consisting of a section header and key-value fields. Section headers MUST be completely written in upper case (e.g. STUDY), field headers MUST have the first letter of each word in upper case (e.g. Study Identifier); with the exception of the referencing label (REF). + +In the following sections, examples of each section block are given beside the specification of each section. + +> ### ATTENTION +> Rows in which the first character in the first column is Unicode +> [U+0023](http://www.fileformat.info/info/unicode/char/0023/index.htm) (the `#` character) > MUST be interpreted as +> comments, where reference implementation parsers SHOULD ignore those lines entirely. + +> Rows where the label `Comment[]` appear can also appear within any of the > section blocks. Where these appear, the comment name must be unique within the context of a single block (e.g. you cannot have multiple occurrences of `Comment[external DB REF]` within `STUDY ASSAYS`. Also, the value cells MUST match the number of values indicated by the rest of the section in context. + +## Ontology Source Reference section + +The Ontology Source section of the Investigation file is used to declare Ontology Sources used elsewhere in the ISA-XLSX +files within the context of an Investigation. + +Where a row labelled with `Term Source REF` suffixed in a [`Top-level metadata sheet`](#top-level-metadata-sheets), the value of the cell SHOULD match one of the `Term Source Name` value declared in this section. + +Where a column labelled with `Term Source REF` in a [`Annotation table sheet`](#annotation-table-sheets), the value +of the cell SHOULD match one of the `Term Source Name` value declared in this section. + +This section implements a list of `Ontology Source` from the ISA Abstract Model. + +This section MUST contain zero or more values. + +### ONTOLOGY SOURCE REFERENCE + +This section MUST contain the following labels, with the specified datatypes for values supported: + +| Label | Datatype | Description | +|-------------------------|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Term Source Name | String | The name of the source of a term; i.e. the source controlled vocabulary or ontology. These names will be used in all corresponding Term Source REF fields that occur elsewhere. | +| Term Source File | String (file name or URI) | A file name or a URI of an official resource. | +| Term Source Version | String | The version number of the Term Source to support terms tracking. | +| Term Source Description | String | Use for disambiguating resources when homologous prefixes have been used. | + +**Example** + +For example, the `ONTOLOGY SOURCE REFERENCE` section of an ISA-XLSX `isa.investigation.xlsx` file may look as follows: + +| | | | | | | +|--------------------|-------|-------|-------|-------------|------| +| ONTOLOGY SOURCE REFERENCE | +| Term Source Name | CHEBI | EFO | OBI | NCBITAXON | PATO | +| Term Source File | [http://data.bioontology.org/ontologies/CHEBI](http://data.bioontology.org/ontologies/CHEBI) | [http://data.bioontology.org/ontologies/EFO](http://data.bioontology.org/ontologies/EFO) | [http://data.bioontology.org/ontologies/OBI](http://data.bioontology.org/ontologies/OBI) | [http://data.bioontology.org/ontologies/NCBITAXON](http://data.bioontology.org/ontologies/NCBITAXON) | [http://data.bioontology.org/ontologies/PATO](http://data.bioontology.org/ontologies/PATO) | +| Term Source Version | 78 | 111 | 21 | 2 | 160 | +| Term Source Description | Chemical Entities of Biological Interest Ontology | Experimental Factor Ontology | Ontology for Biomedical Investigations | National Center for Biotechnology Information (NCBI) Organismal Classification | Phenotypic Quality Ontology | + + +## INVESTIGATION section + +This section is organized in several subsections, described in detail below. + +This section implements an `Investigation` from the ISA Abstract Model. + +### INVESTIGATION + +This section MUST contain zero or one values. + +This section MUST contain the following labels, with the specified datatypes for values supported: + +| Label | Datatype | Description | +|-----------------------------------|---------------------------------------------|----------------------------------------------------------------------------------------------| +| Investigation Identifier | String | A identifier or an accession number provided by a repository. This SHOULD be locally unique. | +| Investigation Title | String | A concise name given to the investigation. | +| Investigation Description | String | A textual description of the investigation. | +| Investigation Submission Date | String formatted as ISO8601 date YYYY-MM-DD | The date on which the investigation was reported to the repository. | +| Investigation Public Release Date | String formatted as ISO8601 date YYYY-MM-DD | The date on which the investigation was released publicly. | + +**Example** + +For example, the `INVESTIGATION` section of an ISA-XLSX `isa.investigation.xlsx` file may look as follows: + + + +| | | +|------------------------------|-------------------------| +| INVESTIGATION | +| Investigation Identifier | ChlamyHeatstress | +| Investigation Title | Systems-wide investigation of responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii. | +| Investigation Description | Algae cultures were grown mixotrophically (TAP). After 24h of 35°C/40°C the cells were shifted back to room temperature for 48h. 'omics samples were taken. | +| Investigation Submission Date | 2022-05-13 | +| Investigation Public Release Date | | + + +### INVESTIGATION PUBLICATIONS + +This section MUST contain zero or more values. + +This section MUST contain the following labels, with the specified datatypes for values supported: + +| Label | Datatype | Description | +|--------------------------------------------------------|----------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Investigation Publication PubMed ID | String formatted as valid PubMed ID | The PubMed IDs of the described publication(s) associated with this investigation. | +| Investigation Publication DOI | String formatted as valid DOI | A Digital Object Identifier (DOI) for that publication (where available). | +| Investigation Publication Author List | String | The list of authors associated with that publication. | +| Investigation Publication Title | String | The title of publication associated with the investigation. | +| Investigation Publication Status | String, or Ontology Annotation by providing accompanying Term Accession Number and Term Source REF | A term describing the status of that publication (i.e. submitted, in preparation, published). | +| Investigation Publication Status Term Accession Number | String or URI | The accession number from the Term Source associated with the selected term. | +| Investigation Publication Status Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one the Term Source Name declared in the in the Ontology Source Reference section. | + +**Example** + +For example, the `INVESTIGATION PUBLICATIONS` section of an ISA-XLSX `isa.investigation.xlsx` file may look as follows: + + +| | | +|----------------------------------------|------------------| +| INVESTIGATION PUBLICATIONS | +| Investigation Publication PubMed ID | PMC9106746 | +| Investigation Publication DOI | 10.1038/s42003-022-03359-z | +| Investigation Publication Author List | Ningning Zhang, Erin M. Mattoon, Will McHargue, Benedikt Venn, David Zimmer, Kresti Pecani, Jooyeon Jeong, Cheyenne M. Anderson, Chen Chen, Jeffrey C. Berry, Ming Xia, Shin-Cheng Tzeng, Eric Becker, Leila Pazouki, Bradley Evans, Fred Cross, Jianlin Cheng, Kirk J. Czymmek, Michael Schroda, Timo Mühlhaus & Ru Zhang | +| Investigation Publication Title | Systems-wide analysis revealed shared and unique responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii | +| Investigation Publication Status | published | +| Investigation Publication Status Term Accession Number | http://purl.org/spar/pso/published | +| Investigation Publication Status Term Source REF | PSO | + +### INVESTIGATION CONTACTS + +This section MUST contain zero or more values. + +This section MUST contain the following labels, with the specified datatypes for values supported: + +| Label | Datatype | Description | +|--------------------------------------------------|---------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Investigation Person Last Name | String | The last name of a person associated with the investigation. | +| Investigation Person First Name | String | Investigation Person Name | +| Investigation Person Mid Initials | String | The middle initials of a person associated with the investigation. | +| Investigation Person Email | String formatted as email | The email address of a person associated with the investigation. | +| Investigation Person Phone | String | The telephone number of a person associated with the investigation. | +| Investigation Person Fax | String | The fax number of a person associated with the investigation. | +| Investigation Person Address | String | The address of a person associated with the investigation. | +| Investigation Person Affiliation | String | The organization affiliation for a person associated with the investigation. | +| Investigation Person Roles | String or Ontology Annotation if accompanied by Term Accession Numbers and Term Source REFs | Term to classify the role(s) performed by this person in the context of the investigation, which means that the roles reported here need not correspond to roles held withing their affiliated organization. Multiple annotations or values attached to one person can be provided by using a semicolon (“;”) Unicode (U0003+B) as a separator (e.g.: submitter;funder;sponsor) .The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. | +| Investigation Person Roles Term Accession Number | String | The accession number from the Term Source associated with the selected term. | +| Investigation Person Roles Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section. | + +**Example** + +For example, the `INVESTIGATION CONTACTS` section of an ISA-XLSX `isa.investigation.xlsx` file may look as follows: + +| | | | | +|--------------------------------|----------|----------|-------| +| INVESTIGATION CONTACTS | +| Investigation Person Last Name | Venn | Zimmer | Mühlhaus | +| Investigation Person First Name | Benedikt | David | Timo | +| Investigation Person Mid Initials | | | | +| Investigation Person Email | venn@rptu.de | d_zimmer@rptu.de | timo.muehlhaus@rptu.de | +| Investigation Person Phone | | | | +| Investigation Person Fax | | | | +| Investigation Person Address | TU Kaiserslautern, Kaiserslautern, 67663, Germany | TU Kaiserslautern, Kaiserslautern, 67663, Germany | TU Kaiserslautern, Kaiserslautern, 67663, Germany | +| Investigation Person Affiliation | Computational Systems Biology | Computational Systems Biology | Computational Systems Biology | +| Investigation Person Roles | author | author | corresponding author | +| Investigation Person Roles Term Accession Number | | | | +| Investigation Person Roles Term Source REF | | | | + +## STUDY section + +This section is organized in several subsections, described in detail below. This section also represents a +**repeatable block**, which is replicated according to the number of Studies to report (i.e. two Studies, two Study +blocks are represented in the Investigation file). The subsections in the block are arranged vertically; the intent +being to enhance readability and presentation, and possibly to help with parsing. These subsections MUST remain within +this repeatable block, although their order MAY vary; the fields MUST remain within their subsection. + +These sections implement the metadata for a `Study` from the ISA Abstract Model and a list of `Assay` (i.e. `Study` and +`Assay` **without** graphs; graphs are implemented in ISA-XLSX as `Annotation Table sheets`). + +### STUDY + +This section MUST contain zero or one values. + +This section MUST contain the following labels, with the specified datatypes for values supported: + +| Label | Datatype | Description | +|---------------------------|--------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Study Identifier | String | A unique identifier, either a temporary identifier supplied by users or one generated by a repository or other database. For example, it could be an identifier complying with the LSID specification. | +| Study Title | String | A concise phrase used to encapsulate the purpose and goal of the study. | +| Study Description | String | A textual description of the study, with components such as objective or goals. | +| Study Submission Date | String formatted as ISO8601 date | The date on which the study is submitted to an archive. | +| Study Public Release Date | String formatted as ISO8601 date | The date on which the study SHOULD be released publicly. | +| Study File Name | String formatted as file name or URI | A field to specify the name of the Study Table file corresponding the definition of that Study. There can be only one file per cell. | + +**Example** + +For example, the `STUDY` section of an ISA-XLSX `isa.investigation.xlsx` file may look as follows: + +| | | +|------------------------|----------| +| STUDY | +| Study Identifier | HeatstressExperiment | +| Study Title | Systems-wide investigation of responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii. | +| Study Description | Algae cultures were grown mixotrophically (TAP). After 24h of 35°C/40°C the cells were shifted back to room temperature for 48h. 'omics samples were taken. | +| Study Submission Date | 2022-05-13 | +| Study Public Release Date | | +| Study File Name | studies/HeatstressExperiment/isa.study.xlsx | + + +### STUDY DESIGN DESCRIPTORS + +This section MUST contain zero or more values. + +This section MUST contain the following labels, with the specified datatypes for values supported: + +| Label | Datatype | Description | +|-----------------------------------------|------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Study Design Type | String | A term allowing the classification of the study based on the overall experimental design, e.g cross-over design or parallel group design. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. | +| Study Design Type Term Accession Number | String | The accession number from the Term Source associated with the selected term. | +| Study Design Type Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Study Design Term Source REF has to match one the Term Source Name declared in the Ontology Source Reference section. | + +**Example** + +For example, the `STUDY DESIGN DESCRIPTORS` section of an ISA-XLSX `isa.investigation.xlsx` file may look as follows: + +| | | | +|--------------------------------|-------------------|-| +| STUDY DESIGN DESCRIPTORS | +| Study Design Type | time series design | heat exposure | +| Study Design Type Term Accession Number | http://purl.obolibrary.org/obo/OBI_0500020 | http://purl.obolibrary.org/obo/XCO_0000308 | +| Study Design Type Term Source REF | OBI | | + + +### STUDY PUBLICATIONS + +This section MUST contain zero or more values. + +This section MUST contain the following labels, with the specified datatypes for values supported: + +| Label | Datatype | Description | +|------------------------------------------------|----------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Study PubMed ID | String formatted as valid PubMed ID | The PubMed IDs of the described publication(s) associated with this study. | +| Study Publication DOI | String formatted as valid DOI | A Digital Object Identifier (DOI) for that publication (where available). | +| Study Publication Author List | String | The list of authors associated with that publication. | +| Study Publication Title | String | The title of publication associated with the investigation. | +| Study Publication Status | String, or Ontology Annotation by providing accompanying Term Accession Number and Term Source REF | A term describing the status of that publication (i.e. submitted, in preparation, published). | +| Study Publication Status Term Accession Number | String or URI | The accession number from the Term Source associated with the selected term. | +| Study Publication Status Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one the Term Source Name declared in the in the Ontology Source Reference section. | + +**Example** + +For example, the `STUDY PUBLICATIONS` section of an ISA-XLSX `isa.investigation.xlsx` file may look as follows: + +| | | +|----------------------------------------|------------------| +| STUDY PUBLICATIONS | +| Study Publication PubMed ID | PMC9106746 | +| Study Publication DOI | 10.1038/s42003-022-03359-z | +| Study Publication Author List | Ningning Zhang, Erin M. Mattoon, Will McHargue, Benedikt Venn, David Zimmer, Kresti Pecani, Jooyeon Jeong, Cheyenne M. Anderson, Chen Chen, Jeffrey C. Berry, Ming Xia, Shin-Cheng Tzeng, Eric Becker, Leila Pazouki, Bradley Evans, Fred Cross, Jianlin Cheng, Kirk J. Czymmek, Michael Schroda, Timo Mühlhaus & Ru Zhang | +| Study Publication Title | Systems-wide analysis revealed shared and unique responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii | +| Study Publication Status | published | +| Study Publication Status Term Accession Number | http://purl.org/spar/pso/published | +| Study Publication Status Term Source REF | PSO | + + +### STUDY FACTORS + +This section MUST contain zero or more values. + +This section MUST contain the following labels, with the specified datatypes for values supported: + +| Label | Datatype | Description | +|-----------------------------------------|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Study Factor Name | String | The name of one factor used in the Study and/or Assay files. A factor corresponds to an independent variable manipulated by the experimentalist with the intention to affect biological systems in a way that can be measured by an assay. The value of a factor is given in the Study or Assay file, accordingly. If both Study and Assay have a Factor Value, these must be different. | +| Study Factor Type | String | A term allowing the classification of this factor into categories. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. | +| Study Factor Type Term Accession Number | String | The accession number from the Term Source associated with the selected term. | +| Study Factor Type Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section. | + +**Example** + +For example, the `STUDY FACTORS` section of an ISA-XLSX `isa.investigation.xlsx` file may look as follows: + +| | | | +|-----------------------------|--------------------|-------------------| +| STUDY FACTORS | +| Study Factor Name | temperature | collection time | +| Study Factor Type | temperature | time | +| Study Factor Type Term Accession Number | http://purl.obolibrary.org/obo/PATO_0000146 | http://purl.obolibrary.org/obo/PATO_0000165 | +| Study Factor Type Term Source REF | PATO | PATO | + + +### STUDY ASSAYS + +This section MUST contain zero or more values. + +This section MUST contain the following labels, with the specified datatypes for values supported: + +| Label | Datatype | Description | +|----------------------------------------------------|------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Study Assay Measurement Type | String | A term to qualify the endpoint, or what is being measured (e.g. gene expression profiling or protein identification). The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. | +| Study Assay Measurement Type Term Accession Number | String | The accession number from the Term Source associated with the selected term. | +| Study Assay Measurement Type Term Source REF | String | The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section. | +| Study Assay Technology Type | String | Term to identify the technology used to perform the measurement, e.g. DNA microarray, mass spectrometry. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. | +| Study Assay Technology Type Term Accession Number | String | The accession number from the Term Source associated with the selected term. | +| Study Assay Technology Type Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section. | +| Study Assay Technology Platform | String | Manufacturer and platform name, e.g. Bruker AVANCE | +| Study Assay File Name | String | A field to specify the name of the Assay Table file corresponding the definition of that assay. There can be only one file per cell. | + +**Example** + +For example, the `STUDY ASSAYS` section of an ISA-XLSX `isa.investigation.xlsx` file may look as follows: + +| | | | +|---------------------|--------------------------------------|------------------------------| +| STUDY ASSAYS | +| Study Assay File Name | assays/Proteomics/isa.assay.xlsx | assays/Transcriptomics/isa.assay.xlsx | +| Study Assay Measurement Type | Proteomics | transcription profiling | +| Study Assay Measurement Type Term Accession Number | http://purl.obolibrary.org/obo/NCIT_C20085 | http://purl.obolibrary.org/obo/OBI_0000424 | +| Study Assay Measurement Type Term Source REF | NCIT | OBI | +| Study Assay Technology Type | Mass Spectrometry | nucleotide sequencing | +| Study Assay Technology Type Term Accession Number | http://purl.obolibrary.org/obo/NCIT_C17156 | http://purl.obolibrary.org/obo/OBI_0000626 | +| Study Assay Technology Type Term Source REF | NCIT | OBI | +| Study Assay Technology Platform | Orbitrap Fusion Lumos | Illumina HiSeq 2000 Rapid Run | + + +### STUDY PROTOCOLS + +This section MUST contain zero or more values. + +This section MUST contain the following labels, with the specified datatypes for values supported: + +| Label | Datatype | Description | +|------------------------------------------------------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Study Protocol Name | String | The name of the protocols used within the ISA-XLSX document. The names are used as identifiers within the ISA-XLSX document and will be referenced in the Study and Assay files in the Protocol REF columns. Names can be either local identifiers, unique within the ISA Archive which contains them, or fully qualified external accession numbers. | +| Study Protocol Type | String | Term to classify the protocol. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. | +| Study Protocol Type Term Accession Number | String | The accession number from the Term Source associated with the selected term. | +| Study Protocol Type Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section. | +| Study Protocol Description | String | A free-text description of the protocol. | +| Study Protocol URI | String | Pointer to protocol resources external to the ISA-Tab that can be accessed by their Uniform Resource Identifier (URI). | +| Study Protocol Version | String | An identifier for the version to ensure protocol tracking. | +| Study Protocol Parameters Name | String | A semicolon-delimited (“;”) list of parameter names, used as an identifier within the ISA-XLSX document. These names are used in the Study and Assay files (in the “Parameter Value []” column heading) to list the values used for each protocol parameter. Refer to section Multiple values fields in the Investigation File on how to encode multiple values in one field and match term sources | +| Study Protocol Parameters Term Accession Number | String | The accession number from the Term Source associated with the selected term. | +| Study Protocol Parameters Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section. | +| Study Protocol Components Name | String | A semicolon-delimited (“;”) list of a protocol’s components; e.g. instrument names, software names, and reagents names. Refer to section Multiple values fields in the Investigation File on how to encode multiple components in one field and match term sources. | +| Study Protocol Components Type | String | Term to classify the protocol components listed for example, instrument, software, detector or reagent. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. | +| Study Protocol Components Type Term Accession Number | String | The accession number from the Source associated to the selected terms. | +| Study Protocol Components Type Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match a Term Source Name previously declared in the ontology section | + +**Example** + +For example, the `STUDY PROTOCOLS` section of an ISA-XLSX `isa.investigation.xlsx` file may look as follows: + +| | | | | +|--|-|--|---| +| STUDY PROTOCOLS | +| Study Protocol Name | Harvesting | Protein extraction | Measurement | +| Study Protocol Type | Biospecimen Collection | nucleic acid extraction | nucleic acid extraction | +| Study Protocol Type Term Accession Measurement Number | http://purl.obolibrary.org/obo/NCIT_C70945 | | | +| Study Protocol Type Term Source REF | NCIT | | | | +| Study Protocol Description | Extraction and storage of algae cells from photo-bio reactor. Extracted and centrifuged cell pellets were frozen in liquid nitrogen. | Proteins were extracted from cells using a combination of chemical (lysis buffer) and physical (sonicator) methods. Digested peptides were purified and resuspended in LC loading buffer. | Peptides were separated by a nanoHPLC (C18 column) and detected using an Orbitrap mass spectrometry device. | +| Study Protocol URI | | | | +| Study Protocol Version | | +| Study Protocol Parameters Name | Centrifugation Time;sample volume setting | frequency; duration | duration;flow rate +| Study Protocol Parameters Name Term Accession Number | http://purl.obolibrary.org/obo/NCIT_C178881;http://purl.allotrope.org/ontologies/result#AFR_0002492 | http://purl.obolibrary.org/obo/PATO_0000044;http://purl.obolibrary.org/obo/PATO_0001309 | http://purl.obolibrary.org/obo/PATO_0001309;http://purl.obolibrary.org/obo/PATO_0001574 | +| Study Protocol Parameters Name Term Source REF | NCIT;AFO | PATO;PATO | PATO;PATO | +| Study Protocol Components Name | liquid nitrogen | Sonicator; Extraction Kit | HPLC; Column; MS +| Study Protocol Components Type | Liquid Nitrogen | VWR Aquasonic 250D; IST sample preparation kit (PreOmics GmbH, Germany) | U3000 RSLCnano HPLC; C18 column (Fritted Glass Column, 25 cm × 75 μm); Orbitrap Fusion Lumos +| Study Protocol Components Type Term Accession Number | http://purl.obolibrary.org/obo/NCIT_C68796 | | ;;http://purl.obolibrary.org/obo/MS_1002732 +| Study Protocol Components Type Term Source REF | NCIT | | ;;MS + + +### STUDY CONTACTS + +This section MUST contain zero or more values. + +This section MUST contain the following labels, with the specified datatypes for values supported: + +| Label | Datatype | Description | +|------------------------------------------|---------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Study Person Last Name | String | The last name of a person associated with the study. | +| Study Person First Name | String | Study Person Name | +| Study Person Mid Initials | String | The middle initials of a person associated with the study. +| +| Study Person Email | String formatted as email | The email address of a person associated with the study. | +| Study Person Phone | String | The telephone number of a person associated with the study. | +| IStudy Person Fax | String | The fax number of a person associated with the study. | +| Study Person Address | String | The address of a person associated with the study. | +| Study Person Affiliation | String | The organization affiliation for a person associated with the study. | +| Study Person Roles | String or Ontology Annotation if accompanied by Term Accession Numbers and Term Source REFs | Term to classify the role(s) performed by this person in the context of the study, which means that the roles reported here need not correspond to roles held withing their affiliated organization. Multiple annotations or values attached to one person can be provided by using a semicolon (“;”) Unicode (U0003+B) as a separator (e.g.: submitter;funder;sponsor) .The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. | +| Study Person Roles Term Accession Number | String | The accession number from the Term Source associated with the selected term. | +| Study Person Roles Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section. | + +**Example** + +For example, the `STUDY CONTACTS` section of an ISA-XLSX `isa.investigation.xlsx` file may look as follows: + +| | | | | +|--------------------------------|----------|----------|-------| +| STUDY CONTACTS | +| Study Person Last Name | Venn | Zimmer | Mühlhaus | +| Study Person First Name | Benedikt | David | Timo | +| Study Person Mid Initials | | | | +| Study Person Email | venn@bio.rptu.de | d_zimmer@rptu.de | timo.muehlhaus@rptu.de | +| Study Person Phone | | | | +| Study Person Fax | | | | +| Study Person Address | TU Kaiserslautern, Kaiserslautern, 67663, Germany | TU Kaiserslautern, Kaiserslautern, 67663, Germany | TU Kaiserslautern, Kaiserslautern, 67663, Germany | +| Study Person Affiliation | Computational Systems Biology | Computational Systems Biology | Computational Systems Biology | +| Study Person Roles | author | author | corresponding author | +| Study Person Roles Term Accession Number | | | | +| Study Person Roles Term Source REF | | | | + + +## ASSAY section + +This section is organized in several subsections, described in detail below. The subsections in the block are arranged vertically; the intent being to enhance readability and presentation, and possibly to help with parsing. These subsections MUST remain within +this block; the fields MUST remain within their subsection. + +These sections implement the metadata for an `Assay` from the ISA Abstract Model. + +### ASSAY + +This section MUST contain zero or one values. + +This section MUST contain the following labels, with the specified datatypes for values supported: + + +| Label | Datatype | Description | +|----------------------------------------------------|------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Assay Measurement Type | String | A term to qualify the endpoint, or what is being measured (e.g. gene expression profiling or protein identification). The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. | +| Study Assay Measurement Type Term Accession Number | String | The accession number from the Term Source associated with the selected term. | +| Assay Measurement Type Term Source REF | String | The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section. | +| Assay Technology Type | String | Term to identify the technology used to perform the measurement, e.g. DNA microarray, mass spectrometry. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. | +| Assay Technology Type Term Accession Number | String | The accession number from the Term Source associated with the selected term. | +| Assay Technology Type Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section. | +| Assay Technology Platform | String | Manufacturer and platform name, e.g. Bruker AVANCE | +| Assay File Name | String | A field to specify the name of the Assay Table file corresponding the definition of that assay. There can be only one file per cell. | + +**Example** + +For example, the `ASSAY` section of an ISA-XLSX `isa.assay.xlsx` file may look as follows: + +| | | +|---------------------|--------------------------------------| +| ASSAY | +| Assay File Name | assays/Proteomics/isa.assay.xlsx | +| Assay Measurement Type | Proteomics | transcription profiling | +| Assay Measurement Type Term Accession Number | http://purl.obolibrary.org/obo/NCIT_C20085 | +| Assay Measurement Type Term Source REF | NCIT | +| Assay Technology Type | Mass Spectrometry | +| Assay Technology Type Term Accession Number | http://purl.obolibrary.org/obo/NCIT_C17156 | +| Assay Technology Type Term Source REF | NCIT | +| Assay Technology Platform | Orbitrap Fusion Lumos | + + +### ASSAY PERFORMERS + +This section MUST contain zero or more values. + +This section MUST contain the following labels, with the specified datatypes for values supported: + +| Label | Datatype | Description | +|------------------------------------------|---------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Assay Person Last Name | String | The last name of a person associated with the Assay. | +| Assay Person First Name | String | Assay Person Name | +| Assay Person Mid Initials | String | The middle initials of a person associated with the Assay.| +| Assay Person Email | String formatted as email | The email address of a person associated with the Assay. | +| Assay Person Phone | String | The telephone number of a person associated with the Assay. | +| Assay Person Fax | String | The fax number of a person associated with the assay. | +| Assay Person Address | String | The address of a person associated with the assay. | +| Assay Person Affiliation | String | The organization affiliation for a person associated with the assay. | +| Assay Person Roles | String or Ontology Annotation if accompanied by Term Accession Numbers and Term Source REFs | Term to classify the role(s) performed by this person in the context of the assay, which means that the roles reported here need not correspond to roles held withing their affiliated organization. Multiple annotations or values attached to one person can be provided by using a semicolon (“;”) Unicode (U0003+B) as a separator (e.g.: submitter;funder;sponsor) .The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. | +| Assay Person Roles Term Accession Number | String | The accession number from the Term Source associated with the selected term. | +| Assay Person Roles Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section. | + +**Example** + +For example, the `ASSAY PERFORMERS` section of an ISA-XLSX `isa.assay.xlsx` file may look as follows: + +| | | | | +|----------------------------|---------|-------|-| +| ASSAY PERFORMERS | +| Assay Person Last Name | Zhang | Tzeng | Evans | +| Assay Person First Name | Ningning | Shin-Cheng | Bradley | +| Assay Person Mid Initials | | | +| Assay Person Email | | | +| Assay Person Phone | | | +| Assay Person Fax | | | +| Assay Person Address | St. Louis, Missouri 63132, USA | St. Louis, Missouri 63132, USA | St. Louis, Missouri 63132, USA | +| Assay Person Affiliation | Donald Danforth Plant Science Center | Donald Danforth Plant Science Center | Donald Danforth Plant Science Center | +| Assay Person Roles | Investigator | Laboratory Technologist | Laboratory Technologist | +| Assay Person Roles Term Accession Number | http://purl.obolibrary.org/obo/NCIT_C25936 | http://purl.obolibrary.org/obo/NCIT_C51830 | http://purl.obolibrary.org/obo/NCIT_C51830 | +| Assay Person Roles Term Source REF | NCIT | NCIT | NCIT | + + +# Annotation Table sheets + +In the `Annotation Table sheets`, column headers MUST have the first letter of each word in upper case, with the exception of the referencing label (REF). + +The content of the annotation table MUST be placed in an `xlsx table` whose name starts with `annotationTable`. Each sheet MUST contain at most one such annotation table. Only cells inside this table are considered as part of the formatted metadata. + +`Annotation Table sheets` are structured with fields organized on a per-row basis. The first row MUST be used for column headers. Each body row is an implementation of a `Process` node. + +## Inputs and Outputs + +Each annotation table sheet MUST contain an `Input` and an `Output` column, which denote the Input and Output node of the `Process` node respectively. They MUST be formatted in the pattern `Input []` and `Output []`. + +`NodeTypes` MUST be one of the following: + +- A `Source` MUST be indicated with the node type `Source Name`. `Sources` MUST not be used as `Output` nodes. + +- A `Sample` MUST be indicated with the node type `Sample Name`. + +- An `Extract Material` MUST be indicated with the node type `Extract Name`. + +- A `Labeled Extract Material` MUST be indicated with the node type `Labeled Extract Name`. + +- An `Image File` MUST be indicated with the node type `Image File`. + +- A `Raw Data File` MUST be indicated with the node type `Raw Data File`. + +- A `Derived Data File` MUST be indicated with the node type `Derived Data File`. + +`Source Names`, `Sample Names`, `Extract Names` and `Labeled Extract Names` MUST be unique across an ARC. If two of these entities with the same name exist in the same ARC, they are considered the same entity. + +`Image File`, `Raw Data File` or `Derived Data File` node types MUST correspond to a relevant file location. + +## Protocol Columns + +`Protocol REF` columns MAY be used to specify the name of the `Protocol` node implemented by the `Process` node. Per Annotation Table sheet there MUST be at most one `Protocol REF` column. The value MUST be free text. + +`Protocol Version` columns MAY be used to specify the version of the `Protocol` node implemented by the `Process` node. Per Annotation Table sheet there MUST be at most one `Protocol Version` column. The value MUST be free text. + +`Protocol Description` columns MAY be used to specify the description of the `Protocol` node implemented by the `Process` node. Per Annotation Table sheet there MUST be at most one `Protocol Description` column. The value MUST be free text. + +`Protocol Uri` columns MAY be used to specify the uri of the `Protocol` node implemented by the `Process` node. Per Annotation Table sheet there MUST be at most one `Protocol Uri` column. The value MUST be either a URI or a file path corresponding to a relevant protocol file location. + +`Protocol Type` columns MAY be used to specify the type of the `Protocol` node implemented by the `Process` node. Per Annotation Table sheet there MUST be at most one `Protocol Type` column. The value MUST be free text, or an [`Ontology Annotation`](#ontology-annotations). + + +## Ontology Annotations + +Where a value is an `Ontology Annotation` in a table file, `Term Accession Number` and `Term Source REF` fields MUST follow the column cell in which the value is entered. These two columns SHOULD contain further ontological information about the header. In this case, following the static header string, separated by a single space, there MUST be a short ontology term identifier formatted as CURIEs (prefixed identifiers) of the form `:` (specified [here](http://obofoundry.org/id-policy)) inside `()` brackets. +For example, a characteristic type `organism` with a value of `Homo sapiens` can be qualified with an `Ontology Annotation` of a term from NCBI Taxonomy as follows: + +| Characteristics [organism] | Term Source REF (OBI:0100026) | Term Accession Number (OBI:0100026) | +|-----------------------------|-------------------|------------------------------------------------------| +| Homo sapiens | NCBITaxon | [http://…/NCBITAXON/9606](http://.../NCBITAXON/9606) | + +An `Ontology Annotation` MAY be applied to any appropriate `Characteristic`, `Parameter`, `Factor`, `Component` or `Protocol Type`. + +This implements `Ontology Annotation` from the ISA Abstract Model. + +## Unit + +Where a value is numeric, a `Unit` MAY be used to qualify the quantity. In this case, following the column in which a `Unit` +is used, a `Unit` heading MUST be present, and MAY be further annotated as an [`Ontology Annotation`](#ontology-annotations). + +For example, to qualify the value `300` with a `Unit` `Kelvin` qualified as an [`Ontology Annotation`](#ontology-annotations) from the Units Ontology declared +in the Ontology Sources with `UO`: + +| Parameter [temperature] | Unit | Term Source REF (PATO:0000146) | Term Accession Number (PATO:0000146) | +|--------------------------------|--------|-------------------|------------------------------------------------------| +| 300 | Kelvin | UO | [http://…/obo/UO_0000012](http://.../obo/UO_0000012) | + + + +## Characteristics + +A `Characteristic` is used as an attribute column following [`Sources`](#inputs-and-outputs) and [`Samples`](#inputs-and-outputs). This column contains terms describing each material according to the characteristics category indicated in the column header in the pattern `Characteristic []`. +For example, a column header `Characteristic [organ part]` would contain terms describing an organ part. `Characteristic` SHOULD be used as an attribute column following `Input [Source Name]`, or `Input [Sample Name]`. The value MUST be free text, numeric, or an [`Ontology Annotation`](#ontology-annotations). + +For example, a characteristic type Organism with a value of Homo sapiens can be qualified with an [`Ontology Annotation`](#ontology-annotations) of a term from NCBI Taxonomy as follows: + +| Characteristic [organ part] | Term Source REF (UBERON:0000064) | Term Accession Number (UBERON:0000064) | +|-------------------------------|-------------------|-------------------------| +| Liver | MeSH | D008099 | + +## Factors + +A `Factor` is an independent variable manipulated by an experimentalist with the intention to affect biological systems in a way that can be measured by an assay. This field holds the actual data for the `Factor` named between the square brackets (as declared in the `Study Factors` section of a top-level metadata sheet) so MUST match, for example, `Factor [compound]`. The value MUST be free text, numeric, or an [`Ontology Annotation`](#ontology-annotations). + +| Factor [Gender] | Term Source REF (NCIT:C17357) | Term Accession Number (NCIT:C17357) | +|------------------------|-------------------|-------------------------| +| Male | MeSH | D008297 | + + +## Components + +A `Component` is a consumable or reusable physical entity used in the experimental workflow. It is formatted in the pattern `Component []`. The value MUST be free text, numeric, or an [`Ontology Annotation`](#ontology-annotations). + +| Component [Measurement Device] | Term Source REF (NCIT_C81182) | Term Accession Number (NCIT_C81182) | +|------------------------|-------------------|-------------------------| +| Illumina MiniSeq | OBI | [http://…/obo/OBI_0003114](http://purl.obolibrary.org/obo/OBI_0003114) | + +## Parameters + +A `Parameter` can be used to specify any additional information about the experimental setup, that does not fall under the aforementioned 3 categories. It is formatted in the pattern `Parameter []`. The value MUST be free text, numeric, or an [`Ontology Annotation`](#ontology-annotations). + +| Parameter [time] | Unit | Term Source REF (PATO_0000165) | Term Accession Number (PATO:0000165) | +|--------------------------------|--------|-------------------|------------------------------------------------------| +| 300 | Kelvin | UO | [http://…/obo/UO_0000032](http://purl.obolibrary.org/obo/UO_0000032) | + +## Others + +Columns whose headers do not follow any of the formats described above are considered additional payload and are out of the scope of this specification. + +## Examples + +For example, a simple [source](#inputs-and-outputs) to [sample](#inputs-and-outputs) may be represented as: + +| Input [Source Name] | Protocol REF | Output [Sample Name] | +|---------------|-------------------|---------------| +| source1 | sample collection | sample1 | + +Where a graph splits or pools, we use the [Input](#inputs-and-outputs) or [Output](#inputs-and-outputs) column to represent the same nodes. + +For example, if we split a source into two samples, we might represent this as: + +| Input [Source Name] | Protocol REF | Output [Sample Name] | +|---------------|-------------------|---------------| +| source1 | sample collection | sample1 | +| source1 | sample collection | sample2 | + +If we pool two sources into a single sample, we might represent this as: + +| Input [Source Name] | Protocol REF | Output [Sample Name] | +|---------------|-------------------|---------------| +| source1 | sample collection | sample1 | +| source2 | sample collection | sample1 | \ No newline at end of file