Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify plate and well specifications for sparse plates #24

Merged
merged 6 commits into from
Feb 2, 2022
172 changes: 155 additions & 17 deletions latest/index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ High-content screening {#hcs-layout}
------------------------------------

The following specification defines the hierarchy for a high-content screening
dataset. Three groups must be defined above the images:
dataset. Three groups MUST be defined above the images:

- the group above the images defines the well and MUST implement the
[well specification](#well-md). All images contained in a well are fields
Expand All @@ -166,6 +166,9 @@ dataset. Three groups must be defined above the images:
collection of wells organized in rows and columns. It MUST implement the
[plate specification](#plate-md)

A well row group MUST NOT be present if there are no images in the well row.
A well group MUST NOT be present if there are no images in the well.


```
. # Root folder, potentially in S3,
Expand Down Expand Up @@ -365,8 +368,9 @@ custom attributes of the plate group under the `plate` key.
<dt><strong>columns</strong></dt>
<dd>A list of JSON objects defining the columns of the plate. Each column
object defines the properties of the column at the index of the object
in the list. If not empty, it MUST contain a `name` key specifying the
column name.</dd>
in the list. Each column in the physical plate MUST be defined, even
if no wells in the column are defined. Each defined column MUST contain
a `name` key specifying the column name.</dd>
<dt><strong>field_count</strong></dt>
<dd>An integer defining the maximum number of fields per view across all
wells.</dd>
Expand All @@ -375,17 +379,24 @@ custom attributes of the plate group under the `plate` key.
<dt><strong>rows</strong></dt>
<dd>A list of JSON objects defining the rows of the plate. Each row object
defines the properties of the row at the index of the object in the
list. If not empty, it MUST contain a `name` key specifying the row
name.</dd>
list. Each row in the physical plate MUST be defined, even if no wells
in the row are defined. Each defined row MUST contain a `name` key
specifying the row name.</dd>
<dt><strong>version</strong></dt>
<dd>A string defining the version of the specification.</dd>
<dt><strong>wells</strong></dt>
<dd>A list of JSON objects defining the wells of the plate. Each well object
MUST contain a `path` key identifying the path to the well subgroup.</dd>
MUST contain a `path` key identifying the path to the well subgroup.
The `path` MUST consist of a `name` in the `rows` list, a file separator (`/`),
and a `name` from the `columns` list, in that order. The `path` MUST NOT contain
additional leading or trailing directories.
sbesson marked this conversation as resolved.
Show resolved Hide resolved
Each well object MUST contain both a `row_index` key identifying the index into
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding #24 (comment), are there cases where it is not possible to recompute these indexes based on the knowledge of the individual wells path as well as the rows and names dictionaries? If recomputing is always possible (but at the cost of the consumer), my primary consideration is whether the recommendation for these new fields should be SHOULD rather than MUST.

For real-world examples, I can definitely see how row_index/column_index makes sense in terms of optimizing some of the queries. In addition to testing this with sparse plates, it will be useful to also generate representative plate with many wells (384 at least) to check there is no performance impact with the extra JSON metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order for these indexes to be forward or reverse computable, path would need to be much more explicitly defined than it is now:

A list of JSON objects defining the fields of views for a given well. Each object MUST contain a path key identifying the path to the field of view. If multiple acquisitions were performed in the plate, it SHOULD contain an acquisition key identifying the id of the acquisition which must match one of acquisition JSON objects defined in the plate metadata.

Furthermore, the wells array would need have be null or similar padding in order for those indexes to make sense.

Neither of these things are ideal obviously. I don't think there's a way to not have these things be MUST if we want to guarantee that lookups can happen based on physical plate characteristics.

the `rows` list and a `column_index` key indentifying the index into
the `columns` list. `row_index` and `column_index` MUST be 0-based.</dd>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise that #70 has been added after this PR was opened, but the decision there means these new attributes should now be named rowIndex and columnIndex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed in 7c2536a.

</dl>

For example the following JSON object defines a plate with two acquisition and
6 wells (2 rows and 3 columns), containing up 2 fields of view per acquistion.
For example the following JSON object defines a plate with two acquisitions and
6 wells (2 rows and 3 columns), containing up to 2 fields of view per acquisition.

```json
"plate": {
Expand Down Expand Up @@ -427,22 +438,129 @@ For example the following JSON object defines a plate with two acquisition and
"version": "0.1",
"wells": [
{
"path": "2020-10-10/A/1"
"path": "A/1",
"row_index": 0,
"column_index": 0
},
{
"path": "A/2"
"row_index": 0,
"column_index": 1
},
{
"path": "A/3"
"row_index": 0,
"column_index": 2
},
{
"path": "B/1"
"row_index": 1,
"column_index": 0
},
{
"path": "B/2"
"row_index": 1,
"column_index": 1
},
{
"path": "B/3"
"row_index": 1,
"column_index": 2
}
]
}
```

The following JSON object defines a sparse plate with one acquisition and
2 wells in a 96 well plate, containing one field of view per acquisition.

```json
"plate": {
"acquisitions": [
{
"id": 1,
"maximumfieldcount": 1,
"name": "single acquisition",
"starttime": 1343731272000
},
],
"columns": [
{
"name": "1"
},
{
"name": "2"
},
{
"name": "3"
},
{
"name": "4"
},
{
"name": "5"
},
{
"name": "6"
},
{
"name": "7"
},
{
"path": "2020-10-10/A/2"
"name": "8"
},
{
"path": "2020-10-10/A/3"
"name": "9"
},
{
"path": "2020-10-10/B/1"
"name": "10"
},
{
"path": "2020-10-10/B/2"
"name": "11"
},
{
"path": "2020-10-10/B/3"
"name": "12"
}
],
"field_count": 1,
"name": "sparse test",
"rows": [
{
"name": "A"
},
{
"name": "B"
},
{
"name": "C"
},
{
"name": "D"
},
{
"name": "E"
},
{
"name": "F"
},
{
"name": "G"
},
{
"name": "H"
}
],
"version": "0.1",
"wells": [
{
"path": "C/5"
"row_index": 2,
"column_index": 4
},
{
"path": "D/7"
"row_index": 3,
"column_index": 6
}
]
}
Expand All @@ -452,23 +570,23 @@ For example the following JSON object defines a plate with two acquisition and
--------------------------

For high-content screening datasets, the metadata about all fields of views
under a given well can be found under the "well" key in the attributes of the
under a given well can be found under the "well" key in the attributes of the
well group.

<dl>
<dt><strong>images</strong></dt>
<dd>A list of JSON objects defining the fields of views for a given well.
Each object MUST contain a `path` key identifying the path to the
field of view. If multiple acquisitions were performed in the plate, it
SHOULD contain an `acquisition` key identifying the id of the
MUST contain an `acquisition` key identifying the id of the
acquisition which must match one of acquisition JSON objects defined in
the plate metadata.</dd>
<dt><strong>version</strong></dt>
<dd>A string defining the version of the specification.</dd>
</dl>

For example the following JSON object defines a well with four fields of
views. The first two fields of view were part of the first acquisition while
view. The first two fields of view were part of the first acquisition while
the last two fields of view were part of the second acquisition.

```json
Expand All @@ -495,6 +613,26 @@ the last two fields of view were part of the second acquisition.
}
```

The following JSON object defines a well with two fields of view in a plate with
four acquisitions. The first field is part of the first acquisition, and the second
field is part of the last acquisition.

```json
"well": {
"images": [
{
"acquisition": 0,
"path": "0"
},
{
"acquisition": 3,
"path": "1"
}
],
"version": "0.1"
}
```

Implementations {#implementations}
==================================

Expand Down