Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define table specifications V1 #529

Closed
3 tasks
tcompa opened this issue Sep 19, 2023 · 3 comments · Fixed by #582
Closed
3 tasks

Define table specifications V1 #529

tcompa opened this issue Sep 19, 2023 · 3 comments · Fixed by #582
Labels
documentation Improvements or additions to documentation High Priority Current Priorities & Blocking Issues Tables AnnData and ROI/feature tables

Comments

@tcompa
Copy link
Collaborator

tcompa commented Sep 19, 2023

  • Add a version to the attributes of all ROI-table Zarr groups. Maybe something like "fractal_roi_table_version": "1"? Note that for the moment we are not planning to do anything with it, but in this way the newly-created Zarrs will include this information.
  • Add a ROI-v1 docs page, with a description of the current status
  • Include a brief list of ongoing discussions or aspects that will likely change in the future. These could include both shorter-term perspectives (e.g. not hard-coding micrometer in column names) and longer-term ones (e.g. re-evaluate the choice of AnnData).
@tcompa tcompa added documentation Improvements or additions to documentation Tables AnnData and ROI/feature tables labels Sep 19, 2023
@tcompa tcompa changed the title Placeholder: Write docs page for fractal-tasks-core ROIs Write docs page for fractal-tasks-core ROIs Oct 12, 2023
@tcompa tcompa added the High Priority Current Priorities & Blocking Issues label Oct 12, 2023
@tcompa tcompa changed the title Write docs page for fractal-tasks-core ROIs Write docs page for fractal-tasks-core ROIs v1 Oct 19, 2023
@tcompa tcompa changed the title Write docs page for fractal-tasks-core ROIs v1 Define ROI-table V1 Oct 19, 2023
@tcompa tcompa linked a pull request Oct 19, 2023 that will close this issue
1 task
@tcompa tcompa changed the title Define ROI-table V1 Define table specifications Oct 27, 2023
@tcompa
Copy link
Collaborator Author

tcompa commented Oct 27, 2023

Given the shape that #582 is taking, I wonder whether we should be more granular in the definition of versions.
That is, should the zarr attributes of a table group be as follows?

// table with no additional specifications
{
 "fractal_table_version": "1",
}
// basic ROI table  (naming is WIP)
{
 "fractal_table_version": "1",
 "type": "basic_roi_table",
}
// advanced ROI table (naming is WIP)
{
 "fractal_table_version": "1",
 "type": "advanced_roi_table",
}
// feature table
{
 "fractal_table_version": "1",
 "type": "feature_table",
}

The benefit here is that based on a zarr attribute we immediately know whether some operations on the table make sense.

For instance we already use the type="ngff:region_table" information in a few places:

  • Masked loading only works for this type of tables, which is checked via the is_ROI_table_valid(table_path, use_masks=True
  • Also _preprocess_input checks that type is "ngff:region_table"
  • When writing a table which has that type, we also check that attributes comply with the specs. For the moment it's a soft check (ref Review use of write_table in registration tasks #594), but we could make it a strict check if we want to only write spec-compliant tables. Note that we'd still be able to write arbitrary tables, but then we would not include any type attribute (while still including fractal_table_version="1").

NOTE: if we start using our own table types (e.g. basic_roi_table, advanced_roi_table, feature_table -- all TBD), then we need to also support the legacy type ngff:region_table, to support tables of existing OME-Zarrs. My current understanding is that we used ngff:region_table simply as a placeholder for what would then become advanced_roi_table, and then we could easily maintain backwards compatibility by treating the two types as synonyms.

@tcompa
Copy link
Collaborator Author

tcompa commented Oct 27, 2023

Note: the proposal in the previous comment concerns the specs definition, but it's still acceptable to postpone a bit the actual implementation of spec-compliant tables.
That is, we can now define (as part of #582) the V1 specs, and then refactor the internal tools and helper functions in the appropriate way in upcoming releases.


For instance we may start to encode part of the specs in the code, e.g. the required columns for a ROI tables could be stored in an upcoming lib_tables.py module.

And we could also take the opportunity to clean up the structure of the helper functions, to keep the ones referred to different parts of the specs separated. We can decide how far to push the modularity, but just to have a random example we could go as far as:

fractal_tasks_core
    tables
        v1
            lib_tables.py (which includes, e.g.
            lib_roi_tables.py
            lib_feature_tables.py
            lib_write_table.py

(or any other arrangement of the same kind of information.. e.g. the v1/v2 do not need to be specific subfolders)

@tcompa tcompa changed the title Define table specifications Define table specifications V1 Oct 30, 2023
@jluethi
Copy link
Collaborator

jluethi commented Oct 30, 2023

On the idea of having specific table types:
I can see the motivation behind this and it does sound helpful. Given the fluid situation of NGFF table specification, I wouldn't want to set too many hard requirements from our side though. But I like that we'll have internal ways of calling (& validating) different table types!

As such, I think we shouldn't be too strict about these types for the time being. I don't mind having them and raising warnings when they aren't set. We may even have some processing that goes easier if the type is set. But we should (at least for quite a while) accept tables with no type set (/wrong type set or ngff:region_table type set) and have ways to check whether their content confirms to the table type we need.

The ngff:region_table type was meant to be our feature_table type. Different naming convention I guess.
The concept of ROI tables hadn't made it into that spec proposal. We only started to adopt ngff:region_table to become closer to the spec in some places.


Regarding names, here is a proposal:

// table with no additional specifications
{
 "fractal_table_version": "1",
}
// basic ROI table
{
 "fractal_table_version": "1",
 "type": "roi_table",
}
// advanced ROI table (naming is WIP)
{
 "fractal_table_version": "1",
 "type": "masking_roi_table",
}
// feature table
{
 "fractal_table_version": "1",
 "type": "feature_table",
}

Are roi_tables just a subset of masking_roi_tables? The masking_roi_tables just contain additional info on labels, metadata on the label image and such, right?
The only thing that makes the advanced ROI tables advanced is their masking info? Or are there other advanced info in there?
I'm happy with feature_table & no-type tables

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation High Priority Current Priorities & Blocking Issues Tables AnnData and ROI/feature tables
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants