Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ROI table on Dataset, for table with ROI ID column #62

Merged
merged 18 commits into from
Mar 21, 2022

Conversation

will-moore
Copy link
Member

@will-moore will-moore commented Nov 26, 2021

This supports work on idr0123 where we are looking to create a Table with ROIs linked to a Dataset.

The table has an roi column so we know the ROI IDs already. These are now validated by populate metadata but we don't add an Roi Name column for a Dataset.

I've added a couple more scenarios to the README (see https://github.com/will-moore/omero-metadata/tree/roi_table_on_dataset), to describe this new feature.
But the handling of invalid IDs isn't included there yet, since we may want to change this.

EDIT: Invalid ROI or Shape IDs will get replaced with -1.

If you don't include an Image column in the csv, populate won't add one. This would be new functionality unlike any existing workflows, so it might be tricky to handle and isn't required by existing use-case. However, if image IDs are included, the Image Name column will be added and populated.

To test:

E.g. with a csv like:

# header roi,l,l,d,l
roi,shape,object,probability,area
225611,416813,1,0.8,250
225612,416814,2,0.9,500
208533,209210,3,0.2,25
$ omero metadata populate Dataset:8 --file test-roi-dataset.csv

Screenshot 2022-02-16 at 14 01 01

And when an Image column is included, an Image Name column should be added:

# header roi,l,image,l,d,l
roi,shape,image,object,probability,area
225611,416813,2567,1,0.8,250
225612,416814,2567,2,0.9,500
208533,209210,2,3,0.2,25

Screenshot 2022-02-16 at 14 03 32

@sbesson
Copy link
Member

sbesson commented Nov 26, 2021

Still brainstorming on this, have you considered including an Image or Image Name column that would need to be resolved as well primarily for consistency? A downside of the structure proposed here is that unlike in the Image-level table. the Image/ROI relationship is lost. I assume that if a downstream application wanted to query a table e.g. find all spots associated with a range of a chromosome across a datasets, this would be an useful information.

@will-moore
Copy link
Member Author

Yes, good point. But I would be reluctant to add the logic for Image lookup to populate.py. You'd need a big look-up to load all the Images for all the ROIs in the Dataset, and the permutations of the code are hard to understand. But it would be trivial for me to add them to my csv when I'm creating it. So this is another case where I'd just want a simple csv -> Table.

@sbesson
Copy link
Member

sbesson commented Nov 29, 2021

For prototyping, I definitely see the value of being able to do CSV -> OMERO.tables and maybe there is a flag that just does so.
But in production, I consider the lookup is one of the features of the omero-metadata plugin in order to validate the containment before linking CSV columns to Image, Dataset, Well... typed columns.

Is the Image/ROI lookup significantly more expesnse than e.g. a Project/Dataset/Image lookup or a Screen/Plate/Well lookup? Or is the issue that the code path is not in the plugin yet?

@will-moore
Copy link
Member Author

@sbesson This now supports look-up of ROIs and Shapes for a DatasetWrapper.
It uses the same behaviour as for ImageWrapper: returns a Skip() if the ROI or Shape isn't found.
This is different than handling of Images within a Dataset, where we return -1 if the Image ID isn't found.
Also added a test for this behaviour.

README.rst Outdated Show resolved Hide resolved
src/omero_metadata/populate.py Outdated Show resolved Hide resolved
else:
data.extend(rv)
if not data:
print("No ROIs on images in target Dataset")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to log too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can revert to the raise MetadataError() that is used elsewhere. I didn't use that initially as _load_rois() was called even if there were no shape or roi column in the table, but now it is only called when needed so it makes sense to throw.

@jburel
Copy link
Member

jburel commented Jan 12, 2022

Test for omero metadata populate Project:1 --file project.csv
Test 1:
Link to Project using similar example that the one in README. The dataset contains 4 images. Only 2 listed in the CVS and one that is not in the dataset is listed.
The output of the command is a unexpected i.e.

INFO:omero_metadata.populate:Adding dataset:3710 image:49209
INFO:omero_metadata.populate:Adding dataset:3710 image:49208
INFO:omero_metadata.populate:Adding dataset:3710 image:49207
INFO:omero_metadata.populate:Adding dataset:3710 image:49206

From the output, I thought before checking the table that the entries for the 4 images have been added to the table.

The choice of -1 for the image option means that the web display still offers an "hyperlink" option
Screenshot 2022-01-12 at 13 51 46

Test 2:
Valid name for the three images but the 3rd image has an invalid dataset name. => -1 in Image column. Expected
Screenshot 2022-01-12 at 13 57 45

Test 3:
Entries with images in multiple datasets in the same project. This works as expected, The Web could be improved in the sense that if the image is in multiple datasets, the user will click on the link and the dataset not corresponding to the one in the table will be expanded i.e. the context is lost. The context is taken into account to set the ID or not but then it is lost in the display

@jburel
Copy link
Member

jburel commented Jan 12, 2022

Test: omero metadata populate Dataset:3710 --file dataset.csv
No output
Expecting something like INFO:omero_metadata.populate:Adding image:49209 to match the previous command

According to the README If the target is a Dataset instead of a Project, the Dataset Name column is not needed.

  • If the dataset name is specified and is not valid, the image is not considered. There is in that case no entry in the table, even if the image is in the dataset specified via the command line (i.e. 3710). This is inconsistent with the Project command since a -1 was inserted.
  • If there is no dataset name, the image is added to the table

@jburel
Copy link
Member

jburel commented Jan 12, 2022

Test if you already know the Image IDs, these can be specified in an ``image`` column, and an ``Image Name`` column will be added. This is only supported for a Dataset.

  • If all IDs are valid, the image names are set.
  • If an ID is not the ID of an image in the Dataset, the ID in the Image column is set to -1 and no image name is set

@will-moore
Copy link
Member Author

@jburel Apologies for the confusion, but the scenarios you're testing there aren't ones that have been added in this PR.
This PR adds support for tables with an roi column (and shape column) with roi/shape IDs, where the target is a Dataset e.g:

    # header roi,l,l,d,l
    Roi,shape,object,probability,area
    501,1066,1,0.8,250
    502,1067,2,0.9,500
    503,1068,3,0.2,25

@jburel
Copy link
Member

jburel commented Jan 12, 2022

Test omero metadata populate Image:49206 --file rois.csv

  • The ROis have no name. Message WARNING:omero_metadata.populate:Conflicting ROI names is misleading.
  • One roi ID in the CSV is incorrect WARNING:omero_metadata.populate:Image is missing ROI: 1286504 shows up twice in the output. The noise is not really necessary. This results with no entry in the table. This is again inconsistent with the previous methods.

@jburel
Copy link
Member

jburel commented Jan 12, 2022

@will-moore I went with the README changes since it also describes how to use this package. If this is outside the scope of this PR then it should be out of it.

My general feeling is that there is a major inconsistency between methods on how to handle the "invalid" rows.

@jburel
Copy link
Member

jburel commented Jan 12, 2022

'ImageWrapper' object has no attribute 'resolve_shape'

@jburel
Copy link
Member

jburel commented Jan 12, 2022

Test omero metadata populate Dataset:3710 --file rois.csv

  • Shape and ROIs ID are valid. The table is created.
  • Shape invalid and ROI ID valid. No entry in the table

Before adding with more confusing I feel that we need to review how invalid rows get handled. Some commands will insert nothing, some will insert a row with -1.

@will-moore
Copy link
Member Author

@jburel I moved the README changes that were unrelated to this PR to #66.
I'll add support for shape column to ImageWrapper since it is confusing to only have that for Dataset.

@will-moore
Copy link
Member Author

@jburel This now handles invalid ROI and Shape IDs by setting them to -1 instead of skipping the row.
@pwalczysko Tweaked the README to try and make it clearer.

@will-moore
Copy link
Member Author

Used this in it's current form to re-create OMERO.tables on Datasets in idr0123 https://idr-redmine.openmicroscopy.org/issues/270#note-13

@pwalczysko
Copy link
Member

@pwalczysko Tweaked the README to try and make it clearer.

Thank you @will-moore . I still have following questions regarding the ROIs paragraph.

Screenshot 2022-01-19 at 18 18 30
I would still think that it would be better to say what examples are meant. Are those the ROI and Shape examples from the same paragraph ? The "above" is really not reassuring.

Screenshot 2022-01-19 at 18 27 41
Futher in the same sentence, the formatting of the words "Image ID" is confusingly different from the ROI ID formatting just two sentences above. Maybe rather say image column containing IDs, to avoid any doubt ?

  1. Where are the ROI names taken from ? If I have some ROIs in OMERO and somehow retrieve their IDs, create a CSV and then try to populate it with results and import the CSV to OMERO using this metadata plugin, I do not suppose that the ROIs are by default named ? What will the plugin do then ? Error ? How to prevent ?

  2. The fact that shape type is not supported means that I cannot target Shapes directly in my CSV, they always have to be "under" a ROI which is specified in the same line of my CSV ?

@erindiel
Copy link
Member

@erindiel Let me know if keeping a row with -1 for invalid ROI IDs would cause problems, since I think that's what I'd like to go for (to be consistent with handling Image IDs etc). Thanks.

@will-moore we are not relying on skipping the row if the ROI ID is invalid, so replacing with -1 to be consistent with the handling of other object IDs works for us. In our workflow, there is typically a single ROI ID across the entire table, and a table with an invalid ROI ID will be "broken" regardless. 👍🏻

@will-moore
Copy link
Member Author

Great, thanks

@will-moore
Copy link
Member Author

@pwalczysko Tried to address some of your feedback, but some is a bit out of scope for this PR and comes under:
How do we fully document the behaviour of metadata populate?
There are so many options: targets, column types etc. that it feels like we need a way to describe how everything is handled.
Probably needs some kind of docs page since it's getting a bit verbose for the README.

@pwalczysko
Copy link
Member

@pwalczysko Tried to address some of your feedback, but some is a bit out of scope for this PR and comes under: How do we fully document the behaviour of metadata populate? There are so many options: targets, column types etc. that it feels like we need a way to describe how everything is handled. Probably needs some kind of docs page since it's getting a bit verbose for the README.

Thank you.
At the moment, I am simply trying to understand the behaviour myself, so not worried just yet about the user understanding it. Surely we have to go away from the README. I would suggest that we need plenty of examples. Not sure about whether to target guides or docs (and link from guides to the docs ? ) cc @jburel @sbesson
For that, I still have a question

If a column named shape of type l is included, the Shape IDs will be validated (and set to -1 if invalid).
Does that mean that that column named shape is containing Shape IDs ? If yes, I think the sentence should first clearly state that, only then continue about explaining what will be done with these IDs.

@pwalczysko
Copy link
Member

@will-moore thank you, now it indeed leaves little room for doubt, I think I understand what to do.

@jburel
Copy link
Member

jburel commented Feb 16, 2022

I think usage guide will be good. We have section about CLI import for example already in the guide so the same strategy could be followed for that plugin.
Adding it to the README does not scale. I think we agreed to add ref doc in the repository itself a while back.

so not worried just yet about the user understanding it

That does not sound like a good start.

@will-moore
Copy link
Member Author

Updated the description with testing instructions.

@pwalczysko
Copy link
Member

I think usage guide will be good. We have section about CLI import for example already in the guide so the same strategy could be followed for that plugin. Adding it to the README does not scale. I think we agreed to add ref doc in the repository itself a while back.

so not worried just yet about the user understanding it

That does not sound like a good start.

@jburel we already have https://omero-guides.readthedocs.io/en/latest/upload/docs/metadata-ui.html#create-a-metadata-csv as well as https://omero-guides.readthedocs.io/en/latest/upload/docs/metadata.html
I think we need a new chapter to alleviate the confusion the user is feeling when trying to create a workable CSV.

@jburel
Copy link
Member

jburel commented Feb 16, 2022

Thanks @pwalczysko I forgot about it.
We have 2 types of doc

  • the guide
  • the ref doc. For the ref doc putting the generated and not generated doc from this repo will follow what has been done elsewhere. That could be easily published on rtd

@pwalczysko
Copy link
Member

pwalczysko commented Feb 17, 2022

Test 1:

On dataset https://merge-ci.openmicroscopy.org/web/webclient/?show=dataset-25469 created succesfully a table https://merge-ci.openmicroscopy.org/web/webclient/omero_table/880600/
The filtering and the link from the clickable table IDs to single shapes works fine. The link which should get me to the rois is not working, the iviewer is reporting Failed to get image data: 'error'

Is that expected ?

@pwalczysko
Copy link
Member

Test 2:
created a csv with multiple rois per image and an image column and also multiple shapes per roi. Worked fine, see the csv bleow.

# header roi,l,image,l,d,l
roi,shape,image,object,probability,area
437462,526593,140169,1,0.8,250
437462,526594,140169,2,0.11,400
437463,526597,140169,2,0.11,400
437465,526607,140167,3,0.9,500
437469,526621,140168,4,0.2,25

@pwalczysko
Copy link
Member

Test 3
In dataset https://merge-ci.openmicroscopy.org/web/webclient/?show=dataset-25469 renamed one image so that I have 2 images with the same name in this dataset. Then, deleted the previous table from Test2 and rerun Test2 on the same dataset.
This caused an error

omero metadata populate Dataset:25469 --file ~/test-will/imagesroisshapes.csv 
Using session for user-3@merge-ci-devspace.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: read-annotate-1
Traceback (most recent call last):
  File "/Users/pwalczysko/miniconda3/envs/cli/bin/omero", line 11, in <module>
    sys.exit(main())
  File "/Users/pwalczysko/miniconda3/envs/cli/lib/python3.6/site-packages/omero/main.py", line 125, in main
    rv = omero.cli.argv()
  File "/Users/pwalczysko/miniconda3/envs/cli/lib/python3.6/site-packages/omero/cli.py", line 1784, in argv
    cli.invoke(args[1:])
  File "/Users/pwalczysko/miniconda3/envs/cli/lib/python3.6/site-packages/omero/cli.py", line 1222, in invoke
    stop = self.onecmd(line, previous_args)
  File "/Users/pwalczysko/miniconda3/envs/cli/lib/python3.6/site-packages/omero/cli.py", line 1299, in onecmd
    self.execute(line, previous_args)
  File "/Users/pwalczysko/miniconda3/envs/cli/lib/python3.6/site-packages/omero/cli.py", line 1381, in execute
    args.func(args)
  File "/Users/pwalczysko/Work/omero-metadata/src/omero_metadata/cli.py", line 536, in populate
    allow_nan=args.allow_nan)
  File "/Users/pwalczysko/Work/omero-metadata/src/omero_metadata/populate.py", line 1094, in __init__
    allow_nan=allow_nan)
  File "/Users/pwalczysko/Work/omero-metadata/src/omero_metadata/populate.py", line 364, in __init__
    self.wrapper = DatasetWrapper(self)
  File "/Users/pwalczysko/Work/omero-metadata/src/omero_metadata/populate.py", line 765, in __init__
    self._load()
  File "/Users/pwalczysko/Work/omero-metadata/src/omero_metadata/populate.py", line 834, in _load
    iname, self.images_by_name[iname].id.val, iid
Exception: Image named Tcf15_KO_d2.tif(id=140168) present. (id=140167)

I would have thought that this Test3 is testing the "identical image names" case mentioned in the last commit ? Does not seem to work.

@will-moore
Copy link
Member Author

Ah - Test 1 ROI link is actually an iviewer bug. It's not handling the fact that the ROI is in your non-default group.
I just tried changing the default group for user-3 and logging out. On logging in again the links are working (show ROI in iviewer). I'll create an inviewer issue...

@pwalczysko
Copy link
Member

As @will-moore aswered the Test1 problem with ROI link above, in summary the remaining problem is #62 (comment) Test3 (same-named images)

Other than that worked fine.

@will-moore
Copy link
Member Author

Test 3 - actually that is showing the fix is working. The bug reported in #68 was that the "duplicate image" error message was failing to render properly. Now it is showing up properly! This doesn't change the fact that duplicate image names aren't supported.

@will-moore
Copy link
Member Author

iviewer issue: ome/omero-iviewer#400

@sbesson
Copy link
Member

sbesson commented Feb 18, 2022

@jburel anything from your side? Otherwise, should we get this into a 0.10.0 release including the ROI / Dataset support as well as the unification of the invalid IDs?

@jburel
Copy link
Member

jburel commented Mar 21, 2022

invalid ROI/shapes are now recorded as -1. This follows what has been implemented elsewhere
Outside the scope of this PR, the handling of -1 should be improved in the omero-web i.e. do not create an hyperlink.

@jburel
Copy link
Member

jburel commented Mar 21, 2022

I think we can release as 0.10.0 as suggested. @will-moore Do you want to take care of the release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants