Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hcs keyvals from csv #195

Merged
merged 14 commits into from
Jun 27, 2022
Merged

Hcs keyvals from csv #195

merged 14 commits into from
Jun 27, 2022

Conversation

will-moore
Copy link
Member

@will-moore will-moore commented Jan 23, 2022

This PR contains all commits from mpievolbio-scicomp#1.

To test: a CSV of this format should behave according to the comment after each row:

plate           well    image               Cell line
Index.idx.xml                               S2R+        # add KV to plate
Index.idx.xml   B2                          S2R+        # add KV to well only
Index.idx.xml   B2      Well 1 Field 1      S2R+        # add KV to image only
Index.idx.xml           Well 1 Field 2      S2R+        # add KV to image only

A couple of additional points addressed in this PR:

Screenshot 2022-01-23 at 22 44 19

cc @abhamacher Does this look OK?
If you could give this a final test and 👍 if looking good, that would be great, thanks.

To test (OME)

Using Plate at https://merge-ci.openmicroscopy.org/web/webclient/?show=plate-16105 (user-3).
This has a test .csv file attached, using the format above. Sample of it pasted below. The target column says what should be annotated by each row.
Download, inspect and edit the file, then use to run the annotations_scripts/KeyVal_from_csv.py script.
Rows which have an image name will annotate that Image (Well column is ignored)
Rows without an image will annotate the Well.
NB: Can use the Remove_KeyVal.py script to remove all Well and Image annotations on the Plate if needed.

plate,well,image,target,Time,cell_count,Mitotic fraction,
to-plate_1,A1,A1 Field 1,Image A1 Field 1,10,1,0.1,
to-plate_1,A1,A1 Field 2,Image A1 Field 2,11,2,0.11,
to-plate_1,A2,A2 Field 1,Image A2 Field 1,20,3,0.2,
to-plate_1,A2,,Well A2,21,4,0.21,
to-plate_1,A3,A3 Field 1,Image A3 Field 1,30,5,0.3,
to-plate_1,A4,,Well A4,40,6,0.4,
to-plate_1,A5,,Well A5,50,7,0.5,
to-plate_1,A6,A6 Field 1,A6 Field 1,60,8,0.6,
to-plate_1,B10,B1 Field 1,Image B1 Field 1 (Well ignored),110,9,0.1,
to-plate_1,B20,B2 Field 1,Image B2 Field 1 (Well ignored),120,10,0.2,
to-plate_1,,,Plate (No Well or Image),130,11,0.3,
invalid,B4,B4 Field 1,NONE (plate name invalid),140,12,0.4,
to-plate_1,B5,B5 Field 1,Image B5 Field 1,160,13,0.5,
to-plate_1,B6,B6 Field 1,Image B6 Field 1,160,14,0.6,

@abhamacher
Copy link
Contributor

Hi Will,

sorry, that my response time is so slow at the moment. This is still an important topic for me, it's just the lab work, that is keeping me busy at the moment.

I did a test right now and also really, really like the CSV delimiter sniffing. Thanks @JensWendt for the idea. Worked well for my data, which saves some effort and helps to avoid errors due to the commas in the default image names.

I only encountered one issue, when creating an Excel file with the annotations and then saving it to csv, which is the most likely scenario for our biologists. When I tried to import this csv file, the script struggled with the BOM at the beginning. The info message is green, but the log stucks at the column headers:

kv-script

script params
File_Annotation 1603
IDs [1]
Data_Type Plate
set ann id 1603
Original File 22636 Copy of plate_test.csv
Using delimiter: ;
header ['\ufeffPlate', 'Well', 'Image', 'Test1', 'Compound', 'Description']

Maybe this is something you could capture and remove the BOM before processing?

Thank you very much for your effort! Looking forward to enroll the script on our production system once it's finally released.

Anna

@JensWendt
Copy link
Contributor

Hi Anna,

I also encountered this problem.
My VERY basic solution to this was to add another line after header = data[0]
--> header[0]=str(header[0]).replace("\ufeff","")
From my understanding this is the BOM for UTF-8 encoding.
There are other possible BOMs but I only encountered this one for now.
I am sure there are other more elegant versions out there. But for now this little workaround is helping.

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/multiple-key-value-pair-map-annoation-questions/62976/5

@will-moore
Copy link
Member Author

Hi @JensWendt @abhamacher,
I've tried handling the BOM in the same way that we do for Populate_Metadata script. I've tested this with a sample BOM script I have and it works for that - Hopefully it'll work for yours too?

@JensWendt
Copy link
Contributor

Hi Will,

I tried it out with two different .csv files and it worked.
Seems good to me.
Thanks!

@will-moore
Copy link
Member Author

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/uploading-key-value-pairs-from-csv-files-into-omero-web-for-plates/60202/58

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/uploading-key-value-pairs-from-csv-files-into-omero-web-for-plates/60202/61

@JensWendt
Copy link
Contributor

Hello @will-moore ,

I checked with two plates in one screen. Both with multiple wells and multiple images per well. There was one combined .csv.
I have three remarks:

  1. I had to declare
header=[]
nimg_updated = 0
missing_names = 0
images_by_name = {}

in line 137+ at the beginning of def keyval_from_csv(), because otherwise the script would complain that I referenced local variables without declaring them first when the return message was created in the end of this method.

  1. The mouseover tooltip for the IDs script parameter is described as "Plate or Screen ID.". That is misleading,
    as I can only select Dataset or Plate as data type. I tried to put the Screen ID in that my two plates were under and of course it did not work. We either change it to also allowing Screen as data type and then getting the objects in the code via listChildren() - assuming I want to annotate all plates in it - or we have to change the tooltip to accurately represent the wanted input. Maybe something like "ID(s) of your dataset(s) or plate(s)."

  2. The actual out message in the end told me "Added 5 kv pairs to 97/64 files. 396 image names not found." although the script did its job perfectly.
    There are multiple things to unpack here. First the number of kv pairs is determined by len(header)-1 this might be better represented by:

num_kv_pairs=len(header)-1
if well_index > -1:
     num_kv_pairs-=1
if plate_index > -1:
     num_kv_pairs-=1

Second the value after "/". Determined by len(images_by_name). Might be more accurate with an image_counter variable comprised by len(images_by_name) + len(wells_by_name) for each obj iteration plus len(ids) in the end.

Third, we iterate over the objects (plates) first and then over all the rows. With multiple plates worth of image and well names of course we only match with a fraction of the correct names. Every other name is labeled "not found". This leads to an irritating amount of false negatives with "396 image names not found".
To be fair I have no good solution for this from the top of my head. Maybe just skip this error message altogether, but this also is a very sub-par solution.

Well, that concludes my inputs. If I have a super-smart idea how to solve the "images not found" issue I will let you know.
If point 1, 2, 3.1 and 3.2 are addressed I feel we are green.

@will-moore
Copy link
Member Author

@JensWendt Thanks for that - I've pushed a fix (but not had time to test it yet)...
I tried to improve the output message when you select multiple Plates/Datasets. I removed the report on number of columns (since this could be different for each Object) and simply summed up the number of images updated/total.

@pwalczysko
Copy link
Member

Worked fine for me. Tested also a file with BOM. Ready to merge fmpov.

@sbesson sbesson merged commit 0c581af into ome:develop Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants