Hcs keyvals from csv #195

will-moore · 2022-01-23T22:50:23Z

This PR contains all commits from mpievolbio-scicomp#1.

To test: a CSV of this format should behave according to the comment after each row:

plate           well    image               Cell line
Index.idx.xml                               S2R+        # add KV to plate
Index.idx.xml   B2                          S2R+        # add KV to well only
Index.idx.xml   B2      Well 1 Field 1      S2R+        # add KV to image only
Index.idx.xml           Well 1 Field 2      S2R+        # add KV to image only

A couple of additional points addressed in this PR:

Improved warning if any Well or Image names don't match (see
Hcs keyvals from csv mpievolbio-scicomp/omero-scripts#1 (review))
This is now shown in response message:

Handling of various CSV delimiters by sniffing as suggested by @JensWendt (see Hcs keyvals from csv mpievolbio-scicomp/omero-scripts#1 (comment))

cc @abhamacher Does this look OK?
If you could give this a final test and 👍 if looking good, that would be great, thanks.

To test (OME)

Using Plate at https://merge-ci.openmicroscopy.org/web/webclient/?show=plate-16105 (user-3).
This has a test .csv file attached, using the format above. Sample of it pasted below. The target column says what should be annotated by each row.
Download, inspect and edit the file, then use to run the annotations_scripts/KeyVal_from_csv.py script.
Rows which have an image name will annotate that Image (Well column is ignored)
Rows without an image will annotate the Well.
NB: Can use the Remove_KeyVal.py script to remove all Well and Image annotations on the Plate if needed.

plate,well,image,target,Time,cell_count,Mitotic fraction,
to-plate_1,A1,A1 Field 1,Image A1 Field 1,10,1,0.1,
to-plate_1,A1,A1 Field 2,Image A1 Field 2,11,2,0.11,
to-plate_1,A2,A2 Field 1,Image A2 Field 1,20,3,0.2,
to-plate_1,A2,,Well A2,21,4,0.21,
to-plate_1,A3,A3 Field 1,Image A3 Field 1,30,5,0.3,
to-plate_1,A4,,Well A4,40,6,0.4,
to-plate_1,A5,,Well A5,50,7,0.5,
to-plate_1,A6,A6 Field 1,A6 Field 1,60,8,0.6,
to-plate_1,B10,B1 Field 1,Image B1 Field 1 (Well ignored),110,9,0.1,
to-plate_1,B20,B2 Field 1,Image B2 Field 1 (Well ignored),120,10,0.2,
to-plate_1,,,Plate (No Well or Image),130,11,0.3,
invalid,B4,B4 Field 1,NONE (plate name invalid),140,12,0.4,
to-plate_1,B5,B5 Field 1,Image B5 Field 1,160,13,0.5,
to-plate_1,B6,B6 Field 1,Image B6 Field 1,160,14,0.6,

mpievolbio-scicomp#1

abhamacher · 2022-02-13T20:05:36Z

Hi Will,

sorry, that my response time is so slow at the moment. This is still an important topic for me, it's just the lab work, that is keeping me busy at the moment.

I did a test right now and also really, really like the CSV delimiter sniffing. Thanks @JensWendt for the idea. Worked well for my data, which saves some effort and helps to avoid errors due to the commas in the default image names.

I only encountered one issue, when creating an Excel file with the annotations and then saving it to csv, which is the most likely scenario for our biologists. When I tried to import this csv file, the script struggled with the BOM at the beginning. The info message is green, but the log stucks at the column headers:

script params
File_Annotation 1603
IDs [1]
Data_Type Plate
set ann id 1603
Original File 22636 Copy of plate_test.csv
Using delimiter: ;
header ['\ufeffPlate', 'Well', 'Image', 'Test1', 'Compound', 'Description']

Maybe this is something you could capture and remove the BOM before processing?

Thank you very much for your effort! Looking forward to enroll the script on our production system once it's finally released.

Anna

JensWendt · 2022-02-14T10:37:26Z

Hi Anna,

I also encountered this problem.
My VERY basic solution to this was to add another line after header = data[0]
--> header[0]=str(header[0]).replace("\ufeff","")
From my understanding this is the BOM for UTF-8 encoding.
There are other possible BOMs but I only encountered this one for now.
I am sure there are other more elegant versions out there. But for now this little workaround is helping.

imagesc-bot · 2022-02-18T11:16:08Z

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/multiple-key-value-pair-map-annoation-questions/62976/5

will-moore · 2022-02-19T22:49:38Z

Hi @JensWendt @abhamacher,
I've tried handling the BOM in the same way that we do for Populate_Metadata script. I've tested this with a sample BOM script I have and it works for that - Hopefully it'll work for yours too?

JensWendt · 2022-02-21T11:35:16Z

Hi Will,

I tried it out with two different .csv files and it worked.
Seems good to me.
Thanks!

will-moore · 2022-04-06T08:25:16Z

See https://forum.image.sc/t/uploading-key-value-pairs-from-csv-files-into-omero-web-for-plates/60202/37 for issues with delimiter sniffing...

Code extension from Jens Wendt for a more robust delimiter sniffing. Slightly modified and tested by myself, see discussion at https://forum.image.sc/t/uploading-key-value-pairs-from-csv-files-into-omero-web-for-plates/60202/51

Removed unnecessary "pass" commands, as supposed by Will Moore.

imagesc-bot · 2022-05-02T13:46:07Z

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/uploading-key-value-pairs-from-csv-files-into-omero-web-for-plates/60202/58

imagesc-bot · 2022-05-10T09:14:07Z

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/uploading-key-value-pairs-from-csv-files-into-omero-web-for-plates/60202/61

JensWendt · 2022-05-12T19:31:03Z

Hello @will-moore ,

I checked with two plates in one screen. Both with multiple wells and multiple images per well. There was one combined .csv.
I have three remarks:

I had to declare

header=[]
nimg_updated = 0
missing_names = 0
images_by_name = {}

in line 137+ at the beginning of def keyval_from_csv(), because otherwise the script would complain that I referenced local variables without declaring them first when the return message was created in the end of this method.

The mouseover tooltip for the IDs script parameter is described as "Plate or Screen ID.". That is misleading,
as I can only select Dataset or Plate as data type. I tried to put the Screen ID in that my two plates were under and of course it did not work. We either change it to also allowing Screen as data type and then getting the objects in the code via listChildren() - assuming I want to annotate all plates in it - or we have to change the tooltip to accurately represent the wanted input. Maybe something like "ID(s) of your dataset(s) or plate(s)."
The actual out message in the end told me "Added 5 kv pairs to 97/64 files. 396 image names not found." although the script did its job perfectly.
There are multiple things to unpack here. First the number of kv pairs is determined by len(header)-1 this might be better represented by:

num_kv_pairs=len(header)-1
if well_index > -1:
     num_kv_pairs-=1
if plate_index > -1:
     num_kv_pairs-=1

Second the value after "/". Determined by len(images_by_name). Might be more accurate with an image_counter variable comprised by len(images_by_name) + len(wells_by_name) for each obj iteration plus len(ids) in the end.

Third, we iterate over the objects (plates) first and then over all the rows. With multiple plates worth of image and well names of course we only match with a fraction of the correct names. Every other name is labeled "not found". This leads to an irritating amount of false negatives with "396 image names not found".
To be fair I have no good solution for this from the top of my head. Maybe just skip this error message altogether, but this also is a very sub-par solution.

Well, that concludes my inputs. If I have a super-smart idea how to solve the "images not found" issue I will let you know.
If point 1, 2, 3.1 and 3.2 are addressed I feel we are green.

will-moore · 2022-05-17T12:14:03Z

@JensWendt Thanks for that - I've pushed a fix (but not had time to test it yet)...
I tried to improve the output message when you select multiple Plates/Datasets. I removed the report on number of columns (since this could be different for each Object) and simply summed up the number of images updated/total.

pwalczysko · 2022-06-24T13:34:41Z

Worked fine for me. Tested also a file with BOM. Ready to merge fmpov.

will-moore added 3 commits January 23, 2022 22:21

Apply all changes from previous PR

c998de9

mpievolbio-scicomp#1

Sniff for csv delimiter

9307de3

Warn if any csv image or well names not found

ca07f57

Read csv with encoding='utf-8-sig' to handle BOM

3d21e42

will-moore mentioned this pull request Apr 29, 2022

Add HCS support to Remove_KeyVal.py #199

Merged

will-moore and others added 5 commits May 2, 2022 13:43

Don't fail if no 'plate' or 'well' or 'image' column

2ef6558

Always check that plate name is correct if provided

a0eea57

Update KeyVal_from_csv.py

d7dda17

Code extension from Jens Wendt for a more robust delimiter sniffing. Slightly modified and tested by myself, see discussion at https://forum.image.sc/t/uploading-key-value-pairs-from-csv-files-into-omero-web-for-plates/60202/51

Removed "pass"

a6a3969

Removed unnecessary "pass" commands, as supposed by Will Moore.

Print warning instead of 'assert' to allow script to continue

11e0256

will-moore mentioned this pull request May 2, 2022

Hcs keyvals from csv, improve delimiter sniffing #200

Closed

will-moore added 4 commits May 3, 2022 23:15

Merge remote-tracking branch 'origin/develop' into hcs_keyvals_from_csv

fb70626

flake8 fixes for omero/annotation_scripts/KeyVal_to_csv.py

a175de7

More flake8 fixes to KeyVal_from_csv.py and Images_From_ROIs.py

5e5cec5

flake8 rename dataType -> data_type

3e42c0b

Fix IDs parameter description and image counts in message

e7321a3

will-moore mentioned this pull request May 17, 2022

Add Plate support to KeyVal_to_csv.py #202

Open

will-moore mentioned this pull request Jun 14, 2022

Adding Support for different CSV Encodings in Import_Scripts/Populate_Metadata.py #198

Open

will-moore mentioned this pull request Jun 24, 2022

Addding support for other encodings than utf-8 in DownloadingOriginalFileProvider ome/omero-py#325

Open

sbesson merged commit 0c581af into ome:develop Jun 27, 2022

JensWendt mentioned this pull request Aug 7, 2023

Improved delimiter sniffing for .csv files #210

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hcs keyvals from csv #195

Hcs keyvals from csv #195

will-moore commented Jan 23, 2022 •

edited

Loading

abhamacher commented Feb 13, 2022

JensWendt commented Feb 14, 2022

imagesc-bot commented Feb 18, 2022

will-moore commented Feb 19, 2022

JensWendt commented Feb 21, 2022

will-moore commented Apr 6, 2022

imagesc-bot commented May 2, 2022

imagesc-bot commented May 10, 2022

JensWendt commented May 12, 2022

will-moore commented May 17, 2022

pwalczysko commented Jun 24, 2022

Hcs keyvals from csv #195

Hcs keyvals from csv #195

Conversation

will-moore commented Jan 23, 2022 • edited Loading

To test (OME)

abhamacher commented Feb 13, 2022

JensWendt commented Feb 14, 2022

imagesc-bot commented Feb 18, 2022

will-moore commented Feb 19, 2022

JensWendt commented Feb 21, 2022

will-moore commented Apr 6, 2022

imagesc-bot commented May 2, 2022

imagesc-bot commented May 10, 2022

JensWendt commented May 12, 2022

will-moore commented May 17, 2022

pwalczysko commented Jun 24, 2022

will-moore commented Jan 23, 2022 •

edited

Loading