Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #10703: download pre-FS original metadata in FS #1717

Merged
merged 8 commits into from Nov 14, 2013

Conversation

mtbc
Copy link
Member

@mtbc mtbc commented Nov 6, 2013

Fixes http://trac.openmicroscopy.org.uk/ome/ticket/10703. To test,

  1. run a local 4.4 server
  2. import files with metadata, e.g., from ome/data_repo/test_images_metadata/
  3. upgrade to 5.0, using sql/psql/OMERO5.0DEV__6/OMERO4.4__0.sql
  4. import other files with metadata
  5. check that client "download original metadata" works okay for both pre-FS and FS images, both global and series metadata
  6. check that OriginalMetadataRequestTest.testMetadataParsing() passes.

(Lines may be re-ordered but ought still be under the correct sections.)

I first tried using org.apache.commons.configuration.HierarchicalINIConfiguration to parse the file but, for our kind of INI-format, it turned out to be more trouble than it was worth.

--no-rebase as 5.0-specific

if (rsp.fileAnnotationId != null) {
final IQuery iQuery = helper.getServiceFactory().getQueryService();
final FileAnnotation fileAnnotation = iQuery.get(FileAnnotation.class, rsp.fileAnnotationId.getValue());
final String filePath = pixelsService.getFilesPath(fileAnnotation.getFile().getId());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not looking under /OMERO/Pixels rather than /OMERO/Files/?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's getPixelsPath.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! Cheers.

@joshmoore
Copy link
Member

Using mt1_R3D_D3D.dv before and after upgrade and the following script:

import omero
import omero.all
from omero.cmd import OriginalMetadataRequest
from omero.gateway import BlitzGateway
from omero.rtypes import unwrap
from path import path


c = omero.client("localhost")
c.createSession(...)
g = BlitzGateway(client_obj=c)

rsps=[]
for i in (1,51):
    req = OriginalMetadataRequest()
    req.imageId = i
    handle = c.sf.submit(req)
    cb = g._waitOnCmd(handle)
    rsps.append(cb.getResponse())

a,b = rsps
print a.filesetId, b.filesetId
print a.fileAnnotationId, b.fileAnnotationId
a_keys = set(a.globalMetadata.keys())
b_keys = set(b.globalMetadata.keys())

print a_keys - b_keys
print b_keys - a_keys

for key in sorted(a_keys):
    print key, unwrap(a.globalMetadata[key]), unwrap(b.globalMetadata[key])

I get the following differences in keys:

set(['Z axis reduction quotient', 'Title 4', 'Title 7', 'Wavelength 4 (in nm)', 'Title 3', 'Title 2', 'Min, Max, Mean (w', 'Wavelength 1 (in nm)', 'Wavelength 5 (in nm)', 'Wavelength 2 (in nm)', 'Wavelength 3 (in nm)', 'Offset to first plane', 'Number of Sub-resolution sets'])

and

set(['Z position for position #2', 'Y position for position #3', 'X position for position #2', 'Z position for position #4', 'Z position for position #3', 'Title #07', 'Title #08', 'Min, Max, Mean (w=617.0 nm)', 'Y position for position #2', 'Title #02', 'Title #03', 'Title #04', 'Title #05', 'Title #06', 'Y position for position #4', 'X position for position #3', 'X position for position #4', 'Title #10', 'Title #01', 'Title #09', 'Min, Max, Mean (w=470.0 nm)'])

which looks as if the parsing of = within () is not handled.

@mtbc
Copy link
Member Author

mtbc commented Nov 7, 2013

It isn't. The Apache Commons library I tried didn't seem to quite fit the INI-like format that we use, and your testing reveals that going too simple doesn't either. Is there a spec somewhere of our version of it so I know what to handle how? (I am thinking that '=' within '()' might not be the only problem, for instance perhaps we do something with quoting or escaping or other kinds of bracket or brace too?) Or, avoiding parenthesis balancing, perhaps it is always okay to split only at the last '=', ignoring any earlier -- we could give that a try?

@joshmoore
Copy link
Member

/cc @melissalinkert @will-moore @jburel

@melissalinkert
Copy link
Member

@mtbc, have you tried the loci.common.IniParser class? Not sure if that will work, but it (and the other Ini* classes) are what Bio-Formats uses internally for reading and writing INI files.

@mtbc
Copy link
Member Author

mtbc commented Nov 7, 2013

It probably might, thank you, I shall give it a try.

@mtbc
Copy link
Member Author

mtbc commented Nov 8, 2013

Hmm, it looks to delegate to ome.scifio.common.IniParser whose parseINI does nothing special with parentheses and splits on the first '='.

@jburel
Copy link
Member

jburel commented Nov 10, 2013

agree that Download and downgrade (currently in insight) should be done in B-F but not in that PR.

@mtbc
Copy link
Member Author

mtbc commented Nov 10, 2013

I can certainly see adding unit testing for parseOriginalMetadataTxt.

Once this PR is agreed to work, if deemed desirable for Bio-Formats I could try to introduce splitOnEquals into IniParser and then use that class in OriginalMetadataRequestI, perhaps after beta2.

@joshmoore
Copy link
Member

@mtbc : sounds like a plan.

omero.cmd.fs.OriginalMetadataRequestTest.testMetadataParsing
which presumably someday gets moved with other Blitz tests to OmeroJava
@bpindelski
Copy link

  1. Pre-upgrade images have metadata default file names as "original_metadata.txt". Post-upgrade files have random file names...
  2. leica-lif has differences between original metadata files (images imported after the upgrade seem to have less information in the original metadata file downloaded from Insight)
  3. For metamorph\sample)_3x3,stk imported after upgrade, I couldn't download the metadata (Metadata could not be retrieved)
  4. For zeiss-lsm/colocsample1b.lsm that was imported post-upgrade, the MIME type for the original metadata text file is wrong (file foo.txt returns foo.txt: data)
  5. For post-upgrade imported zeiss-lsm/sample files.mdb, the right hand side Acquisition panel has some strange entries (Recordings ...):
    screen shot 2013-11-12 at 10 58 38

In general - the metadata files are present, but with minor (major?) differences. It's hard for me to judge the implications, so maybe someone with more metadata-fu could have a look? /cc @pwalczysko

The unit test runs fine. If the issues mentioned by me are to be handled in a different PR, this is OK to merge.

@mtbc
Copy link
Member Author

mtbc commented Nov 12, 2013

How do the downloaded files differ from the original_metadata.txt that end up in Files/? (Probably it's easy to find the right ones, but let me know if you'd like me to dig up some SQL.)

I'd guess that the MIME type issue is unrelated to this PR but very likely worth at least ticketing. I wonder if the Leica issue is some Bio-Formats change or something. I'll cc @rleigh-dundee as he may have some familiarity with all this too.

@bpindelski
Copy link

@mtbc The differences were either related to the # symbol being used in series numbers or missing/added metadata fields (e.g. pre-upgrade imported image had less fields in the metadata file than post-upgrade imported image). I didn't do a full image-by-image comparison of all the metadata attached to files in test_images_metadata - that's a full day's job (unless we can automate it, as there are 381 image files - yes, some of them are part of a MIF).

@mtbc
Copy link
Member Author

mtbc commented Nov 12, 2013

Thank you, I will investigate tomorrow.

@mtbc
Copy link
Member Author

mtbc commented Nov 13, 2013

So, the files I'll look at are,

omero=> select i.name, of.name, a.file from image i, imageannotationlink ial, annotation a, originalfile of where i.id = ial.parent and ial.child = a.id and a.file = of.id;
                                                 name                                                  | file 
-------------------------------------------------------------------------------------------------------+------
 /Volumes/ome/data_repo/test_images_metadata/leica-lif/01_4C1Z.lif                                     |   19
 /Volumes/ome/data_repo/test_images_metadata/metamorph/sample_3x3.stk                                  |   20
 /Volumes/ome/data_repo/test_images_metadata/zeiss-lsm/colocsample1b.lsm                               |   21
 /Volumes/ome/data_repo/test_images_metadata/zeiss-lsm/sample files.mdb/sample files.mdb [XY-ch-02]    |   23
 /Volumes/ome/data_repo/test_images_metadata/zeiss-lsm/sample files.mdb/sample files.mdb [XY-ch-03]    |   24
 /Volumes/ome/data_repo/test_images_metadata/zeiss-lsm/sample files.mdb/sample files.mdb [XY-ch]       |   25
 /Volumes/ome/data_repo/test_images_metadata/zeiss-lsm/sample files.mdb/sample files.mdb [XYT]         |   26
 /Volumes/ome/data_repo/test_images_metadata/zeiss-lsm/sample files.mdb/sample files.mdb [XYZ-ch-20x]  |   27
 /Volumes/ome/data_repo/test_images_metadata/zeiss-lsm/sample files.mdb/sample files.mdb [XYZ-ch-zoom] |   28
 /Volumes/ome/data_repo/test_images_metadata/zeiss-lsm/sample files.mdb/sample files.mdb [XYZ-ch]      |   29
 /Volumes/ome/data_repo/test_images_metadata/zeiss-lsm/sample files.mdb/sample files.mdb [XYZ-ch0]     |   30
(11 rows)

@will-moore
Copy link
Member

FYI: I found a bug in download of pre-FS original files while reviewing another PR: #1738. Don't know if that's something related to this?

@mtbc
Copy link
Member Author

mtbc commented Nov 13, 2013

I am comparing the original_metadata.txt from Files/ with the original metadata files I download from Insight. For most of the above images, I find them the same, apart from that lines within a section are reordered. So, if @bpindelski is viewing them in ways that make most of them seem very different then that is probably unrelated to this PR and worth a ticket. There are, however, three files that exhibit differences, about which I shall separately comment.

@mtbc
Copy link
Member Author

mtbc commented Nov 13, 2013

leica-lif/01_4C1Z.lif has an empty [GlobalMetadata] section that quite reasonably is omitted on download.

@mtbc
Copy link
Member Author

mtbc commented Nov 13, 2013

zeiss-lsm/sample files.mdb/sample files.mdb [XY-ch] has a "line",

Recording #1 Notes=IHC 15.07.08,  Part I Sequenza for comparison of the intensity of CFTR signal between CF and Non CF and between Sequenza  AR 























1/2 AW non CF/CK+,  1/ JTDF  CF  slide 14.11.07 hnb
IHC done by Heather 1/2 cells are non CF labelled with CK and G449 and 1/2 CF labelled only with G449
Ref pict from 18.09.07 stack 1, modified to 12 bit

which got parsed as just the first line and the other lines were ignored.

I think I'd probably argue here that this was the right thing to do, unless we really should be appending following (even blank) lines onto values without any explicit quoting or line continuation hint at all? I don't know if anyone watching this PR has a strong opinion on this or if it's worth an RFE ticket.

@mtbc
Copy link
Member Author

mtbc commented Nov 13, 2013

zeiss-lsm/sample files.mdb/sample files.mdb [XYZ-ch] has the same issue as above with exactly the same "line".

@mtbc
Copy link
Member Author

mtbc commented Nov 13, 2013

So, at this point I think I'm still happy with this PR as at least being a substantial improvement unless that awful many-line Recording #1 Notes must be handled differently.

GIven that @bpindelski mentions an issue with # I would guess that he's using a code path that uses ome.scifio.common.IniParser which does,

  private String commentDelimiter = "#";
      String line = in.readLine();
      if (line == null) break;
      no++;

      // strip comments
      if (commentDelimiter != null) {
        int comment = line.indexOf(commentDelimiter);
        if (comment >= 0) line = line.substring(0, comment);
      }

which, given the nature of our actual input data, may well be worth a ticket.

@mtbc
Copy link
Member Author

mtbc commented Nov 13, 2013

@will-moore: I think, probably unrelated, but thank you!

@mtbc
Copy link
Member Author

mtbc commented Nov 13, 2013

It should be noted that the same images imported in 4.4 and upgraded, and imported in 5.0, may not have identical metadata due to various other code differences, especially in Bio-Formats.

@mtbc
Copy link
Member Author

mtbc commented Nov 14, 2013

Filed http://trac.openmicroscopy.org.uk/ome/ticket/11684 and http://trac.openmicroscopy.org.uk/ome/ticket/11685 about the INI format issues. @bpindelski please could you file one about how to reproduce the MIME type issue you discovered? This PR at least, I think, is a significant improvement over the current state of affairs (wherein one couldn't download any metadata in OMERO 5 for images imported prior to upgrade from OMERO 4).

@bpindelski
Copy link

@mtbc Ticket for MIME type difference opened: https://trac.openmicroscopy.org.uk/ome/ticket/11688

I also agree that this PR improves the state of the original metadata aspect. I don't have anything against merging, and massaging out the subtler bugs later on.

@joshmoore
Copy link
Member

Thanks, guys!

joshmoore added a commit that referenced this pull request Nov 14, 2013
fix #10703: download pre-FS original metadata in FS
@joshmoore joshmoore merged commit a6df36d into ome:develop Nov 14, 2013
@mtbc mtbc deleted the trac-10703-original-metadata branch November 14, 2013 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants