Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Micromanager: parse JSON tag in each TIFF #2213

Merged
merged 4 commits into from
Feb 1, 2016

Conversation

melissalinkert
Copy link
Member

See https://trello.com/c/499UGA5U/72-micromanager-stage-positions-and-json

This reads the JSON data from each TIFF and stores the key/value pairs
in the original metadata table. Stage position values ("*PositionUm")
will also be stored in OME-XML, if they are present.

As each TIFF tag contains a single JSON object, simple parsing of the
key/value pairs without an external library was the easiest solution for now.
We'll likely want to revisit this once we have a better plan for dealing
with JSON data in general though.

To test, use the dataset in curated/micromanager/nico/Untitled_4_multi_file (config PR forthcoming). Without this change, showinf -omexml should not show PositionX, PositionY, or PositionZ values for Plane in the OME-XML. With this change, all 3 values should be set for every Plane, and the original metadata table should show many entries beginning with Plane.

See https://trello.com/c/499UGA5U/72-micromanager-stage-positions-and-json

This reads the JSON data from each TIFF and stores the key/value pairs
in the original metadata table.  Stage position values ("*PositionUm")
will also be stored in OME-XML, if they are present.

As each TIFF tag contains a single JSON object, simple parsing of the
key/value pairs without an external library was the easiest solution.
We'll likely want to revisit this once we have a better plan for dealing
with JSON data in general though.
@melissalinkert
Copy link
Member Author

See configuration for new dataset: https://github.com/openmicroscopy/data_repo_config/pull/73

@julou
Copy link

julou commented Jan 26, 2016

As per Melissa's request, I attach an example dataset acquired with the demo config of MM 1.4.22. tifutill tells that it has all the required tags:

50838 (0xc696) LONG (4) 5<28 5496 32 768 768>
50839 (0xc697) BYTE (1) 7092<0x4a 0x49 ...>
51123 (0xc7b3) ASCII (2) 3329<{"Objective-Name":"DObje ...>

@joshmoore
Copy link
Member

NB: general question which will also become an issue for OMERO -- at what stage do we refactor json parsing out to helpers and/or introduce a dependency?

@melissalinkert
Copy link
Member Author

Thanks, @julou. We'll fix that dataset in a separate pull request, as it is a proper OME-TIFF dataset and not the original Micromanager format (i.e. no metadata.txt file). Noted here so we don't lose track: https://trello.com/c/0FWF75pF/88-parse-extra-ome-tiff-tags

@joshmoore: probably would be good to do that in the not-very-distant future. I just didn't want to make a unilateral decision (given the number of library choices), and assumed we wouldn't have bandwidth to discuss/trial/decide in the current milestone.

@julou
Copy link

julou commented Jan 28, 2016

@melissalinkert I'm afraid there might be some misunderstanding here and I really don't understand why you want to make this a separate issue. As explained by M. Tsuchida:

There are two copies of metadata in the .ome.tif files saved by Micro-Manager. One is Micro-Manager's native metadata, which is identical to what is (optionally) saved to metadata.txt. The other is the OME-XML metadata.

So the dataset I sent is nothing else than a proper sample dataset of micromanager (wit both OME metadata and JSON config-specific metadata, although the latter are not duplicated in a metadata.txt file). As far as I understand, parsing metadata should not rely on the .txt file, all the more so as theese are not imported by omero (at least I didn't manage so far). From your answer, I'm afraid that the current reader relies at some point on the metadata.txt file, is this the case or not?

@julou
Copy link

julou commented Jan 28, 2016

btw I just looked at your modified code and realised you indeed get the JSON MM Metadata from metadata.txt.
The general approach would be to parse the exact same string from tiff tag 51123 (that you currently don't use as far as I can say). This should produce the same result and work with all datasets, whether or not a metadata.txt file is saved (saving a metadata.txt file is optional in MM and again this file is not imported by the omero importer)

Thank you very much for your work on this!

@bramalingam
Copy link
Member

Works as expected. Please merge..

@julou
Copy link

julou commented Jan 29, 2016

@bramalingam I'm surprised that you simply ignored my last comments. Relying on the metadata.txt files rather than the internal tiff tag makes the patch useless for most users and as far as I can tell for omero import…

@sbesson
Copy link
Member

sbesson commented Jan 29, 2016

Hi @julou and apologies for the delay in answering your comment.

From the OME point of view, Micro-Manager currently saves its acquired data under two types of file formats as described in the dataset structure table:

  • the MicroManager format, i.e. a metadata.txt with a set of TIFF files
  • the OME-TIFF format, i.e. .ome.tiff file(s) following the OME Model specification

We certainly understand that this distinction might feel irrelevant for most consumers of Micro-Manager. However at the Bio-Formats level, each file format is handled by a separate image reader. This format duplicity has some constraints in terms of code management, bug fixes, testing and documentation. This is also why we tend to open pull request for individual readers.

The current PR opened by @melissalinkert targets the parsing of the JSON metadata for the MicroManager format only and this is why it was functionally tested by @bramalingam within this scope.

For files saved as .ome.tif, the situation is more complex as Melissa eluded above. Although the OME-TIFF files generated by Micro-Manager are fully valid from an OME-XML point-of-view, the usage of a custom TIFF tag to store extra metadata is not part in the specification. As pointed out by Mark in the related Micro-Manager thread, supporting this custom metadata is effectively equivalent to adding support for a new file format. We will follow-up on the threads initiated both on the ome-users and the Micro-Manager mailing lists and engage the conversation on how to store this extra metadata into the existing OME-TIFF format without creating a new specification.

Best,
Sebastien

@ChrisWeisiger
Copy link

Hi guys, one of the µManager devs here. There seems to have been some miscommunication going on here, so I'd just like to work on figuring out exactly what is going on and what is needed.

First off, looking at the diff for this PR, it seems like it's actually parsing JSON from tag 50839. I don't see any reference to the metadata.txt file, but I also don't see how accessing tag 50839 gets you the JSON metadata we generate. So I'm a bit confused about what this change actually does.

Second, regarding file formats: µManager's file format does not inherently include the metadata.txt file. As Thomas notes, generation of this file is optional. It is intended primarily for users who want to be able to access image metadata without going through µManager. When µManager itself reads a µManager dataset, the file is ignored in favor of TIFF tag 51123. So I think it's incorrect to characterize the µManager file format as "a metadata.txt with a set of TIFF files". I'm not familiar with BioFormats' implementation or what is and is not easy to accomplish, but if BioFormats is relying on the metadata.txt, then it is imperative that the file loader gracefully handle the absence of a metadata.txt file, presumably by simply not populating the X/Y/Z positions.

@sbesson I admit to being a bit confused by your comment. There's already a MicromanagerReader class; what technical hurdle prevents that class from accessing tag 51123, reading the JSON, and extracting any relevant tags from it for insertion into the OME? Of course ideally we (that is, µManager) would be storing this information in the OME ourselves when we write the file; our handling of OME metadata is far from perfect. We want to do a serious revamp of how µManager stores data, for many reasons, but that's a fairly long ways off.

On a side note, manually parsing JSON makes me uneasy.

@joshmoore
Copy link
Member

Hi @ChrisWeisiger -- was just writing up a response to the ome-users & micro-manager-general mailing lists about this. Any chance of discussing briefly on IRC (freenode's #ome) or on https://gitter.im/openmicroscopy/bioformats?

@joshmoore
Copy link
Member

In the meantime, a few comments:

So I think it's incorrect to characterize the µManager file format as "a metadata.txt with a set of TIFF files".

Sorry for any misrepresentation, a better way of saying this would be "for Bio-Formats ... µManager file format as "a metadata.txt with a set of TIFF files", because this is exactly what the reader scans for at present.

if BioFormats is relying on the metadata.txt, then it is imperative that the file loader gracefully handle the absence of a metadata.txt file, presumably by simply not populating the X/Y/Z positions.

Assuming the metadata is present in the OME-TIFF IFD, then that'll happen. But from the Bio-Formats point-of-view, this will be an OME-TIFF, and not a µManager file.

what technical hurdle prevents that class from accessing tag 51123, reading the JSON, and extracting any relevant tags from it for insertion into the OME?

At the moment, the primary hurdle is that this looks to be a new format from the Bio-Formats perspective and one that could potentially be confused (internally) with OME-TIFF itself. And that's the type of corner cases it would be good to work through.

Of course ideally we (that is, µManager) would be storing this information in the OME ourselves when we write the file...

On that point, it might be good to pass some examples back and forth so we're clear on expectations.

Slowly sinking into weekend mode. If I/we don't see you on IRC, etc. looking forward to further discussions. ~Josh

@ChrisWeisiger
Copy link

Okay, now that I've looked at the entire file instead of just what was modified in this diff, I see that you are indeed loading the metadata.txt file as Thomas said. I'm on IRC now and available to talk about our plans and how best to handle this kind of thing in future.

@joshmoore
Copy link
Member

Hi @julou. A quick follow-up before the weekend having chatted on IRC with @ChrisWeisiger: our best suggestion for the moment is to continue generating the separate metadata.txt in µManager.

Without it, the MicromanagerReader (modified here) is not used at all, and instead the OMETiffReader is used, meaning the metadata is not accessible to you.

@ChrisWeisiger and I discussed possible solutions for down the road, but changes in neither project is going to be immediately useful for you (nor others). As soon as I can, I'll follow-up on your ome-users email thread and CC the micro-manager-general mailing list with general thoughts on the way forward.

@julou
Copy link

julou commented Jan 31, 2016

Hi @joshmoore. Thanks a lot for looking more deeply into this and taking the time to coordinate with MM guys (thanks @ChrisWeisiger btw).

As far as I understand, when looking at a dataset, BF will only use the MM reader if it's made of a metadata.txt file + a bunch of tifs, otherwise it'll fall back into using the default OME-TIFF reader… correct?

About relying on metadata.txt files at the moment, I come back on one earlier comment: the whole point of asking this parsing from my perspective is to get metadata displayed in OMERO. My user experience so far is that I couldn't get omero to import anything but this… Is there a way to force the java importer to look for specific files (e.g. based on name patterns)?

As a side question, is it possible to customise the OME-TIF reader. If yes, what would be more relevant than detecting MM datasets based on metadata.txt files would be to detect them based on name patterns (either a bunch of tif stacks with MMStack_Pos[0-9]+ or tif files organised inside several Pos[0-9]+ directories) and then to use the current default reader with an extra step of tags parsing (in particular the JSON string in tiff tag 51123). Is this nonsense? (I attach a bunch of demo datasets created using either save as separate files or save as stack files).

@julou
Copy link

julou commented Feb 1, 2016

btw I did some more tests today to try to force omero to import the metadata.txt files but I was not successful so far. @joshmoore your hints on how to achieve this would be more than appreciated.

@joshmoore
Copy link
Member

As far as I understand, when looking at a dataset, BF will only use the MM reader if it's made of a metadata.txt file + a bunch of tifs, otherwise it'll fall back into using the default OME-TIFF reader… correct?

Correct, though there looks to be one difficulty with your latest dataset in that there's a prefix before _metadata.txt which doesn't match what the MicromanagerReader expects. We'll investigate handling that change with a separate PR.

About relying on metadata.txt files at the moment, I come back on one earlier comment: the whole point of asking this parsing from my perspective is to get metadata displayed in OMERO. My user experience so far is that I couldn't get omero to import anything but this… Is there a way to force the java importer to look for specific files (e.g. based on name patterns)?

Understood, but there's some difficulty here in that we're dealing with at least 2 different file formats (from our perspective). With this PR, we'd assume that the dataset containing a metadata.txt will do as you expect. For datasets with the new naming convention, give us a chance to investigate a fix and we'll get back to you. Our hope would be that the file detection will work as you expect, even in the java importer.

As a side question, is it possible to customise the OME-TIF reader.

Not really, and by design. We'll try to clarify why in the spec documentation itself, but the burden of detecting custom uses of OME-TIFFs is too high at the moment. Instead, we'd like to investigate making the extension points explicit.

If yes, what would be more relevant than detecting MM datasets based on metadata.txt files would be to detect them based on name patterns (either a bunch of tif stacks with MMStack_Pos[0-9]+ or tif files organised inside several Pos[0-9]+ directories) and then to use the current default reader with an extra step of tags parsing (in particular the JSON string in tiff tag 51123). Is this nonsense? (I attach a bunch of demo datasets created using either save as separate files or save as stack files).

This is conceptually possible, though it's a part of the MicromanagerReader rather than the OMETiffReader. However, from our point of view, this is equivalent to defining a new file format, and at the moment, that won't be possible.

btw I did some more tests today to try to force omero to import the metadata.txt files but I was not successful so far. @joshmoore your hints on how to achieve this would be more than appreciated.

Which fileset particularly? For example, test_sep/Pos0 imports with this PR as:

with-2213

@julou
Copy link

julou commented Feb 1, 2016

Which fileset particularly? For example, test_sep/Pos0 imports with this PR as:

True! even in my hands and with our omero server!!!

Correct, though there looks to be one difficulty with your latest dataset in that there's a prefix before _metadata.txt which doesn't match what the MicromanagerReader expects. We'll investigate handling that change with a separate PR.

Yes, I now realise that this might be the root of the misunderstanding. It looks like you're currently not taking into account that MM dataset can be saved as stacks or separate files (one per frame) depending on the user taste. Apparently your reader expects a file named metadata.txt, and I assumed up to now that it could be any metadata file finishing with this pattern (didn't even notice the difference to tell the truth)… The reason for that is that files management is easier on our side with few large files than several small ones, hence with always save dataset as stacks (and it's not really possible to consider to change this).

I realise this requires another ticket. Shall I send another email to the ome-users mailing list? Also, if you extend the MMreader so that it takes all metadata files into account, will this automatically propagates to omero? or is it hardcoded somewhere in omero to import metadata.txt files (but not *_metadata.txt files)?

Thanks a lot for your patient support!

@sbesson
Copy link
Member

sbesson commented Feb 1, 2016

Yes, I now realise that this might be the root of the misunderstanding. It looks like you're currently not taking into account that MM dataset can be saved as stacks or separate files (one per frame) depending on the user taste. Apparently your reader expects a file named metadata.txt, and I assumed up to now that it could be any metadata file finishing with this pattern (didn't even notice the difference to tell the truth)…

Absolutely, at the moment the reader is looking for a file exactly named metadata.txt. This assumption is derived from the datasets which had been made available to us via the community so far.

The reason for that is that files management is easier on our side with few large files than several small ones, hence with always save dataset as stacks (and it's not really possible to consider to change this).

Understood. Generating less files of larger but manageable size is completely sensible.

I realise this requires another ticket. Shall I send another email to the ome-users mailing list? Also, if you extend the MMreader so that it takes all metadata files into account, will this automatically propagates to omero? or is it hardcoded somewhere in omero to import metadata.txt files (but not *_metadata.txt files)?

OMERO will automatically import the set of files which have been detected and grouped together by Bio-Formats. This means that once the MicromanagerReader is extended to support *_metadata.txt, any version of OMERO which uses the new release of Bio-Formats should automatically be able to include such files at import and parse their metadata.

Thanks a lot for your patient support!

Thanks for your input and uploading new datasets.

As this PR is starting to turn into an epic (but hopefully fruitful) discussion, I will now merge its improvements which has been reviewed both at the Bio-Formats and OMERO level. As mentioned above, we will deal with the other issues like the recognition of *_metadata.txt files in follow-up PRs.

The goal is to have these changes reviewed and included by the end of this week so that the upcoming releases of Bio-Formats 5.1.8 and OMERO 5.2.2 can digest the four filesets you uploaded in #2213 (comment) and parse their metadata.

sbesson added a commit that referenced this pull request Feb 1, 2016
Micromanager: parse JSON tag in each TIFF
@sbesson sbesson merged commit a642d0b into ome:develop Feb 1, 2016
@sbesson
Copy link
Member

sbesson commented Feb 1, 2016

--rebased-to #2224

@julou
Copy link

julou commented Feb 1, 2016

The goal is to have these changes reviewed and included by the end of this week so that the upcoming releases of Bio-Formats 5.1.8 and OMERO 5.2.2 can digest the four filesets you uploaded in #2213 (comment) and parse their metadata.

Wow! Really excited about that :)
I didn't intend to provide reference datasets with the previous ones, and I'm a bit surprised that files in test2_stack have no prefix (I attach a more canonical one). In fact I can think of 4 types of datasets naming that you might want to take into account: (separate files or stacks) * (MDA saving / manual saving) where MDA stands for multidimensional acquisition.
@ChrisWeisiger is this correct? I'm not exactly sure of what it means in terms of prefix prepending anyway…

For this week, focusing on supporting the stack format of the current stable release (i.e. test_stack) would already be great!

@ChrisWeisiger
Copy link

@julou In principle it should not make a difference if you save images as they are acquired vs. if you save them after accumulating a complete dataset in RAM; however, it would not surprise me if there were minor differences. Both should be valid OME datasets (and of course, valid MM datasets).

As far as filenames are concerned, images saved as they are acquired will have the name of the directory as a prefix (e.g. files saved in /tmp/test will be named like "test_1_MMStack_Pos0.ome.tif") while images saved after accumulating the complete set in RAM will have names like "MMStack_Pos0.ome.tif" regardless of where they are saved. However, this is not considered meaningful by µManager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants