Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Micromanager: reader crashes with stack format and large datasets #2308

Closed
julou opened this issue Mar 22, 2016 · 11 comments · Fixed by #2475
Closed

[bug] Micromanager: reader crashes with stack format and large datasets #2308

julou opened this issue Mar 22, 2016 · 11 comments · Fixed by #2475

Comments

@julou
Copy link

julou commented Mar 22, 2016

Hello,

Following your fixes to parse config-specific metadata in MM datasets, I realised that I focused on providing small datasets (for obvious practical reasons). Hence the current MM reader (5.1.8) crashes when the dataset is split over several tiff files.

In more details:

  • when MM datasets are stored in stack format and the dataset size exceeds 4.7gb, it's split over several files (e.g. prefix_1_MMStack.ome.tif, prefix_1_MMStack_1.ome.tif, etc)
  • because MM can save one tif file (or set of tif files if > 4.7gb) per position, it is sometimes useful to use BF to open PosXX (in fact it would be my primary use case!)
  • with BF 5.1.7, the following readers were used depending on the input path:
    • first tif file of the dataset: OME_TIF reader
    • any other tif file: TIF reader
  • with current BF 5.1.8, the following readers are used depending on the input path:
    • first tif file of the dataset: OME_TIF reader
    • any other tif file with associated *_metadata.txt file: MM reader (OK provided the position dataset isn't >4.7gb)
    • any other tif file (without *_metadata.txt file): TIF reader

I would be happy to provide you with a demo large dataset, or you can easily create one using MM's demo config and the Multi Dimensional Acquisition window (select e.g. enough time points so that the dataset size is >4.7gb).

Also, as far as I understand it, it means that there is no easy way to open the first position of a dataset (because when this path is given as input, the OME-TIF reader is triggered which will load the full dataset). Although I don't understand how the reader selection works (but that you want to keep a way to trigger the OME-TIF reader for such datasets), I find this very cumbersome! What would be the other possible designs?

Thank you in advance for your attention. Best,
Thomas

@julou julou changed the title [bug] Micromanager: issue with stack format and large datasets [bug] Micromanager: reader crashes with stack format and large datasets Mar 22, 2016
@sbesson
Copy link
Member

sbesson commented Mar 24, 2016

Hi @julou ,

many thanks for reporting this issue. We are currently investigating it together with @bramalingam following the workflow you recommended above i.e. creating a large multi-dimensional acquisition using the demo config of MM 1.4.22.

This initial investigation raised a couple of questions regarding your issue:

  • we assume you are using a recent version of Micro-Manager, i.e. 1.4.22 or later, is this correct?

  • which tool are you using to open the images with Bio-Formats: ImageJ/Fiji or something else?

  • how do you determine the reader used for each file? Using the 5.1.8 command-line tools against a multi-dimensional acquisition fileset with the following listing:

    $ ls -alh
    total 5.0G
    drwxrws--- 2 bramalingam omedev  16K Mar 23 14:03 .
    drwxrws--- 4 bramalingam omedev  16K Mar 23 14:20 ..
    -rwxrwx--- 1 bramalingam omedev 989M Mar 23 14:04 Test_Dataset_1_MMStack_Pos0_1.ome.tif
    -rwxrwx--- 1 bramalingam omedev 4.0G Mar 23 14:04 Test_Dataset_1_MMStack_Pos0.ome.tif
    

    the OME-TIFF reader is selected independently of the input file for us.

As always, we'll keep you posted of our progress and will let you know if we need sample files to reproduce your issue.

Best,
Sebastien & Balaji

@julou
Copy link
Author

julou commented Mar 24, 2016

Hi Sebastien & Balaji,

Thanks a lot for taking this into consideration!

we assume you are using a recent version of Micro-Manager, i.e. 1.4.22 or later, is this correct?

yes

which tool are you using to open the images with Bio-Formats: ImageJ/Fiji or something else?

either the fiji plugin or showinf of the CL tools

how do you determine the reader used for each file? Using the 5.1.8 command-line tools against a multi-dimensional acquisition fileset the OME-TIFF reader is selected independently of the input file for us.

using the CL tools, but my report was imprecise, it's not the first file but the first position of the dataset that is recognised by the OME-TIF reader (and all its files have the metadata if the dataset is split).

So you need to create a large dataset with multiple positions if you want to see the MM reader being called…

Talking with MM devs, I realised that it's not always the first position:

In point of fact, the OME metadata is stored in the first file that µManager can find that has enough room to store it in; the other files are then pointed to that file. This may always be the first file in practice; I haven't investigated. We actually have fallback logic for if no file has enough room for the OME metadata, in which case it is stored in a newly-created file and all other files are told to point to that file.

Best,
Thomas

@sbesson
Copy link
Member

sbesson commented Mar 24, 2016

Hi @julou, thanks for the quick answer and the additional information. We will regenerate multi-position large datasets, test them and report back.

@sbesson
Copy link
Member

sbesson commented Mar 29, 2016

Hi Thomas,

after some more investigation, I was able to reproduce part of your issue. Using the demo config, I generated the following output

sbesson@c001025:Test_Dataset_MultiPosition_Stack_1 $ ls -alh
total 18217040
drwxr-xr-x  6 sbesson  1133848969   204B 25 Mar 15:46 .
drwxr-xr-x  6 sbesson  1133848969   204B 25 Mar 15:46 ..
-rw-r--r--  1 sbesson  1133848969   4.0G 25 Mar 15:43 Test_Dataset_MultiPosition_Stack_1_MMStack_Pos1.ome.tif
-rw-r--r--  1 sbesson  1133848969   357M 25 Mar 15:43 Test_Dataset_MultiPosition_Stack_1_MMStack_Pos1_1.ome.tif
-rw-r--r--  1 sbesson  1133848969   4.0G 25 Mar 15:43 Test_Dataset_MultiPosition_Stack_1_MMStack_Pos2.ome.tif
-rw-r--r--  1 sbesson  1133848969   350M 25 Mar 15:43 Test_Dataset_MultiPosition_Stack_1_MMStack_Pos2_1.ome.tif

I was able to reproduce the inconsistency in reader selection depending on the selected file. It turns out there is nothing wrong with the fileset and the way the metadata file is referred to from the point of view of the OME Data model:

sbesson@c001025:Test_Dataset_MultiPosition_Stack_1 $ tiffcomment Test_Dataset_MultiPosition_Stack_1_MMStack_Pos2.ome.tif 
<?xml version="1.0" encoding="UTF-8" standalone="no"?><OME xmlns="http://www.openmicroscopy.org/Schemas/OME/2015-01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openmicroscopy.org/Schemas/OME/2015-01 http://www.openmicroscopy.org/Schemas/OME/2015-01/ome.xsd"><BinaryOnly MetadataFile="Test_Dataset_MultiPosition_Stack_1_MMStack_Pos1_1.ome.tif" UUID="urn:uuid:df0e950b-7d31-4e0f-9e0a-47a767461b22"/></OME> 

Instead, this points to a regression in the OMETiffReader which was introduced in #2220 while trying to fix another scenario where the metadata OME-TIFF does not exist. Going back to Bio-Formats 5.1.6, the OME-TIFF reader is indeed selected independently of the chosen OME-TIFF. We will work on getting this fixed for 5.1.9 so that all .ome.tiff files are opened unequivocally using the same reader.

Answering one of your earlier questions, it is possible to use the Bio-Formats command line tools and impose the reader. For instance to read any position using the TiffReader, you can do something along the lines of:

showinf -format Tiff Test_Dataset_MultiPosition_Stack_1_MMStack_Pos1.ome.tif

Best,
Sebastien

@julou
Copy link
Author

julou commented Apr 1, 2016

Hi Sebastien,

Thanks for your efforts and sorry for the delayed answer. I was off for a few days.

after some more investigation, I was able to reproduce part of your issue.

Great. I assume that you were able to reproduce that the current MMreader does not concatenate the different files of one position. Any progress on this?

I was able to reproduce the inconsistency in reader selection depending on the selected file. It turns out there is nothing wrong with the fileset and the way the metadata file is referred to from the point of view of the OME Data model (…)
Instead, this points to a regression in the OMETiffReader which was introduced in #2220 while trying to fix another scenario where the metadata OME-TIFF does not exist. Going back to Bio-Formats 5.1.6, the OME-TIFF reader is indeed selected independently of the chosen OME-TIFF. We will work on getting this fixed for 5.1.9 so that all .ome.tiff files are opened unequivocally using the same reader.

OK. If I get it right, it means that a MM file will be open using the MM reader as long as the *_metadata.txt file is found along with it and using OME-TIF reader otherwise. Correct?

Answering one of your earlier questions, it is possible to use the Bio-Formats command line tools and impose the reader.

This is great! :)
As far as I understand, this means that I could force using the MM reader for any position (and hence open this position only) as soon as its "large dataset" problem is fixed! Does this correspond to the setStackFormat method of the java API? Where can I get a list of all proper format strings?

Thanks,
Thomas

@sbesson
Copy link
Member

sbesson commented Apr 4, 2016

Hi Thomas,

Great. I assume that you were able to reproduce that the current MMreader does not concatenate the different files of one position. Any progress on this?

Yes generating a set of large datasets using the demo config was enough to reproduce the ArrayIndexOutOfBoundsException issue in MicromanagerReader which is now captured in https://trello.com/c/2yX1hRwH/120-micromanager-large-datasets.

OK. If I get it right, it means that a MM file will be open using the MM reader as long as the *_metadata.txt file is found along with it and using OME-TIF reader otherwise. Correct?

This PR should let the reader implementation match the description made in https://www.openmicroscopy.org/site/support/bio-formats5.1/formats/micro-manager.html i.e. use only OMETiffReader or MicromanagerReader.

This is great! :)
As far as I understand, this means that I could force using the MM reader for any position (and hence open this position only) as soon as its "large dataset" problem is fixed! Does this correspond to the setStackFormat method of the java API? Where can I get a list of all proper format strings?

The relevant line is https://github.com/openmicroscopy/bioformats/blob/v5.1.8/components/bio-formats-tools/src/loci/formats/tools/ImageInfo.java#L356. Minimally, what you need to pass is the string prefix of the reader e.g.

sbesson@ls30630:Test_Dataset_MultiPosition_Stack_2 $ ls
Test_Dataset_MultiPosition_Stack_2_MMStack_Pos0.ome.tif     Test_Dataset_MultiPosition_Stack_2_MMStack_Pos1.ome.tif
Test_Dataset_MultiPosition_Stack_2_MMStack_Pos0_metadata.txt    Test_Dataset_MultiPosition_Stack_2_MMStack_Pos1_metadata.txt
sbesson@ls30630:Test_Dataset_MultiPosition_Stack_2 $ showinf -nopix -nometa  -format Micromanager Test_Dataset_MultiPosition_Stack_2_MMStack_Pos0.ome.tif 
Checking Micro-Manager format [yes]
Initializing reader
Reading metadata file
Populating metadata
Finding image file names
Building list of TIFFs
Initialization took 0.486s

Reading core metadata
filename = Test_Dataset_MultiPosition_Stack_2_MMStack_Pos0.ome.tif
Used files:
    /Users/sbesson/Desktop/MM1_4_22/Test_Dataset_MultiPosition_Stack_2/Test_Dataset_MultiPosition_Stack_2_MMStack_Pos0_metadata.txt
    /Users/sbesson/Desktop/MM1_4_22/Test_Dataset_MultiPosition_Stack_2/Test_Dataset_MultiPosition_Stack_2_MMStack_Pos0.ome.tif
Series count = 1
Series #0 :
    Image count = 126
    RGB = false (1) 
    Interleaved = false
    Indexed = false (true color)
    Width = 512
    Height = 512
    SizeZ = 21
    SizeT = 2
    SizeC = 3
    Thumbnail size = 128 x 128
    Endianness = intel (little)
    Dimension order = XYZCT (uncertain)
    Pixel type = uint16
    Valid bits per pixel = 16
    Metadata complete = true
    Thumbnail series = false

Best,
Sebastien

@julou
Copy link
Author

julou commented Apr 5, 2016

Looks like there's progress coming ahead! :)

I tried to wrap my head around the expected behaviour you are currently working on:

  • using any .ome.tif as input path should use the OMETiffReader and open the entire dataset (independent of whether there is a companion *_metadata.txt file)
  • using any *_metadata.txt as input path should use the MMReader.
    Will this open the entire dataset or only one position? does this depend on whether it's the first position?

Is it realistic to expect the MMreader's bug with datasets split over several files to be addressed soon?

Also, I put together a beanshell script to open a given position only from one MM dataset. However, when I try to force using the OMETiffReader (hoping to get it to work with the master file including OME metadata or any other file with a binary only tag), I get an error. I assume that this comes from the issue fixed in #2320, correct?

Below is the script, any feedback welcome. Thanks,
Thomas

path = "/scicore/home/nimwegen/GROUP/MM_Data/Matthias/20150331/20150331raw_stacks/20150331_lac_glu_1_MMStack_Pos0.ome.tif";
posStr="Pos2";

import loci.formats.ImageReader;
import loci.formats.ome.OMEXMLMetadata;
import loci.formats.MetadataTools;
import loci.formats.in.OMETiffReader;
import loci.formats.*;

ClassList enabledClasses = new ClassList(IFormatReader.class);
enabledClasses.addClass(loci.formats.in.OMETiffReader.class);
ImageReader reader = new ImageReader(enabledClasses);

//ImageReader reader = new ImageReader();
OMEXMLMetadata omeMeta = MetadataTools.createOMEXMLMetadata();
reader.setMetadataStore(omeMeta);
reader.setId(path);
seriesCount = reader.getSeriesCount();
reader.close();

// find the index of the position of interest
String[] posNames = new String[seriesCount];
idx = -1;
for (int i=0; i<seriesCount; i++) {
    posNames[i] = omeMeta.getStageLabelName(i);
    if (posStr.equals(posNames[i])) idx = i;
}

// open the position as virtual hyperstack
import ij.*;
import loci.plugins.*;
import loci.plugins.in.ImporterOptions;
ImporterOptions options = new ImporterOptions();
options.setVirtual(true);
options.setId(path);
options.setStackFormat("OMETiff");
options.clearSeries();
options.setSeriesOn(idx, true);
ImagePlus[] imps = BF.openImagePlus(options);
imps[0].show();

@julou
Copy link
Author

julou commented Apr 5, 2016

Will this open the entire dataset or only one position? does this depend on whether it's the first position?

Well, I guess ideally it should open the series corresponding to the file unless the "open all series" option has been enabled…

@sbesson
Copy link
Member

sbesson commented Apr 6, 2016

Hi,

answering some questions unrelated to the discussion opened in #2309.

It looks that the MicromanagerReader will initialize the files in one position only in the case of multiple positions, independently of the input file:

$  showinf -nopix -nometa Test_Dataset_MultiPosition_Stack_2_MMStack_Pos0_metadata.txt 
Checking file format [Micro-Manager]
Initializing reader
MicromanagerReader initializing Test_Dataset_MultiPosition_Stack_2_MMStack_Pos0_metadata.txt
Reading metadata file
Populating metadata
Finding image file names
Building list of TIFFs
Initialization took 0.394s

Reading core metadata
filename = Test_Dataset_MultiPosition_Stack_2_MMStack_Pos0_metadata.txt
Used files:
    /Users/sbesson/Desktop/MM1_4_22/Test_Dataset_MultiPosition_Stack_2/Test_Dataset_MultiPosition_Stack_2_MMStack_Pos0_metadata.txt
    /Users/sbesson/Desktop/MM1_4_22/Test_Dataset_MultiPosition_Stack_2/Test_Dataset_MultiPosition_Stack_2_MMStack_Pos0.ome.tif
Series count = 1
Series #0 :
    Image count = 126
    RGB = false (1) 
    Interleaved = false
    Indexed = false (true color)
    Width = 512
    Height = 512
    SizeZ = 21
    SizeT = 2
    SizeC = 3
    Thumbnail size = 128 x 128
    Endianness = intel (little)
    Dimension order = XYZCT (uncertain)
    Pixel type = uint16
    Valid bits per pixel = 16
    Metadata complete = true
    Thumbnail series = false
sbesson@ls30630:Test_Dataset_MultiPosition_Stack_2 $  showinf -nopix -nometa Test_Dataset_MultiPosition_Stack_2_MMStack_Pos1_metadata.txt 
Checking file format [Micro-Manager]
Initializing reader
MicromanagerReader initializing Test_Dataset_MultiPosition_Stack_2_MMStack_Pos1_metadata.txt
Reading metadata file
Populating metadata
Finding image file names
Building list of TIFFs
Initialization took 0.387s

Reading core metadata
filename = Test_Dataset_MultiPosition_Stack_2_MMStack_Pos1_metadata.txt
Used files:
    /Users/sbesson/Desktop/MM1_4_22/Test_Dataset_MultiPosition_Stack_2/Test_Dataset_MultiPosition_Stack_2_MMStack_Pos1_metadata.txt
    /Users/sbesson/Desktop/MM1_4_22/Test_Dataset_MultiPosition_Stack_2/Test_Dataset_MultiPosition_Stack_2_MMStack_Pos1.ome.tif
Series count = 1
Series #0 :
    Image count = 126
    RGB = false (1) 
    Interleaved = false
    Indexed = false (true color)
    Width = 512
    Height = 512
    SizeZ = 21
    SizeT = 2
    SizeC = 3
    Thumbnail size = 128 x 128
    Endianness = intel (little)
    Dimension order = XYZCT (uncertain)
    Pixel type = uint16
    Valid bits per pixel = 16
    Metadata complete = true
    Thumbnail series = false
sbesson@ls30630:Test_Dataset_MultiPosition_Stack_2 $ showinf -nopix -nometa  -format Micromanager Test_Dataset_MultiPosition_Stack_2_MMStack_Pos1.ome.tif 
Checking Micro-Manager format [yes]
Initializing reader
Reading metadata file
Populating metadata
Finding image file names
Building list of TIFFs
Initialization took 0.402s

Reading core metadata
filename = Test_Dataset_MultiPosition_Stack_2_MMStack_Pos1.ome.tif
Used files:
    /Users/sbesson/Desktop/MM1_4_22/Test_Dataset_MultiPosition_Stack_2/Test_Dataset_MultiPosition_Stack_2_MMStack_Pos1_metadata.txt
    /Users/sbesson/Desktop/MM1_4_22/Test_Dataset_MultiPosition_Stack_2/Test_Dataset_MultiPosition_Stack_2_MMStack_Pos1.ome.tif
Series count = 1
Series #0 :
    Image count = 126
    RGB = false (1) 
    Interleaved = false
    Indexed = false (true color)
    Width = 512
    Height = 512
    SizeZ = 21
    SizeT = 2
    SizeC = 3
    Thumbnail size = 128 x 128
    Endianness = intel (little)
    Dimension order = XYZCT (uncertain)
    Pixel type = uint16
    Valid bits per pixel = 16
    Metadata complete = true
    Thumbnail series = false

As mentioned above, the bug for the large files split into several TIFF files has been captured and is definitely on our roadmap for 5.2.0. As explained in our Bio-Formats status blog post, most of our resources are currently focused on the model changes with bug fixing being currently lower priority.

Regarding your error when forcing the usage of the OMETiffReader, do you have a stacktrace? It might well be fixed by #2320 which will be released as part of the upcoming Bio-Formats 5.1.9.

Best,
Sebastien

@julou
Copy link
Author

julou commented Jun 23, 2016

Hi Sebastien, hi all,

As mentioned above, the bug for the large files split into several TIFF files has been captured and is definitely on our roadmap for 5.2.0. As explained in our Bio-Formats status blog post, most of our resources are currently focused on the model changes with bug fixing being currently lower priority.

I'd be curious to know if any progress has been made on this.
I understand it might not be high priority for you but, on our side, we can't use omero for the moment… (and no: I don't want to import datasets as OME-Tifs now and reimport everything as MM datasets later, all the more so as the *_metadata.txt files that your MM_reader needs are not kept by the OME-Tif importer). So I'm very much looking forward to the bio-formats support of MM datasets split in multiple files.

Regarding your error when forcing the usage of the OMETiffReader, do you have a stacktrace? It might well be fixed by #2320 which will be released as part of the upcoming Bio-Formats 5.1.9.

Looking more into this taught me that I was mistaken (by the way surprised that nobody here corrected me when I asked the question): options.setStackFormat() is not equivalent to the -format argument of command line tools… It seems to be the equivalent of the view stack with dialog of fiji's bioformats plugin.

So I rephrase my question: how can I force using a specific reader with the java api and BF.openImagePlus? (I guess it should be by specifying an option of ImporterOptions).

Thanks a lot for your support. Best,
Thomas

@sbesson
Copy link
Member

sbesson commented Jun 27, 2016

Hi Sebastien, hi all,

Hi Thomas

As mentioned above, the bug for the large files split into several TIFF files has been captured and is definitely on our roadmap for 5.2.0. As explained in our Bio-Formats status blog post, most of our resources are currently focused on the model changes with bug fixing being currently lower priority.

I'd be curious to know if any progress has been made on this.

Unfortunately no, we spent the last few months finalizing our breaking model API changes.

I understand it might not be high priority for you but, on our side, we can't use omero for the moment… (and no: I don't want to import datasets as OME-Tifs now and reimport everything as MM datasets later, all the more so as the *_metadata.txt files that your MM_reader needs are not kept by the OME-Tif importer). So I'm very much looking forward to the bio-formats support of MM datasets split in multiple files.

Understood. As you mention OMERO, this bug fix is scheduled for the Bio-Formats 5.2.x series which will require an OMERO 5.3.x installation in order to achieve the workflow you described.
On the prioritization topic, we certainly welcome code contributions from the community to help us deliver bug fixes and maintain our readers.

Regarding your error when forcing the usage of the OMETiffReader, do you have a stacktrace? It might well be fixed by #2320 which will be released as part of the upcoming Bio-Formats 5.1.9.

Looking more into this taught me that I was mistaken (by the way surprised that nobody here corrected me when I asked the question): options.setStackFormat() is not equivalent to the -format argument of command line tools… It seems to be the equivalent of the view stack with dialog of fiji's bioformats plugin.

After double verification, you are absolutely correct with the list of available options being described here.

So I rephrase my question: how can I force using a specific reader with the java api and BF.openImagePlus? (I guess it should be by specifying an option of ImporterOptions).

As it stands, the ImageJ plugin creates readers using LociPrefs.makeImageReader() and there is no option to construct a given reader of a given type like the command-line tools.

That being said, would it be sufficient for you to “disable” one or several readers? The above method uses the ImageJ Prefs API to check for disabled readers - see LociPrefs.isReaderEnabled(). A corresponding call to Prefs.set(LociPrefs.PREF_READER_ENABLED, reader), where ‘reader’ is the fully-qualified reader class name will disable the reader for all subsequent calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants