Update Suite2P implementation #150

swkeemink · 2021-03-02T11:51:50Z

Our current implementation of Suite2P is not up-to-date. Our link to their installation instructions also no longer works. Should fix both of these.

swkeemink · 2021-03-02T12:38:30Z

As was reported in #148, this might be a problem with how Suite2p exports the tiffs, since tifffile.imread(image) only reads a single frame!

swkeemink · 2021-03-17T10:37:21Z

To solve this problem, I'm going to make a custom dataloader to deal with Suite2P's bin file, instead of using their extracted tifffile. I don't really like this, as this means in the future if they change they handle the bin files, our implementation would break, but for now it can get the analysis going again at least.

swkeemink · 2021-05-17T15:44:57Z

There's a very easy solution (thanks to some help at Suite2P github: MouseLand/suite2p#647 )

Currently we load tiffs as simply
tifffile.imread('temp.tif')

which worked very well for a while. For some reason, however, with the Suite2p exported tiffs, this only only loads a single frame.

When using

with tifffile.TiffFile('temp.tif') as tif:
     for page in tif.pages:
         image = page.asarray()

this seems to solve the problem (at least with initial testing).

swkeemink · 2021-06-02T13:50:58Z

I also found a likely original tifffile implementation reason for this error: cgohlke/tifffile#30

swkeemink · 2021-06-02T14:20:17Z

The above fix works now, but there's an issue with our previous test for data loading, which makes me hesitant to make this a general change to our API. Basically tifffile has inconsistent loading behavior when using files generated with tifffile.imsave or with TiffWriter (which is odd, because tifffile.imsave should just be a wrapper of TiffWriter):

# generate and save data
data = np.zeros((4, 40,50))
fname = 'test.tif'
tifffile.imsave(fname, data)

# read out data in two ways
data_A = tifffile.imread(fname)
with tifffile.TiffFile(fname) as tif:
     data_B = np.array([page.asarray() for page in tif.pages])

# compare resulting shapes
data.shape, data_A.shape, data_B.shape

This returns
((4, 40, 50), (4, 40, 50), (1, 4, 40, 50))
although they should all be the same shape.

It's also very inconsistent with different data shapes. For example using data = np.zeros((40, 400, 400)) works fine. This is very odd and I'm wondering if I'm misunderstanding something somewhere. Perhaps reading them as a list of arrays is not the best way to do this. Better to read-out the number of frames and image size, pre-allocate an array, and then fill it in.

swkeemink · 2021-06-02T14:27:17Z

The inconsistency is that for some data-shapes (mostly smaller ones) the tiff file gets read out as a single 3-dimensional page, and for other data-shapes it gets read-out as a series of pages.

In the above TiffFile reading way this leads to an 'empty' dimension since it's a three dimensional array inside a list, but this here assumes a list of two-dimensional arrays.

swkeemink · 2021-06-03T11:28:06Z

The above fix works now, but there's an issue with our previous test for data loading, which makes me hesitant to make this a general change to our API. Basically tifffile has inconsistent loading behavior when using files generated with tifffile.imsave or with TiffWriter (which is odd, because tifffile.imsave should just be a wrapper of TiffWriter):
# generate and save data
data = np.zeros((4, 40,50))
fname = 'test.tif'
tifffile.imsave(fname, data)

# read out data in two ways
data_A = tifffile.imread(fname)
with tifffile.TiffFile(fname) as tif:
     data_B = np.array([page.asarray() for page in tif.pages])

# compare resulting shapes
data.shape, data_A.shape, data_B.shape
This returns
((4, 40, 50), (4, 40, 50), (1, 4, 40, 50))
although they should all be the same shape.

It's also very inconsistent with different data shapes. For example using data = np.zeros((40, 400, 400)) works fine. This is very odd and I'm wondering if I'm misunderstanding something somewhere. Perhaps reading them as a list of arrays is not the best way to do this. Better to read-out the number of frames and image size, pre-allocate an array, and then fill it in.

I can use the imsave/imread command or use TiffFile directly, as above, and they have slightly different behaviors sometimes (which I picked up because our standard tiff reading test did not pass =). For this reason I want to leave our back-end tiff reading 'as-is' to avoid existing workflows possibly breaking, and add an option for suite2p tiff reading, even though that is somewhat inelegant (similarly to that we have an alternative 'low-memory' mode).

Longer term more elegant solution would be to identify when a tiff file is non-standard and use a different reading format for it.

@scottclowe any comments?

scottclowe · 2021-06-03T17:06:51Z

It seems the difference are when pages are saved as a series or not.

No need for a different data loader though. We don't care about whether the pages are in different series or the same series.

Let's just ensure all the pages are 3d and then concatenate them, using a custom version of atleast_3d that pads left instead of right (copy source from here and change the cases around slightly).

with tifffile.TiffFile(fname) as tif:
     data = np.concatenate([atleast_3d(page.asarray()) for page in tif.pages], axis=0)

That should work regardless of whether the pages are grouped in series or not, and even work fine if some pages are series of pages and others aren't.

swkeemink · 2021-06-03T17:32:38Z

So actually the Suite2p tiffs work as series, with each series being a frame:. If you do
tifffile.imread(suite2p_tiff, series=0)

and
tiffffile.imread(suite2p_tiff, series=1)

you get the first and second frame, and so on. Which is why were getting bug #148, since we only read out one 'series' with tifffile.imread.

By loading it as

with tifffile.TiffFile(fname) as tif:
     data_B = np.array([page.asarray() for page in tif.pages])

instead, i avoid this. It then sometimes makes a 4D array if the tiffile loaded happens to be a single 3D 'frame' (like in the above case). Does this work with what you suggest?

I feel like tifffile.imread should be able to deal with all this =p.

scottclowe · 2021-06-03T20:04:13Z

Yes, I understood what you were saying was wrong. The solution I suggested will fix it. You sometimes get a 4D output if some of the frames are bundled together. Two dimensions come from the width and height of the image, one from the frames in the series, and another from the paginated series. We can make it by 3D by padding dimensions up to 3D if they are lower and then concatenating instead of stacking the objects. You could alternatively fix it with reshape, but I prefer this method.

@swkeemink Can you write a unit test which finds this issue? If you issue the new unit test as a PR, I'll implement this solution as described above to your PR and then we know for sure it fixes the problem. :D

swkeemink · 2021-06-04T10:04:01Z

Will do, great solution!

swkeemink · 2021-06-07T08:11:32Z

Now done, in #170 , with the addition of a test for how Suite2P saves their tiffs.

swkeemink · 2021-06-16T08:54:26Z

After a whole load of updates, we can now read suite2p output tiffs correctly (as well as most other conceivable outputs tiffs), thanks to #156 and #170.

However, the notebook is still out of date with current Suite2p implementation, and multiprocessing is still broken (#147) (Partial fix exists thanks to #147 (comment)), both of which I am close to implementing.

The main problem with #147 (comment) is that it silences the print statements, which probably needs an option change somewhere (together with #155 ).

swkeemink · 2021-06-16T10:43:17Z

So a bit of background and summary on investigations into fixing multiprocessing:

First, the basics of spawn and fork processes --
https://docs.python.org/3/library/multiprocessing.html
The 'spawn' context starts a fresh python interpreter process, and is slower to initialize. This is default in macOS and Windows.
The 'fork' context inherets from another process, and is faster to initialize. This is default on Unix.

The 'fork' context can get in trouble with improper attribution of Pools (as per here https://pythonspeed.com/articles/python-multiprocessing/). This is normally not a problem, but is a problem if we just ran Suite2p in the same Python instance before. Indeed, by using the spawn context within FISSA we can fix this problem. It does make initializing Pools a lot slower however, and for example as a result on my system running pytest once goes from 4-5 seconds to ~40 seconds.

I would like a different solution to this therefore! (although on large analyses instead of our simple test data it hopefully would not make a big difference).

swkeemink · 2021-06-16T10:57:47Z

By putting the following at the start of the suite2p notebook

from multiprocessing import set_start_method
set_start_method("spawn")

We can avoid having to adjust the FISSA code to always use the spawn method.

swkeemink · 2022-02-11T08:57:14Z

This has now been fixed in the updates Suite2p example, now in a separate repository here: https://github.com/rochefort-lab/fissa-suite2p-example/

swkeemink self-assigned this Mar 2, 2021

This was referenced Mar 2, 2021

ValueError: Wrong ROIs input format: unfamiliar shape. #148

Closed

Stuck on 'Doing signal separation...' #147

Open

swkeemink mentioned this issue Jun 2, 2021

Installation via pip into conda environment on Windows #151

Open

swkeemink mentioned this issue Jun 16, 2021

JNB:BUG: Updated Suite2P example workflow for new Suite2p API and multiprocessing bug #181

Closed

swkeemink closed this as completed Feb 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Suite2P implementation #150

Update Suite2P implementation #150

swkeemink commented Mar 2, 2021

swkeemink commented Mar 2, 2021

swkeemink commented Mar 17, 2021

swkeemink commented May 17, 2021

swkeemink commented Jun 2, 2021

swkeemink commented Jun 2, 2021

swkeemink commented Jun 2, 2021 •

edited

Loading

swkeemink commented Jun 3, 2021

scottclowe commented Jun 3, 2021

swkeemink commented Jun 3, 2021

scottclowe commented Jun 3, 2021

swkeemink commented Jun 4, 2021

swkeemink commented Jun 7, 2021

swkeemink commented Jun 16, 2021

swkeemink commented Jun 16, 2021

swkeemink commented Jun 16, 2021

swkeemink commented Feb 11, 2022

Update Suite2P implementation #150

Update Suite2P implementation #150

Comments

swkeemink commented Mar 2, 2021

swkeemink commented Mar 2, 2021

swkeemink commented Mar 17, 2021

swkeemink commented May 17, 2021

swkeemink commented Jun 2, 2021

swkeemink commented Jun 2, 2021

swkeemink commented Jun 2, 2021 • edited Loading

swkeemink commented Jun 3, 2021

scottclowe commented Jun 3, 2021

swkeemink commented Jun 3, 2021

scottclowe commented Jun 3, 2021

swkeemink commented Jun 4, 2021

swkeemink commented Jun 7, 2021

swkeemink commented Jun 16, 2021

swkeemink commented Jun 16, 2021

swkeemink commented Jun 16, 2021

swkeemink commented Feb 11, 2022

swkeemink commented Jun 2, 2021 •

edited

Loading