Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Suite2P implementation #150

Closed
swkeemink opened this issue Mar 2, 2021 · 16 comments
Closed

Update Suite2P implementation #150

swkeemink opened this issue Mar 2, 2021 · 16 comments
Assignees

Comments

@swkeemink
Copy link
Member

Our current implementation of Suite2P is not up-to-date. Our link to their installation instructions also no longer works. Should fix both of these.

@swkeemink
Copy link
Member Author

As was reported in #148, this might be a problem with how Suite2p exports the tiffs, since tifffile.imread(image) only reads a single frame!

@swkeemink
Copy link
Member Author

To solve this problem, I'm going to make a custom dataloader to deal with Suite2P's bin file, instead of using their extracted tifffile. I don't really like this, as this means in the future if they change they handle the bin files, our implementation would break, but for now it can get the analysis going again at least.

@swkeemink
Copy link
Member Author

There's a very easy solution (thanks to some help at Suite2P github: MouseLand/suite2p#647 )

Currently we load tiffs as simply
tifffile.imread('temp.tif')

which worked very well for a while. For some reason, however, with the Suite2p exported tiffs, this only only loads a single frame.

When using

with tifffile.TiffFile('temp.tif') as tif:
     for page in tif.pages:
         image = page.asarray()

this seems to solve the problem (at least with initial testing).

@swkeemink
Copy link
Member Author

I also found a likely original tifffile implementation reason for this error: cgohlke/tifffile#30

@swkeemink
Copy link
Member Author

The above fix works now, but there's an issue with our previous test for data loading, which makes me hesitant to make this a general change to our API. Basically tifffile has inconsistent loading behavior when using files generated with tifffile.imsave or with TiffWriter (which is odd, because tifffile.imsave should just be a wrapper of TiffWriter):

# generate and save data
data = np.zeros((4, 40,50))
fname = 'test.tif'
tifffile.imsave(fname, data)

# read out data in two ways
data_A = tifffile.imread(fname)
with tifffile.TiffFile(fname) as tif:
     data_B = np.array([page.asarray() for page in tif.pages])

# compare resulting shapes
data.shape, data_A.shape, data_B.shape

This returns
((4, 40, 50), (4, 40, 50), (1, 4, 40, 50))
although they should all be the same shape.

It's also very inconsistent with different data shapes. For example using data = np.zeros((40, 400, 400)) works fine. This is very odd and I'm wondering if I'm misunderstanding something somewhere. Perhaps reading them as a list of arrays is not the best way to do this. Better to read-out the number of frames and image size, pre-allocate an array, and then fill it in.

@swkeemink
Copy link
Member Author

swkeemink commented Jun 2, 2021

The inconsistency is that for some data-shapes (mostly smaller ones) the tiff file gets read out as a single 3-dimensional page, and for other data-shapes it gets read-out as a series of pages.

In the above TiffFile reading way this leads to an 'empty' dimension since it's a three dimensional array inside a list, but this here assumes a list of two-dimensional arrays.

@swkeemink
Copy link
Member Author

The above fix works now, but there's an issue with our previous test for data loading, which makes me hesitant to make this a general change to our API. Basically tifffile has inconsistent loading behavior when using files generated with tifffile.imsave or with TiffWriter (which is odd, because tifffile.imsave should just be a wrapper of TiffWriter):

# generate and save data
data = np.zeros((4, 40,50))
fname = 'test.tif'
tifffile.imsave(fname, data)

# read out data in two ways
data_A = tifffile.imread(fname)
with tifffile.TiffFile(fname) as tif:
     data_B = np.array([page.asarray() for page in tif.pages])

# compare resulting shapes
data.shape, data_A.shape, data_B.shape

This returns
((4, 40, 50), (4, 40, 50), (1, 4, 40, 50))
although they should all be the same shape.

It's also very inconsistent with different data shapes. For example using data = np.zeros((40, 400, 400)) works fine. This is very odd and I'm wondering if I'm misunderstanding something somewhere. Perhaps reading them as a list of arrays is not the best way to do this. Better to read-out the number of frames and image size, pre-allocate an array, and then fill it in.

I can use the imsave/imread command or use TiffFile directly, as above, and they have slightly different behaviors sometimes (which I picked up because our standard tiff reading test did not pass =). For this reason I want to leave our back-end tiff reading 'as-is' to avoid existing workflows possibly breaking, and add an option for suite2p tiff reading, even though that is somewhat inelegant (similarly to that we have an alternative 'low-memory' mode).

Longer term more elegant solution would be to identify when a tiff file is non-standard and use a different reading format for it.

@scottclowe any comments?

@scottclowe
Copy link
Member

It seems the difference are when pages are saved as a series or not.

No need for a different data loader though. We don't care about whether the pages are in different series or the same series.

Let's just ensure all the pages are 3d and then concatenate them, using a custom version of atleast_3d that pads left instead of right (copy source from here and change the cases around slightly).

with tifffile.TiffFile(fname) as tif:
     data = np.concatenate([atleast_3d(page.asarray()) for page in tif.pages], axis=0)

That should work regardless of whether the pages are grouped in series or not, and even work fine if some pages are series of pages and others aren't.

@swkeemink
Copy link
Member Author

So actually the Suite2p tiffs work as series, with each series being a frame:. If you do
tifffile.imread(suite2p_tiff, series=0)

and
tiffffile.imread(suite2p_tiff, series=1)

you get the first and second frame, and so on. Which is why were getting bug #148, since we only read out one 'series' with tifffile.imread.

By loading it as

with tifffile.TiffFile(fname) as tif:
     data_B = np.array([page.asarray() for page in tif.pages])

instead, i avoid this. It then sometimes makes a 4D array if the tiffile loaded happens to be a single 3D 'frame' (like in the above case). Does this work with what you suggest?

I feel like tifffile.imread should be able to deal with all this =p.

@scottclowe
Copy link
Member

Yes, I understood what you were saying was wrong. The solution I suggested will fix it. You sometimes get a 4D output if some of the frames are bundled together. Two dimensions come from the width and height of the image, one from the frames in the series, and another from the paginated series. We can make it by 3D by padding dimensions up to 3D if they are lower and then concatenating instead of stacking the objects. You could alternatively fix it with reshape, but I prefer this method.

@swkeemink Can you write a unit test which finds this issue? If you issue the new unit test as a PR, I'll implement this solution as described above to your PR and then we know for sure it fixes the problem. :D

@swkeemink
Copy link
Member Author

Will do, great solution!

@swkeemink
Copy link
Member Author

Now done, in #170 , with the addition of a test for how Suite2P saves their tiffs.

@swkeemink
Copy link
Member Author

After a whole load of updates, we can now read suite2p output tiffs correctly (as well as most other conceivable outputs tiffs), thanks to #156 and #170.

However, the notebook is still out of date with current Suite2p implementation, and multiprocessing is still broken (#147) (Partial fix exists thanks to #147 (comment)), both of which I am close to implementing.

The main problem with #147 (comment) is that it silences the print statements, which probably needs an option change somewhere (together with #155 ).

@swkeemink
Copy link
Member Author

So a bit of background and summary on investigations into fixing multiprocessing:

First, the basics of spawn and fork processes --
https://docs.python.org/3/library/multiprocessing.html
The 'spawn' context starts a fresh python interpreter process, and is slower to initialize. This is default in macOS and Windows.
The 'fork' context inherets from another process, and is faster to initialize. This is default on Unix.

The 'fork' context can get in trouble with improper attribution of Pools (as per here https://pythonspeed.com/articles/python-multiprocessing/). This is normally not a problem, but is a problem if we just ran Suite2p in the same Python instance before. Indeed, by using the spawn context within FISSA we can fix this problem. It does make initializing Pools a lot slower however, and for example as a result on my system running pytest once goes from 4-5 seconds to ~40 seconds.

I would like a different solution to this therefore! (although on large analyses instead of our simple test data it hopefully would not make a big difference).

@swkeemink
Copy link
Member Author

By putting the following at the start of the suite2p notebook

from multiprocessing import set_start_method
set_start_method("spawn") 

We can avoid having to adjust the FISSA code to always use the spawn method.

@swkeemink
Copy link
Member Author

This has now been fixed in the updates Suite2p example, now in a separate repository here: https://github.com/rochefort-lab/fissa-suite2p-example/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants