Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: improve show_stack() performance #346

Merged
merged 17 commits into from Aug 2, 2018
Merged

Conversation

kevinyamauchi
Copy link
Collaborator

@kevinyamauchi kevinyamauchi commented Jul 27, 2018

edit 7.27.18: updated to include prototypes for show_spots (see 5525d9b, 48ba1e7, dd941e2)

Overview

The show_stack tool is really useful for evaluating the images during processing. However, it is very slow and thus difficult to use. I have prototyped a faster way to update images (see the notebook) and proposed a method for speeding up the show_spots as well (see below). Thoughts?

Approach

Faster scrolling
The current version in master creates a new imshow() plot each time the slider is adjusted. Recreating the entire plot is very slow, so instead we use the imshow().set_data() to update just the image data. We should be able to make show_stack(). If we need to go faster, we may need to look into a different plotting library (e.g., pyqtgraph).

Faster spots
To speed up the spot display, we draw all of the spots on the image (as if they were z projected) before calling interact object. We set all spots to set_visible = False (i.e., make them invisible). Then, when the viewer "scrolls" between slices, toggle the visible property for spots that are members of that frame. We precompute the masks for each slice to speed up the upate function.

Image query cursor tooltop
Additionally, note that I use the 'notebook' (as opposed to 'inline') matplotlib magic, which both allows the draw to work and as a bonus gives an image tooltip. The tooltip is super useful because it reads out the coordinates and intensity of the pixel under the cursor (exactly what we were talking about today)!

Questions

  • Will the proposed show_spots method generalize to other slicing (i.e., not along z)?
  • Is the show_spots responsive enough? We could consider blitting (i.e., only interacting with spots that change state between slices), but it may not be worth the effort.
  • What class will be passed to show_spots as the 'results_df'?
  • Does it matter what backend (e.g., QtAgg or TkAgg) the user has?

@kevinyamauchi kevinyamauchi requested a review from ttung July 27, 2018 06:15
@ambrosejcarr
Copy link
Member

This looks awesome. I'm going to be on bad wifi today so I may not be able to really dig in and test things out until I get home tomorrow.

A random point:

  • PRs will fail unless you use nbencdec encode <name.ipynb> <name.py> to create a notebook that can be tested by our tests.

Copy link
Collaborator

@dganguli dganguli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Love the ability to pan and zoom!

@kevinyamauchi
Copy link
Collaborator Author

Ah sorry about the failed tests - I'll try to figure out the notebook creation later today.

I just added the show_spots functionality and I think it works pretty well (I updated the notebook to include an example)!

@ambrosejcarr, I have a question about the results_df that gets passed into show_spots() argument: under the new IntensityTable/Codebook, what class will be passed to show_spots?

@ambrosejcarr
Copy link
Member

ambrosejcarr commented Jul 27, 2018

Ah sorry about the failed tests - I'll try to figure out the notebook creation later today.

Not a problem and no hurry.

pip3 install nbencdec
cd starfish
nbencdec encode notebooks/Faster_show_stack.ipynb notebooks/Faster_show_stack.py

@ambrosejcarr, I have a question about the results_df that gets passed into show_spots() argument: under the new IntensityTable/Codebook, what class will be passed to show_spots?

That's a great question. I think it would probably be an IntensityTable, but we'd have to update the code to make it work. Referencing #324 Because I think that there isn't any coverage of that method right now, and adding tests will help.

@ttung
Copy link
Collaborator

ttung commented Jul 27, 2018

Generally, you should run pip install -r REQUIREMENTS-DEV.TXT -r REQUIREMENTS-NOTEBOOK.TXT

@ambrosejcarr
Copy link
Member

ambrosejcarr commented Jul 27, 2018

I was able to load up the notebook -- I love the %matplotlib notebook, the tools are super useful; they are indeed exactly what we wanted!

Unfortunately, when I use that tool and try normal matplotlib figure plots, they don't show up anymore. I haven't done any due diligence yet to see if there are any easy solutions, but we should make sure it doesn't disrupt the regular plotting.

I tried replacing it with %matplotlib inline, but the new plot doesn't survive the transition. It also tried using both inline and notebook, but it seems like the notebook only supports one at a time.

Edit: working example:

import numpy as np
import matplotlib.pyplot as plt
plt.plot(np.arange(10), np.arange(10))

Under %matplotlib notebook This just returns the axes object, but nothing displays.

@kevinyamauchi
Copy link
Collaborator Author

Thanks for looking into the plotting, Ambrose. I took a look and we can plot using matplotlib, but when we use the %matplotlib notebook magic, we need to explicitly assign the plot object to a fig/axis (see below. plt.plot() just returns the plot object. I also added some examples to the notebook.

fig = plt.figure();
ax = fig.add_subplot(111);
ax.plot(range(10));

@ambrosejcarr
Copy link
Member

Got it. Thanks for looking into that. I think the requirement that we have to assign to axes is an acceptable price to pay to get pan/zoom and better performance on the show stack.

Interested to see what @dganguli has to say about this. :)

@kevinyamauchi
Copy link
Collaborator Author

I agree. I also think that the specificity of assigning the plot to an axis is useful for when we are generating multiple plots (e.g., QC plots) and want to manipulate a particular one (either statically or dynamically).

@ambrosejcarr
Copy link
Member

I like these changes, here's what I think it would take to merge them:

  1. Since %matplotlib notebook causes plots not to show without explicit figure references, and the proposed code updates breaks show_stack in %matplotlib inline, we'll need to update existing notebooks to use the proposed strategy.
  2. In updated notebooks, verify that new show_spots has logical results
  3. [Optional] Add some kind of tests to the vis methods
  4. Remove example notebook

@kevinyamauchi Do you feel comfortable working through these? We can give you some pointers on how we use nbencdec to generate .py files that pair with the notebooks if you'd like. Alternatively, I'd be happy to pair on this with you (or perhaps @ttung could in person).

@kevinyamauchi
Copy link
Collaborator Author

Sounds good. I can work through those items. I think I will need some help/discussion for creating tests (item 3) and generating the py files for notebooks, but I can take a crack at it myself first.

With regards to the spot data input format for show_spots(), should I think I will integrate the input format in the current implementation just to get it tested quickly with the current notebooks. We can work to add support for IntensityTable soon. How does that sound?

@ambrosejcarr
Copy link
Member

Sounds good. I can work through those items. I think I will need some help/discussion for creating tests (item 3) and generating the py files for notebooks, but I can take a crack at it myself first

Cool, I don't have much experience with vis tests, so I'm more interested than anything else. If it seems too complicated I wouldn't be too upset if you punted on it (that's what we're currently doing, see #324)

With regards to the spot data input format for show_spots(), should I think I will integrate the input format in the current implementation just to get it tested quickly with the current notebooks. We can work to add support for IntensityTable soon. How does that sound?

Sounds like a good plan.

@@ -335,25 +335,58 @@ def show_stack(

show_spot_function = self._show_spots
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no longer used -- is that intended?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - deleted.

ax.set_xticks([])
ax.set_yticks([])

if show_spots:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why show fake spots? is this for demo purposes only? e.g., not intended to be merged into master?

Copy link
Collaborator

@dganguli dganguli Jul 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, you could continue to use _show_spots, but pass in a spots table that looks like this:

spots = pd.DataFrame({'y':intensities.x, 'x':intensities.y, 'r':intensities.r})
f, ax = plt.subplots(figsize=(20, 20))
ax.imshow(blobs, cmap=plt.cm.gray)
stack._show_spots(results, ax=plt.gca())

where intensities has the relevant spot attributes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this on slack, but just adding it here to consolidate the record. I just had this in as a placeholder until I figured out the expected format for the spot data. I have no implemented show_spots for the pandas data frame as you suggested here.

Copy link
Collaborator

@dganguli dganguli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awesome contribution! I think we should move the random spot generation out of the code and let users submit spot attributes for testing. I also think we should verify that the existing notebooks will still plot appropriately. Or at least make sure we fix the existing notebooks to generate the relevant plots

@kevinyamauchi
Copy link
Collaborator Author

Sounds good, Deep! I fixed show_spots to work as a drop in for the current notebooks. I tested the notebooks and they all seem to work with minor tweaks. I can go over this with you when we meet tomorrow. If things look okay to you, I'll take a shot at making the notebook .py files.

@ambrosejcarr, the one thing I can't get to work is pixel spot detector in the MERFISH notebook. I get a "KeyError: 'barcode'". Is this something being addressed in a different PR (or perhaps already merged with master)? However, this doesn't seem related to the show_stack() so I am not sure we need to fix it in this PR.

from starfish.pipeline.features.pixels.pixel_spot_detector import PixelSpotDetector
psd = PixelSpotDetector(
    codebook='https://s3.amazonaws.com/czi.starfish.data.public/MERFISH/codebook.csv',
    distance_threshold=0.5176,
    magnitude_threshold=1,
    area_threshold=2,
    crop_size=40
)

spot_attributes, decoded = psd.find(s)

Copy link
Collaborator

@dganguli dganguli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 from an offline convo @kevinyamauchi and i went through this update. pros: show_stack is much much faster, even when visualizing spots. also we get the ability to pan/zoom and read intensity values via hover. only con is that %matplotlib notebook is now required. this means for inline plots, one needs to call ax = plt.figure(); ax.plot(...) instead of simply plt.plot(...). in the end, i think the pros outweigh the cons.

@@ -0,0 +1,3401 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i say we nuke this notebook.

@dganguli
Copy link
Collaborator

and one last thing -- we should verify the new notebooks work -- which i'm pretty sure they weill.

@ambrosejcarr
Copy link
Member

I like the extra expressiveness of:

f, ax = plt.subplots(figsize=(n, n))

This is just another "pro" for me. :)

@codecov-io
Copy link

Codecov Report

Merging #346 into master will decrease coverage by 0.25%.
The diff coverage is 3.57%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #346      +/-   ##
==========================================
- Coverage   82.97%   82.72%   -0.26%     
==========================================
  Files          67       67              
  Lines        2585     2593       +8     
==========================================
  Hits         2145     2145              
- Misses        440      448       +8
Impacted Files Coverage Δ
starfish/image/_stack.py 73.33% <3.57%> (-1.6%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b142a5...b34a239. Read the comment docs.

@kevinyamauchi
Copy link
Collaborator Author

Okay, I finally got the notebooks updated and encoded as .py files (thanks, @ttung for the help!). The allen notebook still fails due to the experiment.json file needing updates. Since that is a separate issue, if everyone is okay with the changes, can we merge? Thoughts @dganguli and @ambrosejcarr ?

Copy link
Member

@ambrosejcarr ambrosejcarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants