Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorized Markers #3076

Closed

Conversation

CSSFrancis
Copy link
Member

Description of the change

Change described by #3075

  • Allows ragged point Markers initialized
  • Allowed ragged line segment markers initialized.
  • Speeds up plotting many markers by ~100 times for plots with 100 or more active markers.
  • Allows for ragged initialization of points/ markers.

Progress of the PR

  • Change implemented (can be split into several points),
  • update docstring (if appropriate),
  • update user guide (if appropriate),
  • add an changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
  • Check formatting changelog entry in the readthedocs doc build of this PR (link in github checks)
  • add tests,
  • ready for review.

Minimal example of the bug fix or the new feature

x= np.empty((10,10), dtype=object)
y= np.empty((10,10), dtype=object)

points = np.random.randint(0, 100, (2,10,10, 500))

for i in np.ndindex(10,10):
    x_ind = (0,)+i
    y_ind = (1,)+i
    x[i]=points[x_ind]
    y[i]=points[y_ind]


s = hs.signals.Signal2D(np.random.random((10,10,100,100)))
markers =hs.plot.markers.Point(x =x,y=y,color="blue",)   # only adds one ragged marker object to the dataset. 
s.add_marker(markers,plot_marker=False, permanent=True, render_figure=False)
s.plot(vmin=0, vmax=1)

@CSSFrancis CSSFrancis changed the base branch from RELEASE_next_minor to RELEASE_next_major December 9, 2022 19:36
@codecov
Copy link

codecov bot commented Dec 9, 2022

Codecov Report

Base: 80.74% // Head: 80.71% // Decreases project coverage by -0.03% ⚠️

Coverage data is based on head (4049554) compared to base (e0b2956).
Patch coverage: 73.33% of modified lines in pull request are covered.

Additional details and impacted files
@@                  Coverage Diff                   @@
##           RELEASE_next_major    #3076      +/-   ##
======================================================
- Coverage               80.74%   80.71%   -0.03%     
======================================================
  Files                     176      176              
  Lines                   24486    24513      +27     
  Branches                 5452     5462      +10     
======================================================
+ Hits                    19770    19786      +16     
- Misses                   3391     3394       +3     
- Partials                 1325     1333       +8     
Impacted Files Coverage Δ
hyperspy/drawing/_markers/line_segment.py 84.00% <66.66%> (-16.00%) ⬇️
hyperspy/drawing/_markers/point.py 100.00% <100.00%> (ø)
hyperspy/drawing/marker.py 85.81% <100.00%> (+0.41%) ⬆️
hyperspy/misc/math_tools.py 77.41% <0.00%> (-4.84%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@CSSFrancis CSSFrancis marked this pull request as draft December 12, 2022 17:14
@CSSFrancis
Copy link
Member Author

@ericpre or @francisco-dlp What are your initial thoughts on the markers class being rewritten as a subclass of a signal. Specifically a ragged signal?

This PR fixes some of the issues with slowness related plotting but there is still a pretty major missing functionality in that markers can't be lazy/ plotted lazily.

For large 4-D STEM datasets this is fairly crippling to the Strain mapping/ diffraction spot finding code as you can't use any of the interactive features without the system grinding to a halt and it reduces your ability to do any iterative discovery.

Making markers a subclass of a Signal is I think a fairly elegant solution and gets ride of some of the major current issues.

1 - It allows for the ability to return a marker signal from a map function
2 - Markers are much more easily edited in similar form as signals (exclude markers near edges etc.)
3 - The navigation dimensions are consistent with the navigation dimensions of the signal (no reversing dimensions)
4 - Markers can be lazy and plotted alongside signals using the same lazy operations (caching etc.)

We can keep some of the old syntax for initializing markers but also have the ability to directly set the .data attribute as well when initializing.

I feel like I've been circling around a couple of these points and wonder if it would be better for me to layout a full plan in a HSEP? My hope was to bring together all of these parts over the winter break when I have some more time to devote to the change.

@ericpre
Copy link
Member

ericpre commented Dec 21, 2022

This is great that you are tackling this issue - indeed the markers are unusable when there are many...

@ericpre or @francisco-dlp What are your initial thoughts on the markers class being rewritten as a subclass of a signal. Specifically a ragged signal?

Conceptually, markers shouldn't be signals, because their purpose is for plotting and there are not designed to do processing on them, for example, what would be the meaning of adding line of markers. That being said, they could be benefit of re-using some of the BaseSignal class to make handling of marker easier - maybe, slicing/indexing?

Making markers a subclass of a Signal is I think a fairly elegant solution and gets ride of some of the major current issues.

1 - It allows for the ability to return a marker signal from a map function

Conceptually, there are two different things. For example, in case of peak indexing: the signal contains the actual peak position and the markers are the representation of this signal. I would keep this distinction.

2 - Markers are much more easily edited in similar form as signals (exclude markers near edges etc.)
3 - The navigation dimensions are consistent with the navigation dimensions of the signal (no reversing dimensions)
4 - Markers can be lazy and plotted alongside signals using the same lazy operations (caching etc.)

I don't think that I follow this point. What is the advantage of having lazy markers? There are most of the time not large?

@CSSFrancis
Copy link
Member Author

CSSFrancis commented Apr 3, 2023

@ericpre I realized I never got back to you. I finally got back around to this problem and have some thoughts etc.

Conceptually, markers shouldn't be signals, because their purpose is for plotting and there are not designed to do processing on them, for example, what would be the meaning of adding line of markers. That being said, they could be benefit of re-using some of the BaseSignal class to make handling of marker easier - maybe, slicing/indexing?

Slicing and indexing is a good thing to add. It would be nice to say slice the navigation axes of a signal and retain the markers. As far as functions to markers go there are cases where things like shifting a marker or a bunch of markers might be very useful.

Conceptually, there are two different things. For example, in case of peak indexing: the signal contains the actual peak position and the markers are the representation of this signal. I would keep this distinction.

There is a distinction here but it is very small. There is no reason that the signal can't have the actual peak position. In many cases due to how hyperspy prioritizes real units it is better to have the real units. This is something that it would be better to keep track of rather than to assume that the user is keeping track of. In pyxem it is a real pain trying to separate real units and pixel units....

I don't think that I follow this point. What is the advantage of having lazy markers? There are most of the time not large?

The major benefit of lazy makers isn't the size in memory rather the processing time. When you are working with lazy signals you don't want to compute things until you have to.

The main workflow this is important for is something like the workflow below where you want to have a set of lazy peaks created from a method which runs lazily. While the same thing could be accomplished by slicing the data you lose a lot of what make lazy processing nice such as the ability to test small parts and then use the same code to process the whole dataset. For plotting this also allows you to just move to a different section of the dataset and see if the same function works well.

If you have a 100 Gb dataset it takes around 10 minutes to process so it is very hard to iterate quickly unless you have a fully lazy workflow. This is part of the reason also why having markers as Signals would be nice as you could use s.set_signal_type("marker") to convert something to a marker.

peaks = s.find_peaks()
print(peaks) # Lazy BaseSignal
lazy_marker = to_marker(peaks) # Lazy Marker
s.add_marker(lazy_marker)
s.plot()

@CSSFrancis
Copy link
Member Author

CSSFrancis commented May 4, 2023

@ericpre So this is a bit more broken than I first anticipated and has been a point of pain and fusteration.

I've been trying to make this consistent but ultimately it seems like there isn't a great way to handle all of the different markers. The big problem is that incorporating text in any reasonable fashion means that things don't work well.

I think we should strip out the Text marker and have that be a separate class otherwise it forces the other markers to take lots of performance hits and stops them from being completely lazy etc.

Then we could completely break the marker functionality and make it better but I don't know if that is the right thing to do.

@CSSFrancis
Copy link
Member Author

CSSFrancis commented May 4, 2023

I think the best way to handle this would be to have the data variable either directly set or through kwargs.

For example:

class MarkerBase(object):
...

    def set_data(self, data=None, **kwargs):
        if data is not None: # Case 1
            self.data = data
        elif any([kwargs[k].dtype == object for k in kwargs]): # Case 2
            shapes = np.array([np.array(kwargs[k]).shape for k in kwargs])
            data = np.empty(shape=shapes[0],
                                        dtype=object)
            for ind in np.ndindex[shapes[0]]:
                data[ind] = np.stack([kwargs[k] for k in kwargs],axis=1)  # Make into block array
            self.data = data
        else:
             self.data = np.stack([kwargs[k] for k in kwargs],axis=1)  # Make into block array

This does stop you from being able to define an iterating marker like this

Point(x= range(3), y=range(3)) but we could add something like: Point(x= range(3), y=range(3), iterating=True)

Either way we would have to have the text marker defined through some other MarkerBase class.

@CSSFrancis CSSFrancis mentioned this pull request May 8, 2023
21 tasks
@CSSFrancis
Copy link
Member Author

Closed in Favor of #3148

@CSSFrancis CSSFrancis closed this May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants