Vectorized Markers #3076

CSSFrancis · 2022-12-09T19:35:00Z

Description of the change

Change described by #3075

Allows ragged point Markers initialized
Allowed ragged line segment markers initialized.
Speeds up plotting many markers by ~100 times for plots with 100 or more active markers.
Allows for ragged initialization of points/ markers.

Progress of the PR

Change implemented (can be split into several points),
update docstring (if appropriate),
update user guide (if appropriate),
add an changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
Check formatting changelog entry in the readthedocs doc build of this PR (link in github checks)
add tests,
ready for review.

Minimal example of the bug fix or the new feature

x= np.empty((10,10), dtype=object)
y= np.empty((10,10), dtype=object)

points = np.random.randint(0, 100, (2,10,10, 500))

for i in np.ndindex(10,10):
    x_ind = (0,)+i
    y_ind = (1,)+i
    x[i]=points[x_ind]
    y[i]=points[y_ind]


s = hs.signals.Signal2D(np.random.random((10,10,100,100)))
markers =hs.plot.markers.Point(x =x,y=y,color="blue",)   # only adds one ragged marker object to the dataset. 
s.add_marker(markers,plot_marker=False, permanent=True, render_figure=False)
s.plot(vmin=0, vmax=1)

codecov · 2022-12-09T19:49:48Z

Codecov Report

Base: 80.74% // Head: 80.71% // Decreases project coverage by -0.03% ⚠️

Coverage data is based on head (4049554) compared to base (e0b2956).
Patch coverage: 73.33% of modified lines in pull request are covered.

Additional details and impacted files

@@                  Coverage Diff                   @@
##           RELEASE_next_major    #3076      +/-   ##
======================================================
- Coverage               80.74%   80.71%   -0.03%     
======================================================
  Files                     176      176              
  Lines                   24486    24513      +27     
  Branches                 5452     5462      +10     
======================================================
+ Hits                    19770    19786      +16     
- Misses                   3391     3394       +3     
- Partials                 1325     1333       +8

Impacted Files	Coverage Δ
hyperspy/drawing/_markers/line_segment.py	`84.00% <66.66%> (-16.00%)`	⬇️
hyperspy/drawing/_markers/point.py	`100.00% <100.00%> (ø)`
hyperspy/drawing/marker.py	`85.81% <100.00%> (+0.41%)`	⬆️
hyperspy/misc/math_tools.py	`77.41% <0.00%> (-4.84%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

CSSFrancis · 2022-12-17T14:53:54Z

@ericpre or @francisco-dlp What are your initial thoughts on the markers class being rewritten as a subclass of a signal. Specifically a ragged signal?

This PR fixes some of the issues with slowness related plotting but there is still a pretty major missing functionality in that markers can't be lazy/ plotted lazily.

For large 4-D STEM datasets this is fairly crippling to the Strain mapping/ diffraction spot finding code as you can't use any of the interactive features without the system grinding to a halt and it reduces your ability to do any iterative discovery.

Making markers a subclass of a Signal is I think a fairly elegant solution and gets ride of some of the major current issues.

1 - It allows for the ability to return a marker signal from a map function
2 - Markers are much more easily edited in similar form as signals (exclude markers near edges etc.)
3 - The navigation dimensions are consistent with the navigation dimensions of the signal (no reversing dimensions)
4 - Markers can be lazy and plotted alongside signals using the same lazy operations (caching etc.)

We can keep some of the old syntax for initializing markers but also have the ability to directly set the .data attribute as well when initializing.

I feel like I've been circling around a couple of these points and wonder if it would be better for me to layout a full plan in a HSEP? My hope was to bring together all of these parts over the winter break when I have some more time to devote to the change.

ericpre · 2022-12-21T13:29:38Z

This is great that you are tackling this issue - indeed the markers are unusable when there are many...

@ericpre or @francisco-dlp What are your initial thoughts on the markers class being rewritten as a subclass of a signal. Specifically a ragged signal?

Conceptually, markers shouldn't be signals, because their purpose is for plotting and there are not designed to do processing on them, for example, what would be the meaning of adding line of markers. That being said, they could be benefit of re-using some of the BaseSignal class to make handling of marker easier - maybe, slicing/indexing?

Making markers a subclass of a Signal is I think a fairly elegant solution and gets ride of some of the major current issues.

1 - It allows for the ability to return a marker signal from a map function

Conceptually, there are two different things. For example, in case of peak indexing: the signal contains the actual peak position and the markers are the representation of this signal. I would keep this distinction.

2 - Markers are much more easily edited in similar form as signals (exclude markers near edges etc.)
3 - The navigation dimensions are consistent with the navigation dimensions of the signal (no reversing dimensions)
4 - Markers can be lazy and plotted alongside signals using the same lazy operations (caching etc.)

I don't think that I follow this point. What is the advantage of having lazy markers? There are most of the time not large?

… Switched to matplotlib.Collections.LineCollection

…kers

CSSFrancis · 2023-04-03T16:50:05Z

@ericpre I realized I never got back to you. I finally got back around to this problem and have some thoughts etc.

Conceptually, markers shouldn't be signals, because their purpose is for plotting and there are not designed to do processing on them, for example, what would be the meaning of adding line of markers. That being said, they could be benefit of re-using some of the BaseSignal class to make handling of marker easier - maybe, slicing/indexing?

Slicing and indexing is a good thing to add. It would be nice to say slice the navigation axes of a signal and retain the markers. As far as functions to markers go there are cases where things like shifting a marker or a bunch of markers might be very useful.

Conceptually, there are two different things. For example, in case of peak indexing: the signal contains the actual peak position and the markers are the representation of this signal. I would keep this distinction.

There is a distinction here but it is very small. There is no reason that the signal can't have the actual peak position. In many cases due to how hyperspy prioritizes real units it is better to have the real units. This is something that it would be better to keep track of rather than to assume that the user is keeping track of. In pyxem it is a real pain trying to separate real units and pixel units....

I don't think that I follow this point. What is the advantage of having lazy markers? There are most of the time not large?

The major benefit of lazy makers isn't the size in memory rather the processing time. When you are working with lazy signals you don't want to compute things until you have to.

The main workflow this is important for is something like the workflow below where you want to have a set of lazy peaks created from a method which runs lazily. While the same thing could be accomplished by slicing the data you lose a lot of what make lazy processing nice such as the ability to test small parts and then use the same code to process the whole dataset. For plotting this also allows you to just move to a different section of the dataset and see if the same function works well.

If you have a 100 Gb dataset it takes around 10 minutes to process so it is very hard to iterate quickly unless you have a fully lazy workflow. This is part of the reason also why having markers as Signals would be nice as you could use s.set_signal_type("marker") to convert something to a marker.

peaks = s.find_peaks()
print(peaks) # Lazy BaseSignal
lazy_marker = to_marker(peaks) # Lazy Marker
s.add_marker(lazy_marker)
s.plot()

CSSFrancis · 2023-05-04T20:41:38Z

@ericpre So this is a bit more broken than I first anticipated and has been a point of pain and fusteration.

I've been trying to make this consistent but ultimately it seems like there isn't a great way to handle all of the different markers. The big problem is that incorporating text in any reasonable fashion means that things don't work well.

I think we should strip out the Text marker and have that be a separate class otherwise it forces the other markers to take lots of performance hits and stops them from being completely lazy etc.

Then we could completely break the marker functionality and make it better but I don't know if that is the right thing to do.

CSSFrancis · 2023-05-04T20:59:50Z

I think the best way to handle this would be to have the data variable either directly set or through kwargs.

For example:

class MarkerBase(object):
...

    def set_data(self, data=None, **kwargs):
        if data is not None: # Case 1
            self.data = data
        elif any([kwargs[k].dtype == object for k in kwargs]): # Case 2
            shapes = np.array([np.array(kwargs[k]).shape for k in kwargs])
            data = np.empty(shape=shapes[0],
                                        dtype=object)
            for ind in np.ndindex[shapes[0]]:
                data[ind] = np.stack([kwargs[k] for k in kwargs],axis=1)  # Make into block array
            self.data = data
        else:
             self.data = np.stack([kwargs[k] for k in kwargs],axis=1)  # Make into block array

This does stop you from being able to define an iterating marker like this

Point(x= range(3), y=range(3)) but we could add something like: Point(x= range(3), y=range(3), iterating=True)

Either way we would have to have the text marker defined through some other MarkerBase class.

CSSFrancis · 2023-05-08T22:30:42Z

Closed in Favor of #3148

CSSFrancis changed the base branch from RELEASE_next_minor to RELEASE_next_major December 9, 2022 19:36

CSSFrancis marked this pull request as draft December 12, 2022 17:14

CSSFrancis mentioned this pull request Dec 13, 2022

Diffraction Vectors Overhaul pyxem/pyxem#872

Open

21 tasks

CSSFrancis mentioned this pull request Dec 28, 2022

Proposal--HSEP: 4 Adding support for Labeled Columns/Advanced Slicing, Ragged Signal of Signals and Advanced Markers #3078

Open

CSSFrancis added 5 commits February 24, 2023 17:06

New Feature: Using a ragged signal to initialize point markers.

e392259

New Feature: Using a ragged signal to initialize LineSegment markers.…

a4741ce

… Switched to matplotlib.Collections.LineCollection

Bugfix: Cleaned up issue plotting single lines using old methodology

634f02f

Testing: Added tests for plotting ragged Point markers.

1bc9559

Fixed Comparison Tests: Fixed image comparison tests for plotting mar…

4049554

…kers

CSSFrancis force-pushed the VectorizedMarkers branch from 9dd03e5 to 4049554 Compare February 24, 2023 23:07

ericpre force-pushed the RELEASE_next_major branch from 9203c67 to aac2fea Compare May 3, 2023 20:39

CSSFrancis mentioned this pull request May 8, 2023

Marker Collection #3148

Merged

21 tasks

CSSFrancis closed this May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorized Markers #3076

Vectorized Markers #3076

CSSFrancis commented Dec 9, 2022

codecov bot commented Dec 9, 2022 •

edited

CSSFrancis commented Dec 17, 2022

ericpre commented Dec 21, 2022

CSSFrancis commented Apr 3, 2023 •

edited

CSSFrancis commented May 4, 2023 •

edited

CSSFrancis commented May 4, 2023 •

edited

CSSFrancis commented May 8, 2023

Vectorized Markers #3076

Vectorized Markers #3076

Conversation

CSSFrancis commented Dec 9, 2022

Description of the change

Progress of the PR

Minimal example of the bug fix or the new feature

codecov bot commented Dec 9, 2022 • edited

Codecov Report

CSSFrancis commented Dec 17, 2022

ericpre commented Dec 21, 2022

CSSFrancis commented Apr 3, 2023 • edited

CSSFrancis commented May 4, 2023 • edited

CSSFrancis commented May 4, 2023 • edited

CSSFrancis commented May 8, 2023

codecov bot commented Dec 9, 2022 •

edited

CSSFrancis commented Apr 3, 2023 •

edited

CSSFrancis commented May 4, 2023 •

edited

CSSFrancis commented May 4, 2023 •

edited