New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorized Markers #3076
Vectorized Markers #3076
Conversation
Codecov ReportBase: 80.74% // Head: 80.71% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## RELEASE_next_major #3076 +/- ##
======================================================
- Coverage 80.74% 80.71% -0.03%
======================================================
Files 176 176
Lines 24486 24513 +27
Branches 5452 5462 +10
======================================================
+ Hits 19770 19786 +16
- Misses 3391 3394 +3
- Partials 1325 1333 +8
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@ericpre or @francisco-dlp What are your initial thoughts on the markers class being rewritten as a subclass of a signal. Specifically a ragged signal? This PR fixes some of the issues with slowness related plotting but there is still a pretty major missing functionality in that markers can't be lazy/ plotted lazily. For large 4-D STEM datasets this is fairly crippling to the Strain mapping/ diffraction spot finding code as you can't use any of the interactive features without the system grinding to a halt and it reduces your ability to do any iterative discovery. Making markers a subclass of a Signal is I think a fairly elegant solution and gets ride of some of the major current issues. 1 - It allows for the ability to return a marker signal from a map function We can keep some of the old syntax for initializing markers but also have the ability to directly set the I feel like I've been circling around a couple of these points and wonder if it would be better for me to layout a full plan in a HSEP? My hope was to bring together all of these parts over the winter break when I have some more time to devote to the change. |
This is great that you are tackling this issue - indeed the markers are unusable when there are many...
Conceptually, markers shouldn't be signals, because their purpose is for plotting and there are not designed to do processing on them, for example, what would be the meaning of adding line of markers. That being said, they could be benefit of re-using some of the
Conceptually, there are two different things. For example, in case of peak indexing: the signal contains the actual peak position and the markers are the representation of this signal. I would keep this distinction.
I don't think that I follow this point. What is the advantage of having lazy markers? There are most of the time not large? |
… Switched to matplotlib.Collections.LineCollection
9dd03e5
to
4049554
Compare
@ericpre I realized I never got back to you. I finally got back around to this problem and have some thoughts etc.
Slicing and indexing is a good thing to add. It would be nice to say slice the navigation axes of a signal and retain the markers. As far as functions to markers go there are cases where things like shifting a marker or a bunch of markers might be very useful.
There is a distinction here but it is very small. There is no reason that the signal can't have the actual peak position. In many cases due to how hyperspy prioritizes real units it is better to have the real units. This is something that it would be better to keep track of rather than to assume that the user is keeping track of. In
The major benefit of lazy makers isn't the size in memory rather the processing time. When you are working with lazy signals you don't want to compute things until you have to. The main workflow this is important for is something like the workflow below where you want to have a set of lazy peaks created from a method which runs lazily. While the same thing could be accomplished by slicing the data you lose a lot of what make lazy processing nice such as the ability to test small parts and then use the same code to process the whole dataset. For plotting this also allows you to just move to a different section of the dataset and see if the same function works well. If you have a 100 Gb dataset it takes around 10 minutes to process so it is very hard to iterate quickly unless you have a fully lazy workflow. This is part of the reason also why having markers as Signals would be nice as you could use
|
9203c67
to
aac2fea
Compare
@ericpre So this is a bit more broken than I first anticipated and has been a point of pain and fusteration. I've been trying to make this consistent but ultimately it seems like there isn't a great way to handle all of the different markers. The big problem is that incorporating text in any reasonable fashion means that things don't work well. I think we should strip out the Text marker and have that be a separate class otherwise it forces the other markers to take lots of performance hits and stops them from being completely lazy etc. Then we could completely break the marker functionality and make it better but I don't know if that is the right thing to do. |
I think the best way to handle this would be to have the data variable either directly set or through kwargs. For example: class MarkerBase(object):
...
def set_data(self, data=None, **kwargs):
if data is not None: # Case 1
self.data = data
elif any([kwargs[k].dtype == object for k in kwargs]): # Case 2
shapes = np.array([np.array(kwargs[k]).shape for k in kwargs])
data = np.empty(shape=shapes[0],
dtype=object)
for ind in np.ndindex[shapes[0]]:
data[ind] = np.stack([kwargs[k] for k in kwargs],axis=1) # Make into block array
self.data = data
else:
self.data = np.stack([kwargs[k] for k in kwargs],axis=1) # Make into block array This does stop you from being able to define an iterating marker like this
Either way we would have to have the text marker defined through some other MarkerBase class. |
Closed in Favor of #3148 |
Description of the change
Change described by #3075
Progress of the PR
upcoming_changes
folder (seeupcoming_changes/README.rst
),readthedocs
doc build of this PR (link in github checks)Minimal example of the bug fix or the new feature