-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New signal class and tools for in-situ 4D-STEM data analysis #900
Conversation
@SyHuang19 very cool. Some initial thoughts. Drift CorrectionThere are potentially three different ways this can be done:
In all cases this is hard. I wonder if you have considered adding rechunker as an optional dependency? In that case what should probably happen is rechunking in the navigation dimension so that the navigation dimension is spanned. Then drift correction can be easily applied. In that case the data is saved/loaded 4 times as well (once to save a partial result, once to load the rechunking, once to save a partial result and once to load the data chunked back). Maybe something like: def rechunk_offline(self, nav_chunks=None, sig_chunks=-1, **kwargs):
import from rechunker import rechunk # fails if not installed
if not isinstance(sig_chunks, tuple):
sig_chunks = (sig_chunks,)*len(self.axes_manager.signal_shape)
if not isinstance(nav_chunks, tuple):
nav_chunks = (nav_chunks,)*len(self.axes_manager.navigation_shape)
new_chunks = nav_chunks + sig_chunks
new_data = rechunk(self.data, new_chunks, **kwargs)
self.data = new_data Then the code could look like this: def register(self, method="rigid interpolation"):
self.rechunk_offine(nav_chunks=(1,-1,-1), sig_chunks="auto)
transposed = self.transpose(signal_axes=(1,2))
registered = transposed.map(method_dict[method])
registered = registered.transpose(signal_axes=(1,2))
return registered Part of me wants to really avoid the This stuff is hard because a general solution is hard and being chunk agnostic is the wrong thing to do in many cases and yet we kind of have to be in this case. Now that I think of it maybe we should include a That would help things out a significant amount and maybe lower the barrier to writing one of these functions that operate on different chunk schemes. |
@CSSFrancis Thanks for the input.
What I've implemented at the moment is your option 2: rigid registration with interpolation. I don't think the level of experimental accuracy in my experiment at least warrant non-rigid registration (in fact, even without interpolation had produced similar result quality in my early attempts). But that could be a topic for another day.
I did, but as you've described, using of rechunker requires saving/loading several intermediate files in hard disk. I don't know if that level of autonomy is good for one class method (even if you know what you are doing). For me, as a user I would rather have the ability to control the chunking of my signal than running one line of code that may or may not use up gigabytes of disk space in the background. I like the idea of adding this
Yes, I can try benchmarking it after we get our computing hardwares back online. I used
rechunker is fast and one thing I like about it is that the number of tasks only dependents on your target chunk. So even if you have a lot of cross rechunking it wouldn't create too many overhead for the scheduler. But as mentioned above I feel kinda iffy about allowing Maybe we can write a I'll benchmark the speed of rechunker vs. da.overlap. I anticipate rechunker to be faster. Then perhaps I can restructure the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SyHuang19 Looks pretty good. Just some minor comments and then we can probably merge this pretty soon!
|
||
""" | ||
roi = kwargs.pop("roi", None) | ||
ref = self.get_time_series(roi=roi, time_axis=time_axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this today. It might also be useful to be able to pass multiple ROI's and then return multiple shifts and take the average. This is more useful in my case where I don't really have distinct features but I can still align on the glass features.
roi = kwargs.pop("roi", None) | ||
ref = self.get_time_series(roi=roi, time_axis=time_axis) | ||
|
||
shift_reference = kwargs.pop("reference", "cascade") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to just to pass these directly rather than as keyword arguments. That at least makes them more visible in the doc string and accomplishes the same thing.
|
||
return shift_vectors | ||
|
||
def correct_real_space_drift(self, shifts=None, time_axis=2, lazy_result=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super comfortable with this function but I think it probably stays in there for the time being. It's useful in a pinch but not really an ideal way to handle things.
We should try to include an example of how to use the rechunker
package in the documentation.
|
||
xdrift = shifts.data[:, 0] | ||
ydrift = shifts.data[:, 1] | ||
xs = Signal1D( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think that this is overkill.
The map
function will copy arrays to every single function so you should just be able to pass:
s_transposed.map(_register_drift_2d,
shift1=xdrift,
shift2=ydrift,
inplace=False
)
Without repeating the data. The added benefit is that it will use less memory duplicating the shift array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The xdrift
/'ydrift' parameters are different for different time coordinates, which is a navigation axes. Therefore I need to pass it as signal that has the same navigation dimension as s_transposed
.
|
||
registered_data_t = registered_data.transpose(navigation_axes=[-2, -1, -3]) | ||
registered_data_t.set_signal_type("insitu_diffraction") | ||
if self._lazy and not lazy_result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be handled by the map
function. (see above)
@@ -0,0 +1,144 @@ | |||
# -*- coding: utf-8 -*- | |||
# Copyright 2016-2022 The pyXem developers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright 2016-2022 The pyXem developers | |
# Copyright 2016-2023 The pyXem developers |
@@ -0,0 +1,339 @@ | |||
# -*- coding: utf-8 -*- | |||
# Copyright 2016-2022 The pyXem developers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright 2016-2022 The pyXem developers | |
# Copyright 2016-2023 The pyXem developers |
pyxem/utils/insitu_utils.py
Outdated
@@ -0,0 +1,169 @@ | |||
# -*- coding: utf-8 -*- | |||
# Copyright 2016-2022 The pyXem developers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright 2016-2022 The pyXem developers | |
# Copyright 2016-2023 The pyXem developers |
pyxem/utils/insitu_utils.py
Outdated
import scipy.signal as ss | ||
|
||
|
||
def _register_drift_5d(data, shifts1, shifts2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth it to be able to set your order here. I think that there are a couple of places where order=0
would be faster/quicker.
pyxem/utils/insitu_utils.py
Outdated
return data_t | ||
|
||
|
||
def _register_drift_2d(data, shift1, shift2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above...
@SyHuang19 it would also be a good idea to have the insitu_utils.py file completely covered by tests. The code style is just because you haven't run |
name: New signal class and tools for in-situ 4D-STEM data analysis
about: This PR aims to add in some support for in-situ data analysis specifically for 4D-STEM time series.
Checklist
What does this PR do? Please describe and/or link to an open issue.
This PR builds upon my own code to analyze 5-dimensional dataset from my experiments (an early version of the analysis code can be found in 5D_ECM_analysis). I think it would be beneficial for pyxem to have a separate signal class to deal with some general and specific analysis for in-situ 4D/5D data. I started building some general tools such as time series reconstruction and drift correction. And I also included time correlation calculation as a specialty analysis function, the math of which can be found here.
There is more work to be done, and I'm open to suggestion of general tools and function that this signal class could use.
Are there any known issues? Do you need help?
I'm not completely settled on the axes structure of this signal class. Currently I'm treating the time axis as the 3rd navigation dimension so the signal would look like this
(N_x,N_y,N_t|N_kx,N_ky)
. This has been working out for me pretty well in both of the following case:But I don't know if there is a clever way to do this. The current structure handles the functions well but that might change if more tools and functions are to be added.
Work in progress?
Potentially new general tools. @CSSFrancis since you have been working on 5D-dataset as well can you take a look at this and see if you can think of any general operation that might be useful?
Some specific functionalities to be added for time correlation (ECM) analysis:
Other todo items