New signal class and tools for in-situ 4D-STEM data analysis #900

SyHuang19 · 2023-01-17T22:39:35Z

name: New signal class and tools for in-situ 4D-STEM data analysis
about: This PR aims to add in some support for in-situ data analysis specifically for 4D-STEM time series.

Checklist

Add general tools to deal with in situ data set
- Time series reconstruction with arbitrary virtual detector
- Specimen drift registration using said reconstruction
Specific analysis functionalities
- General time correlation calculation

What does this PR do? Please describe and/or link to an open issue.
This PR builds upon my own code to analyze 5-dimensional dataset from my experiments (an early version of the analysis code can be found in 5D_ECM_analysis). I think it would be beneficial for pyxem to have a separate signal class to deal with some general and specific analysis for in-situ 4D/5D data. I started building some general tools such as time series reconstruction and drift correction. And I also included time correlation calculation as a specialty analysis function, the math of which can be found here.

There is more work to be done, and I'm open to suggestion of general tools and function that this signal class could use.

Are there any known issues? Do you need help?
I'm not completely settled on the axes structure of this signal class. Currently I'm treating the time axis as the 3rd navigation dimension so the signal would look like this (N_x,N_y,N_t|N_kx,N_ky). This has been working out for me pretty well in both of the following case:

If a function is mapped over signal axes, everything works the same as a Signal2D.
If a function is to be mapped over time axis, the time axes is then transposed into signal axis before the function is applied. In this case, the navigation shape would change or a further transpose is needed to retain the navigation shape, depending on what further analysis you would want to do to the result.

But I don't know if there is a clever way to do this. The current structure handles the functions well but that might change if more tools and functions are to be added.

Work in progress?
Potentially new general tools. @CSSFrancis since you have been working on 5D-dataset as well can you take a look at this and see if you can think of any general operation that might be useful?
Some specific functionalities to be added for time correlation (ECM) analysis:

Different k averaging scheme for $g_2(t)$
Time resampling for $g_2(t)$

Other todo items

Add lazy signal class
Add tests for $g_2(t)$ calculations

coveralls · 2023-01-17T23:07:40Z

Coverage: 95.986% (-0.01%) from 95.997% when pulling ba0f363 on SyHuang19:InSitu4DSTEM into 348c392 on pyxem:main.

coveralls · 2023-01-17T23:07:42Z

Coverage: 95.651% (-0.4%) from 96.017% when pulling 11689ce on SyHuang19:InSitu4DSTEM into 5456873 on pyxem:main.

CSSFrancis · 2023-01-18T18:24:56Z

@SyHuang19 very cool. Some initial thoughts.

Drift Correction

There are potentially three different ways this can be done:

Non-rigid registration: This is the most complicated and requires that for each (x,y) real space point a displacement is given. Then the image is reconstructed from interpolation of the displacement.
Rigid Registration with Interpolation: In this case the same registration is applied to every single (x,y) pixel. Then the data is reconstructed from the interpolation. This is almost as complicated as 1. as it still requires interpolation
Rigid Registration without interpolation: This is the easiest case and in cases where resolution requirement aren't terribly high this is the right thing to do. This can be done much more efficiently lazily as you only have to at most load 4 chunks at one time.

In all cases this is hard. I wonder if you have considered adding rechunker as an optional dependency?

In that case what should probably happen is rechunking in the navigation dimension so that the navigation dimension is spanned. Then drift correction can be easily applied. In that case the data is saved/loaded 4 times as well (once to save a partial result, once to load the rechunking, once to save a partial result and once to load the data chunked back).

Maybe something like:

def rechunk_offline(self, nav_chunks=None, sig_chunks=-1, **kwargs):
    import from rechunker import rechunk # fails if not installed
    if not isinstance(sig_chunks, tuple):
        sig_chunks = (sig_chunks,)*len(self.axes_manager.signal_shape)
    if not isinstance(nav_chunks, tuple):
        nav_chunks = (nav_chunks,)*len(self.axes_manager.navigation_shape)
    new_chunks = nav_chunks + sig_chunks
    
    new_data = rechunk(self.data, new_chunks, **kwargs)
    self.data = new_data

Then the code could look like this:

def register(self, method="rigid interpolation"):
    self.rechunk_offine(nav_chunks=(1,-1,-1),  sig_chunks="auto)
    transposed = self.transpose(signal_axes=(1,2))
    registered = transposed.map(method_dict[method]) 
    registered = registered.transpose(signal_axes=(1,2))
    return registered

Part of me wants to really avoid the da.overlap function unless it is completely necessary. It might be worth bench marking this as you already have the hard version coded. It is possible that using the rechunker function is just as slow.

This stuff is hard because a general solution is hard and being chunk agnostic is the wrong thing to do in many cases and yet we kind of have to be in this case.

Now that I think of it maybe we should include a map_axes function which applies a function to arbitrary axes. Then there is the option to rechunk the data using the rechunker package or rechunking in dask. In our case this might be really helpful as we can temporarily store the data in a scratch directory. Does that sound reasonable? You've used the rechunker package more than I have so I'm not sure how fast it is.

That would help things out a significant amount and maybe lower the barrier to writing one of these functions that operate on different chunk schemes.

SyHuang19 · 2023-01-19T23:44:01Z

@CSSFrancis Thanks for the input.

There are potentially three different ways this can be done

What I've implemented at the moment is your option 2: rigid registration with interpolation. I don't think the level of experimental accuracy in my experiment at least warrant non-rigid registration (in fact, even without interpolation had produced similar result quality in my early attempts). But that could be a topic for another day.

I wonder if you have considered adding rechunker as an optional dependency?

I did, but as you've described, using of rechunker requires saving/loading several intermediate files in hard disk. I don't know if that level of autonomy is good for one class method (even if you know what you are doing). For me, as a user I would rather have the ability to control the chunking of my signal than running one line of code that may or may not use up gigabytes of disk space in the background. I like the idea of adding this rechunk_offline you suggest though. Maybe that can be added as a utility function.

It might be worth bench marking this as you already have the hard version coded. It is possible that using the rechunker function is just as slow.

Yes, I can try benchmarking it after we get our computing hardwares back online. I used da.overlap initially because I didn't want the hassle of using rechunker every step of the way, but that could very possibly be faster.

Now that I think of it maybe we should include a map_axes function which applies a function to arbitrary axes. Then there is the option to rechunk the data using the rechunker package or rechunking in dask. In our case this might be really helpful as we can temporarily store the data in a scratch directory. Does that sound reasonable? You've used the rechunker package more than I have so I'm not sure how fast it is.

rechunker is fast and one thing I like about it is that the number of tasks only dependents on your target chunk. So even if you have a lot of cross rechunking it wouldn't create too many overhead for the scheduler. But as mentioned above I feel kinda iffy about allowing rechunk_offline in the background for any function.

Maybe we can write a map_axes function that only work with the correct chunking? But then all it does other than what map already does is just handling the axes and transposition.

I'll benchmark the speed of rechunker vs. da.overlap. I anticipate rechunker to be faster. Then perhaps I can restructure the correct_real_space_drift function. rechunk_offline that uses rechunker as dependency can perhaps go into lazy_tools as a standalone utility function, and we can discuss further about using it in signal class functions.

CSSFrancis

@SyHuang19 Looks pretty good. Just some minor comments and then we can probably merge this pretty soon!

CSSFrancis · 2023-06-05T13:49:42Z

pyxem/signals/insitu_diffraction2d.py

+
+        """
+        roi = kwargs.pop("roi", None)
+        ref = self.get_time_series(roi=roi, time_axis=time_axis)


I was thinking about this today. It might also be useful to be able to pass multiple ROI's and then return multiple shifts and take the average. This is more useful in my case where I don't really have distinct features but I can still align on the glass features.

CSSFrancis · 2023-06-05T13:52:01Z

pyxem/signals/insitu_diffraction2d.py

+        roi = kwargs.pop("roi", None)
+        ref = self.get_time_series(roi=roi, time_axis=time_axis)
+
+        shift_reference = kwargs.pop("reference", "cascade")


It might be better to just to pass these directly rather than as keyword arguments. That at least makes them more visible in the doc string and accomplishes the same thing.

CSSFrancis · 2023-06-05T13:56:41Z

pyxem/signals/insitu_diffraction2d.py

+
+        return shift_vectors
+
+    def correct_real_space_drift(self, shifts=None, time_axis=2, lazy_result=True):


I'm not super comfortable with this function but I think it probably stays in there for the time being. It's useful in a pinch but not really an ideal way to handle things.

We should try to include an example of how to use the rechunker package in the documentation.

CSSFrancis · 2023-06-05T14:04:16Z

pyxem/signals/insitu_diffraction2d.py

+
+        xdrift = shifts.data[:, 0]
+        ydrift = shifts.data[:, 1]
+        xs = Signal1D(


So I think that this is overkill.

The map function will copy arrays to every single function so you should just be able to pass:

s_transposed.map(_register_drift_2d, shift1=xdrift, shift2=ydrift, inplace=False )

Without repeating the data. The added benefit is that it will use less memory duplicating the shift array

The xdrift/'ydrift' parameters are different for different time coordinates, which is a navigation axes. Therefore I need to pass it as signal that has the same navigation dimension as s_transposed.

CSSFrancis · 2023-06-05T14:05:39Z

pyxem/signals/insitu_diffraction2d.py

+
+        registered_data_t = registered_data.transpose(navigation_axes=[-2, -1, -3])
+        registered_data_t.set_signal_type("insitu_diffraction")
+        if self._lazy and not lazy_result:


This should be handled by the map function. (see above)

CSSFrancis · 2023-06-05T14:07:07Z

pyxem/tests/signals/test_insitu_diffraction2d.py

@@ -0,0 +1,144 @@
+# -*- coding: utf-8 -*-
+# Copyright 2016-2022 The pyXem developers


Suggested change

# Copyright 2016-2022 The pyXem developers

# Copyright 2016-2023 The pyXem developers

CSSFrancis · 2023-06-05T14:07:18Z

pyxem/signals/insitu_diffraction2d.py

@@ -0,0 +1,339 @@
+# -*- coding: utf-8 -*-
+# Copyright 2016-2022 The pyXem developers


Suggested change

# Copyright 2016-2022 The pyXem developers

# Copyright 2016-2023 The pyXem developers

CSSFrancis · 2023-06-05T14:08:08Z

pyxem/utils/insitu_utils.py

@@ -0,0 +1,169 @@
+# -*- coding: utf-8 -*-
+# Copyright 2016-2022 The pyXem developers


Suggested change

# Copyright 2016-2022 The pyXem developers

# Copyright 2016-2023 The pyXem developers

CSSFrancis · 2023-06-05T14:09:45Z

pyxem/utils/insitu_utils.py

+import scipy.signal as ss
+
+
+def _register_drift_5d(data, shifts1, shifts2):


It might be worth it to be able to set your order here. I think that there are a couple of places where order=0 would be faster/quicker.

CSSFrancis · 2023-06-05T14:10:54Z

pyxem/utils/insitu_utils.py

+    return data_t
+
+
+def _register_drift_2d(data, shift1, shift2):


Same as above...

CSSFrancis · 2023-06-05T18:55:51Z

@SyHuang19 it would also be a good idea to have the insitu_utils.py file completely covered by tests.

The code style is just because you haven't run black on the changes. We can always merge the PR and then it will automatically do make a new PR to run black so that isn't a big deal.

SyHuang19 and others added 6 commits January 5, 2023 16:45

Created in situ diffraction signal class

5eed9a0

Added general tests for in situ diffraction signal class

8d2d6d6

Added module for auto-determining drift from original signal

e276b4d

Merge branch 'pyxem:master' into InSitu4DSTEM

8a9bb6e

Add g2 calculation module

e2a356c

Merge branch 'pyxem:main' into InSitu4DSTEM

11689ce

CSSFrancis mentioned this pull request Feb 1, 2023

Version 0.15 Release Tracker #891

Closed

16 tasks

SyHuang19 and others added 8 commits February 17, 2023 13:31

Restructure drift correction and bugfix

9d3a3fb

Merge remote-tracking branch 'origin/InSitu4DSTEM' into InSitu4DSTEM

6d345e7

Bugfix

0971a67

Add tests and bugfixes

b1715d0

Add fast method for drift registration using optimized chunking scheme

d50b6d3

Add time axis resampling

23876c6

update docstring

e7278bd

Merge branch 'pyxem:main' into InSitu4DSTEM

a5a4b66

SyHuang19 marked this pull request as ready for review March 2, 2023 23:25

SyHuang19 and others added 3 commits May 30, 2023 17:30

Update docstrings

c34e56d

Merge remote-tracking branch 'origin/InSitu4DSTEM' into InSitu4DSTEM

359457f

Merge branch 'pyxem:main' into InSitu4DSTEM

ba0f363

CSSFrancis reviewed Jun 5, 2023

View reviewed changes

CSSFrancis mentioned this pull request Jun 5, 2023

0.16.0 Release #936

Closed

10 tasks

Code fixes, added tests

878b69a

CSSFrancis merged commit 54e1ac7 into pyxem:main Jun 10, 2023
8 of 10 checks passed

ericpre mentioned this pull request Jan 2, 2024

Slicing TransientSpec LumiSpy/lumispy#204

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New signal class and tools for in-situ 4D-STEM data analysis #900

New signal class and tools for in-situ 4D-STEM data analysis #900

SyHuang19 commented Jan 17, 2023 •

edited

Loading

coveralls commented Jan 17, 2023 •

edited

Loading

coveralls commented Jan 17, 2023

CSSFrancis commented Jan 18, 2023

SyHuang19 commented Jan 19, 2023

CSSFrancis left a comment

CSSFrancis Jun 5, 2023

CSSFrancis Jun 5, 2023

CSSFrancis Jun 5, 2023

CSSFrancis Jun 5, 2023

SyHuang19 Jun 5, 2023

CSSFrancis Jun 5, 2023

CSSFrancis Jun 5, 2023

CSSFrancis Jun 5, 2023

CSSFrancis Jun 5, 2023

CSSFrancis Jun 5, 2023

CSSFrancis Jun 5, 2023

CSSFrancis commented Jun 5, 2023


		return shift_vectors

		def correct_real_space_drift(self, shifts=None, time_axis=2, lazy_result=True):

		@@ -0,0 +1,144 @@
		# -- coding: utf-8 --
		# Copyright 2016-2022 The pyXem developers

	# Copyright 2016-2022 The pyXem developers
	# Copyright 2016-2023 The pyXem developers

		@@ -0,0 +1,339 @@
		# -- coding: utf-8 --
		# Copyright 2016-2022 The pyXem developers

		@@ -0,0 +1,169 @@
		# -- coding: utf-8 --
		# Copyright 2016-2022 The pyXem developers

		import scipy.signal as ss


		def _register_drift_5d(data, shifts1, shifts2):

New signal class and tools for in-situ 4D-STEM data analysis #900

New signal class and tools for in-situ 4D-STEM data analysis #900

Conversation

SyHuang19 commented Jan 17, 2023 • edited Loading

coveralls commented Jan 17, 2023 • edited Loading

coveralls commented Jan 17, 2023

CSSFrancis commented Jan 18, 2023

Drift Correction

SyHuang19 commented Jan 19, 2023

CSSFrancis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CSSFrancis commented Jun 5, 2023

SyHuang19 commented Jan 17, 2023 •

edited

Loading

coveralls commented Jan 17, 2023 •

edited

Loading