Improvements to dictionary indexing #419

hakonanes · 2021-08-23T08:04:01Z

Signed-off-by: Håkon Wiik Ånes hwaanes@gmail.com

Description of the change

This PR aims to improve the dictionary indexing implementation in kikuchipy, and simplify some parts of it.

Drawbacks in the current implementation that this PR addresses:

Any custom metric function passed by the user doesn't have control over chunking: Chunking will be optional. I found that by not rechunking, memory usage is reduced drastically while giving a small cost in computation time. Thus the user can balance these themselves. To give the user more control, I've opted to change the SimilarityMetric class to an abstract one where three methods must be provided: prepare_simulated(), prepare_experimental(), and compare(). Previously, only the compare() method could be provided by the user, where all pattern preparation had to be done. This is called in every iteration, where parts of the dictionary is compared to all experimental patterns. For example in the default normalized cross correlation metric, all experimental patterns were centred and normalized in every iteration by dask. In this new implementation, prepare_experimental() is called once before the iteration loop. The user now has to create a class instead of a function, which is more complex. However, by providing an example in the documentation, I think this should be OK.
Masking the detector isn't supported: An optional boolean mask is applied by always reshaping experimental and dictionary pattern arrays to 2D, (n patterns, n pixels), with a 1D mask equal to n pixels slicing the signal axis. We must use a 1D mask because Dask doesn't allow 2D slicing with a boolean mask. This reshaping did not increase memory usage or computation time on my machine locally when operating on dask arrays. I think we should always operate on dask arrays, since we in our default similarity metrics always casts data to either 32-bit or 64-bit floats.
The n_slices parameter isn't self explanatory. The number of simulated patterns divided by this number decides how many simulated patterns to match to all experimental patterns in each iteration. It would be better to let the user specify how many simulated patterns to match in each iteration. Also, if the dictionary is a dask array, we could use it's chunksize as this number. This will be the default option. This will also get rid of the prompt asking if we really should index when n_slices were deemed too few.
"Static"/"dynamic" dictionary indexing naming is unnecessary, since the only distinction is whether the dictionary is a dask or numpy array. Therefore, all dictionary indexing functionality, apart from the similarity metrics, is moved to one file named _dictionary_indexing.py. Whether we are indexing with a dask or numpy array is checked in an if statement, otherwise the implementation is the same in both cases.
Merging of dictionary indexing results from multiple runs should not be handled by the dictionary_indexing() method: I've decided to remove the possibility to index a dataset with multiple dictionaries "at the same" by passing multiple dictionaries to the method. The user can just produce two maps, and then merge them themselves, with only two extra lines. Benefits? The API is cleaner and it is way simpler to maintain.
Generation of an orientation similarity map should not be handled by the dictionary_indexing() method: I've decided to remove the possibility to generate an orientation similarity map with the method. The user can produce this map with one extra line themselves. See the previous point for benefits.

Extra thoughts for other PRs:

Actually, since we always reshape to 2D arrays, we could try to benefit from similarity metrics in the scipy.spatial.distance module. We would have to think more about memory usage here, though, since both the normalized cross-correlation and normalized dot product metrics currently implemented use dask's einsum for comparing patterns.

This will close #380, #420.

Unless something comes up, I anticipate these changes to be the last ones, apart from any bugfixes, to merge before releasing v0.5.

Progress of the PR

Minimal example of the bug fix or new feature

>>> import kikuchipy as kp
>>> from orix import sampling
>>> s = kp.data.nickel_ebsd_large(allow_download=True, lazy=True)
>>> s
<LazyEBSD, title: patterns Scan 1, dimensions: (75, 55|60, 60)>
>>> s.remove_static_background()
>>> s.remove_dynamic_background()
>>> mp = kp.data.nickel_ebsd_master_pattern_small(projection="lambert")
>>> mp
<EBSDMasterPattern, title: ni_mc_mp_20kv_uint8_gzip_opts9, dimensions: (|401, 401)>
>>> r = sampling.get_sample_fundamental(resolution=3, point_group=mp.phase.point_group)
>>> r
Rotation (35083,)
[[ 1.     -0.     -0.     -0.    ]
 [ 0.9916 -0.1291 -0.     -0.    ]
 [ 0.9832 -0.1826 -0.     -0.    ]
 ...
 [ 0.9474 -0.3162 -0.      0.0497]
 [ 0.9386 -0.3416 -0.      0.0492]
 [ 0.9297 -0.3651 -0.      0.0487]]
>>> signal_shape = s.axes_manager.signal_shape[::-1]
>>> detector = kp.detectors.EBSDDetector(shape=signal_shape, pc=(0.421, 0.7794, 0.5049), sample_tilt=70, convention="tsl")
>>> detector
EBSDDetector (60, 60), px_size 1 um, binning 1, tilt 0, azimuthal 0, pc (0.421, 0.221, 0.505)
>>> sim = mp.get_patterns(r, detector, energy=20, dtype_out=np.float32, compute=False)
>>> sim
<LazyEBSD, title: , dimensions: (35083|60, 60)>
>>> mask = ~kp.filters.Window("circular", shape=signal_shape).astype(bool)
>>> xmap = s.dictionary_indexing(sim, signal_mask=mask)
Dictionary indexing information:
	Phase name: ni
	Matching 4125 experimental pattern(s) to 35083 dictionary pattern(s)
	NormalizedCrossCorrelationMetric: float32, greater is better, rechunk: False, signal mask: True
100%|██████████| 3/3 [00:18<00:00,  6.07s/it]
>>> xmap
Phase   Orientations  Name  Space group  Point group  Proper point group     Color
    0  4125 (100.0%)    ni        Fm-3m         m-3m                 432  tab:blue
Properties: scores, simulation_indices
Scan unit: px
>>> xmap.rotations_per_point
20

For reviewers

The PR title is short, concise, and will make sense 1 year later.
New functions are imported in corresponding __init__.py.
New features, API changes, and deprecations are mentioned in the unreleased
section in doc/changelog.rst.
New contributors are added to .all-contributorsrc and the table is regenerated.

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>

hakonanes · 2021-08-25T20:14:15Z

There are only a few minor updates to the API reference and the pattern matching notebook left now. Hope to finish this tomorrow and then merge. Please feel free to have a look e.g. at EBSD.dictionary_indexing or SimilarityMetric if anyone is interested.

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>

hakonanes · 2021-08-26T13:08:40Z

This PR is now done. Will merge after checks pass.

hakonanes added the enhancement New feature or request label Aug 23, 2021

hakonanes added this to the v0.5.0 milestone Aug 23, 2021

hakonanes marked this pull request as draft August 23, 2021 08:04

hakonanes mentioned this pull request Aug 23, 2021

Unpin dask following dask 2021.8.1 #418

Merged

4 tasks

hakonanes changed the title ~~Start on improvement of dictionary indexing~~ Improvements to dictionary indexing Aug 23, 2021

hakonanes added 3 commits August 23, 2021 11:21

Start on improvement of dictionary indexing

e389646

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>

Further testing

7066ff1

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>

Update badge in README, which I forgot to do in pyxem#419 [skip ci]

5b61d61

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>

hakonanes force-pushed the internal-improvements-to-dictionary-indexing branch from e987569 to 5b61d61 Compare August 23, 2021 09:30

hakonanes added 3 commits August 24, 2021 17:24

Finish improvement of EBSD.dictionary_indexing(), start testing

7e509ff

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>

Remove deprecated methods and functions, and some that aren't

91c5c02

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>

Complete tests of updated sim. metrics and DI, start API updates

491079a

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>

hakonanes marked this pull request as ready for review August 25, 2021 20:11

hakonanes mentioned this pull request Aug 26, 2021

Potential improvements to dictionary indexing #422

Closed

1 task

hakonanes added 4 commits August 26, 2021 14:07

Add version changed/added notes to changed function APIs

a69db72

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>

Update changelog with all additions, changes, and removals

a846c83

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>

Update notebooks

0b34f62

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>

Test OSM computation without normalization

f7c5471

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>

hakonanes merged commit b8a95c6 into pyxem:develop Aug 26, 2021

hakonanes mentioned this pull request Aug 26, 2021

Remove deprecations for v0.5 #420

Closed

hakonanes deleted the internal-improvements-to-dictionary-indexing branch September 1, 2021 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to dictionary indexing #419

Improvements to dictionary indexing #419

hakonanes commented Aug 23, 2021 •

edited

Loading

hakonanes commented Aug 25, 2021

hakonanes commented Aug 26, 2021

Improvements to dictionary indexing #419

Improvements to dictionary indexing #419

Conversation

hakonanes commented Aug 23, 2021 • edited Loading

Description of the change

Progress of the PR

Minimal example of the bug fix or new feature

For reviewers

hakonanes commented Aug 25, 2021

hakonanes commented Aug 26, 2021

hakonanes commented Aug 23, 2021 •

edited

Loading