Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to dictionary indexing #419

Merged

Conversation

hakonanes
Copy link
Member

@hakonanes hakonanes commented Aug 23, 2021

Signed-off-by: Håkon Wiik Ånes hwaanes@gmail.com

Description of the change

This PR aims to improve the dictionary indexing implementation in kikuchipy, and simplify some parts of it.

Drawbacks in the current implementation that this PR addresses:

  1. Any custom metric function passed by the user doesn't have control over chunking: Chunking will be optional. I found that by not rechunking, memory usage is reduced drastically while giving a small cost in computation time. Thus the user can balance these themselves. To give the user more control, I've opted to change the SimilarityMetric class to an abstract one where three methods must be provided: prepare_simulated(), prepare_experimental(), and compare(). Previously, only the compare() method could be provided by the user, where all pattern preparation had to be done. This is called in every iteration, where parts of the dictionary is compared to all experimental patterns. For example in the default normalized cross correlation metric, all experimental patterns were centred and normalized in every iteration by dask. In this new implementation, prepare_experimental() is called once before the iteration loop. The user now has to create a class instead of a function, which is more complex. However, by providing an example in the documentation, I think this should be OK.
  2. Masking the detector isn't supported: An optional boolean mask is applied by always reshaping experimental and dictionary pattern arrays to 2D, (n patterns, n pixels), with a 1D mask equal to n pixels slicing the signal axis. We must use a 1D mask because Dask doesn't allow 2D slicing with a boolean mask. This reshaping did not increase memory usage or computation time on my machine locally when operating on dask arrays. I think we should always operate on dask arrays, since we in our default similarity metrics always casts data to either 32-bit or 64-bit floats.
  3. The n_slices parameter isn't self explanatory. The number of simulated patterns divided by this number decides how many simulated patterns to match to all experimental patterns in each iteration. It would be better to let the user specify how many simulated patterns to match in each iteration. Also, if the dictionary is a dask array, we could use it's chunksize as this number. This will be the default option. This will also get rid of the prompt asking if we really should index when n_slices were deemed too few.
  4. "Static"/"dynamic" dictionary indexing naming is unnecessary, since the only distinction is whether the dictionary is a dask or numpy array. Therefore, all dictionary indexing functionality, apart from the similarity metrics, is moved to one file named _dictionary_indexing.py. Whether we are indexing with a dask or numpy array is checked in an if statement, otherwise the implementation is the same in both cases.
  5. Merging of dictionary indexing results from multiple runs should not be handled by the dictionary_indexing() method: I've decided to remove the possibility to index a dataset with multiple dictionaries "at the same" by passing multiple dictionaries to the method. The user can just produce two maps, and then merge them themselves, with only two extra lines. Benefits? The API is cleaner and it is way simpler to maintain.
  6. Generation of an orientation similarity map should not be handled by the dictionary_indexing() method: I've decided to remove the possibility to generate an orientation similarity map with the method. The user can produce this map with one extra line themselves. See the previous point for benefits.

Extra thoughts for other PRs:

  • Actually, since we always reshape to 2D arrays, we could try to benefit from similarity metrics in the scipy.spatial.distance module. We would have to think more about memory usage here, though, since both the normalized cross-correlation and normalized dot product metrics currently implemented use dask's einsum for comparing patterns.

This will close #380, #420.

Unless something comes up, I anticipate these changes to be the last ones, apart from any bugfixes, to merge before releasing v0.5.

Progress of the PR

  • Docstrings for all functions
  • Unit tests with pytest for all lines
  • Clean code style by running black via pre-commit
  • Finish new abstract SimilarityMetric class
    - [ ] Provide an example in the documentation of a very simple SimilarityMetric class to show users how they can make their own Link to the implementation of NormalizedCrossCorrelationMetric instead.
  • Finish new NormalizedCrossCorrelationMetric class using the abstract class
  • Finish new NormalizedDotProductMetric class using the abstract class
  • Finish updates to EBSD.dictionary_indexing() and the indexing logistics in _dictionary_indexing.py
  • Update pattern matching notebook
  • Remove deprecated EBSD.match_patterns(), closing Remove deprecations for v0.5 #420.
  • Remove deprecated correlate module.

Minimal example of the bug fix or new feature

>>> import kikuchipy as kp
>>> from orix import sampling
>>> s = kp.data.nickel_ebsd_large(allow_download=True, lazy=True)
>>> s
<LazyEBSD, title: patterns Scan 1, dimensions: (75, 55|60, 60)>
>>> s.remove_static_background()
>>> s.remove_dynamic_background()
>>> mp = kp.data.nickel_ebsd_master_pattern_small(projection="lambert")
>>> mp
<EBSDMasterPattern, title: ni_mc_mp_20kv_uint8_gzip_opts9, dimensions: (|401, 401)>
>>> r = sampling.get_sample_fundamental(resolution=3, point_group=mp.phase.point_group)
>>> r
Rotation (35083,)
[[ 1.     -0.     -0.     -0.    ]
 [ 0.9916 -0.1291 -0.     -0.    ]
 [ 0.9832 -0.1826 -0.     -0.    ]
 ...
 [ 0.9474 -0.3162 -0.      0.0497]
 [ 0.9386 -0.3416 -0.      0.0492]
 [ 0.9297 -0.3651 -0.      0.0487]]
>>> signal_shape = s.axes_manager.signal_shape[::-1]
>>> detector = kp.detectors.EBSDDetector(shape=signal_shape, pc=(0.421, 0.7794, 0.5049), sample_tilt=70, convention="tsl")
>>> detector
EBSDDetector (60, 60), px_size 1 um, binning 1, tilt 0, azimuthal 0, pc (0.421, 0.221, 0.505)
>>> sim = mp.get_patterns(r, detector, energy=20, dtype_out=np.float32, compute=False)
>>> sim
<LazyEBSD, title: , dimensions: (35083|60, 60)>
>>> mask = ~kp.filters.Window("circular", shape=signal_shape).astype(bool)
>>> xmap = s.dictionary_indexing(sim, signal_mask=mask)
Dictionary indexing information:
	Phase name: ni
	Matching 4125 experimental pattern(s) to 35083 dictionary pattern(s)
	NormalizedCrossCorrelationMetric: float32, greater is better, rechunk: False, signal mask: True
100%|██████████| 3/3 [00:18<00:00,  6.07s/it]
>>> xmap
Phase   Orientations  Name  Space group  Point group  Proper point group     Color
    0  4125 (100.0%)    ni        Fm-3m         m-3m                 432  tab:blue
Properties: scores, simulation_indices
Scan unit: px
>>> xmap.rotations_per_point
20

For reviewers

  • The PR title is short, concise, and will make sense 1 year later.
  • New functions are imported in corresponding __init__.py.
  • New features, API changes, and deprecations are mentioned in the unreleased
    section in doc/changelog.rst.
  • New contributors are added to .all-contributorsrc and the table is regenerated.

@hakonanes hakonanes added the enhancement New feature or request label Aug 23, 2021
@hakonanes hakonanes added this to the v0.5.0 milestone Aug 23, 2021
@hakonanes hakonanes marked this pull request as draft August 23, 2021 08:04
@hakonanes hakonanes mentioned this pull request Aug 23, 2021
4 tasks
@hakonanes hakonanes changed the title Start on improvement of dictionary indexing Improvements to dictionary indexing Aug 23, 2021
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
@hakonanes hakonanes force-pushed the internal-improvements-to-dictionary-indexing branch from e987569 to 5b61d61 Compare August 23, 2021 09:30
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
@hakonanes hakonanes marked this pull request as ready for review August 25, 2021 20:11
@hakonanes
Copy link
Member Author

There are only a few minor updates to the API reference and the pattern matching notebook left now. Hope to finish this tomorrow and then merge. Please feel free to have a look e.g. at EBSD.dictionary_indexing or SimilarityMetric if anyone is interested.

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
@hakonanes
Copy link
Member Author

This PR is now done. Will merge after checks pass.

@hakonanes hakonanes merged commit b8a95c6 into pyxem:develop Aug 26, 2021
@hakonanes hakonanes deleted the internal-improvements-to-dictionary-indexing branch September 1, 2021 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow masking when computing pattern similarities
1 participant