Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal improvements to the EBSD orientation/projection center refinement code #405

Merged
merged 22 commits into from
Aug 18, 2021

Conversation

hakonanes
Copy link
Member

@hakonanes hakonanes commented Aug 4, 2021

Description of the change

  • Simplify testing of refinement functionality and related EBSD tests
  • Update API reference with EBSD refinement methods
  • Simplify checks of whether parameters passed to refinement methods are valid. One of these, the check of a EBSD detector's compatibility to the EBSD signal's navigation and signal shapes, is moved to a separate private function because it can be useful elsewhere.
  • Update order of contributor credits
  • Rename gnomonic and Lambert projections from project/iproject to vector2xy/xy2vector.
  • Greatly improve performance of generation of a dictionary of simulated EBSD patterns by master pattern projection using Numba. These are partly based on work by @friedkitteh in the _refinement.py module. We get a huge speed-up!

See #402 and the closed PR #387 for further details.

Progress of the PR

  • Docstrings for all functions
  • Unit tests with pytest for all lines
  • Clean code style by running black via pre-commit
  • Re-add PC refinement
  • Re-add full refinement
  • Re-add respect chunks in lazy signals during refinement
  • Add refinement to pattern matching notebook
  • Re-add nice print of refinement information

Minimal example of the bug fix or new feature

No updates to the public API are made, hence no new functionality is added in this PR.

For reviewers

  • The PR title is short, concise, and will make sense 1 year later.
  • New functions are imported in corresponding __init__.py.
  • New features, API changes, and deprecations are mentioned in the
    unreleased section in doc/changelog.rst.

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
@hakonanes hakonanes added documentation This relates to the documentation tests This relates to the tests maintenance This relates to package maintenance labels Aug 4, 2021
@hakonanes hakonanes added this to the v0.5.0 milestone Aug 4, 2021
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
@hakonanes
Copy link
Member Author

The ubuntu-latest/py.8/pip tests took 290 s for the #403 PR. With the latest commits, they now take 137 s (link). This is not a fair comparison yet, since the coverage of the _refinement.py module dropped from 100% to 90%, and the ebsd.py coverage dropped from 100% to 99%. Still I hope to keep the test time significantly below 290 s, which was a significant jump from 106 s for a PR before the refinement tests were added.

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
@hakonanes hakonanes added the enhancement New feature or request label Aug 13, 2021
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
@hakonanes
Copy link
Member Author

hakonanes commented Aug 16, 2021

@friedkitteh Do you know if dask.delayed() respects chunks? I was wondering if iterating over a lazy EBSD signal using for idx in np.ndindex(navigation_shape):, which runs through the signal from upper left to lower right along the first axis (rows), and passing patterns[idx] to dask.delayed(), respect the underlying chunks of patterns upon a call to compute()? As in:

https://github.com/pyxem/kikuchipy/pull/405/files#diff-9ac7dbbc84a37f18d62897b66ed2534fc1ff71cabec3e0023d22d12bfe657aacR129

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
@hakonanes
Copy link
Member Author

hakonanes commented Aug 16, 2021

By the way, I got marginally better results when refining using global SHGO compared to local Nelder-Mead for the built in "large" Ni data set by passing these keyword arguments:

    method_kwargs=dict(
        sampling_method="sobol",
        options=dict(f_tol=1e-4),
        minimizer_kwargs=dict(method="Nelder-Mead", options=dict(ftol=1e-4)),
    ),

to EBSD.refine_orientation(). SHGO used 160 s and achieved a mean NCC of 0.4412814 while Nelder-Mead used 59 s and achieved a marginally worse mean NCC of 0.4412797. Note that ftol=1e-4 is default for Nelder-Mead via method="minimize".

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
@friedkitteh
Copy link
Collaborator

@friedkitteh Do you know if dask.delayed() respects chunks? I was wondering if iterating over a lazy EBSD signal using for idx in np.ndindex(navigation_shape):, which runs through the signal from upper left to lower right along the first axis (rows), and passing patterns[idx] to dask.delayed(), respect the underlying chunks of patterns upon a call to compute()?

I think the way you have it now should work as you have intended. If you wanted to work on a chunk-by-chunk basis, you could call my_dask_array.to_delayed() and have an np.ndarray with shape equal to the chunks, some trickery would then probably be needed to ensure the results are built back in the correct order.

Also, I think it would probably be faster to use to_delayed(), as I assume it is very expensive to interact with the Dask array all the time?

@hakonanes
Copy link
Member Author

I think the way you have it now should work as you have intended. If you wanted to work on a chunk-by-chunk basis, you could call my_dask_array.to_delayed() and have an np.ndarray with shape equal to the chunks, some trickery would then probably be needed to ensure the results are built back in the correct order.

I thought about using map_blocks(), but I believe dask has more freedom with dask.delayed() and will lead to less idle time for workers.

Also, I think it would probably be faster to use to_delayed(), as I assume it is very expensive to interact with the Dask array all the time?

You're referring to this part of the "best practices" guide? Yes, we might earn some time on that. Will look into it.

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
@hakonanes
Copy link
Member Author

Only the final touches to the API reference and the changelog remains, will finish this tomorrow!

Signed-off-by: Håkon Wiik Ånes <hwaanes@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation This relates to the documentation enhancement New feature or request maintenance This relates to package maintenance tests This relates to the tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants