Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure correct endian in numba_histogram #2678

Merged
merged 4 commits into from Mar 17, 2021

Conversation

jlaehne
Copy link
Contributor

@jlaehne jlaehne commented Mar 16, 2021

Description of the change

Makes sure that numba_histogram runs for big endian datatypes like >u2 or >f4 by converting data to little endian for execution.

Closes #2668

Progress of the PR

  • Change implemented,
  • extend to other numba functions,
  • add tests,
  • ready for review.

Minimal example of the bug fix or the new feature

import numpy as np
from hyperspy.misc.array_tools import numba_histogram

# works
arr = np.arange(100, dtype='<u2')
numba_histogram(arr, 5, (0, 100))

# works
arr = np.arange(100, dtype='u2')
numba_histogram(arr, 5, (0, 100))

# works
arr = np.arange(100, dtype='>u2')
numba_histogram(arr, 5, (0, 100))

@jlaehne
Copy link
Contributor Author

jlaehne commented Mar 16, 2021

Question is, whether this affects any other functions that run numba? Namely

  • _linear_bin_loop in array_tools.py
  • lowess in lowess_smooth.py
  • _fast_mean and _fast_std in peakfinders2D.py

Probably, the would need a similar wrapper.

@@ -392,6 +391,17 @@ def numba_histogram(data, bins, ranges):
hist : array
The values of the histogram.
"""
if data.dtype.byteorder == '>':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the following instead?

    if data.dtype.byteorder == '>':
       data = data.byteswap().newbyteorder()
    return _numba_histogram(data, bins, ranges)

It is the same, but easier to maintain...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, no problem.

Should I add it for the other 4 numba-functions? I think we cannot rule out that someone calls them with the wrong endian. For the private functions I could just add it to the calling functions, only lowess would need a wrapper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that numba only support native endianess and this approach will not work on all platform. What's about:

        # Numba only supported native dtype
        # https://github.com/numba/numba/issues/2243
        if not data.dtype.isnative:
            data = data.astype(data.dtype.type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should do the job, indeed.

@codecov
Copy link

codecov bot commented Mar 16, 2021

Codecov Report

Merging #2678 (9452c96) into RELEASE_next_patch (c7a662c) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@                  Coverage Diff                   @@
##           RELEASE_next_patch    #2678      +/-   ##
======================================================
+ Coverage               76.62%   76.66%   +0.04%     
======================================================
  Files                     201      201              
  Lines                   29655    29666      +11     
  Branches                 6491     6495       +4     
======================================================
+ Hits                    22722    22743      +21     
+ Misses                   5174     5168       -6     
+ Partials                 1759     1755       -4     
Impacted Files Coverage Δ
hyperspy/misc/array_tools.py 83.46% <100.00%> (+7.43%) ⬆️
hyperspy/misc/lowess_smooth.py 100.00% <100.00%> (ø)
hyperspy/signal.py 75.87% <0.00%> (+0.10%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c7a662c...9452c96. Read the comment docs.

Copy link
Member

@ericpre ericpre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to take the opportunity to increase the coverage, thanks!

@ericpre ericpre merged commit 19a0a7a into hyperspy:RELEASE_next_patch Mar 17, 2021
@jlaehne
Copy link
Contributor Author

jlaehne commented Mar 17, 2021

Nice to take the opportunity to increase the coverage, thanks!

Not much, but at least a few obvious ones.

In the end, I refrained from covering _fast_mean and _fast_std in peakfinders2D.py, because I could not trigger the error using the 'stat' peakfinder routine.

@jlaehne jlaehne deleted the numba_endian branch March 17, 2021 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants