strange bug that appears for our mock catalogs when upgraded to corrfunc v2 #140

BenWibking · 2017-10-30T22:58:30Z

When I run it on my mocks, the code fails and says to report a bug:

A minimal working example is contained in this tarball: http://www.astronomy.ohio-state.edu/~wibking.1/corrfunc/mwe.tar

The conditions needed to reproduce it are: compute xi(s) from any mock produced by my HOD code with Corrfunc v2.

BenWibking · 2017-10-30T22:58:55Z

lgarrison · 2017-10-31T02:43:20Z

At first glance, the input does appear to all be in the expected range, so this does look like a Corrfunc bug of some kind. I'll try to look at this some more in the next few days.

manodeep · 2017-10-31T05:41:44Z

This is weird. Here are the flags associated with x/y/z

    C_CONTIGUOUS : True
    F_CONTIGUOUS : True
    OWNDATA : False
    WRITEABLE : True
    ALIGNED : True
    UPDATEIFCOPY : False

If the data are C_CONTIGUOUS, then any issue out of different strides should not occur. And I can see that the arrays values are all in [0.0, 720.0).

I can solve the issue by round-tripping

   x = np.array(x.tolist(), dtype=np.float32)

The only difference I see with the initial x.flags and this new array->list->array x.flags is the OWNDATA flag. However, neither x = np.ascontiguousarray(x) or by x = np.require(x, requirements=['C', 'O', 'W']) work.

I do not have a detailed enough understanding of numpy :(

manodeep · 2017-10-31T05:42:38Z

(Also, the requirements I specified spelled 'COW')

BenWibking · 2017-10-31T13:46:45Z

This is very weird. C_CONTIGUOUS should be all that is needed to treat the data like a C array. It's possible the flag is set incorrectly. Should be testable by iterating through x[i] via cython and passing x[0] as a pointer to the cython function. This has to be some kind of bug in h5py or (less likely) numpy...? I don't see how else x=np.array(x.tolist()) would fix the problem.

…

On 10/31/17 1:41 AM, Manodeep Sinha wrote: This is weird. Here are the flags associated with |x/y/z| C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False If the data are |C_CONTIGUOUS|, then any issue out of different |strides| should not occur. And I can see that the arrays values are all in |[0.0, 720.0)|. I can solve the issue by round-tripping x = np.array(x.tolist(), dtype=np.float32) The only difference I see with the initial |x.flags| and this new array->list->array |x.flags| is the *OWNDATA* flag. However, neither |x = np.ascontiguousarray(x)| or by |x = np.require(x, requirements=['C', 'O', 'W'])| work. I do not have a detailed enough understanding of |numpy| :( — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#140 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABswn_I0S1hIrQTT0njFzEJGxUfC945Cks5sxrMYgaJpZM4QL-RQ>.

lgarrison · 2017-10-31T16:44:26Z

Think I figured it out. The position arrays are big-endian:

>>> x.dtype
dtype('>f4')

but Corrfunc is probably interpreting them as native-endian, which is little-endian in most cases. Converting the arrays to little-endian with x.byteswap() does the trick, but the correct solution is probably to teach Corrfunc to detect the endianness. This is easy to do in the Python wrappers, which is probably sufficient, since the only way we can ever detect endianness is if the arrays come from Python anyway (i.e. the C interfaces know nothing about endianness).

HDF5 is supposed to take care of some of these endianness issues, but h5py may simply be preserving the endianness of Numpy array inputs. In that case, another quick fix is to make sure the arrays are little-endian before they go into the HDF5 files.

BenWibking · 2017-10-31T17:43:05Z

Wow, that's amazing. I have no idea how I managed to create big-endian arrays! Thanks, Lehman. This is very helpful.

…

On 10/31/17 12:44 PM, Lehman Garrison wrote: Think I figured it out. The position arrays are big-endian: |>>> x.dtype() dtype('>f4') | but Corrfunc is probably interpreting them as native-endian, which is little-endian in most cases. Converting the arrays to little-endian with |x.byteswap()| does the trick, but the correct solution is probably to teach Corrfunc to detect the endianness. This is easy to do in the Python wrappers, which is probably sufficient, since the only way we can ever detect endianness is if the arrays come from Python anyway (i.e. the C interfaces know nothing about endianness). HDF5 is supposed to take care of some of these endianness issues, but h5py may simply be preserving the endianness of Numpy array inputs. In that case, another quick fix is to make sure the arrays are little-endian before they go into the HDF5 files. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#140 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABswn2A7bhaFYmNkQ6Sql3Z8BbLy01ruks5sx05rgaJpZM4QL-RQ>.

manodeep · 2017-10-31T20:26:10Z

It's not that Corrfunc is expecting little-endian, Corrfunc is expecting everything to be in machine-endian. And the array from the hdf5 file is big-endian while the computer itself is little-endian. If the computer were big-endian, then there would not be any problems.

The fix would be to ensure (in the python wrappers) that all arrays in machine-endian order.

manodeep · 2017-10-31T20:37:11Z

(The biggest reason I don't worry about this particular endian-ness mis-match bug is that Corrfunc will crash. Fixing the endian-ness handling is part of #101 )

@BenWibking Do you mind if we close this particular issue?

BenWibking · 2017-10-31T21:14:13Z

Fixing it and saving future graduate students from tearing their hair out should take just 10 additional lines of code in the python wrapper: def convert_to_native_endian(array): import sys system_is_little_endian = (sys.byteorder == 'little') array_is_little_endian = (array.dtype.byteorder == '<') is_native_endian = (system_is_little_endian and array_is_little_endian) or (not system_is_little_endian and not array_is_little_endian) or (array.dtype.byteorder == '=') if not is_native_endian: return array.byteswap().newbyteorder() else: return array x,y,z,weights = [convert_to_native_endian(arr) for arr in [x,y,z,weights]]

…

On 10/31/17 4:37 PM, Manodeep Sinha wrote: (The biggest reason I don't worry about this particular endian-ness mis-match bug is that Corrfunc will crash. Fixing the endian-ness handling is part of #101 <#101> ) @BenWibking <https://github.com/benwibking> Do you mind if we close this particular issue? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#140 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABswn2fMJIlyRCN4O3J7IcYuFkRFis03ks5sx4T4gaJpZM4QL-RQ>.

manodeep · 2017-10-31T21:33:23Z

@BenWibking You know what I am going to say right? ....PR... :)

I think you can make the if condition slightly concise:

def convert_to_native_endian(array):
    '''
    Returns the supplied array in native endian byte-order

    Parameters
    ------------

    array: A numpy array

    Returns
    --------

    array: A numpy array in native-endian byte-order

    Example
    ----------

    >>> import numpy as np
    >>> create a big-endian array
    >>> and then covert to little-endian
    >>> and make sure that the output dtype is little-endian
    >>> This serves as a doctest
    >>> create a native-endian array
    >>> run through function
    >>> ensure output is the same
    '''

    import sys
    system_is_little_endian = (sys.byteorder == 'little')   
    array_is_little_endian = (array.dtype.byteorder == '<')
    if (array_is_little_endian != system_is_little_endian) or (array.dtype.byteorder == '='):
        return array.byteswap().newbyteorder()
    else:
        return array

The doctests are left for you :)

manodeep · 2017-10-31T21:34:32Z

Plus, some tests to make sure that the array is indeed a numpy array or returning back None if the input is None

lgarrison · 2017-11-01T15:25:54Z

I can work on this.

…140 and #101.

lgarrison self-assigned this Nov 1, 2017

lgarrison pushed a commit that referenced this issue Nov 1, 2017

Add checks for Numpy array endianness to the Python wrappers. Closes #…

93d1bf1

…140 and #101.

lgarrison mentioned this issue Nov 1, 2017

Add checks for Numpy array endianness to the Python wrappers. Closes… #142

Merged

manodeep closed this as completed in 088421c Nov 2, 2017

manodeep mentioned this issue Dec 18, 2019

run time error when using theory.xi #206

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strange bug that appears for our mock catalogs when upgraded to corrfunc v2 #140

strange bug that appears for our mock catalogs when upgraded to corrfunc v2 #140

BenWibking commented Oct 30, 2017 •

edited

Loading

BenWibking commented Oct 30, 2017

lgarrison commented Oct 31, 2017

manodeep commented Oct 31, 2017

manodeep commented Oct 31, 2017

BenWibking commented Oct 31, 2017 via email

lgarrison commented Oct 31, 2017 •

edited

Loading

BenWibking commented Oct 31, 2017 via email

manodeep commented Oct 31, 2017

manodeep commented Oct 31, 2017

BenWibking commented Oct 31, 2017 via email

manodeep commented Oct 31, 2017 •

edited

Loading

manodeep commented Oct 31, 2017

lgarrison commented Nov 1, 2017

strange bug that appears for our mock catalogs when upgraded to corrfunc v2 #140

strange bug that appears for our mock catalogs when upgraded to corrfunc v2 #140

Comments

BenWibking commented Oct 30, 2017 • edited Loading

BenWibking commented Oct 30, 2017

lgarrison commented Oct 31, 2017

manodeep commented Oct 31, 2017

manodeep commented Oct 31, 2017

BenWibking commented Oct 31, 2017 via email

lgarrison commented Oct 31, 2017 • edited Loading

BenWibking commented Oct 31, 2017 via email

manodeep commented Oct 31, 2017

manodeep commented Oct 31, 2017

BenWibking commented Oct 31, 2017 via email

manodeep commented Oct 31, 2017 • edited Loading

manodeep commented Oct 31, 2017

lgarrison commented Nov 1, 2017

BenWibking commented Oct 30, 2017 •

edited

Loading

lgarrison commented Oct 31, 2017 •

edited

Loading

manodeep commented Oct 31, 2017 •

edited

Loading