Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_mio.py::test_recarray failure due to dtype handling changes in numpy #16399

Closed
rgommers opened this issue Jun 13, 2022 · 5 comments · Fixed by #16393
Closed

test_mio.py::test_recarray failure due to dtype handling changes in numpy #16399

rgommers opened this issue Jun 13, 2022 · 5 comments · Fixed by #16393
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.io
Milestone

Comments

@rgommers
Copy link
Member

rgommers commented Jun 13, 2022

This test failure started to show up in the last 3 days in the prerelease CI, presumably due to a change in NumPy main regarding dtype infra (EDIT: see full log here):

numpy.core._exceptions._UFuncNoLoopError: ufunc 'equal' did not contain a loop with signature matching types (<class 'numpy.dtype[bytes_]'>, <class 'numpy.dtype[str_]'>) -> <class 'numpy.dtype[bool_]'>
The above exception was the direct cause of the following exception:
../testenv/lib/python3.11/site-packages/scipy/io/matlab/tests/test_mio.py:786: in test_recarray
    savemat(stream, {'arr': arr})
        arr        = array([( 0.5, b'python'), (99. , b'not perl')],
      dtype=[('f1', '<f8'), ('f2', 'S10')])
        dt         = [('f1', 'f8'), ('f2', 'S10')]
        stream     = <_io.BytesIO object at 0x7f7dd9f58590>
../testenv/lib/python3.11/site-packages/scipy/io/matlab/_mio.py:298: in savemat
    MW.put_variables(mdict)
        MW         = <scipy.io.matlab._mio5.MatFile5Writer object at 0x7f7dcc15f590>
        appendmat  = True
        do_compression = False
        file_name  = <_io.BytesIO object at 0x7f7dd9f58590>
        file_stream = <_io.BytesIO object at 0x7f7dd9f58590>
        format     = '5'
        long_field_names = False
        mdict      = {'arr': array([( 0.5, b'python'), (99. , b'not perl')],
      dtype=[('f1', '<f8'), ('f2', 'S10')])}
        oned_as    = 'row'
../testenv/lib/python3.11/site-packages/scipy/io/matlab/_mio5.py:892: in put_variables
    self._matrix_writer.write_top(var, name.encode('latin1'), is_global)
        is_global  = False
        mdict      = {'arr': array([( 0.5, b'python'), (99. , b'not perl')],
      dtype=[('f1', '<f8'), ('f2', 'S10')])}
        name       = 'arr'
        self       = <scipy.io.matlab._mio5.MatFile5Writer object at 0x7f7dcc15f590>
        var        = array([( 0.5, b'python'), (99. , b'not perl')],
      dtype=[('f1', '<f8'), ('f2', 'S10')])
        write_header = True
../testenv/lib/python3.11/site-packages/scipy/io/matlab/_mio5.py:633: in write_top
    self.write(arr)
        arr        = array([( 0.5, b'python'), (99. , b'not perl')],
      dtype=[('f1', '<f8'), ('f2', 'S10')])
        is_global  = False
        name       = b'arr'
        self       = <scipy.io.matlab._mio5.VarWriter5 object at 0x7f7dcc15f650>
../testenv/lib/python3.11/site-packages/scipy/io/matlab/_mio5.py:662: in write
    self.write_struct(narr)
        arr        = array([( 0.5, b'python'), (99. , b'not perl')],
      dtype=[('f1', '<f8'), ('f2', 'S10')])
        mat_tag_pos = 128
        narr       = array([( 0.5, b'python'), (99. , b'not perl')],
      dtype=[('f1', '<f8'), ('f2', 'S10')])
        self       = <scipy.io.matlab._mio5.VarWriter5 object at 0x7f7dcc15f650>
../testenv/lib/python3.11/site-packages/scipy/io/matlab/_mio5.py:781: in write_struct
    self._write_items(arr)
        arr        = array([( 0.5, b'python'), (99. , b'not perl')],
      dtype=[('f1', '<f8'), ('f2', 'S10')])
        self       = <scipy.io.matlab._mio5.VarWriter5 object at 0x7f7dcc15f650>
../testenv/lib/python3.11/site-packages/scipy/io/matlab/_mio5.py:798: in _write_items
    self.write(el[f])
        A          = array([( 0.5, b'python'), (99. , b'not perl')],
      dtype=[('f1', '<f8'), ('f2', 'S10')])
        arr        = array([( 0.5, b'python'), (99. , b'not perl')],
      dtype=[('f1', '<f8'), ('f2', 'S10')])
        el         = (0.5, b'python')
        f          = 'f2'
        fieldnames = ['f1', 'f2']
        length     = 3
        max_length = 32
        self       = <scipy.io.matlab._mio5.VarWriter5 object at 0x7f7dcc15f650>
../testenv/lib/python3.11/site-packages/scipy/io/matlab/_mio5.py:670: in write
    self.write_char(narr, codec)
        arr        = b'python'
        codec      = 'UTF8'
        mat_tag_pos = 264
        narr       = array(b'python', dtype='|S6')
        self       = <scipy.io.matlab._mio5.VarWriter5 object at 0x7f7dcc15f650>
../testenv/lib/python3.11/site-packages/scipy/io/matlab/_mio5.py:720: in write_char
    arr = arr_to_chars(arr)
        arr        = array(b'python', dtype='|S6')
        codec      = 'UTF8'
        self       = <scipy.io.matlab._mio5.VarWriter5 object at 0x7f7dcc15f650>
../testenv/lib/python3.11/site-packages/scipy/io/matlab/_miobase.py:424: in arr_to_chars
    empties = [arr == '']
E   FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
        arr        = array([[b'p', b'y', b't', b'h', b'o', b'n']], dtype='|S1')
        dims       = [1, 6]
=========================== short test summary info ============================
FAILED ../testenv/lib/python3.11/site-packages/scipy/io/matlab/tests/test_mio.py::test_recarray

The FutureWarning is clear, but let's run it by @seberg anyway to see if this was intended. And if so, is it going to turn into an error in 2 releases, or only in a major numpy release?

@rgommers rgommers added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.io labels Jun 13, 2022
@seberg
Copy link
Contributor

seberg commented Jun 13, 2022

Yeah, this was the string function change. It basically expanded the existing FutureWarning to string comparisons (as in comparing unicode and byte-strings). EDIT: This is because now the string == uses the normal code path through np.equal().

Since the code seems arguably broken, I would just roll with it? The code should read arr == b"" using a byte-string for comparison, because you cannot compare unicode and byte-strings!

It is a bit of a short-notice, because numba asked for backporting this last minute (unfortunately, I am not sure it is actually enough for them to be useful).

@seberg
Copy link
Contributor

seberg commented Jun 13, 2022

Ah, this FutureWarning has been around forever on the non-string paths unfortunately. So to be 100% sure that this only happens after 2 releases, we should possibly add a test somewhere (or rely on SciPy to be that "test").

@tupui
Copy link
Member

tupui commented Jun 13, 2022

I opened #16393 already to mitigate it in our CI

@tupui tupui linked a pull request Jun 13, 2022 that will close this issue
@seberg
Copy link
Contributor

seberg commented Jun 13, 2022

The weird thing is that it feels like this actually fixes the logic? Since before:

In [4]: np.array(["asdf", "fdsa", ""], dtype="S") == ""
Out[4]: False

would never reach the path that replaces empty fields with a space?

@tupui
Copy link
Member

tupui commented Jun 13, 2022

I also found the output weird when looking at it since the new code produces an array as I would expect (form a quick glance around)

np.array(["asdf", "fdsa", ""], dtype="S") == np.array([""], dtype="S")
Out[5]: array([False, False,  True])

@rgommers rgommers added this to the 1.10.0 milestone Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.io
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants