Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tm.assert_index_equal broken with pd.NA and np.NaT sentinel #31884

Closed
WillAyd opened this issue Feb 11, 2020 · 5 comments · Fixed by #48351
Closed

tm.assert_index_equal broken with pd.NA and np.NaT sentinel #31884

WillAyd opened this issue Feb 11, 2020 · 5 comments · Fixed by #48351
Labels
good first issue Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Tests Unit test(s) needed to prevent regressions Testing pandas testing functions or related to the test suite

Comments

@WillAyd
Copy link
Member

WillAyd commented Feb 11, 2020

Discovered in #31799

I couldn't reproduce this without using a np.* NaT sentinel...

>>> idx1 = pd.Index([pd.NA, np.datetime64("nat")])
>>> idx2 = pd.Index([pd.NA, pd.NaT])
>>> tm.assert_index_equal(idx1, idx2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/williamayd/clones/pandas/pandas/_testing.py", line 689, in assert_index_equal
    diff = np.sum((left.values != right.values).astype(int)) * 100.0 / len(left)
AttributeError: 'bool' object has no attribute 'astype'
@WillAyd WillAyd added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Feb 11, 2020
@jbrockmendel
Copy link
Member

cc @jorisvandenbossche any thoughts on how to handle this? The left.values != right.values in the OP is getting evaluated to True and emitting

DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
    diff = np.sum((left.values != right.values).astype(int)) * 100.0 / len(left)

@jorisvandenbossche
Copy link
Member

It's a general problem with how numpy handles the non-boolean result of comparing pd.NA:

In [18]: np.array([pd.NA], dtype=object) == 1
/home/joris/miniconda3/envs/dev/bin/ipython:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
  #!/home/joris/miniconda3/envs/dev/bin/python
Out[18]: False

Basically storing pd.NA in object dtype numpy arrays is not really supported yet, see #32931 with an overview of issues related to this topic.

One possible solution for the specific case here of the asserts is adding a check for object dtype, and then specifically check the presence of pd.NA, that this presence is equal in both left and right, and then filter them out, and compare the remaining values.

Another short-term quick workaround would also be to put the specific line in assert_index_equal in a try/except, because where it is failing is only to give a more informative error message, not the actual check whether the indexes are equal or not.

@mroeschke mroeschke added Bug Testing pandas testing functions or related to the test suite labels Jul 28, 2021
@jbrockmendel jbrockmendel added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Dec 17, 2021
@mroeschke
Copy link
Member

Looks like the comparison is a little better now. Could use a test

In [6]: >>> idx1 = pd.Index([pd.NA, np.datetime64("nat")])
   ...: >>> idx2 = pd.Index([pd.NA, pd.NaT])
   ...: >>> tm.assert_index_equal(idx1, idx2)

AssertionError: Index are different

Attribute "inferred_type" are different
[left]:  mixed
[right]: datetime

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Jul 1, 2022
@MrShevan
Copy link
Contributor

MrShevan commented Sep 1, 2022

Hi, @mroeschke ! If I understand correctly, bug is fixed and there is needed test for checking AssertionError: Index are different for these inputs format idx1 = pd.Index([pd.NA, np.datetime64("nat")]) and idx2 = pd.Index([pd.NA, pd.NaT]) to close this issue. Could I take it? :)

@mroeschke
Copy link
Member

Correct, @MrShevan. A unit test will need to be added to test that the code example still raises an exception

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Tests Unit test(s) needed to prevent regressions Testing pandas testing functions or related to the test suite
Projects
None yet
5 participants