Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Regression in assert_frame_equal check_dtype for extension dtypes #32747

Closed
jorisvandenbossche opened this issue Mar 16, 2020 · 2 comments · Fixed by #32757
Closed

BUG: Regression in assert_frame_equal check_dtype for extension dtypes #32747

jorisvandenbossche opened this issue Mar 16, 2020 · 2 comments · Fixed by #32757
Labels
Regression Functionality that used to work in a prior pandas version Testing pandas testing functions or related to the test suite
Milestone

Comments

@jorisvandenbossche
Copy link
Member

Consider this small example of two DataFrames, one with an Int64 extension dtype, the other with the same values but object dtype:

df1 = pd.DataFrame({'a': pd.array([1, 2, 3], dtype="Int64")}) 
df2 = df1.astype(object)   

With pandas 1.0.1, this passes assert_frame_equal with the check_dtype=False:

In [5]: pd.testing.assert_frame_equal(df1, df2)  
...
Attribute "dtype" are different
[left]:  Int64
[right]: object

In [6]: pd.testing.assert_frame_equal(df1, df2, check_dtype=False)  

but with master (since #32570, see my comment there, cc @jbrockmendel), this fails:

In [2]: pd.testing.assert_frame_equal(df1, df2, check_dtype=False)   
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-2-d2d792202db1> in <module>
----> 1 pd.testing.assert_frame_equal(df1, df2, check_dtype=False)

~/scipy/pandas/pandas/_testing.py in assert_frame_equal(left, right, check_dtype, check_index_type, check_column_type, check_frame_type, check_less_precise, check_names, by_blocks, check_exact, check_datetimelike_compat, check_categorical, check_like, obj)
   1378                 check_datetimelike_compat=check_datetimelike_compat,
   1379                 check_categorical=check_categorical,
-> 1380                 obj=f'{obj}.iloc[:, {i}] (column name="{col}")',
   1381             )
   1382 

~/scipy/pandas/pandas/_testing.py in assert_series_equal(left, right, check_dtype, check_index_type, check_series_type, check_less_precise, check_names, check_exact, check_datetimelike_compat, check_categorical, check_category_order, obj)
   1177         )
   1178     elif is_extension_array_dtype(left.dtype) or is_extension_array_dtype(right.dtype):
-> 1179         assert_extension_array_equal(left._values, right._values)
   1180     elif needs_i8_conversion(left.dtype) or needs_i8_conversion(right.dtype):
   1181         # DatetimeArray or TimedeltaArray

~/scipy/pandas/pandas/_testing.py in assert_extension_array_equal(left, right, check_dtype, check_less_precise, check_exact)
   1017     """
   1018     assert isinstance(left, ExtensionArray), "left is not an ExtensionArray"
-> 1019     assert isinstance(right, ExtensionArray), "right is not an ExtensionArray"
   1020     if check_dtype:
   1021         assert_attr_equal("dtype", left, right, obj="ExtensionArray")

AssertionError: right is not an ExtensionArray
@jorisvandenbossche jorisvandenbossche added Testing pandas testing functions or related to the test suite Regression Functionality that used to work in a prior pandas version labels Mar 16, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.1 milestone Mar 16, 2020
@dsaxton
Copy link
Member

dsaxton commented Mar 16, 2020

@jorisvandenbossche I may be misunderstanding this function, but shouldn't these be and and not or statements if we're to respect check_dtype?

https://github.com/pandas-dev/pandas/blob/master/pandas/_testing.py#L1172
https://github.com/pandas-dev/pandas/blob/master/pandas/_testing.py#L1162

If so I think this might also be a bug:

import pandas as pd
import pandas._testing as tm

left = pd.Series([pd.Interval(0, 1)], dtype="interval")
right = left.astype(object)

tm.assert_series_equal(left, right, check_dtype=False)
# AssertionError: IntervalArray Expected type <class 'pandas.core.arrays.interval.IntervalArray'>, found <class 'pandas.core.arrays.numpy_.PandasArray'> instead

@jorisvandenbossche
Copy link
Member Author

Yes, that was also my conclusion at #32570 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants