Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert_frame_equal fails when dtype is categorical #12108

Closed
aggFTW opened this issue Jan 21, 2016 · 8 comments
Closed

assert_frame_equal fails when dtype is categorical #12108

aggFTW opened this issue Jan 21, 2016 · 8 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Usage Question

Comments

@aggFTW
Copy link

aggFTW commented Jan 21, 2016

assert_frame_equal(desired_df, df)
  File "C:\Anaconda\lib\site-packages\pandas\util\testing.py", line 1049, in assert_frame_equal
    obj='DataFrame.iloc[:, {0}]'.format(i))
  File "C:\Anaconda\lib\site-packages\pandas\util\testing.py", line 927, in assert_series_equal
    assert_attr_equal('dtype', left, right)
  File "C:\Anaconda\lib\site-packages\pandas\util\testing.py", line 721, in assert_attr_equal
    result = left_attr == right_attr
TypeError: data type not understood
@jreback
Copy link
Contributor

jreback commented Jan 21, 2016

Show pd.show_versions() and a complete copy-pastable example

@aggFTW
Copy link
Author

aggFTW commented Jan 21, 2016

Show versions:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 19.2
Cython: 0.22.1
numpy: 1.10.1
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 4.1.0-dev
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: None
Jinja2: None

Example:

import pandas as pd
import numpy as np
from pandas.util.testing import assert_frame_equal

def coerce_pandas_df_to_numeric_datetime(df):
    for column_name in df.columns:
        coerced = False

        if not coerced and df[column_name].dtype == np.dtype("object"):
            try:
                df[column_name] = pd.to_datetime(df[column_name], errors="raise")
                coerced = True
            except:
                pass

        if not coerced and df[column_name].dtype == np.dtype("object"):
            try:
                df[column_name] = pd.to_numeric(df[column_name], errors="raise")
                coerced = True
            except:
                pass

        if not coerced and df[column_name].nunique() < 20:
            df[column_name] = df[column_name].astype('category')
            coerced = True

records = [{u'buildingID': 0, u'date': u'6/1/13', u'temp_diff': u'12'},
               {u'buildingID': 1, u'date': u'6/1/13', u'temp_diff': u'0adsf'}]
desired_df = pd.DataFrame(records)
desired_df["date"] = pd.to_datetime(desired_df["date"])

df = pd.DataFrame(records)
coerce_pandas_df_to_numeric_datetime(df)

assert_frame_equal(desired_df, df)

@jreback
Copy link
Contributor

jreback commented Jan 21, 2016

this is correct. These frames are not exactly equal, so an exception is raised.

what exactly are you trying to do?

usually one wants to know if things are equal.

In [13]: desired_df.equals(df)
Out[13]: False

In [14]: desired_df.equals(desired_df)
Out[14]: True

In [15]: df.equals(df)
Out[15]: True

@jreback
Copy link
Contributor

jreback commented Jan 21, 2016

btw, a more general idiom is to do the following for these types of soft-conversions

In [25]: desired_df.select_dtypes(include=['object']).apply(lambda x: pd.to_datetime(x, errors='ignore'))
Out[25]: 
        date temp_diff
0 2013-06-01        12
1 2013-06-01     0adsf

In [26]: desired_df.select_dtypes(include=['object']).apply(lambda x: pd.to_datetime(x, errors='ignore')).dtypes
Out[26]: 
date         datetime64[ns]
temp_diff            object
dtype: object

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Usage Question labels Jan 21, 2016
@jreback jreback closed this as completed Jan 21, 2016
@aggFTW
Copy link
Author

aggFTW commented Jan 21, 2016

Hi @jreback,

Thanks for addressing the question. We are trying to coerce some strings into those 3 types as appropriate in a semi-intelligent fashion. Thanks for the idiom you pasted above.

I know that it should have thrown for the example I pasted, but I do think there's a bug here, as it was not clear at all that it had failed with legitimate reasons. My point being that TypeError should be thrown when I pass in a type to a method that expects a different type. Instead, an AssertionError should be thrown here.

@jreback
Copy link
Contributor

jreback commented Jan 21, 2016

This was already changed slightly in master, and will give an error like this (in 0.18), releasing soon.

In [7]: assert_frame_equal(desired_df,df)
AssertionError: Attributes are different

Attribute "dtype" are different
[left]:  int64
[right]: category

@aggFTW
Copy link
Author

aggFTW commented Jan 22, 2016

That would close this 😃
Thanks!

@MaruvadaKameswaraRao
Copy link

MaruvadaKameswaraRao commented Jul 11, 2018

HI, I am trying to compare two data frames, one got data from hive database and the other got data from oracle,prolem is In hive all column data types are strings so data frames taking it as object, but oracle has int, so frame taking it as int64, I am getting the following error
Attribute "dtype" are different
[left]: object
[right]: int64
Can you please help me here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Usage Question
Projects
None yet
Development

No branches or pull requests

3 participants