Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert_almost_equal fails with KeyError on non-integer column names #10013

Closed
dhj-io opened this issue Apr 29, 2015 · 4 comments
Closed

assert_almost_equal fails with KeyError on non-integer column names #10013

dhj-io opened this issue Apr 29, 2015 · 4 comments
Labels
Testing pandas testing functions or related to the test suite Usage Question

Comments

@dhj-io
Copy link

dhj-io commented Apr 29, 2015

import pandas as pd

df = pd.DataFrame(0:'b', index = ['c'])
pd.util.testing.assert_almost_equal(df, df)  # SUCCESS

df = pd.DataFrame({'a':'b'}, index = ['c'])
pd.util.testing.assert_almost_equal(df, df)  # FAIL KeyError

Top of the trace stack is:

  • pandas\hashtable.pyd in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12310)()
  • pandas\hashtable.pyd in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12261)()
  • KeyError: 0

I am sure this used to work without any problem, because I have unit tests that depend on it.

I tried uninstalling and re-installing pandas and the bug is still present.

I couldn't find a similar bug searching for get_item, assert_almost_equal or hashtable.

@jreback
Copy link
Contributor

jreback commented Apr 29, 2015

This is not in the spec to work at all, and may accidently work. assert_almost_equal essentially is a recursive list-like/scalar comparator. Doesn't know about anything else.

use assert_frame_equal for testing or .equals(..) is the full-fledged method on pandas objects.

@jreback jreback closed this as completed Apr 29, 2015
@jreback jreback added Testing pandas testing functions or related to the test suite Usage Question labels Apr 29, 2015
@dhj-io
Copy link
Author

dhj-io commented Apr 29, 2015

Thank you for the quick feedback! Just a few things:

  1. Can you direct me to the spec? I wasn't able to find any documentation on pandas.util.testing.assert_almost_equal and the python "help(assert_almost_equal)" is as follows: assert_almost_equal(...)
  2. Is the function supposed to be only for Series objects with floating point values? If DataFrames containing only floating point numbers are supposed to be evaluated then the same error occurs with a floating point number substituted for 'b'.
  3. Previous behavior (this may have been as early as 0.6), was to compare equality by python standard except for floating point columns. These columns assumed the behavior of numpy.testing.assert_almost_equal. The numpy.testing.assert_almost_equal function allows precision to be specified in decimal places. Is there a testing function with this behavior to compare everything normally except specify precision when comparing floating point numbers? assert_frame_equal with check_less_precise looks promising, but that is a boolean specification and not a "number of decimal points" specification.
  4. If a test function doesn't exist where the assert_almost_equal behavior applies only to float comparisons then I will need to write one for myself (in python). I can send a pull request if it might be of use to others.

@jreback
Copy link
Contributor

jreback commented May 1, 2015

@dhj-io these are internal functions that we use for testing, no spec per se. you should use .equals if you are comparing things.

@jorisvandenbossche
Copy link
Member

@dhj-io See also the issue at #9895 on providing some of the testing functions in a public testing module.
Feedback on which functions you would use/find useful is certainly welcome over there!

And if you want to know the exact behaviour of assert_almost_equal, I think there is only the code :-) https://github.com/pydata/pandas/blob/master/pandas/src/testing.pyx#L58 (no docstring alas) But as you will see, assert_almost_equal is not meant to work with dataframes (it may have worked before y accident), so will probably not be part of a public testing module.
But if assert_frame_equal is lacking in some ways for the precision issues, you can certainly provide some feedback/ideas on the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite Usage Question
Projects
None yet
Development

No branches or pull requests

3 participants