Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty dataframe not equal to another empty dataframe in 0.16.1 #10181

Closed
rekcahpassyla opened this issue May 21, 2015 · 5 comments · Fixed by #10188
Closed

Empty dataframe not equal to another empty dataframe in 0.16.1 #10181

rekcahpassyla opened this issue May 21, 2015 · 5 comments · Fixed by #10188
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@rekcahpassyla
Copy link
Contributor

I searched the issues list for "empty dataframe" but did not find anything matching.

In pandas 0.16.1, the result of adding 2 empty dataframes is not equal to another empty dataframe.

This code, in pandas 0.15.2, does what is expected:

import pandas as pd

pd.show_versions()

df1 = pd.DataFrame()

df2 = pd.DataFrame()

assert df1.equals(df1)

assert df2.equals(df1)

assert (df1+df1).equals(df2)

It fails in 0.16.1.

0.15.2 output:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.7.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_GB

pandas: 0.15.2
nose: 1.3.6
Cython: 0.21.1
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.1.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.0
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.7.2
lxml: 3.4.0
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None

0.16.1 output:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.16.1
nose: 1.3.6
Cython: 0.22
numpy: 1.9.2
scipy: 0.14.0
statsmodels: 0.6.1
IPython: 3.1.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.7.2
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.4
pymysql: None
psycopg2: None
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
c:\test_empty_add_0.16.1.py in <module>()
     11 assert df2.equals(df1)
     12
---> 13 assert (df1+df1).equals(df2)

AssertionError:
@jreback
Copy link
Contributor

jreback commented May 21, 2015

the .equals check was made more correct in 0.16.1 so it is now picking this up. This is a very degenerate case. If it can be fixed in a elegant way, then ok.

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design Difficulty Intermediate labels May 21, 2015
@jreback jreback added this to the Next Major Release milestone May 21, 2015
@rekcahpassyla
Copy link
Contributor Author

It's these lines where the False is returned.

Here's the output of my debugger at that point-

>>> len(self.blocks)
Out[1]: 1
>>> len(other.blocks)
Out[2]: 0
>>> self.blocks
Out[3]: (FloatBlock: [], 0 x 0, dtype: float64,)
>>> other.blocks
Out[4]: ()

@jreback
Copy link
Contributor

jreback commented May 21, 2015

this is a symptom of where the comparison happens

the issue is that a 0-Len FloatBlock is created in the first place

@rekcahpassyla
Copy link
Contributor Author

Here's some background on why we needed this functionality-

We have a container class that looks something like this (paraphrased, just to avoid any issues with pasting any actual working code from my employer)

class Container(object):

    def __init__(self, a, b):
        self.a = a
        self.b = b

    @classmethod
    def empty(cls):
        return cls(pd.DataFrame(), pd.DataFrame())

    def __eq__(self, other):
        return (type(other) is type(self)
                and self.a.equals(other.a)
                and self.b.equals(other.b))

    def __add__(self, other):
        return Container(self.a + other.a, self.b + other.b)

    def __radd__(self, other):
        return self.__add__(other)

    def __iadd__(self, other):
        return self.__add__(other)

This container class holds a grouping of results, which we want to be able to add to each other. (The addition isn't as simple as in this implementation, otherwise obviously we could just use panels).

We need to be able to have things like this work:

empty = Container.empty()


def f(c):
    # in reality, perform some operation on c
    # that can sometimes return an empty container
    return Container.empty()

def g(c):
    # in reality, perform some operation on c
    # that can sometimes return an empty container
    return Container.empty()

c = Container(pd.DataFrame([1],[1]), pd.DataFrame([2],[2]))

assert (f(c) + g(c)) == empty
In [85]: pd.DataFrame().blocks
Out[85]: {}

In [86]: (pd.DataFrame() + pd.DataFrame()).blocks
Out[86]:
{'float64': Empty DataFrame
 Columns: []
 Index: []}

@rekcahpassyla
Copy link
Contributor Author

For what it's worth, I decided to have a go. #10188 in case it is helpful.

@jreback jreback modified the milestones: 0.17.0, Next Major Release May 21, 2015
@jreback jreback modified the milestones: 0.16.2, 0.17.0 Jun 2, 2015
jreback added a commit that referenced this issue Jun 7, 2015
BUG: Adding empty dataframes should result in empty blocks #10181
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants