Empty dataframe not equal to another empty dataframe in 0.16.1 #10181

rekcahpassyla · 2015-05-21T09:22:18Z

I searched the issues list for "empty dataframe" but did not find anything matching.

In pandas 0.16.1, the result of adding 2 empty dataframes is not equal to another empty dataframe.

This code, in pandas 0.15.2, does what is expected:

import pandas as pd

pd.show_versions()

df1 = pd.DataFrame()

df2 = pd.DataFrame()

assert df1.equals(df1)

assert df2.equals(df1)

assert (df1+df1).equals(df2)

It fails in 0.16.1.

0.15.2 output:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.7.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_GB

pandas: 0.15.2
nose: 1.3.6
Cython: 0.21.1
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.1.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.0
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.7.2
lxml: 3.4.0
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None

0.16.1 output:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.16.1
nose: 1.3.6
Cython: 0.22
numpy: 1.9.2
scipy: 0.14.0
statsmodels: 0.6.1
IPython: 3.1.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.7.2
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.4
pymysql: None
psycopg2: None
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
c:\test_empty_add_0.16.1.py in <module>()
     11 assert df2.equals(df1)
     12
---> 13 assert (df1+df1).equals(df2)

AssertionError:

jreback · 2015-05-21T13:17:20Z

the .equals check was made more correct in 0.16.1 so it is now picking this up. This is a very degenerate case. If it can be fixed in a elegant way, then ok.

rekcahpassyla · 2015-05-21T13:43:13Z

It's these lines where the False is returned.

Here's the output of my debugger at that point-

>>> len(self.blocks)
Out[1]: 1
>>> len(other.blocks)
Out[2]: 0
>>> self.blocks
Out[3]: (FloatBlock: [], 0 x 0, dtype: float64,)
>>> other.blocks
Out[4]: ()

jreback · 2015-05-21T13:52:42Z

this is a symptom of where the comparison happens

the issue is that a 0-Len FloatBlock is created in the first place

rekcahpassyla · 2015-05-21T13:57:23Z

Here's some background on why we needed this functionality-

We have a container class that looks something like this (paraphrased, just to avoid any issues with pasting any actual working code from my employer)

class Container(object):

    def __init__(self, a, b):
        self.a = a
        self.b = b

    @classmethod
    def empty(cls):
        return cls(pd.DataFrame(), pd.DataFrame())

    def __eq__(self, other):
        return (type(other) is type(self)
                and self.a.equals(other.a)
                and self.b.equals(other.b))

    def __add__(self, other):
        return Container(self.a + other.a, self.b + other.b)

    def __radd__(self, other):
        return self.__add__(other)

    def __iadd__(self, other):
        return self.__add__(other)

This container class holds a grouping of results, which we want to be able to add to each other. (The addition isn't as simple as in this implementation, otherwise obviously we could just use panels).

We need to be able to have things like this work:

empty = Container.empty()


def f(c):
    # in reality, perform some operation on c
    # that can sometimes return an empty container
    return Container.empty()

def g(c):
    # in reality, perform some operation on c
    # that can sometimes return an empty container
    return Container.empty()

c = Container(pd.DataFrame([1],[1]), pd.DataFrame([2],[2]))

assert (f(c) + g(c)) == empty

In [85]: pd.DataFrame().blocks
Out[85]: {}

In [86]: (pd.DataFrame() + pd.DataFrame()).blocks
Out[86]:
{'float64': Empty DataFrame
 Columns: []
 Index: []}

…#10181

rekcahpassyla · 2015-05-21T16:09:44Z

For what it's worth, I decided to have a go. #10188 in case it is helpful.

…#10181

BUG: Adding empty dataframes should result in empty blocks #10181

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design Difficulty Intermediate labels May 21, 2015

jreback added this to the Next Major Release milestone May 21, 2015

rekcahpassyla added a commit to rekcahpassyla/pandas that referenced this issue May 21, 2015

BUG: Adding empty dataframes should result in empty blocks pandas-dev…

eb8ad50

…#10181

rekcahpassyla mentioned this issue May 21, 2015

BUG: Adding empty dataframes should result in empty blocks #10181 #10188

Merged

jreback modified the milestones: 0.17.0, Next Major Release May 21, 2015

rekcahpassyla added a commit to rekcahpassyla/pandas that referenced this issue May 21, 2015

BUG: Adding empty dataframes should result in empty blocks pandas-dev…

9294f92

…#10181

jreback modified the milestones: 0.16.2, 0.17.0 Jun 2, 2015

rekcahpassyla added a commit to rekcahpassyla/pandas that referenced this issue Jun 3, 2015

BUG: Adding empty dataframes should result in empty blocks pandas-dev…

5aaed1d

…#10181

rekcahpassyla added a commit to rekcahpassyla/pandas that referenced this issue Jun 4, 2015

BUG: Adding empty dataframes should result in empty blocks pandas-dev…

4f1e43d

…#10181

rekcahpassyla added a commit to rekcahpassyla/pandas that referenced this issue Jun 5, 2015

BUG: Adding empty dataframes should result in empty blocks pandas-dev…

d0ba41e

…#10181

jreback closed this as completed in #10188 Jun 7, 2015

jreback added a commit that referenced this issue Jun 7, 2015

Merge pull request #10188 from rekcahpassyla/empty_df_add

d24dc4b

BUG: Adding empty dataframes should result in empty blocks #10181

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty dataframe not equal to another empty dataframe in 0.16.1 #10181

Empty dataframe not equal to another empty dataframe in 0.16.1 #10181

rekcahpassyla commented May 21, 2015

jreback commented May 21, 2015

rekcahpassyla commented May 21, 2015

jreback commented May 21, 2015

rekcahpassyla commented May 21, 2015

rekcahpassyla commented May 21, 2015

Empty dataframe not equal to another empty dataframe in 0.16.1 #10181

Empty dataframe not equal to another empty dataframe in 0.16.1 #10181

Comments

rekcahpassyla commented May 21, 2015

jreback commented May 21, 2015

rekcahpassyla commented May 21, 2015

jreback commented May 21, 2015

rekcahpassyla commented May 21, 2015

rekcahpassyla commented May 21, 2015