New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.ix[idx, :] = value sets wrong values when idx is a MultiIndex and DataFrame.columns is also a MultiIndex #11372

Closed
rekcahpassyla opened this Issue Oct 19, 2015 · 6 comments

Comments

Projects
None yet
2 participants
@rekcahpassyla
Contributor

rekcahpassyla commented Oct 19, 2015

This code is broken in 0.17.0 but not in 0.15.2:

import pandas as pd
import numpy as np

np.random.seed(1)

from itertools import product

from pandas.util.testing import assert_frame_equal

pd.show_versions()

idx = pd.MultiIndex.from_tuples(
    list(
        product(['A', 'B', 'C'], 
                pd.date_range('2015-01-01', '2015-04-01', freq='MS'))
    )
)

sub = pd.MultiIndex.from_tuples(
    [('A', pd.Timestamp('2015-01-01')), ('A', pd.Timestamp('2015-02-01'))]
)
# if cols = ['foo', 'bar', 'baz', 'quux'], there is no error. 
cols = pd.MultiIndex.from_tuples(
    list(
        product(['foo', 'bar'], 
                pd.date_range('2015-01-01', '2015-02-01', freq='MS'))
    )
)

test = pd.DataFrame(np.random.random((12, 4)), index=idx, columns=cols)
vals = pd.DataFrame(np.random.random((2, 4)), index=sub, columns=cols)
test.ix[sub, :] = vals

print test.ix[sub, :]
print vals

assert_frame_equal(test.ix[sub, :], vals)

0.17.0

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.0
setuptools: 18.0.1
Cython: 0.22
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.1
pytz: 2015.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None
                    foo                   bar
             2015-01-01 2015-02-01 2015-01-01 2015-02-01
A 2015-01-01   0.287775   0.130029   0.019367   0.678836
  2015-02-01   0.287775   0.130029   0.019367   0.678836
                    foo                   bar
             2015-01-01 2015-02-01 2015-01-01 2015-02-01
A 2015-01-01   0.287775   0.130029   0.019367   0.678836
  2015-02-01   0.211628   0.265547   0.491573   0.053363
Traceback (most recent call last):
  File "c:\dev\code\sandbox\multiindex.py", line 41, in <module>
    assert_frame_equal(test.ix[sub, :], vals)
  File "c:\python\envs\pd017\lib\site-packages\pandas\util\testing.py", line 1028, in assert_frame_equal
    obj='DataFrame.iloc[:, {0}]'.format(i))
  File "c:\python\envs\pd017\lib\site-packages\pandas\util\testing.py", line 925, in assert_series_equal
    check_less_precise, obj='{0}'.format(obj))
  File "pandas\src\testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas\src\testing.c:3809)
  File "pandas\src\testing.pyx", line 147, in pandas._testing.assert_almost_equal (pandas\src\testing.c:2685)
  File "c:\python\envs\pd017\lib\site-packages\pandas\util\testing.py", line 798, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame.iloc[:, 0] are different

DataFrame.iloc[:, 0] values are different (50.0 %)
[left]:  [0.287775338586, 0.287775338586]
[right]: [0.287775338586, 0.211628116]

0.15.2

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_GB

pandas: 0.15.2
nose: 1.3.7
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.1
pytz: 2015.4
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None
                    foo                   bar           
             2015-01-01 2015-02-01 2015-01-01 2015-02-01
A 2015-01-01   0.287775   0.130029   0.019367   0.678836
  2015-02-01   0.211628   0.265547   0.491573   0.053363
                    foo                   bar           
             2015-01-01 2015-02-01 2015-01-01 2015-02-01
A 2015-01-01   0.287775   0.130029   0.019367   0.678836
  2015-02-01   0.211628   0.265547   0.491573   0.053363
@rekcahpassyla

This comment has been minimized.

Show comment
Hide comment
@rekcahpassyla

rekcahpassyla Oct 19, 2015

Contributor

Indexing with a specific set of columns also gives the error:

Code sample:

import pandas as pd
import numpy as np

np.random.seed(1)

from itertools import product

from pandas.util.testing import assert_frame_equal

pd.show_versions()

idx = pd.MultiIndex.from_tuples(
    list(
        product(['A', 'B', 'C'], 
                pd.date_range('2015-01-01', '2015-04-01', freq='MS'))
    )
)
cols = pd.MultiIndex.from_tuples(
    list(
        product(['foo', 'bar'], 
                pd.date_range('2016-01-01', '2016-02-01', freq='MS'))
    )
)

# if cols = ['foo', 'bar', 'baz', 'quux'], there is no error. 

test = pd.DataFrame(np.random.random((12, 4)), index=idx, columns=cols)


subidx = pd.MultiIndex.from_tuples(
    [('A', pd.Timestamp('2015-01-01')), ('A', pd.Timestamp('2015-02-01'))]
)

subcols = pd.MultiIndex.from_tuples(
    [('foo', pd.Timestamp('2016-01-01')), ('foo', pd.Timestamp('2016-02-01'))]
)


vals = pd.DataFrame(np.random.random((2, 2)), index=subidx, columns=subcols)


test.ix[subidx, subcols] = vals

print test.ix[subidx, subcols]

print vals

assert_frame_equal(test.ix[subidx, subcols], vals)

0.17.0

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.0
setuptools: 18.0.1
Cython: 0.22
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.1
pytz: 2015.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None
                    foo
             2016-01-01 2016-02-01
A 2015-01-01   0.287775   0.130029
  2015-02-01   0.287775   0.130029
                    foo
             2016-01-01 2016-02-01
A 2015-01-01   0.287775   0.130029
  2015-02-01   0.019367   0.678836
Traceback (most recent call last):
  File "c:\dev\code\sandbox\multiindex.py", line 48, in <module>
    assert_frame_equal(test.ix[subidx, subcols], vals)
  File "c:\python\envs\pd017\lib\site-packages\pandas\util\testing.py", line 1028, in assert_frame_equal
    obj='DataFrame.iloc[:, {0}]'.format(i))
  File "c:\python\envs\pd017\lib\site-packages\pandas\util\testing.py", line 925, in assert_series_equal
    check_less_precise, obj='{0}'.format(obj))
  File "pandas\src\testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas\src\testing.c:3809)
  File "pandas\src\testing.pyx", line 147, in pandas._testing.assert_almost_equal (pandas\src\testing.c:2685)
  File "c:\python\envs\pd017\lib\site-packages\pandas\util\testing.py", line 798, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame.iloc[:, 0] are different

DataFrame.iloc[:, 0] values are different (50.0 %)
[left]:  [0.287775338586, 0.287775338586]
[right]: [0.287775338586, 0.0193669578703]
Contributor

rekcahpassyla commented Oct 19, 2015

Indexing with a specific set of columns also gives the error:

Code sample:

import pandas as pd
import numpy as np

np.random.seed(1)

from itertools import product

from pandas.util.testing import assert_frame_equal

pd.show_versions()

idx = pd.MultiIndex.from_tuples(
    list(
        product(['A', 'B', 'C'], 
                pd.date_range('2015-01-01', '2015-04-01', freq='MS'))
    )
)
cols = pd.MultiIndex.from_tuples(
    list(
        product(['foo', 'bar'], 
                pd.date_range('2016-01-01', '2016-02-01', freq='MS'))
    )
)

# if cols = ['foo', 'bar', 'baz', 'quux'], there is no error. 

test = pd.DataFrame(np.random.random((12, 4)), index=idx, columns=cols)


subidx = pd.MultiIndex.from_tuples(
    [('A', pd.Timestamp('2015-01-01')), ('A', pd.Timestamp('2015-02-01'))]
)

subcols = pd.MultiIndex.from_tuples(
    [('foo', pd.Timestamp('2016-01-01')), ('foo', pd.Timestamp('2016-02-01'))]
)


vals = pd.DataFrame(np.random.random((2, 2)), index=subidx, columns=subcols)


test.ix[subidx, subcols] = vals

print test.ix[subidx, subcols]

print vals

assert_frame_equal(test.ix[subidx, subcols], vals)

0.17.0

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.0
setuptools: 18.0.1
Cython: 0.22
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.1
pytz: 2015.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None
                    foo
             2016-01-01 2016-02-01
A 2015-01-01   0.287775   0.130029
  2015-02-01   0.287775   0.130029
                    foo
             2016-01-01 2016-02-01
A 2015-01-01   0.287775   0.130029
  2015-02-01   0.019367   0.678836
Traceback (most recent call last):
  File "c:\dev\code\sandbox\multiindex.py", line 48, in <module>
    assert_frame_equal(test.ix[subidx, subcols], vals)
  File "c:\python\envs\pd017\lib\site-packages\pandas\util\testing.py", line 1028, in assert_frame_equal
    obj='DataFrame.iloc[:, {0}]'.format(i))
  File "c:\python\envs\pd017\lib\site-packages\pandas\util\testing.py", line 925, in assert_series_equal
    check_less_precise, obj='{0}'.format(obj))
  File "pandas\src\testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas\src\testing.c:3809)
  File "pandas\src\testing.pyx", line 147, in pandas._testing.assert_almost_equal (pandas\src\testing.c:2685)
  File "c:\python\envs\pd017\lib\site-packages\pandas\util\testing.py", line 798, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame.iloc[:, 0] are different

DataFrame.iloc[:, 0] values are different (50.0 %)
[left]:  [0.287775338586, 0.287775338586]
[right]: [0.287775338586, 0.0193669578703]
@rekcahpassyla

This comment has been minimized.

Show comment
Hide comment
@rekcahpassyla

rekcahpassyla Oct 19, 2015

Contributor

(Deleted- misread something, my previous suggestion was not really a fix)

Contributor

rekcahpassyla commented Oct 19, 2015

(Deleted- misread something, my previous suggestion was not really a fix)

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 19, 2015

Contributor

hmm, surprised that broke. there is not must testing on that sub-section actually

The issue is here: https://github.com/pydata/pandas/blob/master/pandas/core/indexing.py#L450

self._align_series is called on a sub-section of the frame, but in the aligner, it looks at it and says oh you are a frame so gives back the wrong result.

So could prob pass in an additional parameter which would determine this.

Contributor

jreback commented Oct 19, 2015

hmm, surprised that broke. there is not must testing on that sub-section actually

The issue is here: https://github.com/pydata/pandas/blob/master/pandas/core/indexing.py#L450

self._align_series is called on a sub-section of the frame, but in the aligner, it looks at it and says oh you are a frame so gives back the wrong result.

So could prob pass in an additional parameter which would determine this.

@jreback jreback added this to the 0.17.1 milestone Oct 19, 2015

@jreback jreback changed the title from `DataFrame.ix[idx, :] = value` sets wrong values when `idx` is a `MultiIndex` and `DataFrame.columns` is also a `MultiIndex` to DataFrame.ix[idx, :] = value sets wrong values when idx is a MultiIndex and DataFrame.columns is also a `MultiIndex` Oct 19, 2015

@jreback jreback changed the title from DataFrame.ix[idx, :] = value sets wrong values when idx is a MultiIndex and DataFrame.columns is also a `MultiIndex` to DataFrame.ix[idx, :] = value sets wrong values when idx is a MultiIndex and DataFrame.columns is also a MultiIndex Oct 19, 2015

@rekcahpassyla

This comment has been minimized.

Show comment
Hide comment
@rekcahpassyla

rekcahpassyla Oct 20, 2015

Contributor

Since I've already got two test cases, I'd be happy to have a go if I can be pointed in the right direction. I'll start by looking at the history of indexing.py and following any referenced issues / PRs

Contributor

rekcahpassyla commented Oct 20, 2015

Since I've already got two test cases, I'd be happy to have a go if I can be pointed in the right direction. I'll start by looking at the history of indexing.py and following any referenced issues / PRs

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 20, 2015

Contributor

the pointer above is to the relevant issues.

the way to do this is to setup the test cases and the expected results (in test_indexing); they should fail before a fix, then you can step thru to see where to put a fix and go from there

Contributor

jreback commented Oct 20, 2015

the pointer above is to the relevant issues.

the way to do this is to setup the test cases and the expected results (in test_indexing); they should fail before a fix, then you can step thru to see where to put a fix and go from there

@rekcahpassyla

This comment has been minimized.

Show comment
Hide comment
@rekcahpassyla

rekcahpassyla Oct 21, 2015

Contributor

OK, here is my first attempt: #11400

I added a test for #5206 as well to test I hadn't broken that existing functionality.

C:\dev\code\opensource\pandas-rekcahpassyla [multiindex_setitem +2 ~0 -0 !]> C:\python\envs\pandasdev\scripts\nosetests .\pandas\tests\test_indexing.py
...........................................................................................................................................
----------------------------------------------------------------------
Ran 139 tests in 68.094s

OK

Attempted to run the whole test suite, but test_max_ext_len (pandas.tests.test_msgpack.test_limits.TestLimits) eats up 4GB of memory and causes Python to crash

Contributor

rekcahpassyla commented Oct 21, 2015

OK, here is my first attempt: #11400

I added a test for #5206 as well to test I hadn't broken that existing functionality.

C:\dev\code\opensource\pandas-rekcahpassyla [multiindex_setitem +2 ~0 -0 !]> C:\python\envs\pandasdev\scripts\nosetests .\pandas\tests\test_indexing.py
...........................................................................................................................................
----------------------------------------------------------------------
Ran 139 tests in 68.094s

OK

Attempted to run the whole test suite, but test_max_ext_len (pandas.tests.test_msgpack.test_limits.TestLimits) eats up 4GB of memory and causes Python to crash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment