Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some items were not contained in blocks AssertionError #4032

Closed
miketkelly opened this issue Jun 25, 2013 · 4 comments · Fixed by #4043
Closed

Some items were not contained in blocks AssertionError #4032

miketkelly opened this issue Jun 25, 2013 · 4 comments · Fixed by #4043
Labels
Milestone

Comments

@miketkelly
Copy link

The code below causes the DataFrame internals to get confused and throws an AssertionError on print df.values. The code looks contrived, but its just a simplified version of something I was trying to do. It doesn't seem to matter whether any columns are actually renamed so I simplified that also, but the rename calls do contribute to the problem.

import pandas as pd
df = pd.DataFrame({'a': [1, 2],
                   'b': [3, 4],
                   'c': [5, 6]})

df = df.set_index(['a', 'b'])
df = df.unstack()
df = df.rename(columns={'a': 'd'})
df = df.reset_index()
print df._data
df = df.rename(columns={})
print df._data
print df.values

BlockManager
Items: MultiIndex
[(u'a', u''), (u'c', 3), (u'c', 4)]
Axis 1: Int64Index([0, 1], dtype=int64)
FloatBlock: [(c, 3), (c, 4)], 2 x 2, dtype float64
IntBlock: [(a, )], 1 x 2, dtype int64

BlockManager
Items: MultiIndex
[(u'a', u''), (u'c', 3), (u'c', 4)]
Axis 1: Int64Index([0, 1], dtype=int64)
FloatBlock: [(a, ), (c, 3)], 2 x 2, dtype float64
IntBlock: [(a, )], 1 x 2, dtype int64

Traceback (most recent call last):
  File "/home/mtk/bug.py", line 13, in <module>
    print df.values
  File "/Users/mtk/Source/pandas/pandas/core/frame.py", line 1779, in as_matrix
    return self._data.as_matrix(columns).T
  File "/Users/mtk/Source/pandas/pandas/core/internals.py", line 1513, in as_matrix
    mat = self._interleave(self.items)
  File "/Users/mtk/Source/pandas/pandas/core/internals.py", line 1549, in _interleave
    raise AssertionError('Some items were not contained in blocks')
AssertionError: Some items were not contained in blocks

>>> pd.__version__
'0.11.1.dev-8a242d2'
@jreback
Copy link
Contributor

jreback commented Jun 25, 2013

the 2nd rename should not be allowed and is prob causing an issue (eg state gets messed up)

@hayd
Copy link
Contributor

hayd commented Jun 25, 2013

I think rename({}) should be allowed, as it could be you use a generated dictionary (which could be empty) to rename with. But agree that's probably where the issue lies.

@miketkelly
Copy link
Author

Here's a simpler snippet that reproduces the problem:

import pandas as pd
df = pd.DataFrame({'b': [1.1, 2.2]})
df = df.rename(columns={})
df.insert(0, 'a', [1, 2])
df = df.rename(columns={})
print df.values

To be clear, this has nothing to do with passing an empty dict to rename. I know a little about the internals now, but not enough fix this one. I believe the bug is in insert. The first rename just happens to cause ref_locs to be calculated on the blocks. The insert method fails to update these ref_locs when a new columns is inserted (block.ref_items is updated but block.ref_locs is not). The second rename then causes block.items to be calculated with incorrect ref_locs. Things go downhill from there.

This is a regression from 11.0.

@jreback
Copy link
Contributor

jreback commented Jun 26, 2013

pretty straightforward, was not clearing the _ref_locs (which is the indexer named ref_locs) on a unique-index (non-unique is handled separately from this); so an insert not at the end was failing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants