Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when updating dataframe with empty filter #9596

Closed
Sereger13 opened this issue Mar 5, 2015 · 9 comments · Fixed by #9934
Closed

Error when updating dataframe with empty filter #9596

Sereger13 opened this issue Mar 5, 2015 · 9 comments · Fixed by #9934
Labels
Bug Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@Sereger13
Copy link
Contributor

df = pd.DataFrame({'a': ['1', '2', '3'], 
               'b': ['11', '22', '33'], 
               'c': ['111', '222', '333']})
df.loc[df.b.isnull(), 'a'] = df.b

The filtering condition is always false (b is never null) and the above code produces this error:
Array is not broadcastable to correct shape.

If, however, I change the type for column 'c' - which is not even mentioned in the assignment expression (!) - from str to int,

df = pd.DataFrame({'a': ['1', '2', '3'], 
               'b': ['11', '22', '33'], 
               'c': [111, 222, 333]}) # Type changed to int
df.loc[df.b.isnull(), 'a'] = df.b

No errors are produced.

Would be nice to have consistent behaviour and not have any errors - could not find anything in the documentation saying that applying .loc on empty filter is considered illegal?

@jreback
Copy link
Contributor

jreback commented Mar 5, 2015

pls show pd.show_versions()

this looks like an older fixed bug

@Sereger13
Copy link
Contributor Author

INSTALLED VERSIONS

commit: None
python: 2.7.5.final.0
python-bits: 32
OS: Linux
OS-release: 2.6.18-238.9.1.el5
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US

pandas: 0.15.0
nose: 1.3.0
Cython: 0.20
numpy: 1.7.1
scipy: 0.13.0
statsmodels: 0.5.0
IPython: 2.2.0
sphinx: 1.1.3
patsy: 0.2.1
dateutil: 1.5
pytz: 2013b
bottleneck: None
tables: 3.1.0
numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: 1.6.2
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: 0.5.5
lxml: None
bs4: 4.3.1
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.8.3
pymysql: None
psycopg2: None

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Error Reporting Incorrect or improved errors from pandas labels Mar 5, 2015
@jreback jreback added this to the 0.16.1 milestone Mar 5, 2015
@chrisgilmerproj
Copy link
Contributor

I can reproduce the bug:

In [51]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: 3e7f21ceaf9b456ab3e7c2b76166d580e321bed1
python: 2.7.9.final.0
python-bits: 64
OS: Darwin
OS-release: 14.1.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.0-110-g3e7f21c
nose: 1.3.6
Cython: 0.22
numpy: 1.9.2
scipy: None
statsmodels: None
IPython: 3.1.0
sphinx: None
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.2
bottleneck: 1.0.0
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.3
openpyxl: 2.2.1
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.7.2
lxml: 3.4.2
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.9
pymysql: 0.6.6.None
psycopg2: None

Here's how I reproduced with python code in ipython notebook:

df = pd.DataFrame({'a': ['1', '2', '3'], 
                   'b': ['11', '22', '33'], 
                   'c': ['111', '222', '333']})


print(df.loc[df.b.isnull(), 'a'])
print(df.b)
print('')

new_s = df.loc[df.b.isnull(), 'a']
print(new_s)

new_s = df.b
print(new_s)

Output:

Series([], Name: a, dtype: object)
0    11
1    22
2    33
Name: b, dtype: object

Series([], Name: a, dtype: object)
0    11
1    22
2    33
Name: b, dtype: object

Now to show the error:

df = pd.DataFrame({'a': ['1', '2', '3'], 
                   'b': ['11', '22', '33'], 
                   'c': ['111', '222', '333']})


print(df.loc[df.b.isnull(), 'a'])
print(df.b)
print('')

df.loc[df.b.isnull(), 'a'] = df.b

Output:

Series([], Name: a, dtype: object)
0    11
1    22
2    33
Name: b, dtype: object

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-55-1af2799a791a> in <module>()
      8 print('')
      9 
---> 10 df.loc[df.b.isnull(), 'a'] = df.b

/Users/cgilmer/Projects/pandas-cgilmer/pandas/core/indexing.pyc in __setitem__(self, key, value)
    116     def __setitem__(self, key, value):
    117         indexer = self._get_setitem_indexer(key)
--> 118         self._setitem_with_indexer(indexer, value)
    119 
    120     def _has_valid_type(self, k, axis):

/Users/cgilmer/Projects/pandas-cgilmer/pandas/core/indexing.pyc in _setitem_with_indexer(self, indexer, value)
    500 
    501             # actually do the set
--> 502             self.obj._data = self.obj._data.setitem(indexer=indexer, value=value)
    503             self.obj._maybe_update_cacher(clear=True)
    504 

/Users/cgilmer/Projects/pandas-cgilmer/pandas/core/internals.pyc in setitem(self, **kwargs)
   2473 
   2474     def setitem(self, **kwargs):
-> 2475         return self.apply('setitem', **kwargs)
   2476 
   2477     def putmask(self, **kwargs):

/Users/cgilmer/Projects/pandas-cgilmer/pandas/core/internals.pyc in apply(self, f, axes, filter, do_integrity_check, **kwargs)
   2449                                                  copy=align_copy)
   2450 
-> 2451             applied = getattr(b, f)(**kwargs)
   2452 
   2453             if isinstance(applied, list):

/Users/cgilmer/Projects/pandas-cgilmer/pandas/core/internals.pyc in setitem(self, indexer, value)
    603             # set
    604             else:
--> 605                 values[indexer] = value
    606 
    607             # coerce and try to infer the dtypes of the result

ValueError: shape mismatch: value array of shape (3,) could not be broadcast to indexing result of shape (0,)

I can also verify the other behavior:

df = pd.DataFrame({'a': ['1', '2', '3'], 
                   'b': ['11', '22', '33'], 
                   'c': [111, 222, 333]}) # Type changed to int

print(df.loc[df.b.isnull(), 'a'])
print(df.b)
print('')

df.loc[df.b.isnull(), 'a'] = df.b

print('')
new_s = df.loc[df.b.isnull(), 'a']
print(new_s)
new_s = df.b
print(new_s)

Output:

Series([], Name: a, dtype: object)
0    11
1    22
2    33
Name: b, dtype: object


Series([], Name: a, dtype: object)
0    11
1    22
2    33
Name: b, dtype: object

@chrisgilmerproj
Copy link
Contributor

This bug appears only when I assign df.b directly to a newly created Series object using the .loc method. It does not show when I've already assigned the new Series object to a variable. I need to perhaps know a bit more about the internals of python here but I think that assignment isn't working here. Also, what is the use case for doing this particular assignment?

@chrisgilmerproj
Copy link
Contributor

Something I can't do:

pd.Series([], name='a', dtype=object) = df.b

Output:

  File "<ipython-input-73-2e69376bea41>", line 1
    pd.Series([], name='a', dtype=object) = df.b
SyntaxError: can't assign to function call

@chrisgilmerproj
Copy link
Contributor

I'm going to pass this over. I can't seem to make traction on it.

@chrisgilmerproj
Copy link
Contributor

Very cool. I never would have found that.

@Sereger13
Copy link
Contributor Author

Thanks for fixing - any idea when the version with the fix (0.16.1?) is going live?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants