Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'ValueError: Cannot assign nan to integer series" when calling where on Series with non-unique index. #4550

Closed
jgehrcke opened this issue Aug 13, 2013 · 4 comments · Fixed by #4779
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@jgehrcke
Copy link
Contributor

Create a Series with non-unique index:

>>> import pandas as pd
>>> pd.__version__
'0.12.0'
>>> s1 = pd.Series(range(3))
>>> s2 = pd.Series(range(3))
>>> comb = pd.concat([s1,s2])
>>> comb
0    0
1    1
2    2
0    0
1    1
2    2
dtype: int64

As of #4548 I cannot use comb[comb<2] =+ 10, so I tried working with where (according to http://pandas.pydata.org/pandas-docs/stable/indexing.html#where-and-masking, stating "To guarantee that selection output has the same shape as the original data, you can use the where method in Series and DataFrame"). But already calling where on comb is problematic:

>>> comb.where(comb < 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/projects/bioinfp_apps/Python-2.7.3/lib/python2.7/site-packages/pandas/core/series.py", line 745, in where
    ser._set_with(~cond, other)
  File "/projects/bioinfp_apps/Python-2.7.3/lib/python2.7/site-packages/pandas/core/series.py", line 886, in _set_with
    self._set_values(key, value)
  File "/projects/bioinfp_apps/Python-2.7.3/lib/python2.7/site-packages/pandas/core/series.py", line 904, in _set_values
    values[key] = _index.convert_scalar(values, value)
  File "index.pyx", line 547, in pandas.index.convert_scalar (pandas/index.c:9752)
  File "index.pyx", line 560, in pandas.index.convert_scalar (pandas/index.c:9639)
ValueError: Cannot assign nan to integer series
>>> 

Is this expected?

@jreback
Copy link
Contributor

jreback commented Aug 13, 2013

yes this is expected; you cannot (ATM) change the dtype in place (from integer to float), which is what assigning with a nan entails. Series act somewhat differently that DataFrames because they have a different super-class, this is slated to be fixed in 0.13.

This type of assignment is more supported in DataFrames (as well as dealing with non-unique indices)

In [16]: df = DataFrame(dict(a = [0,1,2,0,1,2]))

In [17]: df
Out[17]: 
   a
0  0
1  1
2  2
3  0
4  1
5  2

In [18]: df[df['a']<2]
Out[18]: 
   a
0  0
1  1
3  0
4  1

In [19]: df[df['a']<2] += 2

In [20]: df
Out[20]: 
   a
0  2
1  3
2  2
3  2
4  3

@jgehrcke
Copy link
Contributor Author

If it's expected, it's fine.

But honestly, from the perspective of the normal user, I don't see an "assignment" in the line comb.where(comb < 2) at all. The semantics of this line say that we just read and not write. This makes it especially hard to see where the "data type change" should be. So the error message comes really surprising. When reading the docs, the user just sees that something like s.where(s > 0) should work or provide an understandable error message.

@jreback
Copy link
Contributor

jreback commented Aug 13, 2013

where is not strictly an indexing operation, it is shape preserving,which means nans are possibly introduced meaning a dtype change (in this case) is required. It think the docs are pretty clear on this: http://pandas.pydata.org/pandas-docs/dev/indexing.html#where-and-masking.

Integer not supporting nan is a general problem (stemming from numpy), see here: http://pandas.pydata.org/pandas-docs/dev/gotchas.html#nan-integer-na-values-and-na-type-promotions

It is important to understand dtype changes; your operations could do unexpected things.

@jreback
Copy link
Contributor

jreback commented Aug 13, 2013

all that said, this is fixed in #3482 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
2 participants