Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inplace where with complex column changes dtype to float #6345

Closed
hayd opened this issue Feb 13, 2014 · 11 comments · Fixed by #6347
Closed

inplace where with complex column changes dtype to float #6345

hayd opened this issue Feb 13, 2014 · 11 comments · Fixed by #6347
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@hayd
Copy link
Contributor

hayd commented Feb 13, 2014

http://stackoverflow.com/questions/21766182/replacing-out-of-bounds-complex-values-in-a-pandas-dataframe/21766586#21766586

@hayd hayd added the Dtypes label Feb 13, 2014
@jreback
Copy link
Contributor

jreback commented Feb 13, 2014

@hayd

df3.where(df3<5000,np.nan,inplace=True)

is the same (and is implemented) like this:

df3[df3<5000] = np.nan

their certianly could be a coercion bug in this (as this boils down to Block.where to actually do this and it has to coerce inputs and outputs (or attempt to anyhow)

@jreback jreback added this to the 0.14.0 milestone Feb 13, 2014
@jreback jreback added the Bug label Feb 13, 2014
@jreback
Copy link
Contributor

jreback commented Feb 13, 2014

@hayd if you have the test frame for this example would be great (or a simple one too that reproduces)

@hayd
Copy link
Contributor Author

hayd commented Feb 13, 2014

Sure thing:

In [11]: df = DataFrame([[1+1j, 2], [5+1j, 4+1j]], columns=['a', 'b'])

In [12]: df[df.abs() >= 5] = np.nan

In [13]: df
Out[13]:
    a       b
0   1  (2+0j)
1 NaN  (4+1j)

In [14]: df.a.dtype  # expected complex128 (that of df.b.dtype)
Out[14]: dtype('float64')

expected =  DataFrame([[1+1j, 2], [np.nan, 4+1j]], columns=['a', 'b'])

this demonstrates for where too:

In [15]: df.where(df <= 5)  # works correctly
Out[15]:
        a       b
0  (1+1j)  (2+0j)
1     NaN  (4+1j)

In [16]: df.where(df <= 5, inplace=True)

In [17]: df
Out[17]:
    a       b
0   1  (2+0j)
1 NaN  (4+1j)

@jreback
Copy link
Contributor

jreback commented Feb 13, 2014

was basically a 1-line change (a long line!) hahah

@schodge
Copy link

schodge commented Feb 14, 2014

Following up on a later comment I left on SO (was the original question-asker of the above link), the NaN assigned is NaN + 0j, which means taking np.imag() of the value gives you 0.0 instead of NaN. Would it make more sense for pandas to return np.nan + 0j*np.nan?

>>> a = np.nan + 0j
>>> a
(nan+0j)
>>> np.real(a)
array(nan)
>>> np.imag(a)
array(0.0)

@jreback
Copy link
Contributor

jreback commented Feb 14, 2014

no; the nan represents a missing value, which is distinct from np.nan + 0j*np.nan. It should be ignored in operations on the frame . can you try out an operation you are doing and confirm ?

@schodge
Copy link

schodge commented Feb 14, 2014

Using the frame df @hayd provided above. It works while you're operating on the dataframe, but if you use .values, you get anomalous results operating on the numpy array - I would expect the last value to be [1. nan] not [1. 0].

df = DataFrame([[1+1j, 2], [5+1j, 4+1j]], columns=['a', 'b'])
df2 = df.where(df <= 5)
print df2['a'].real
print df2['a'].imag
temp = df2['a'].values
print temp
# Same results with np.real(temp) / np.imag(temp)
print temp.real
print temp.imag
#Output  
[  1.  nan]
[ 1.  0.]
[  1.+1.j  nan+0.j]
[  1.  nan]
[ 1.  0.]

@jreback
Copy link
Contributor

jreback commented Feb 14, 2014

The point of where IS to introduce nan by definition

The values in (1,a) is missing, not a complex (or any other value)

You prob want to fill it

In [11]: df2.fillna(0+0j)
Out[11]: 
        a       b
0  (1+1j)  (2+0j)
1      0j  (4+1j)

[2 rows x 2 columns]

In [18]: df2.fillna(0+0j,inplace=True)

In [19]: temp = df2['a'].values

In [20]: print temp.real
[ 1.  0.]

In [21]: print temp.imag
[ 1.  0.]

@hayd
Copy link
Contributor Author

hayd commented Feb 14, 2014

The default where replace value is nan, so this is expected/correct. You should be explicit about what you want to replace it with .where(df <=5, nan + 1j * nan).

@schodge
Copy link

schodge commented Feb 14, 2014

OK, so long as is this correct behavior. I find it odd that isnan(a) can be True but isnan(a.imag) False, but this appears to be the way Python works. It also only works in that direction - whenever I attempt to create (0+nanj) I wind up with nan+nanj, so it appears you can never have isnan(a.imag) True and isnan(a.real) False.

Thank you both for your help, on both sites.

@hayd
Copy link
Contributor Author

hayd commented Feb 15, 2014

I guess the distinction is between nan and inf, these concepts are different. That's why there is only one complex nan...

However, I expected this to work (but not to do with pandas), weird:

In [21]: 3 + 1j * -np.inf
Out[21]: (nan-infj)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants