-
-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
I came across the case where if you try setitem with a boolean mask on a frame
that is mixed dtype an exception is raised
this is easily relaxed in the int/float case (and will leave/upcast the int columns as needed)
In [68]: df
Out[68]:
0 1 2 3 4 y
35 NaN NaN NaN NaN 0.342153 0
40 NaN 0.326323 NaN NaN NaN 0
43 NaN NaN 0.290126 NaN NaN 0
49 NaN 0.326323 NaN NaN NaN 0
50 NaN 0.391147 NaN NaN NaN 1
In [75]: df.dtypes
Out[75]:
0 float64
1 float64
2 float64
3 float64
4 float64
y int64
This will currently raise because its mixed_type (this is easily fixed and I think should be,
as the IntBlock will upcast if needed)
In [72]: df[df>0.3] = 1
In [73]: df
Out[73]:
0 1 2 3 4 y
35 NaN NaN NaN NaN 1 0
40 NaN 1 NaN NaN NaN 0
43 NaN NaN 0.290126 NaN NaN 0
49 NaN 1 NaN NaN NaN 0
50 NaN 1 NaN NaN NaN 1
What about a mixed type that invovles non-numerics though,
In [77]: df
Out[77]:
0 1 2 3 4 y foo
35 NaN NaN NaN NaN 0.342153 0 test
40 NaN 0.326323 NaN NaN NaN 0 test
43 NaN NaN 0.290126 NaN NaN 0 test
49 NaN 0.326323 NaN NaN NaN 0 test
50 NaN 0.391147 NaN NaN NaN 1 test
In [78]: df.get_dtype_counts()
Out[78]:
float64 5
int64 1
object 1
Should raise here? or allow just the non-numerics to 'work'?
am leaning toward allowing a purely numeric frame to work (e.g. mixed int/float),
but raising on this last case? (then its explicity that you did something 'wrong')
any opinons?
Note that the getitem case works on mixed....
n [80]: df[df>0.3]
Out[80]:
0 1 2 3 4 y foo
35 NaN NaN NaN NaN 0.3421533 NaN test
40 NaN 0.3263232 NaN NaN NaN NaN test
43 NaN NaN NaN NaN NaN NaN test
49 NaN 0.3263232 NaN NaN NaN NaN test
50 NaN 0.3911472 NaN NaN NaN 1 test
and this would preclude a pathological case where (and maybe this is another bug),
you can fillna this and it doesn't convert to float64, so it 'looks' like a numeric but actually isn't
Note: I am letting this go thru, (e.g. only try the numeric case if the mixed type fails, more
for backward compatibilty that anything else)
In [94]: df = DataFrame({"col1": [2, 5.0, 123, None],
....: "col2": [1, 2, 3, 4]}, dtype=object)
In [95]: df
Out[95]:
col1 col2
0 2 1
1 5 2
2 123 3
3 None 4
In [96]: df.dtypes
Out[96]:
col1 object
col2 object