handle nan values in DataFrame.update when overwrite=False #15593

Closed
pcluo opened this Issue Mar 6, 2017 · 5 comments

Comments

Projects
None yet
5 participants
Contributor

pcluo commented Mar 6, 2017

Code Sample

from pandas import DataFrame, date_range
df1 = DataFrame({'A': [1,None,3], 'B': date_range('2000', periods=3)})
df2 = DataFrame({'A': [None, 2, 3]})
df1.update(df2, overwrite=False)
df1

Problem description

I got TypeError: invalid type promotion error when updating a DF with a datetime column. The 2nd DF doesn't have this column. The error message is in the details (although bad formatted).

IMHO, the culpit is in the DataFrame.update. The block checking mask.all should be outside the if block and applies to the case overwrite=False as well.

                if overwrite:
                    mask = isnull(that)

                    # don't overwrite columns unecessarily
                    if mask.all():
                        continue
                else:
                    mask = notnull(this)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) in () 1 df1 = DataFrame({'A': [1,None,3], 'B': date_range('2000', periods=3)}) 2 df2 = DataFrame({'A': [None, 2, 3]}) ----> 3 df1.update(df2, overwrite=False) 4 df1 5

C:\Users\pcluo\Anaconda3\lib\site-packages\pandas\core\frame.py in update(self, other, join, overwrite, filter_func, raise_conflict)
3845
3846 self[col] = expressions.where(mask, this, that,
-> 3847 raise_on_error=True)
3848
3849 # ----------------------------------------------------------------------

C:\Users\pcluo\Anaconda3\lib\site-packages\pandas\computation\expressions.py in where(cond, a, b, raise_on_error, use_numexpr)
228
229 if use_numexpr:
--> 230 return _where(cond, a, b, raise_on_error=raise_on_error)
231 return _where_standard(cond, a, b, raise_on_error=raise_on_error)
232

C:\Users\pcluo\Anaconda3\lib\site-packages\pandas\computation\expressions.py in _where_numexpr(cond, a, b, raise_on_error)
151
152 if result is None:
--> 153 result = _where_standard(cond, a, b, raise_on_error)
154
155 return result

C:\Users\pcluo\Anaconda3\lib\site-packages\pandas\computation\expressions.py in _where_standard(cond, a, b, raise_on_error)
126 def _where_standard(cond, a, b, raise_on_error=True):
127 return np.where(_values_from_object(cond), _values_from_object(a),
--> 128 _values_from_object(b))
129
130

TypeError: invalid type promotion

Contributor

jreback commented Mar 6, 2017

yeah this looks like a bug. .update() has not gotten much TLC. in fact this should be completely changed, see #3025 to generally fix this method.

So a short-term fix is ok if you'd want to push that.

jreback added this to the Next Major Release milestone Mar 6, 2017

mayukh18 commented Mar 8, 2017

@cluoren are you doing it? else I can take it up. I have looked into it already.

Contributor

pcluo commented Mar 8, 2017

@mayukh18 just created a pull request. thx tho.

@jreback jreback modified the milestone: 0.20.0, Next Major Release Mar 8, 2017

@jreback jreback modified the milestone: 0.20.0, Next Major Release Mar 23, 2017

@pcluo pcluo added a commit to pcluo/pandas that referenced this issue May 22, 2017

@pcluo @pcluo pcluo + pcluo BUG: handle nan values in DataFrame.update when overwrite=False #15593
add nan test for DataFrame.update
update whatsnew v0.20.2
39350e2

@pcluo pcluo added a commit to pcluo/pandas that referenced this issue May 22, 2017

@pcluo pcluo BUG: handle nan values in DataFrame.update when overwrite=False (#15593)
BUG: handle nan values in DataFrame.update when overwrite=False (#15593)

add nan test for DataFrame.update

update whatsnew v0.20.2
ee999f7

@pcluo pcluo added a commit to pcluo/pandas that referenced this issue May 22, 2017

@pcluo pcluo BUG: handle nan values in DataFrame.update when overwrite=False (#15593)
BUG: handle nan values in DataFrame.update when overwrite=False (#15593)

add nan test for DataFrame.update

update whatsnew v0.20.2
e6b11bd

@pcluo pcluo added a commit to pcluo/pandas that referenced this issue May 24, 2017

@pcluo pcluo BUG: handle nan values in DataFrame.update when overwrite=False (#15593)
add nan test for DataFrame.update

update whatsnew v0.20.2
c832fdd

@pcluo pcluo added a commit to pcluo/pandas that referenced this issue May 24, 2017

@pcluo pcluo BUG: handle nan values in DataFrame.update when overwrite=False (#15593)
add nan test for DataFrame.update

update whatsnew v0.20.2
d8bde84

@pcluo pcluo added a commit to pcluo/pandas that referenced this issue May 24, 2017

@pcluo pcluo BUG: handle nan values in DataFrame.update when overwrite=False (#15593)
add nan test for DataFrame.update

update whatsnew v0.20.2
be7a230

@pcluo pcluo added a commit to pcluo/pandas that referenced this issue May 24, 2017

@pcluo pcluo BUG: handle nan values in DataFrame.update when overwrite=False (#15593)
add nan test for DataFrame.update

update whatsnew v0.20.2
468a798

@jreback jreback modified the milestone: 0.20.2, Next Major Release May 24, 2017

jreback closed this in #16430 May 24, 2017

@TomAugspurger TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue May 29, 2017

@pcluo @TomAugspurger pcluo + TomAugspurger BUG: handle nan values in DataFrame.update when overwrite=False (#15593
…) (#16430)

(cherry picked from commit 85080aa)
e735629

olizhu commented Jun 16, 2017

The issue with NaN seems to be fixed in v0.20.2, but a similar problem still exists with NaT if it exists anywhere in the dataframe.
For example, this will throw an error:

df1 = DataFrame({'A': [1,None], 'B':[to_datetime('abc', errors='coerce'),to_datetime('2016-01-01')]})
df2 = DataFrame({'A': [2,3]})
df1.update(df2, overwrite=False)
Contributor

TomAugspurger commented Jun 16, 2017

@olizhu could you search to see if we have an issue for that already (I don't recall seeing it before). If not, could you open a new issue for it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment