handle nan values in DataFrame.update when overwrite=False #15593

Closed
pcluo opened this Issue Mar 6, 2017 · 5 comments

Comments

Projects
None yet
5 participants
@pcluo
Contributor

pcluo commented Mar 6, 2017

Code Sample

from pandas import DataFrame, date_range
df1 = DataFrame({'A': [1,None,3], 'B': date_range('2000', periods=3)})
df2 = DataFrame({'A': [None, 2, 3]})
df1.update(df2, overwrite=False)
df1

Problem description

I got TypeError: invalid type promotion error when updating a DF with a datetime column. The 2nd DF doesn't have this column. The error message is in the details (although bad formatted).

IMHO, the culpit is in the DataFrame.update. The block checking mask.all should be outside the if block and applies to the case overwrite=False as well.

                if overwrite:
                    mask = isnull(that)

                    # don't overwrite columns unecessarily
                    if mask.all():
                        continue
                else:
                    mask = notnull(this)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) in () 1 df1 = DataFrame({'A': [1,None,3], 'B': date_range('2000', periods=3)}) 2 df2 = DataFrame({'A': [None, 2, 3]}) ----> 3 df1.update(df2, overwrite=False) 4 df1 5

C:\Users\pcluo\Anaconda3\lib\site-packages\pandas\core\frame.py in update(self, other, join, overwrite, filter_func, raise_conflict)
3845
3846 self[col] = expressions.where(mask, this, that,
-> 3847 raise_on_error=True)
3848
3849 # ----------------------------------------------------------------------

C:\Users\pcluo\Anaconda3\lib\site-packages\pandas\computation\expressions.py in where(cond, a, b, raise_on_error, use_numexpr)
228
229 if use_numexpr:
--> 230 return _where(cond, a, b, raise_on_error=raise_on_error)
231 return _where_standard(cond, a, b, raise_on_error=raise_on_error)
232

C:\Users\pcluo\Anaconda3\lib\site-packages\pandas\computation\expressions.py in _where_numexpr(cond, a, b, raise_on_error)
151
152 if result is None:
--> 153 result = _where_standard(cond, a, b, raise_on_error)
154
155 return result

C:\Users\pcluo\Anaconda3\lib\site-packages\pandas\computation\expressions.py in _where_standard(cond, a, b, raise_on_error)
126 def _where_standard(cond, a, b, raise_on_error=True):
127 return np.where(_values_from_object(cond), _values_from_object(a),
--> 128 _values_from_object(b))
129
130

TypeError: invalid type promotion

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 6, 2017

Contributor

yeah this looks like a bug. .update() has not gotten much TLC. in fact this should be completely changed, see #3025 to generally fix this method.

So a short-term fix is ok if you'd want to push that.

Contributor

jreback commented Mar 6, 2017

yeah this looks like a bug. .update() has not gotten much TLC. in fact this should be completely changed, see #3025 to generally fix this method.

So a short-term fix is ok if you'd want to push that.

@jreback jreback added this to the Next Major Release milestone Mar 6, 2017

@mayukh18

This comment has been minimized.

Show comment
Hide comment
@mayukh18

mayukh18 Mar 8, 2017

@cluoren are you doing it? else I can take it up. I have looked into it already.

mayukh18 commented Mar 8, 2017

@cluoren are you doing it? else I can take it up. I have looked into it already.

@pcluo

This comment has been minimized.

Show comment
Hide comment
@pcluo

pcluo Mar 8, 2017

Contributor

@mayukh18 just created a pull request. thx tho.

Contributor

pcluo commented Mar 8, 2017

@mayukh18 just created a pull request. thx tho.

@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 8, 2017

@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017

pcluo added a commit to pcluo/pandas that referenced this issue May 22, 2017

BUG: handle nan values in DataFrame.update when overwrite=False #15593
add nan test for DataFrame.update
update whatsnew v0.20.2

pcluo added a commit to pcluo/pandas that referenced this issue May 22, 2017

BUG: handle nan values in DataFrame.update when overwrite=False (#15593)
BUG: handle nan values in DataFrame.update when overwrite=False (#15593)

add nan test for DataFrame.update

update whatsnew v0.20.2

pcluo added a commit to pcluo/pandas that referenced this issue May 22, 2017

BUG: handle nan values in DataFrame.update when overwrite=False (#15593)
BUG: handle nan values in DataFrame.update when overwrite=False (#15593)

add nan test for DataFrame.update

update whatsnew v0.20.2

pcluo added a commit to pcluo/pandas that referenced this issue May 24, 2017

BUG: handle nan values in DataFrame.update when overwrite=False (#15593)
add nan test for DataFrame.update

update whatsnew v0.20.2

pcluo added a commit to pcluo/pandas that referenced this issue May 24, 2017

BUG: handle nan values in DataFrame.update when overwrite=False (#15593)
add nan test for DataFrame.update

update whatsnew v0.20.2

pcluo added a commit to pcluo/pandas that referenced this issue May 24, 2017

BUG: handle nan values in DataFrame.update when overwrite=False (#15593)
add nan test for DataFrame.update

update whatsnew v0.20.2

pcluo added a commit to pcluo/pandas that referenced this issue May 24, 2017

BUG: handle nan values in DataFrame.update when overwrite=False (#15593)
add nan test for DataFrame.update

update whatsnew v0.20.2

@jreback jreback modified the milestones: 0.20.2, Next Major Release May 24, 2017

@jreback jreback closed this in #16430 May 24, 2017

jreback added a commit that referenced this issue May 24, 2017

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue May 29, 2017

TomAugspurger added a commit that referenced this issue May 30, 2017

stangirala added a commit to stangirala/pandas that referenced this issue Jun 11, 2017

@olizhu

This comment has been minimized.

Show comment
Hide comment
@olizhu

olizhu Jun 16, 2017

The issue with NaN seems to be fixed in v0.20.2, but a similar problem still exists with NaT if it exists anywhere in the dataframe.
For example, this will throw an error:

df1 = DataFrame({'A': [1,None], 'B':[to_datetime('abc', errors='coerce'),to_datetime('2016-01-01')]})
df2 = DataFrame({'A': [2,3]})
df1.update(df2, overwrite=False)

olizhu commented Jun 16, 2017

The issue with NaN seems to be fixed in v0.20.2, but a similar problem still exists with NaT if it exists anywhere in the dataframe.
For example, this will throw an error:

df1 = DataFrame({'A': [1,None], 'B':[to_datetime('abc', errors='coerce'),to_datetime('2016-01-01')]})
df2 = DataFrame({'A': [2,3]})
df1.update(df2, overwrite=False)
@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Jun 16, 2017

Contributor

@olizhu could you search to see if we have an issue for that already (I don't recall seeing it before). If not, could you open a new issue for it?

Contributor

TomAugspurger commented Jun 16, 2017

@olizhu could you search to see if we have an issue for that already (I don't recall seeing it before). If not, could you open a new issue for it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment