Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/API: Make consistent API for tz-aware and tz-naive ops #11351

Closed
sinhrks opened this issue Oct 17, 2015 · 5 comments
Closed

BUG/API: Make consistent API for tz-aware and tz-naive ops #11351

sinhrks opened this issue Oct 17, 2015 · 5 comments
Labels
API Design Needs Discussion Requires discussion from core team before further action Timezones Timezone data dtype

Comments

@sinhrks
Copy link
Member

sinhrks commented Oct 17, 2015

Currently there are some inconsistent results. We should clarify the policy first.

import pandas as pd

s = pd.Series([pd.Timestamp('2011-01-01'), pd.NaT])
s
#0   2011-01-01
#1          NaT
# dtype: datetime64[ns]

1. coerced to object dtype

I prefer this behavior.

s.fillna(pd.Timestamp('2011-01-02', tz='Asia/Tokyo'))
#0          2011-01-01 00:00:00
#1    2011-01-02 00:00:00+09:00
# dtype: object

2. converted to GMT

s[1] = pd.Timestamp('2011-01-02', tz='Asia/Tokyo')
s
#0   2011-01-01 00:00:00
#1   2011-01-01 15:00:00
# dtype: datetime64[ns]

3. raise explicit error

This looks second best.

stz = pd.Series(pd.DatetimeIndex(['2012-01-01', '2012-01-02'], tz='Asia/Tokyo'))
stz
#0   2012-01-01 00:00:00+09:00
#1   2012-01-02 00:00:00+09:00
# dtype: datetime64[ns, Asia/Tokyo]

pd.Index(s).union(pd.Index(stz))
# TypeError: Cannot join tz-naive with tz-aware DatetimeIndex

4. not handled properly (bug)

s.append(stz)
# AttributeError: 'numpy.ndarray' object has no attribute 'tz_localize'

-> Split to #11455, closed by #12195.

@sinhrks sinhrks added API Design Timezones Timezone data dtype Needs Discussion Requires discussion from core team before further action labels Oct 17, 2015
@jorisvandenbossche
Copy link
Member

Note that also with other dtypes there are some of the same 'inconsistencies', casting when assigning and not with fillna:

In [36]: s = pd.Series([1.4, np.nan])

In [37]: s.fillna('3')
Out[37]:
0    1.4
1      3
dtype: object

In [38]: s[1] = '3'

In [39]: s
Out[39]:
0    1.4
1    3.0
dtype: float64

@jreback
Copy link
Contributor

jreback commented Oct 17, 2015

@jorisvandenbossche your last example is numpy conversion at work, we could fix it, but IMHO nor really worth it.

Option 1 is already selected, and reaffirmed in #11329

So the general rule for fillna/replace is that downcasting happens when possible where no loss of information occurs:

In [3]: s = pd.Series([pd.Timestamp('2011-01-01'), pd.NaT],dtype='object')

In [4]: s
Out[4]: 
0    2011-01-01 00:00:00
1                    NaT
dtype: object

In [5]: s.fillna(Timestamp('20130101'))
Out[5]: 
0   2011-01-01
1   2013-01-01
dtype: datetime64[ns]

All other cases are converted to object, e.g. when using mixed timezones for example (or naive and a tz), or mixed numbers/strings.

I will leave this issue open to discuss a possibility of using Option 3 on a specific case (explicity raising an error) (or having a raise_on_error flag to fillna/replace or somesuch.

using .fillna with a naive tz with a tz based Series (or vice-versa) should really just raise the error and not convert to object as this is almost certainly a user error, though its ok as we have it now.

[4] above

In [1]:  s = pd.Series([pd.Timestamp('2011-01-01',tz='US/Eastern'), pd.NaT])

In [2]: s
Out[2]: 
0   2011-01-01 00:00:00-05:00
1                         NaT
dtype: datetime64[ns, US/Eastern]

In [3]: s.fillna(Timestamp('20130101'))
Out[3]: 
0    2011-01-01 00:00:00-05:00
1          2013-01-01 00:00:00
dtype: object

In [4]: s.fillna(Timestamp('20130101',tz='US/Pacific'))
Out[4]: 
0    2011-01-01 00:00:00-05:00
1    2013-01-01 00:00:00-08:00
dtype: object

@jreback
Copy link
Contributor

jreback commented Oct 17, 2015

@sinhrks

[3] - we actually allow this and just convert things to object dtype, so that is a bug as well (again maybe we should have an option to these, then change the default to raise)
[4] is a bug, pls open a separate issue for that.

@jreback
Copy link
Contributor

jreback commented Jun 25, 2018

cc @mroeschke some of these might be closable as already fixed (future PR)

@mroeschke
Copy link
Member

Everything looks tested and fixed but [3], opened up #21671 for a more specific issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Needs Discussion Requires discussion from core team before further action Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

4 participants