Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to i8data for DatetimeIndex #24559

Open
TomAugspurger opened this issue Jan 2, 2019 · 67 comments

Comments

Projects
None yet
5 participants
@TomAugspurger
Copy link
Contributor

commented Jan 2, 2019

Master currently has an (undocumented) (maybe-) API-breaking change from 0.23.4 when passed integer values

0.23.4

In [2]: i8data = np.arange(5) * 3600 * 10**9

In [3]: pd.DatetimeIndex(i8data, tz="US/Central")
Out[3]:
DatetimeIndex(['1970-01-01 00:00:00-06:00', '1970-01-01 01:00:00-06:00',
               '1970-01-01 02:00:00-06:00', '1970-01-01 03:00:00-06:00',
               '1970-01-01 04:00:00-06:00'],
              dtype='datetime64[ns, US/Central]', freq=None)

Master

In [3]: pd.DatetimeIndex(i8data, tz="US/Central")
Out[3]:
DatetimeIndex(['1969-12-31 18:00:00-06:00', '1969-12-31 19:00:00-06:00',
               '1969-12-31 20:00:00-06:00', '1969-12-31 21:00:00-06:00',
               '1969-12-31 22:00:00-06:00'],
              dtype='datetime64[ns, US/Central]', freq=None)

Attempt to explain the behavior: In 0.23.4, passing an ndarray[i8] was equivalent to passing data.view("M8[ns]")

# 0.23.4
In [4]: pd.DatetimeIndex(i8data.view("M8[ns]"), tz="US/Central")
Out[4]:
DatetimeIndex(['1970-01-01 00:00:00-06:00', '1970-01-01 01:00:00-06:00',
               '1970-01-01 02:00:00-06:00', '1970-01-01 03:00:00-06:00',
               '1970-01-01 04:00:00-06:00'],
              dtype='datetime64[ns, US/Central]', freq=None)

On master, integer values are treated as unix timestamps, while M8[ns] values are treated as wall-times in the given timezone.

# master
In [4]: pd.DatetimeIndex(i8data.view("M8[ns]"), tz="US/Central")
Out[4]:
DatetimeIndex(['1970-01-01 00:00:00-06:00', '1970-01-01 01:00:00-06:00',
               '1970-01-01 02:00:00-06:00', '1970-01-01 03:00:00-06:00',
               '1970-01-01 04:00:00-06:00'],
              dtype='datetime64[ns, US/Central]', freq=None)

Reason for the change

There are four cases of interest:

In [4]: arr = np.arange(5) * 24 * 3600 * 10**9
In [5]: tz = 'US/Pacific'

In [6]: a = pd.DatetimeIndex(arr, tz=tz)
In [7]: b = pd.DatetimeIndex(arr.view('M8[ns]'), tz=tz)
In [8]: c = pd.DatetimeIndex._simple_new(arr, tz=tz)
In [9]: d = pd.DatetimeIndex._simple_new(arr.view('M8[ns]'), tz=tz)

In [10]: a
Out[10]: 
DatetimeIndex(['1970-01-01 00:00:00-08:00', '1970-01-02 00:00:00-08:00',
               '1970-01-03 00:00:00-08:00', '1970-01-04 00:00:00-08:00',
               '1970-01-05 00:00:00-08:00'],
              dtype='datetime64[ns, US/Pacific]', freq=None)

In [11]: b
Out[11]: 
DatetimeIndex(['1970-01-01 00:00:00-08:00', '1970-01-02 00:00:00-08:00',
               '1970-01-03 00:00:00-08:00', '1970-01-04 00:00:00-08:00',
               '1970-01-05 00:00:00-08:00'],
              dtype='datetime64[ns, US/Pacific]', freq=None)

In [12]: c
Out[12]: 
DatetimeIndex(['1969-12-31', '1970-01-01', '1970-01-02', '1970-01-03',
               '1970-01-04'],
              dtype='datetime64[ns, US/Pacific]', freq=None)

In [13]: d
Out[13]: 
DatetimeIndex(['1969-12-31', '1970-01-01', '1970-01-02', '1970-01-03',
               '1970-01-04'],
              dtype='datetime64[ns, US/Pacific]', freq=None)

In 0.23.4 we have a.equals(b) and c.equals(d) but no way to pass data in a way that was constructor-neutral. In master we now have a match c and d. At some point in the refactoring process we changed that, but off the top of my head I don't remember when or if this was the precise motivation or just a side-benefit.

BTW _simple_new was also way too much:

        if getattr(values, 'dtype', None) is None:
            # empty, but with dtype compat
            if values is None:
                values = np.empty(0, dtype=_NS_DTYPE)
                return cls(values, name=name, freq=freq, tz=tz,
                           dtype=dtype, **kwargs)
            values = np.array(values, copy=False)

        if is_object_dtype(values):
            return cls(values, name=name, freq=freq, tz=tz,
                       dtype=dtype, **kwargs).values
        elif not is_datetime64_dtype(values):
            values = _ensure_int64(values).view(_NS_DTYPE)

Was this documented?

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DatetimeIndex.html mentions that it's "represented internally as int64".

The (imprecise) type on data is "Optional datetime-like data"

I don't see anything in http://pandas.pydata.org/pandas-docs/stable/timeseries.html suggesting that integers can be passed to DatetimeIndex.

@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Jan 2, 2019

@TomAugspurger TomAugspurger referenced this issue Jan 2, 2019

Merged

REF: DatetimeLikeArray #24024

7 of 12 tasks complete
@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Jan 2, 2019

@jbrockmendel could you fill in the "Reason for the change" section? IIUC, it was to simplify DatetimeIndex._simple_new?

@mroeschke

This comment has been minimized.

Copy link
Member

commented Jan 2, 2019

Possibly related, I worked on this topic in #21216. Rational was that DatetimeIndex(ints, tz=tz) should behave similarly as Timestamp(int, tz=tz), and ints cannot necessarily represent wall times.

I have an open issue #20257 to document passing integers into Timestamp which, similarly to DatetimeIndex, will treat integers as epoch (unix) timestamps

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jan 2, 2019

@mroeschke thanks. IIRC consistency with to_datetime was also a consideration

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 2, 2019

IIRC consistency with to_datetime was also a consideration

to_datetime has no tz keyword, only utc=True, but for UTC wall time or unix epochs are the same?

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 2, 2019

The Timestamp / DatetimeIndex consistency is a good reason to change one of both (since they are inconsistent in 0.23.4).
And I think the unix epoch way makes sense for integer input. Although I think it can also be confusing that DatetimeIndex(int_array, ..) and DatetimeIndex(int_array.view('M8[ns]), ..) give different results.

Regarding"was this documented": it might not have been documented clearly, but it is long standing behaviour: either we think people don't use it / shouldn't use it (but then why bother changing it? Shouldn't we then rather deprecate the whole ability of passing integer data, instead of changing the behaviour?), or either people do use it, and this change will break that usage.
Of course, a break for a limited number of people might be worth the trade-off for a big win within our code base. But is this consistency between __new__ and _simple_new that important code-technical?

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 4, 2019

Any other thoughts / replies here?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Jan 4, 2019

Although I think it can also be confusing that DatetimeIndex(int_array, ..) and DatetimeIndex(int_array.view('M8[ns]), ..) give different results.

I think this surprised me early on. That may have been because I was used to the old way; I'm not really sure.

Regarding"was this documented":

My intent there was "If this was documented before, then we definitely can't change it." I wasn't advocating "It wasn't documented, so we can change it.". Just "It wasn't documented, so we can maybe change it" :)

I don't really have an opinion on the technical merits of wall time vs. epochs. I don't think I know enough to vote one way or the other.

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jan 4, 2019

I think the overriding internal-consistency concern is the one @mroeschke reminded us of: this should behave like Timestamp constructor.

@TomAugspurger would "fixing" this behavior be difficult? Last time I tried something similar I got test_packers failures that I couldn't figure out.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Jan 4, 2019

Demo of the inconsistency on 0.23.4

In [9]: i8 = pd.Timestamp('2000', tz='CET').value

In [10]: pd.Timestamp(i8)
Out[10]: Timestamp('1999-12-31 23:00:00')

In [11]: pd.Timestamp(i8, tz="CET")
Out[11]: Timestamp('2000-01-01 00:00:00+0100', tz='CET')

In [12]: pd.DatetimeIndex(np.array([i8]))
Out[12]: DatetimeIndex(['1999-12-31 23:00:00'], dtype='datetime64[ns]', freq=None)

In [13]: pd.DatetimeIndex(np.array([i8]), tz="CET")
Out[13]: DatetimeIndex(['1999-12-31 23:00:00+01:00'], dtype='datetime64[ns, CET]', freq=None)

On master, Out[13] matches Out[11].

That consistency is certainly worth striving for.


If we wanted a graceful deprecation warning, then the DTI constructor would

  1. check for i8data & tz
  2. tell users to... what? What recourse would they have? If they want wall times (the previous behavior) then they can .view("dateime64[ns]") and pass that. If they want unix epochs (the future behavior) then they should ...?
  3. Update all of pandas to use the recommended behavior from 2 (which is TBD).

Putting that warning in place would give us an idea of how difficult this would be to change. I'll see what turns up. Since _simple_new already does the "right" (future) thing, it may not be too bad...

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 4, 2019

If they want unix epochs (the future behavior) then they should ...?

to_datetime ? (if it would finally have a tz option)

The other option could also be to deprecate passing integers, if we find it too confusing what it should result in.

@jbrockmendel jbrockmendel referenced this issue Jan 4, 2019

Closed

EA: revert treatment of i8values #24623

1 of 1 task complete
@mroeschke

This comment has been minimized.

Copy link
Member

commented Jan 4, 2019

Since tz effectively means tz_localize, in an ideal world I would only want integer data to only accept tz='UTC'/None. I think Timestamp(i8, tz=tz)'s localization to UTC then conversion to tz is a little too magic although it's the more appropriate behavior. Therefore I had thought the DatetimeIndex(i8_array, tz=tz) behavior was a bug, but since our policy wasn't set in documentation deprecation is appropriate.

For 2, if a user wants epoch timestamps (future behavior) they should use DatetimeIndex(i8_array, tz='UTC').tz_convert(tz) (or even to_datetime since it even has an origin parameter). I have a feeling we may rely on passing i8 into these constructors internally so deprecating the behavior may warrant a lot of refactoring down the line?

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jan 6, 2019

#24623 is now passing. PTAL.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 6, 2019

@jbrockmendel it is not fully clear to me what #24623 exactly does related to the discussion above. Can you summarize it here?
(e.g. does it change back the behaviour of DatimeIndex? (but I thought you were opposed to that?) And I don't think it adds a deprecation warning? ..)

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jan 7, 2019

it is not fully clear to me what #24623 exactly does related to the discussion above. Can you summarize it here?

There are two central changes in #24623, both of which were the status quo before #24024.

  1. make DatetimeArray.__init__ behave like DatetimeIndex.__new__ and Timestamp.__new__ by treating int64 values as unix timestamps and datetime64 as wall-times.

  2. Do frequency validation in DatetimeArray.__init__, preventing users from creating invalid instances with the public constructor.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 7, 2019

Then how is that related to this issue? This issue is about the change in behaviour of handling integer values in the DatetimeIndex constructor, not about the DatetimeArray constructor

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jan 7, 2019

Then how is that related to this issue?

If it isn't closely enough related for your tastes then feel free to ignore.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 7, 2019

You wrote the PR, I think you are better placed to judge to what extent it is related to the discussion here ;)
It's just because you mentioned it here as if it is resolving this issue (but maybe I misunderstood that) that I am asking for clarification.

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jan 7, 2019

I'm actually currently getting side-tracked by trying to figure out exactly what is and isn't internally-consistent. It looks like 24623 makes i8 behavior consistent between DTA/DTI/Timestamp and M8[ns] behavior consistent between DTA/DTI, but breaks the existing consistency in master between DTA/Timestamp.

I'll try to put together a grid for these...

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 7, 2019

And an answer to the comment of @mroeschke

Since tz effectively means tz_localize, in an ideal world I would only want integer data to only accept tz='UTC'/None.

That sounds as a logical rule to me.

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jan 7, 2019

Since tz effectively means tz_localize, in an ideal world I would only want integer data to only accept tz='UTC'/None.

I'm not sure I understand this rule. Is the idea that the constructor would only be able to create tz-aware {n \choose k}(DTA/DTI/Timestamp) with UTC and them make the user do tz_convert as another step?

@mroeschke

This comment has been minimized.

Copy link
Member

commented Jan 7, 2019

In an ideal world I would prefer that since it's more explicit, but I think it would require deprecating Timestamp behavior with ints and tz.

I would be fine just propagating the current Timestamp behavior and more clearly documenting interger with tz arg behavior across the board since I think we would have to deprecate less?

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jan 7, 2019

I've made things more difficult here by voicing opinions while being misinformed. Apologies all around.

Recap/Double-Checking behavior from 0.23.4 and master. In all cases passing tz/dtype corresponding to US/Pacific.

ts = pd.Timestamp.now('US/Pacific').round('s')
dti = pd.DatetimeIndex([ts])
tz = dti.tz
dtype = dti.dtype

results = [
    pd.Timestamp(np.int64(ts.value), tz=ts.tz),
    pd.Timestamp(np.datetime64(ts.value, "ns"), tz=ts.tz),
    pd.Timestamp(float(ts.value), tz=ts.tz),
]
assert all(x == ts for x in results)   # <-- holds in both 0.23.4 and in master

args = [
    dti.asi8,
    dti.asi8.view('datetime64[ns]'),
    dti.asi8.astype(np.float64),
    dti.asi8.astype(object),
    dti.asi8.view('datetime64[ns]').astype(object),
    dti.asi8.astype(np.float64).astype(object)
]

news = [pd.DatetimeIndex(x, dtype=dtype) for x in args]
simples = [pd.DatetimeIndex._simple_new(x, tz=tz)
           for x in args if x.dtype == 'i8' or x.dtype == 'M8[ns]']
inits = [pd.core.arrays.DatetimeArray(x, dtype=dtype)
         for x in args if x.dtype == 'i8' or x.dtype == 'M8[ns]']  # <-- master only

master

>>> for x in news:
...     x[0]
... 
Timestamp('2019-01-07 12:28:45-0800', tz='US/Pacific')
Timestamp('2019-01-07 20:28:45-0800', tz='US/Pacific')
Timestamp('2019-01-07 20:28:45-0800', tz='US/Pacific')
Timestamp('2019-01-07 12:28:45-0800', tz='US/Pacific')
Timestamp('2019-01-07 12:28:45-0800', tz='US/Pacific')
Timestamp('2019-01-07 20:28:45-0800', tz='US/Pacific')

>>> for x in simples:
...     x[0]
... 
Timestamp('2019-01-07 12:28:45-0800', tz='US/Pacific')
Timestamp('2019-01-07 12:28:45-0800', tz='US/Pacific')

>>> for x in inits:
...     x[0]
... 
Timestamp('2019-01-07 12:28:45-0800', tz='US/Pacific')
Timestamp('2019-01-07 12:28:45-0800', tz='US/Pacific')

0.23.4

>>> for x in news:
...     x[0]
... 
Timestamp('2019-01-07 20:28:45-0800', tz='US/Pacific')
Timestamp('2019-01-07 20:28:45-0800', tz='US/Pacific')
Timestamp('2019-01-07 20:28:45-0800', tz='US/Pacific')
Timestamp('2019-01-07 20:28:45-0800', tz='US/Pacific')
Timestamp('2019-01-07 20:28:45-0800', tz='US/Pacific')
Timestamp('2019-01-07 20:28:45-0800', tz='US/Pacific')
>>> 
>>> for x in simples:
...     x[0]
... 
Timestamp('2019-01-07 12:28:45-0800', tz='US/Pacific')
Timestamp('2019-01-07 12:28:45-0800', tz='US/Pacific')

In both 0.23.4 and master, Timestamp.__new__ treats all i8, M8[ns], and f8 as unix timestamps.

In both 0.23.4 and master, both DTA.__init__ and DTI._simple_new treat both i8 and M8[ns] as unix timestamps.

In 0.23.4 DTI.__new__ treats all the cases as wall-times.

In master, DTI.__new__ treats i8, i8+object, and M8[ns]+object as unix timestamps, and the other cases as wall times.

In master, DTA.__init__ treats both i8 and M8[ns] as unix timestamps.


I think there is a consensus that the preferred behavior is for all of these to match Timestamp, right? So the options for DTI are:

  1. revert to 0.23.4 behavior and issue warnings that this behavior is going to change.
  2. keep the half-and-half behavior, issue warnings that more is going to change, document that some has changed
  3. move all cases to match Timestamp behavior, document the change, possibly issue warnings(?)

And the options for DTA:
a) if option 3 is chosen for DTI, then DTA is fine as-is.
b) otherwise:
i) change DTA to match DTI, issue equivalent warnings and change them at the same time in the future
ii) live with them behaving differently.

The question I'd pose to the more senior folks: can we get away with option 3?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Jan 7, 2019

I'm having some trouble following the samples in
#24559 (comment), but I think that your

I think there is a consensus that the preferred behavior is for all of these to match Timestamp, right?

is correct.

Can you clarify the difference between your option 1 and 3? Is the difference the handling of i8 values in DTI._simple_new?

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jan 7, 2019

Can you clarify the difference between your option 1 and 3? Is the difference the handling of i8 values in DTI._simple_new?

None of the options I have in mind touch _simple_new.

In option 1, DTI.__new__ behavior is reverted to 0.23.4, where both int64 and datetime64[ns] are treated as wall-times. This is the maximally-different-from-Timestamp behavior, but the minimal-user-facing-change option.

In option 3, DTI.__new__ is behavior changed to fully match Timestamp behavior (and DTA.__init__ behavior, for the subset of inputs that it supports). This would be the clear choice if backward-compatibility and deprecation cycles weren't a consideration.

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jan 9, 2019

Should we xfail these tests until we decide the behavior of passing ints into DTI?

The thing is there are 51 failing tests (fewer if we don't count parametrized tests separately)...

Wait.. everybody wait! I have good news!

https://github.com/pandas-dev/pandas/pull/21216/files#diff-87b716a95c390fb4d6ffe62b1626ce73R39

We've documented the DTI i8 behavior change already!

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jan 9, 2019

Wait... holy cow wait! We’ve documented the i8 behavior change! #21216

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 9, 2019

Yes, we documented the change, as we did it deliberately (although not fully, as it seems to indicate it did not accept integers before (which was the case of Int64Index, but not for int array). But we came back on that deciding we want a deprecation instead.

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jan 9, 2019

Either way, I'm calling this good enough that I'm going to get lunch instead of banging my head against these test failures.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Jan 9, 2019

@mroeschke

This comment has been minimized.

Copy link
Member

commented Jan 9, 2019

I think in theory #21216 should have changed the behavior of for int arrays as well despite me only mentioning Int64Index in the whatsnew.

Looks like I included tests for this case as well: https://github.com/pandas-dev/pandas/pull/21216/files#diff-f368eb7eb824bd248bf961916d6de7eaR492

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jan 9, 2019

I think in theory #21216 should have changed the behavior of for int arrays as well despite me only mentioning Int64Index in the whatsnew.

Yah, the fact that it is tested and documented makes me much more willing to accept the status quo.

Feel free to post a link to your WIP branch and I can take a look.

@TomAugspurger https://github.com/jbrockmendel/pandas/tree/i8revert2

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 9, 2019

I like option 4 or 5 (slight preference), don't reallly love 6

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 10, 2019

Regarding Option 5, I am -1 on that for the reason of wanting to keep the DatetimeArray simple (doing no work in the base case) and consistent for ints / datetime64. For the same reason that DatetimeIndex._simple_new now treats both ints and datetimes as UTC unix epochs (and I see DatetimeArray more similar to DatetimeIndex._simple_new as to DatetimeIndex).

But, I don't really like making DatetimeIndex(datetime64) to be unix epoch instead of wall time (as in Option 4), because that would make it inconsistent with things like pd.date_range(start_date, periods=N, tz=..) where the start_date is interpreted as wall time.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 10, 2019

And in case it might be useful, the experiment I did earlier today: https://github.com/pandas-dev/pandas/compare/master...jorisvandenbossche:i8revert?expand=1 (only changed the DatetimeIndex constructor without fixing any of the consequential failures)

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Jan 10, 2019

Back to the deprecation warning. What's the expected behavior of Int64Index.astype('datetime64[ns, tz]')? That errored on 0.23.4. Should it just be the same as DatetimeIndex(Index, dtype='datetime64[ns, tz]')? Should it throw a warning?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Jan 10, 2019

Assuming we eventually want the behavior for DTI that's currently on master, how's this for a deprecation warning?

The meaning of integer-dtype 'data' in DatetimeIndex when a timezone is 
specified is changing. Previously, these were viewed as datetime64[ns] values,
and represented the wall time *in the specified timezone*.

In the future, these will be viewed as datetime64[ns] values
and represent the wall time *in UTC*.

To accept the future behavior, use

    pd.to_datetime(integer_data, utc=True).tz_convert(tz)

To keep the previous behavior, use

    pd.to_datetime(integer_data).tz_localize(tz)
@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Jan 10, 2019

One more question: should tz='UTC' emit a warning? IIUC, the behavior of

pd.DatetimeIndex([1457537600000000000, 1458059600000000000], tz='UTC')

will always be the same.


Gotta take a break from this until this afternoon. WIP at https://github.com/TomAugspurger/pandas/tree/revert-i8

That's based off @jorisvandenbossche's start. Should be easy to swap in @jbrockmendel's changes to _from_sequence if we prefer that to the extra warn parameter.

@mroeschke

This comment has been minimized.

Copy link
Member

commented Jan 10, 2019

I think Int64Index.astype('datetime64[ns, tz]') should raise the same warning as DatetimeIndex(Index, dtype='datetime64[ns, tz]') as Index(int64s, dtype='datetime64[ns, tz]')

I would raise on UTC even thought the outcome is the same just to acclimate the user to the fact that passing any tz with ints will change behavior.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 10, 2019

Nice deprecation warning! (I would maybe only mention something regarding the reason for the change? integers seen as unix epoch which is defined to be UTC? (although its nanoseconds so not exactly unix epochs))

One more question: should tz='UTC' emit a warning?

Since it stays the same, ideally no, I would say

What's the expected behavior of Int64Index.astype('datetime64[ns, tz]')

Ideally the new behaviour IMO (although that might make the implementation more complicated?)
(if we are sure about the new behaviour of course .. )

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 10, 2019

I would raise on UTC even thought the outcome is the same just to acclimate the user to the fact that passing any tz with ints will change behavior.

Hmm, yes, that might be a good argument (although I think, as a user, I would have the reflex to complain about it: "I already specify UTC!")

@jsexauer jsexauer referenced this issue Jan 10, 2019

Open

DEPR: deprecations from prior versions #6581

0 of 94 tasks complete
@mroeschke

This comment has been minimized.

Copy link
Member

commented Jan 10, 2019

Fair enough. Yeah from a user perspective, it may feel like a false positive warning. My opinion about warning with UTC isn't that strong so I'd be fine not emitting a warning in that case.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Jan 10, 2019

FWIW, I'm ambivalent about warning in the UTC case. Either way has downsides.

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jan 10, 2019

If long term we want option 5, do we need to get the deprecation for Timestamp behavior in for 0.24.0?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Jan 10, 2019

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 21, 2019

anything more on this for 0.24.0?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Jan 21, 2019

I think we're good for 0.24.0.

We do need to settle on a desired long-term behavior though.

@TomAugspurger TomAugspurger modified the milestones: 0.24.0, 0.25.0 Jan 21, 2019

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

How close are we to consensus on this? It looks like @mroeschke, @jreback, and I have expressed preference for options 4 or 5, with a slight preference towards 5. @jorisvandenbossche has expressed a preference for option 6.

Anyone else want to weigh in?

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jun 9, 2019

@TomAugspurger @jorisvandenbossche @mroeschke @jreback I'd like to settle on a desired long-term behavior and start the appropriate deprecation warnings before 0.25.0. How far are we from consensus?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 9, 2019

@jbrockmendel can u do. short summary of this and just confirm that 0.24.2
and master have the same behavior

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Jul 2, 2019

We discussed this on the call. IIRC someone volunteered / was volunteered to summarize, and maybe close this issue? (was I the one to volunteer?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.