Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timestamp(foo) vs to_datetime(foo) #17697

Closed
jbrockmendel opened this issue Sep 27, 2017 · 11 comments

Comments

Projects
None yet
4 participants
@jbrockmendel
Copy link
Member

commented Sep 27, 2017

I assumed that these should be the same:

In [2]: pd.to_datetime("2015-11-18 15:30:00+05:30")
Out[2]: Timestamp('2015-11-18 10:00:00')

In [3]: pd.Timestamp("2015-11-18 15:30:00+05:30")
Out[3]: Timestamp('2015-11-18 15:30:00+0530', tz='pytz.FixedOffset(330)')

Is the mismatch intentional?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 27, 2017

xref to #13712

these I suppose should parse the same. but this is quite tricky. should these be given pytz or dateutil stamps?

we prob have a hardcoding somewhere.

also .to_datetime doesn't accept tz atm so it can only parse to UTC

@sinhrks sinhrks added the Timezones label Sep 27, 2017

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Sep 28, 2017

should these be given pytz or dateutil stamps?

I'm partial to dateutil, but will be happy either way.

we prob have a hardcoding somewhere.

It looks like tslib.array_to_datetime and tslib.convert_str_to_tsobject both call _string_to_dts but do different post-processing. Need to do some tinkering, but I suspect that unifying these will fix this.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Sep 28, 2017

I agree it would be nice they give the same result, but I am not sure the fixed offset behaviour of Timestamp is that useful.

This is also not only the behaviour of to_datetime on scalar input, but also for parsing sequences of strings (and thus also of read_csv, etc). Changing this may break a lot of people's code.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Sep 28, 2017

but I am not sure the fixed offset behaviour of Timestamp is that useful.

Do you mean you would prefer something other than FixedOffset, or that you prefer the tz-naive output?

This is also not only the behaviour of to_datetime on scalar input, but also for parsing sequences of strings (and thus also of read_csv, etc). Changing this may break a lot of people's code.

Fair point. How about a proof of concept that unifies the implementations and we'll see if that causes any test failures, then reevaluate.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Sep 30, 2017

This came up when I was looking at tests.scalar.test_timestamp. A handful of tests use e.g. stamp = date_range(...)[0] instead of just stamp = Timestamp(...). In these particular cases they turn out to be equivalent. Any objection to

a) changing these existing tests to use the Timestamp constructor and
b) adding TestEquivalency tests to assert that date_range(...)[0] == Timestamp(...) in the relevant cases?

The idea here is that tslib/tslibs would be easier to test/maintain if the tests.scalar tests were self-contained.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Oct 1, 2017

Any objection to changing these existing tests to use the Timestamp constructor and

As long as there are other tests for date_range that checks those things (eg the different types of timezones), then I have no objection

adding TestEquivalency tests to assert that date_range(...)[0] == Timestamp(...) in the relevant cases?

do you mean to_datetime instead of date_range ? (as that was the original issue)

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Oct 1, 2017

do you mean to_datetime instead of date_range ? (as that was the original issue)

In this case the tests I'm referring to specifically use date_range. Under the hood date_range calls DatetimeIndex which calls to_datetime, so for the purposes of scalar-izing tests, I think of them as being in the same bucket.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Oct 30, 2017

The more I think about it, the more unsettling the behavior of to_datetime is.

>>> pd.to_datetime('2017-10-30 09:53:15-08:00')
Timestamp('2017-10-30 17:53:15')

strikes me as distinctly less correct than returning a tz-aware Timestamp. Is it just me?

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Oct 30, 2017

You are correct, but still the question is: what to return instead and how much user code will this break?

Do you mean you would prefer something other than FixedOffset, or that you prefer the tz-naive output?

I don't really know, I only know that often I don't want a pytz.FixedOffset.
Maybe returning a tz-aware datetime, but localized to UTC would make more sense as opposed to the current UTC but tz-naive result. Although that is something you can already obtain with the utc keyword:

In [11]: pd.to_datetime('2017-10-30 09:53:15-08:00')
Out[11]: Timestamp('2017-10-30 17:53:15')

In [12]: pd.to_datetime('2017-10-30 09:53:15-08:00', utc=True)
Out[12]: Timestamp('2017-10-30 17:53:15+0000', tz='UTC')
@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Oct 31, 2017

what to return instead [...]

Seems like 3 cases:
a) all tzaware with same tz --> DatetimeTZIndex (or whatever its called)
b) all naive --> DatetimeIndex
c) mixed --> object-dtype Index/Series with each entry correct

Implementing this will be a minor hassle, but I'm bothered enough by this behavior to volunteer.

[...] and how much user code will this break?

That I have no idea. How would we go about figuring this out?

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Nov 24, 2017

Two more mismatched cases: "today" and "now". to_datetime goes through the iso8601 parser, while Timestamp goes through the datetime.now/datetime.today calls:

>>> pd.Timestamp("now")
Timestamp('2017-11-24 08:15:35.366420')
>>> pd.to_datetime("now")
Timestamp('2017-11-24 16:15:39')
>>> pd.Timestamp("today")
Timestamp('2017-11-24 08:15:42.678363')
>>> pd.to_datetime("today")
Timestamp('2017-11-24 00:00:00')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.