Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Allow to reset tz from DatetimeIndex holding time representation #7812

Closed
sinhrks opened this issue Jul 20, 2014 · 20 comments · Fixed by #7852
Closed

ENH: Allow to reset tz from DatetimeIndex holding time representation #7812

sinhrks opened this issue Jul 20, 2014 · 20 comments · Fixed by #7852
Labels
Milestone

Comments

@sinhrks
Copy link
Member

sinhrks commented Jul 20, 2014

There looks no easy way to remove tz from tz-aware DatetimeIndex holding the timestamp represented. What required is inverse operation of tz_localize.

idx = pd.date_range('2014-01-01 09:00', '2014-01-01 20:00', freq='H', tz='Asia/Tokyo')
idx
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 09:00:00+09:00, ..., 2014-01-01 20:00:00+09:00]
# Length: 12, Freq: H, Timezone: Asia/Tokyo

# What I want to do is:
idx.tz_reset()
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 09:00:00, ..., 2014-01-01 20:00:00]
# Length: 12, Freq: H, Timezone: None

# This raises TypeError
idx.tz_localize(None)
# TypeError: Already tz-aware, use tz_convert to convert.

# This change the timestamp to UTC
idx.tz_convert(None)
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 00:00:00, ..., 2014-01-01 11:00:00]
# Length: 12, Freq: H, Timezone: None

In my case, there are some globally distributed data sources which holds local timezones. And want to merge them and analyze based on local subjective times (business hours, etc).

@sinhrks sinhrks changed the title ENH: Add DatetimeIndex.tz_reset ENH: Allow to reset tz from DatetimeIndex holding time representation Jul 20, 2014
@rockg
Copy link
Contributor

rockg commented Jul 21, 2014

+1. What if we support idx.tzinfo=None and that operates appropriately on the underlying data?

@jreback
Copy link
Contributor

jreback commented Jul 21, 2014

I agree with @rockg (maybe idx.tz = None though).

I don't think this very common, so full function I don't think is necessary (though I guess can't hurt).

@jreback jreback added this to the 0.15.0 milestone Jul 21, 2014
@nehalecky
Copy link
Contributor

+1 : I suspect this is more common than imagined. :)

@jreback
Copy link
Contributor

jreback commented Jul 21, 2014

@nehalecky hahh!

ok, so method and setter?

how do you do this now?

DatetimeIndex(idx_with_tz.tz_convert('UTC'),tz=None) ?

@sinhrks
Copy link
Member Author

sinhrks commented Jul 21, 2014

Initial idea may be generalized by adding force option to tz_localize.

idx = pd.date_range('2014-01-01 09:00', '2014-01-01 20:00', freq='H', tz='Asia/Tokyo')
idx
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 09:00:00+09:00, ..., 2014-01-01 20:00:00+09:00]
# Length: 12, Freq: H, Timezone: Asia/Tokyo

# TypeError by default
idx.tz_localize(None)
# TypeError: Already tz-aware, use tz_convert to convert.

idx.tz_localize(None, force=True)
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 09:00:00, ..., 2014-01-01 20:00:00]
# Length: 12, Freq: H, Timezone: None

# can be used to set different timezones with same representation
idx.tz_localize('US/Eastern', force=True)
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 09:00:00-05:00, ..., 2014-01-01 20:00:00-05:00]
# Length: 12, Freq: H, Timezone: US/Eastern

@jreback
Copy link
Contributor

jreback commented Jul 21, 2014

what about just allowing tz_localize(None) as a valid argument and/or (tz_convert(None))

though this may be more confusing

@rockg
Copy link
Contributor

rockg commented Jul 21, 2014

I think None is a great idea and I don't think it's too confusing.

@jreback
Copy link
Contributor

jreback commented Jul 21, 2014

should this work only on one if tz_convert/tz_localize?

side issue do we support setting directly of a tz? (which would convert/localize as appropriate)

@rockg
Copy link
Contributor

rockg commented Jul 21, 2014

I was going to say just tz_localize before but it probably does make more sense with tz_convert (converting a tz to a non-tz). I think supporting both is okay. I don't think we support directly setting tz except through tz_localize.

@jreback
Copy link
Contributor

jreback commented Jul 21, 2014

ok, so:

  • need to figure out what cases the currently raised error is actually useful (e.g. wesm must have put it their for some reason - maybe just to remind the user)?
  • raise NotImplementedError on directly setting tz if we are not going to allow it (I actually think its ok, maybe just a call to tz_convert(tz)

@nehalecky
Copy link
Contributor

@jreback, thanks, and yeah, tz naive DatetimeIndexes allows for comparing time series that follow typical time-of-day driven processes that humans (and other phenomena such as weather) like to follow. I work with a lot of this type of data, so, it would be great to see pandas allow flexibility when performing comparisons across multi-tz datasets. :)

At first, I brute-forced the stripping of the underlying timestamp of tz info via .apply, then I got smart and saw that simply recreating (like you've suggested) was more performant. I've actually found this to be the case when wanting to modify other pandas objects as well.

+1 on the .tz_convert(None), I actually think that is what I tried to call the first time I was attempting to do this.

@sinhrks
Copy link
Member Author

sinhrks commented Jul 24, 2014

Hmm, both tz_localize and tz_convert "converts" something, the actual behavior should be considered.

  • tz_localize "converts" timezone holding time representation (holding subjective time and change absolute time).
  • tz_convert "converts" timezone holding UTC time (holding absolute time and change subjective time).

The required operation is "holding subjective time and reset timezone", thus I feel tz_localize(None) is more natural. If we use tz_convert(None) for this operation, tz_convert has inconsistency which sometimes convert subjective time, otherwise convert absolute time.

@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

@rockg ?

@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

my2c:

I think we should do just tz_localize. Its an explicit user initiated operation, saying, hey I have an index/timestamp with a tz and I want to remove the tz. And its the inverse operation of tz_localize(tz) which takes a naive object and gives it a timezone

@rockg
Copy link
Contributor

rockg commented Jul 24, 2014

I'm fine with just tz_localize

@jreback
Copy link
Contributor

jreback commented Jul 28, 2014

in #7852

tz_convert(None) ONLY will drop the timezone if its UTC.

is this confusing / problem / ok? (I guess as opposed to just removing it no matter what tz it is)

@sinhrks
Copy link
Member Author

sinhrks commented Jul 28, 2014

Ah, my explanation made incorrect impression. What I meant is tz_convert(NONE) works for ALL timezones, it will remove timezone based on UTC time.

ts = pd.Timestamp('2014-08-01', tz='US/Eastern')
ts
# 2014-08-01 00:00:00-04:00

ts.tz_convert(None)
# 2014-08-01 04:00:00

ts.tz_localize(None)
# 2014-08-01 00:00:00

@jreback
Copy link
Contributor

jreback commented Jul 28, 2014

ok, maybe show this example (I haven't looked at the doc updates yet). to make this very clear.

@nehalecky
Copy link
Contributor

Nice work @sinhrks. Quick thought: The examples could possibly convey more understanding of functionality if, in addition to the examples above, an included sample with timestamps with tz that weren't exact UTC hour offsets? e.g.,

In [25]: t = pd.Timestamp('2014-01-01 10:00', tz='Asia/Tokyo')

In [26]: t.tz_convert(None)
Out[26]: Timestamp('2014-01-01 01:00:00')

In [27]: t.tz_localize(None)
Out[26]: Timestamp('2014-01-01 10:00:00')

Other than that, I think this looks grand. :)

@sinhrks
Copy link
Member Author

sinhrks commented Jul 29, 2014

Thanks, looks more clear. Docs in #7852 uses US timezone and different hours from UTC offset, there should be no problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants