Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timezone dropped when dataframes are joined after resampling #13783

Closed
giogix2 opened this issue Jul 25, 2016 · 10 comments

Comments

Projects
None yet
3 participants
@giogix2
Copy link

commented Jul 25, 2016

I've already explained the problem link.
When I merge 2 dataframes, using join() or concat() (I didn't try with merge) and one of these has been previously resampled, averaging the values each 15 minutes, i loose the timezone

Code Sample, a copy-pastable example if possible

df1
2016-07-05 11:30:00+01:00    -100.81
2016-07-05 11:31:00+01:00    -99.34
2016-07-05 11:32:00+01:00    -95.09
..............
..............
2016-07-05 11:45:00+01:00    -83.62
2016-07-05 11:46:00+01:00    -1.57
2016-07-05 11:47:00+01:00    21.01
..............
df2['column1']
2016-07-05 11:30:00+01:00    -79,45
2016-07-05 11:45:00+01:00    -51.11
2016-07-05 12:00:00+01:00    -12.67
2016-07-05 12:15:00+01:00    15.21
..........
print df1.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:31:00+01:00',
               '2016-07-05 11:32:00+01:00', '2016-07-05 11:33:00+01:00',
                ...
               '2016-07-19 14:30:00+01:00', '2016-07-19 14:31:00+01:00'],
               dtype='datetime64[ns, Europe/London]', length=1358, freq=None)

print df2.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
               '2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
                ...
               '2016-07-19 14:30:00+01:00', '2016-07-19 14:45:00+01:00'],
               dtype='datetime64[ns, Europe/London]', length=1358, freq=None)

df1 = df1.resample('15Min').mean()

print df1.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
           '2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
            ...
           '2016-07-19 14:30:00+01:00'],
           dtype='datetime64[ns, Europe/London]', length=1358, freq='15T')

df = pd.concat([df1, df2], axis=1)

print df.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
               '2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
                ...
               '2016-07-19 14:30:00+01:00'],
               dtype='datetime64[ns, UTC]', length=1358, freq='15T')

Expected Output

print df.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
               '2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
                ...
               '2016-07-19 14:30:00+01:00'],
               dtype='datetime64[ns, Europe/London]', length=1358, freq='15T')

The output looses the timezone and it comes back to 'UTC'.
I think the problem is related to this issue, where the same problem comes out when aggregating.

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-28-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 24.0.1
Cython: None
numpy: 1.11.1
scipy: 0.17.1
statsmodels: None
xarray: None
IPython: 4.2.1
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 25, 2016

pls show a copy-paste reproducible example, simple examples work just fine in 0.18.1 (and master).

In [4]: df1 = pd.DataFrame({'A' : [1,2,3]}, index=pd.date_range('20160101',periods=3,tz='Europe/London', freq='15T'))

In [5]: df2 = pd.DataFrame({'B' : [4,5,6]}, index=pd.date_range('20160102',periods=3,tz='Europe/London', freq='15T'))

In [6]: pd.concat([df1, df2], axis=1).index
Out[6]: DatetimeIndex(['2016-01-01 00:00:00+00:00', '2016-01-01 00:15:00+00:00', '2016-01-01 00:30:00+00:00', '2016-01-02 00:00:00+00:00', '2016-01-02 00:15:00+00:00', '2016-01-02 00:30:00+00:00'], dtype='datetime64[ns, Europe/London]', freq=None)

In [7]: pd.concat([df1, df2], axis=1)
Out[7]: 
                             A    B
2016-01-01 00:00:00+00:00  1.0  NaN
2016-01-01 00:15:00+00:00  2.0  NaN
2016-01-01 00:30:00+00:00  3.0  NaN
2016-01-02 00:00:00+00:00  NaN  4.0
2016-01-02 00:15:00+00:00  NaN  5.0
2016-01-02 00:30:00+00:00  NaN  6.0
@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 25, 2016

might be related to #7795 but only tangentially

@giogix2

This comment has been minimized.

Copy link
Author

commented Jul 25, 2016

Here's a copy paste reproducible example:

import pandas as pd

df1 = pd.DataFrame({'A' : [1,2,3,4,5,6,7,8,9,10]}, index=pd.date_range('20160101', periods=10, freq='1T'))
df2 = pd.DataFrame({'B' : [1,2,3,4,5]}, index=pd.date_range('20160101', periods=5, freq='2T'))

times = df1.index.tz_localize('UTC').tz_convert('Europe/London')
df1 = df1.set_index(times)
times = df2.index.tz_localize('UTC').tz_convert('Europe/London')
df2 = df2.set_index(times)

df1 = df1.resample('2T').mean()
df = df2.join(df1)
print df1.index

The output is:

DatetimeIndex(['2016-01-01 00:00:00+00:00', '2016-01-01 00:02:00+00:00',
               '2016-01-01 00:04:00+00:00', '2016-01-01 00:06:00+00:00',
               '2016-01-01 00:08:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='2T')

Expected Output

DatetimeIndex(['2016-01-01 00:00:00+00:00', '2016-01-01 00:02:00+00:00',
               '2016-01-01 00:04:00+00:00', '2016-01-01 00:06:00+00:00',
               '2016-01-01 00:08:00+00:00'],
              dtype='datetime64[ns, Europe/London]', freq='2T')

Should I directly edit the main post I wrote?

@giogix2

This comment has been minimized.

Copy link
Author

commented Jul 25, 2016

Anyway it looks to me exactly the same problem you've linked (#7795).

@sinhrks

This comment has been minimized.

Copy link
Member

commented Jul 25, 2016

This looks work on current master? Though tests are needed... (I'm going to cover it on #13660)

@sinhrks

This comment has been minimized.

Copy link
Member

commented Jul 25, 2016

ah example is incorrect. maybe:

df1 = df1.resample('2T').mean()
df = df2.join(df1)
print df.index
# DatetimeIndex(['2016-01-01 00:00:00+00:00', '2016-01-01 00:02:00+00:00',
#                '2016-01-01 00:04:00+00:00', '2016-01-01 00:06:00+00:00',
#                '2016-01-01 00:08:00+00:00'],
#               dtype='datetime64[ns, UTC]', freq='2T')

@sinhrks

This comment has been minimized.

Copy link
Member

commented Jul 25, 2016

This is not fixed #13660 and looks different from #7795. This is for join, not concat. Fixing the issue title.

@sinhrks sinhrks changed the title Timezone dropped when dataframes are concatenated after resampling Timezone dropped when dataframes are joined after resampling Jul 25, 2016

@giogix2

This comment has been minimized.

Copy link
Author

commented Jul 26, 2016

The point is, as I said in the main post, that this problem happens also with the concat().
Here's the code with the concat():

import pandas as pd

df1 = pd.DataFrame({'A' : [1,2,3,4,5,6,7,8,9,10]}, index=pd.date_range('20160101', periods=10, freq='1T'))
df2 = pd.DataFrame({'B' : [1,2,3,4,5]}, index=pd.date_range('20160101', periods=5, freq='2T'))

times = df1.index.tz_localize('UTC').tz_convert('Europe/London')
df1 = df1.set_index(times)
times = df2.index.tz_localize('UTC').tz_convert('Europe/London')
df2 = df2.set_index(times)

df1 = df1.resample('2T').mean()
df = pd.concat([df1, df2])
print df.index

The output is the same as with the join():

DatetimeIndex(['2016-01-01 00:00:00+00:00', '2016-01-01 00:02:00+00:00',
               '2016-01-01 00:04:00+00:00', '2016-01-01 00:06:00+00:00',
               '2016-01-01 00:08:00+00:00', '2016-01-01 00:00:00+00:00',
               '2016-01-01 00:02:00+00:00', '2016-01-01 00:04:00+00:00',
               '2016-01-01 00:06:00+00:00', '2016-01-01 00:08:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

Should this be reported in another "issue"?

@sinhrks sinhrks removed the Can't Repro label Jul 26, 2016

@sinhrks

This comment has been minimized.

Copy link
Member

commented Jul 26, 2016

No new issue is needed, as concat is being fixed by #13660. Nice if you can test the branch.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 29, 2017

I believe this issue is fixed, someone want to do a PR with a validation test?

@jreback jreback added this to the 0.22.0 milestone Nov 29, 2017

@jreback jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018

@mroeschke mroeschke referenced this issue Jun 29, 2018

Merged

TST/CLN: Old timezone issues PT3 #21674

6 of 6 tasks complete

@jreback jreback modified the milestones: Next Major Release, 0.24.0 Jul 2, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.