New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame() returns an empty DataFrame, if DatetimeIndex with timezone-info and column label are passed #19157

Closed
JQGoh opened this Issue Jan 9, 2018 · 2 comments

Comments

Projects
None yet
3 participants
@JQGoh
Contributor

JQGoh commented Jan 9, 2018

Problem description

Converting a DatetimeIndex to Dataframe will result in different behaviors, depending on whether DatetimeIndex is timezone-naive or timezone-aware. This issue only happens if columns label is provided.

Input of Example 1: DatetimeIndex with timezone-info.
import pandas as pd
start_dt2 = pd.to_datetime('20180101T10:00:00')
start_dt2 = start_dt2.tz_localize('Asia/Singapore')
end_dt2 = pd.to_datetime('20180101T10:03:00')
end_dt2 = end_dt2.tz_localize('Asia/Singapore')
date_range2 = pd.date_range(start_dt2, end_dt2, freq='T')
print(date_range2, '\n')

df2 = pd.DataFrame(date_range2, columns=['timestamps'])
print(df2)
Output of Example1: an empty DataFrame.
DatetimeIndex(['2018-01-01 10:00:00+08:00', '2018-01-01 10:01:00+08:00',
               '2018-01-01 10:02:00+08:00', '2018-01-01 10:03:00+08:00'],
              dtype='datetime64[ns, Asia/Singapore]', freq='T') 

Empty DataFrame
Columns: [timestamps]
Index: [] 
Input of Example 2: DatetimeIndex is timezone-naive.
start_dt = pd.to_datetime('20180101T10:00:00')
end_dt = pd.to_datetime('20180101T10:03:00')
date_range1 = pd.date_range(start_dt, end_dt, freq='T')
print(date_range1, '\n')

df1 = pd.DataFrame(date_range1, columns=['timestamps'])
print(df1)
Output of Example2: DataFrame returned.
DatetimeIndex(['2018-01-01 10:00:00', '2018-01-01 10:01:00',
               '2018-01-01 10:02:00', '2018-01-01 10:03:00'],
              dtype='datetime64[ns]', freq='T') 

           timestamps
0 2018-01-01 10:00:00
1 2018-01-01 10:01:00
2 2018-01-01 10:02:00
3 2018-01-01 10:03:00 

The above two examples include the argument columns=['timestamp']. However, if we simply convert the DatetimeIndex (timezone-aware) with input argument as a dictionary object, it returns a DataFrame as shown in the example 2 above.

Input of Example 3: DatetimeIndex with timezone-info, passed as part of a dictionary.
start_dt3 = pd.to_datetime('20180101T10:00:00')
start_dt3 = start_dt3.tz_localize('Asia/Singapore')
end_dt3 = pd.to_datetime('20180101T10:03:00')
end_dt3 = end_dt3.tz_localize('Asia/Singapore')
date_range3 = pd.date_range(start_dt3, end_dt3, freq='T')
print(date_range3, '\n')

df3 = pd.DataFrame({"timestamps": date_range3})
print(df3)
Output of Example3: DataFrame returned.
DatetimeIndex(['2018-01-01 10:00:00+08:00', '2018-01-01 10:01:00+08:00',
               '2018-01-01 10:02:00+08:00', '2018-01-01 10:03:00+08:00'],
              dtype='datetime64[ns, Asia/Singapore]', freq='T') 

                 timestamps
0 2018-01-01 10:00:00+08:00
1 2018-01-01 10:01:00+08:00
2 2018-01-01 10:02:00+08:00
3 2018-01-01 10:03:00+08:00

Question

Shouldn't the Example 1 returns a non-empty DataFrame which agrees with the output Example 3? As I expect the outputs will be consistent, regardless of the timezone-info or timezone-naive of DatetimeIndex. Thank you.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: None
jinja2: 2.9.5
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1
None

@chris-b1

This comment has been minimized.

Contributor

chris-b1 commented Jan 9, 2018

Yep, this looks buggy - #13407 may be related. PR would be welcome! I'd suggest walking through the construction path, starting around here:

elif isinstance(data, (np.ndarray, Series, Index)):

@chris-b1 chris-b1 added this to the Next Major Release milestone Jan 9, 2018

@JQGoh

This comment has been minimized.

Contributor

JQGoh commented Jan 21, 2018

@chris-b1 Could you help to review the changes? I am new to contributing for Pandas, appreciate your feedback on my approach.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Jan 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment