Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series (but not DataFrame) combine_first() loses timezone information #21469

Closed
Liam3851 opened this issue Jun 13, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@Liam3851
Copy link
Contributor

commented Jun 13, 2018

Code Sample, a copy-pastable example if possible

dts1 = pd.date_range('20150101','20150105',tz='UTC')  
df1 = pd.DataFrame({'DATE':dts1})                     
dts2 = pd.date_range('20150103','20150105',tz='UTC')  
df2 = pd.DataFrame({'DATE':dts2})                     
df = df1.combine_first(df2)                           
df.DATE[0].tz # returns<UTC>, 10567 fixed

ser = df1['DATE'].combine_first(df2['DATE'])
ser[0].tz  # returns None, should be <UTC> as above

Problem description

Calling Series.combine_first on two tz-localized datetime Series returns a non-localized Series.

#10567 handled the case when running DataFrame.combine_first on DataFrames with datetime tz columns. Oddly, this does not work for Series. This behavior is the same under at least both 0.19.2 and latest master so it appears it may never have been fixed with #10567.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: 576d5c6 python: 3.6.5.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.24.0.dev0+103.g576d5c6b7
pytest: 3.6.0
pip: 10.0.1
setuptools: 39.2.0
Cython: 0.28.3
numpy: 1.14.2
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.6
IPython: 6.4.0
sphinx: 1.7.5
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.5
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.8
pymysql: 0.8.1
psycopg2: None
jinja2: 2.10
s3fs: 0.1.5
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None

@gfyoung

This comment has been minimized.

Copy link
Member

commented Jun 13, 2018

I'm +1 for consistency. Investigation and PR are welcome!

@Liam3851

This comment has been minimized.

Copy link
Contributor Author

commented Jun 14, 2018

Hmm.. looks like this might be the result of a more general issue with where when other is a Series-- we're losing type info. combine_first appears to be delegating its implementation to where's internals.

dts1 = pd.date_range('20150101','20150105',tz='UTC')  
df1 = pd.DataFrame({'date':dts1})                     
dts2 = pd.date_range('20150103','20150107',tz='UTC')  
df2 = pd.DataFrame({'date':dts2})                     
df1.date.where(df1.date < df1.date[3], df2.date)

Out[42]:
0    2015-01-01 00:00:00+00:00
1    2015-01-02 00:00:00+00:00
2    2015-01-03 00:00:00+00:00
3          1420502400000000000
4          1420588800000000000
Name: date, dtype: object
@Liam3851

This comment has been minimized.

Copy link
Contributor Author

commented Jun 14, 2018

Actually it looks like there where issue might only be tangentially related... Series.combine_first refers to pd.core.common._where_compat, but despite the name _where_compat is not referenced in where.

@mroeschke

This comment has been minimized.

Copy link
Member

commented Jun 20, 2018

If I were to guess, this may be a problem in the where property defined in the SingleBlockManager

def where(self, other, cond, align=True, errors='raise',

In general, data is operated as numpy arrays and therefore tz information will be discarded (and not appropriated considered when remerging data)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.