Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timezone lost on DataFrame assignments with realignment #12981

Closed
ajenkins-cargometrics opened this issue Apr 25, 2016 · 3 comments

Comments

Projects
None yet
2 participants
@ajenkins-cargometrics
Copy link
Contributor

commented Apr 25, 2016

Starting from pandas 0.17, certain assignments to DataFrames cause offset-aware datetime columns to be converted to offset-naive columns. Specifically, it seems that if any data realignment is required when assigning the RHS to a a slice of the DataFrame, then timezone info is lost. Here's an example:

from __future__ import print_function
import pandas

print("Pandas version:", pandas.__version__)

start = pandas.Timestamp('2015-01-01', tz='utc')
df = pandas.DataFrame({'dates': pandas.date_range(start, periods=3)})

print("Before assignment")
print(df['dates'])

# Shuffle column and reassign, causing RHS to need to be realigned on assignment
df['dates'] = df['dates'][[1,0,2]]

print("\nAfter assignment")
print(df['dates'])

The output I'd expect, which is what I get from pandas 0.16.2, is:

Pandas version: 0.16.2
Before assignment
0    2015-01-01 00:00:00+00:00
1    2015-01-02 00:00:00+00:00
2    2015-01-03 00:00:00+00:00
Name: dates, dtype: object

After assignment
0    2015-01-01 00:00:00+00:00
1    2015-01-02 00:00:00+00:00
2    2015-01-03 00:00:00+00:00
Name: dates, dtype: object

However when I run this with pandas 0.18.0, after the assignment the timezone info is lost:

Pandas version: 0.18.0
Before assignment
0   2015-01-01 00:00:00+00:00
1   2015-01-02 00:00:00+00:00
2   2015-01-03 00:00:00+00:00
Name: dates, dtype: datetime64[ns, UTC]

After assignment
0   2015-01-01
1   2015-01-02
2   2015-01-03
Name: dates, dtype: datetime64[ns]

It seems the custom timezone-aware dtype that pandas started using for timezone-aware time series in 0.17.x doesn't get correctly propagated in this operation.

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Darwin
OS-release: 15.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.9.0
Cython: None
numpy: 1.11.0
scipy: 0.15.1
statsmodels: None
xarray: None
IPython: 3.1.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.3
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.4.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: None
boto: None
@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2016

yep looks like a bug. pull-requests are welcome!

@ajenkins-cargometrics

This comment has been minimized.

Copy link
Contributor Author

commented Apr 25, 2016

After a little digging, I believe I've found the fix. In DataFrame._santize_column, there is a statement which accesses the values property, which should access _values. This statement:

value = value.reindex(self.index).values

should be

value = value.reindex(self.index)._values

The values property returns a numpy array, which loses the custom dtype, whereas _values returns a DateTimeIndex which preserves the dtype. I'll submit a PR.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2016

@jreback jreback modified the milestones: 0.18.1, 0.18.2 Apr 25, 2016

@jreback jreback closed this in cc67b72 Apr 26, 2016

nps added a commit to nps/pandas that referenced this issue May 17, 2016

BUG: Preserve timezone in unaligned assignments
closes pandas-dev#12981

Author: ajenkins-cargometrics <ajenkins@cargometrics.com>

Closes pandas-dev#12982 from ajenkins-cargometrics/GH12981 and squashes the following commits:

6689f57 [ajenkins-cargometrics] TST: Add test with mask on LHS for test_setitem_with_unaligned_tz_aware_datetime_column
1347398 [ajenkins-cargometrics] BUG: Preserve timezone in unaligned assignments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.