You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Starting from pandas 0.17, certain assignments to DataFrames cause offset-aware datetime columns to be converted to offset-naive columns. Specifically, it seems that if any data realignment is required when assigning the RHS to a a slice of the DataFrame, then timezone info is lost. Here's an example:
from __future__ importprint_functionimportpandasprint("Pandas version:", pandas.__version__)
start=pandas.Timestamp('2015-01-01', tz='utc')
df=pandas.DataFrame({'dates': pandas.date_range(start, periods=3)})
print("Before assignment")
print(df['dates'])
# Shuffle column and reassign, causing RHS to need to be realigned on assignmentdf['dates'] =df['dates'][[1,0,2]]
print("\nAfter assignment")
print(df['dates'])
The output I'd expect, which is what I get from pandas 0.16.2, is:
It seems the custom timezone-aware dtype that pandas started using for timezone-aware time series in 0.17.x doesn't get correctly propagated in this operation.
After a little digging, I believe I've found the fix. In DataFrame._santize_column, there is a statement which accesses the values property, which should access _values. This statement:
value=value.reindex(self.index).values
should be
value=value.reindex(self.index)._values
The values property returns a numpy array, which loses the custom dtype, whereas _values returns a DateTimeIndex which preserves the dtype. I'll submit a PR.
Starting from pandas 0.17, certain assignments to DataFrames cause offset-aware datetime columns to be converted to offset-naive columns. Specifically, it seems that if any data realignment is required when assigning the RHS to a a slice of the DataFrame, then timezone info is lost. Here's an example:
The output I'd expect, which is what I get from pandas 0.16.2, is:
However when I run this with pandas 0.18.0, after the assignment the timezone info is lost:
It seems the custom timezone-aware dtype that pandas started using for timezone-aware time series in 0.17.x doesn't get correctly propagated in this operation.
output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: