Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
BUG: DataFrame with tz-aware data and max(axis=1) returns NaN #10390
Hello! Hijacking this issue as I've also verified this behaviour (actually, it took a while to discover after upgrading to 0.19.0 and discovering some odd dropping of timezones - see #14524, which is a duplication of #13905). This behaviour was masked to my program previously as Pandas 0.18.1 was dropping the timezones from all relevant columns before I tried to perform this step. Once upgrading to 0.19.0 half the operations I was performing stopped dropping timezones, leading to mismatch between tz-aware and tz-naive timestamps which I've been chasing down the rabbit hole for a couple of days now.
I've verified that this is present in pandas 0.18.1 and 0.19.0.
From some stepping through of the code, this looks like a potential problem with the numpy implementations of
This issue has meant that I've been forced to roll back to 0.18.1 to use the drop timezone bug in order to make the
A small, complete example of the issue
import pandas as pd df = pd.DataFrame(pd.date_range(start=pd.Timestamp('2016-01-01 00:00:00+00'), end=pd.Timestamp('2016-01-01 23:59:59+00'), freq='H')) df.columns = ['a'] df['b'] = df.a.subtract(pd.Timedelta(seconds=60*60)) # if using pandas 0.19.0 to test, ensure that this is a series of timedeltas instead of a single - we want b and c to be tz-naive. df[['a', 'b']].max() # This is fine, produces two numbers df[['a', 'b']].max(axis=1) # This is not fine, produces a correctly sized series of NaN df['c'] = df.a.subtract(pd.Timedelta(seconds=60)) # if using pandas 0.19.0 to test, ensure that this is a series of timedeltas instead of a single - we want b and c to be tz-naive. df[['b', 'c']].max(axis=1) # This is fine, produces correctly sized series of valid timestamps without timezone df[['a', 'b']].T.max() # produces an empty series.