Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.values not a 2D-array when constructed from timezone-aware datetimes #13407

aburgm opened this Issue Jun 9, 2016 · 4 comments


3 participants
Copy link

commented Jun 9, 2016

When a DataFrame column is constructed from timezone-aware datetime objects, its values attribute returns a pandas.DatetimeIndex instead of a 2D numpy array. This is problematic because the datetime index does not support all operations that a numpy array does.

Code Sample, a copy-pastable example if possible

import datetime
import dateutil
import pandas
import numpy as np
df = pandas.DataFrame()
df['Time'] = [datetime.datetime(2015,1,1,]
df.dropna(axis=0) # raises ValueError: 'axis' entry is out of bounds

Also, print df.values returns DatetimeIndex(['2015-01-01'], dtype='datetime64[ns, UTC]', freq=None).

Expected Output

The df.dropna call should be a no-op.

Compare this to the case when constructed using df['Time'] = [datetime.datetime(2015,1,1)]. In that case, df.dropna works as expected, and df.values is array([['2014-12-31T16:00:00.000000000-0800']], dtype='datetime64[ns]').

output of pd.show_versions()

commit: None
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: None
pip: 8.0.2
setuptools: 20.1.1
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.1
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

This comment has been minimized.

Copy link

commented Jun 9, 2016

This must be shortcutting on the 1-d case in lcd_types. If you have an additional column it will work.

In [10]: df['foo'] = 'bar'

In [11]: df.values
Out[11]: array([[Timestamp('2015-01-01 00:00:00+0000', tz='UTC'), 'bar']], dtype=object)

Note that using .values is pretty inefficient, nor does numpy support these extended dtypes.


This comment has been minimized.

Copy link

commented Jun 9, 2016

This statement is non-sensical, as numpy barley understands timezones. And an Index API is a super-set of 1-d numpy operations.

This is problematic because the datetime index does not support all operations that a numpy array does.

@jreback jreback added this to the Next Major Release milestone Jun 9, 2016


This comment has been minimized.

Copy link

commented Jun 9, 2016

Agreed; I did not phrase that properly. What I meant is that some operations on a pandas dataframe, such as dropna along the 0-axis, fail, apparently because the index is a 1-D structure where a 2-D structure was expected.


This comment has been minimized.

Copy link

commented Feb 11, 2018

Spent a long time debugging this today. It is very surprising to have dropna throw an AxisError exception in numpy when the column happens to be datetime64[ns, UTC] but not when it's datetime64[ns].

As a workaround, I believe users can use df[df.column_name.notnull()] (as described on StackOverflow) instead of dropna(subset=['column name']).

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Mar 25, 2018

jreback added a commit that referenced this issue Mar 25, 2018

@TomAugspurger TomAugspurger referenced this issue Dec 12, 2018


REF: DatetimeLikeArray #24024

7 of 12 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.