Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Pandas cannot create DataFrame from Numpy Array of TimeStamps #13287

Closed
jameskelleher opened this issue May 25, 2016 · 3 comments

Comments

@jameskelleher
Copy link

commented May 25, 2016

I have the following array of Timestamps:

ts_array = np.array([[Timestamp('2016-05-02 15:50:00+0000', tz='UTC', offset='5T'),
        Timestamp('2016-05-02 15:50:00+0000', tz='UTC', offset='5T'),
        Timestamp('2016-05-02 15:50:00+0000', tz='UTC', offset='5T')],
       [Timestamp('2016-05-02 17:10:00+0000', tz='UTC', offset='5T'),
        Timestamp('2016-05-02 17:10:00+0000', tz='UTC', offset='5T'),
        Timestamp('2016-05-02 17:10:00+0000', tz='UTC', offset='5T')],
       [Timestamp('2016-05-02 20:25:00+0000', tz='UTC', offset='5T'),
        Timestamp('2016-05-02 20:25:00+0000', tz='UTC', offset='5T'),
        Timestamp('2016-05-02 20:25:00+0000', tz='UTC', offset='5T')]], dtype=object)

I can't create a DataFrame from this array using the DataFrame constructor:

pd.DataFrame(ts_array)
Traceback (most recent call last):
  File "/Users/jkelleher/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-46-ae20c6b6248f>", line 1, in <module>
    pd.DataFrame(ts_array)
  File "/Users/jkelleher/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 255, in __init__
    copy=copy)
  File "/Users/jkelleher/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 432, in _init_ndarray
    return create_block_manager_from_blocks([values], [columns, index])
  File "/Users/jkelleher/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3986, in create_block_manager_from_blocks
    mgr = BlockManager(blocks, axes)
  File "/Users/jkelleher/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 2591, in __init__
    (block.ndim, self.ndim))
AssertionError: Number of Block dimensions (1) must equal number of axes (2)

I can create the DataFrame from the array using from_records:

ts_df = pd.DataFrame.from_records(ts_array)

However, when I attempt to transpose this DataFrame, I wind up with the same AssertionError as before.

AssertionError: Number of Block dimensions (1) must equal number of axes (2)

If I convert the Timestamps to Datetimes, the error persists. I can, however, convert the Timestamps to Datetime64 objects, and this fixes the problem.

dt64_array = np.array([[ts.to_datetime64() for ts in sublist] for sublist in ts_array])
pd.DataFrame(dt64_array)
Out[56]: 
                    0                   1                   2
0 2016-05-02 15:50:00 2016-05-02 15:50:00 2016-05-02 15:50:00
1 2016-05-02 17:10:00 2016-05-02 17:10:00 2016-05-02 17:10:00
2 2016-05-02 20:25:00 2016-05-02 20:25:00 2016-05-02 20:25:00
pd.DataFrame(dt64_array).transpose()
Out[57]: 
                    0                   1                   2
0 2016-05-02 15:50:00 2016-05-02 17:10:00 2016-05-02 20:25:00
1 2016-05-02 15:50:00 2016-05-02 17:10:00 2016-05-02 20:25:00
2 2016-05-02 15:50:00 2016-05-02 17:10:00 2016-05-02 20:25:00

Though I found a suitable workaround, I feel like pandas should be able to construct and operate on DataFrames of Timestamps as easily as other other objects.

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 15.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 20.3
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.1
statsmodels: 0.8.0.dev0+970e99e
xarray: None
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None
@jreback

This comment has been minimized.

Copy link
Contributor

commented May 25, 2016

In [4]: DataFrame.from_records(ts_array)
Out[4]: 
                          0                         1                         2
0 2016-05-02 15:50:00+00:00 2016-05-02 15:50:00+00:00 2016-05-02 15:50:00+00:00
1 2016-05-02 17:10:00+00:00 2016-05-02 17:10:00+00:00 2016-05-02 17:10:00+00:00
2 2016-05-02 20:25:00+00:00 2016-05-02 20:25:00+00:00 2016-05-02 20:25:00+00:00

I suppose its a bug, but you are just going about this the wrong way to have a 2- d numpy array of Timestamps (which is completely inefficient) THEN create a frame.

@jreback

This comment has been minimized.

Copy link
Contributor

commented May 25, 2016

yeah these are stored internally in a different way, so I guess .T is broken on these types of things.

@jreback jreback added this to the Next Major Release milestone May 25, 2016

@jreback

This comment has been minimized.

Copy link
Contributor

commented May 25, 2016

If you want to step thru and submit a PR have at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.