PerformanceWarning: what is actually the problem I can change? #3622

jankatins · 2013-05-16T13:27:47Z

I get several PerformanceWarnings when I store my dataframe in a hdfstore:

C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->axis0]

  warnings.warn(ws, PerformanceWarning)
C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->block0_values]

  warnings.warn(ws, PerformanceWarning)
C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->unicode,key->block0_items]

What I can't get from this is what column gives me these problems, at least I don't have any "block0" columns :-) It would be nice if this warnings can give me an indicator what i can actually do about this warnings.

The text was updated successfully, but these errors were encountered:

jreback · 2013-05-16T13:33:31Z

You are storing Stores (meaning not a Table), which means that PyTables is pickling some type of data. Several options. Split out the data to separate nodes (that node will still have the warning, but the rest will be faster), or you can save it as a Table (which should support it a little better). Can you show me a sample of the data and df.dtypes?

jreback · 2013-05-16T14:51:54Z

also...update to master, I just added #3623 which should make the warnings slightly more informative

jankatins · 2013-05-17T07:53:18Z

Here is some code which produces these warnings:

from data_names import (hdf_store_name, hdf_aaa, csv_aaa)
aaa = pandas.read_csv(csv_aaa, encoding="iso-8859-15", skiprows=0, sep=";", dtype={"zz id": np.int32})
[... some data cleaning...]

# open and close because there were some errors when the hdf stores was initially created and 
# immediately written to. Not sure if that is necessary anymore.
store = pandas.HDFStore(hdf_store_name)
store.close()
store = pandas.HDFStore(hdf_store_name)
store[hdf_aaa] = aaa
store.close()

C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->axis0]

  warnings.warn(ws, PerformanceWarning)
C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->unicode,key->block0_items]

  warnings.warn(ws, PerformanceWarning)
C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->block2_values]

  warnings.warn(ws, PerformanceWarning)
C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->unicode,key->block2_items]

  warnings.warn(ws, PerformanceWarning)

aaa.dtypes

title                                            object
a                                               object
b                                               float64
c                                               float64
d                                               float64
e                                               float64
f                                                float64
g                                               float64
h                                               object
i                                                object
j                                                int32
k                                               int32
l                                                int32
m                                              int32
n                                               int32
o                                               int32
p                                               int32
dtype: object

The objects are strings of variable length (some are paragraph length).

Performance is not a problem (~seconds? or less than a second, even for my biggest data file, which has ~300k rows), so I don't mind the time it takes, just the warnigns which make my IPython notebook longer and harder to read the important parts.

jreback · 2013-05-17T11:00:41Z

the open/close twice should not be necessary

can u post

df._data.blocks?

jreback · 2013-05-17T11:02:26Z

not sure if u can but would help if u post your data file (a link on say Dropbox)
can do privately if u want

jreback · 2013-05-17T14:31:03Z

are some of your object columns actually unicode? this could definitly trigger this

jankatins · 2013-05-17T21:10:35Z

print journals._data.blocks
[FloatBlock: [SNIP2_2009, SJR2_2009, SNIP2_2010, SJR2_2010, SNIP2_2011, SJR2_2011], 6 x 32059, dtype float64, IntBlock: [sjr2_2011_top10_overall, sjr2_2011_top10_nano, sjr2_2011_top10_business, sjr2_2011_top10_BusinessManagementAccounting, sjr2_2011_top10_MaterialsScience, articles_count, sjr2_2011_top10], 7 x 32059, dtype int32, ObjectBlock: [title, ISSN, BusinessManagementAccounting, MaterialsScience], 4 x 32059, dtype object]
type(journals.iloc[0,0]) # This is the "title" column
unicode

jreback · 2013-05-17T21:18:09Z

Try getting rid of the unicode

In [27]: x = 'foo'

In [28]: type(x)
Out[28]: str

In [29]: type(x.decode('utf-8'))
Out[29]: unicode

you may need something like

df['column_with_unicode'] = df['column_with_unicode'].apply(lamda x: x.decode('utf-8'))

FYI very soon (with the release of PyTables 3.0) I think we will be able to support unicode

jankatins · 2013-05-21T08:24:14Z

Then I will simple wait until that happens. Right now the performance is no problem, just the annoying warnings :-)

jreback · 2013-05-21T10:24:26Z

the warning is just to alert the user that u r basically pickling those fields rather than storing then in a c-type
u can filter the warnings as well

import warnings
warnings.filterwarnings('ignore',category=pandas.io.pytables.PerformanceWarning)

jreback · 2013-05-21T11:39:11Z

closing for now, @JanSchulz reopen/new issue if you have questions/concerns

jetpackdata · 2015-07-29T23:33:38Z

Hi @jreback , im on pytables 3 (tables==3.2.0) and am still facing the same issue as @JanSchulz - warnings when i try to save my 'df' as 'h5'. My data frame does contain unicode. Any thing i can do to avoid them ?

jreback · 2015-07-29T23:58:24Z

make sure you are storing with format='table'

py3 handles the Unicode

pls show code and version if this doesn't work

nguyenvulong · 2019-05-18T04:55:34Z

I found a weird case when I ran the same command the second time then that warning disappeared:

PerformanceWarning:
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->block0_values]

f.to_hdf("dataset_test.h5", key="test")

P.S. I ran it in interactive mode, version: python==3.6.7, pandas==0.23.4
P.P.S Hmm I guess this is its behavior. Not sure though.

jreback closed this as completed May 21, 2013

celsopneto mentioned this issue Feb 20, 2018

PerformanceWarning on _drop_axis() #19799

Closed

jankislinger mentioned this issue Sep 16, 2019

PerformanceWarning: misleading warning for all columns #28460

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PerformanceWarning: what is actually the problem I can change? #3622

PerformanceWarning: what is actually the problem I can change? #3622

jankatins commented May 16, 2013

jreback commented May 16, 2013

jreback commented May 16, 2013

jankatins commented May 17, 2013

jreback commented May 17, 2013

jreback commented May 17, 2013

jreback commented May 17, 2013

jankatins commented May 17, 2013

jreback commented May 17, 2013

jankatins commented May 21, 2013

jreback commented May 21, 2013

jreback commented May 21, 2013

jetpackdata commented Jul 29, 2015

jreback commented Jul 29, 2015

nguyenvulong commented May 18, 2019 •

edited

PerformanceWarning: what is actually the problem I can change? #3622

PerformanceWarning: what is actually the problem I can change? #3622

Comments

jankatins commented May 16, 2013

jreback commented May 16, 2013

jreback commented May 16, 2013

jankatins commented May 17, 2013

jreback commented May 17, 2013

jreback commented May 17, 2013

jreback commented May 17, 2013

jankatins commented May 17, 2013

jreback commented May 17, 2013

jankatins commented May 21, 2013

jreback commented May 21, 2013

jreback commented May 21, 2013

jetpackdata commented Jul 29, 2015

jreback commented Jul 29, 2015

nguyenvulong commented May 18, 2019 • edited

nguyenvulong commented May 18, 2019 •

edited