Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PerformanceWarning: what is actually the problem I can change? #3622

Closed
jankatins opened this issue May 16, 2013 · 14 comments
Closed

PerformanceWarning: what is actually the problem I can change? #3622

jankatins opened this issue May 16, 2013 · 14 comments

Comments

@jankatins
Copy link
Contributor

I get several PerformanceWarnings when I store my dataframe in a hdfstore:

C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->axis0]

  warnings.warn(ws, PerformanceWarning)
C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->block0_values]

  warnings.warn(ws, PerformanceWarning)
C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->unicode,key->block0_items]

What I can't get from this is what column gives me these problems, at least I don't have any "block0" columns :-) It would be nice if this warnings can give me an indicator what i can actually do about this warnings.

@jreback
Copy link
Contributor

jreback commented May 16, 2013

You are storing Stores (meaning not a Table), which means that PyTables is pickling some type of data. Several options. Split out the data to separate nodes (that node will still have the warning, but the rest will be faster), or you can save it as a Table (which should support it a little better). Can you show me a sample of the data and df.dtypes?

@jreback
Copy link
Contributor

jreback commented May 16, 2013

also...update to master, I just added #3623 which should make the warnings slightly more informative

@jankatins
Copy link
Contributor Author

Here is some code which produces these warnings:

from data_names import (hdf_store_name, hdf_aaa, csv_aaa)
aaa = pandas.read_csv(csv_aaa, encoding="iso-8859-15", skiprows=0, sep=";", dtype={"zz id": np.int32})
[... some data cleaning...]

# open and close because there were some errors when the hdf stores was initially created and 
# immediately written to. Not sure if that is necessary anymore.
store = pandas.HDFStore(hdf_store_name)
store.close()
store = pandas.HDFStore(hdf_store_name)
store[hdf_aaa] = aaa
store.close()

C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->axis0]

  warnings.warn(ws, PerformanceWarning)
C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->unicode,key->block0_items]

  warnings.warn(ws, PerformanceWarning)
C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->block2_values]

  warnings.warn(ws, PerformanceWarning)
C:\portabel\Python27\lib\site-packages\pandas\io\pytables.py:1788: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->unicode,key->block2_items]

  warnings.warn(ws, PerformanceWarning)

aaa.dtypes

title                                            object
a                                               object
b                                               float64
c                                               float64
d                                               float64
e                                               float64
f                                                float64
g                                               float64
h                                               object
i                                                object
j                                                int32
k                                               int32
l                                                int32
m                                              int32
n                                               int32
o                                               int32
p                                               int32
dtype: object

The objects are strings of variable length (some are paragraph length).

Performance is not a problem (~seconds? or less than a second, even for my biggest data file, which has ~300k rows), so I don't mind the time it takes, just the warnigns which make my IPython notebook longer and harder to read the important parts.

@jreback
Copy link
Contributor

jreback commented May 17, 2013

the open/close twice should not be necessary

can u post

df._data.blocks?

@jreback
Copy link
Contributor

jreback commented May 17, 2013

not sure if u can but would help if u post your data file (a link on say Dropbox)
can do privately if u want

@jreback
Copy link
Contributor

jreback commented May 17, 2013

are some of your object columns actually unicode? this could definitly trigger this

@jankatins
Copy link
Contributor Author

print journals._data.blocks
[FloatBlock: [SNIP2_2009, SJR2_2009, SNIP2_2010, SJR2_2010, SNIP2_2011, SJR2_2011], 6 x 32059, dtype float64, IntBlock: [sjr2_2011_top10_overall, sjr2_2011_top10_nano, sjr2_2011_top10_business, sjr2_2011_top10_BusinessManagementAccounting, sjr2_2011_top10_MaterialsScience, articles_count, sjr2_2011_top10], 7 x 32059, dtype int32, ObjectBlock: [title, ISSN, BusinessManagementAccounting, MaterialsScience], 4 x 32059, dtype object]
type(journals.iloc[0,0]) # This is the "title" column
unicode

@jreback
Copy link
Contributor

jreback commented May 17, 2013

Try getting rid of the unicode

In [27]: x = 'foo'

In [28]: type(x)
Out[28]: str

In [29]: type(x.decode('utf-8'))
Out[29]: unicode

you may need something like

df['column_with_unicode'] = df['column_with_unicode'].apply(lamda x: x.decode('utf-8'))

FYI very soon (with the release of PyTables 3.0) I think we will be able to support unicode

@jankatins
Copy link
Contributor Author

Then I will simple wait until that happens. Right now the performance is no problem, just the annoying warnings :-)

@jreback
Copy link
Contributor

jreback commented May 21, 2013

the warning is just to alert the user that u r basically pickling those fields rather than storing then in a c-type
u can filter the warnings as well

import warnings
warnings.filterwarnings('ignore',category=pandas.io.pytables.PerformanceWarning)

@jreback
Copy link
Contributor

jreback commented May 21, 2013

closing for now, @JanSchulz reopen/new issue if you have questions/concerns

@jreback jreback closed this as completed May 21, 2013
@jetpackdata
Copy link

Hi @jreback , im on pytables 3 (tables==3.2.0) and am still facing the same issue as @JanSchulz - warnings when i try to save my 'df' as 'h5'. My data frame does contain unicode. Any thing i can do to avoid them ?

@jreback
Copy link
Contributor

jreback commented Jul 29, 2015

make sure you are storing with format='table'

py3 handles the Unicode

pls show code and version if this doesn't work

@nguyenvulong
Copy link

nguyenvulong commented May 18, 2019

I found a weird case when I ran the same command the second time then that warning disappeared:

PerformanceWarning:
your performance may suffer as PyTables will pickle object types that it cannot map
directly to c-types [inferred_type->mixed,key->block0_values]

f.to_hdf("dataset_test.h5", key="test")

P.S. I ran it in interactive mode, version: python==3.6.7, pandas==0.23.4
P.P.S Hmm I guess this is its behavior. Not sure though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants