why I cannot save the DataFrame to pickle? #12712

songhuiming · 2016-03-24T16:36:54Z

Code Sample, a copy-pastable example if possible

bbc.to_pickle(r'/home/hsong01/work/dataMart/pcbasel/data/bbc_ifrs9')

Expected Output

SystemError Traceback (most recent call last)
in ()
----> 1 bbc.to_pickle(r'/home/hsong01/work/dataMart/pcbasel/data/bbc_ifrs9')

/home/hsong01/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in to_pickle(self, path)
992 """
993 from pandas.io.pickle import to_pickle
--> 994 return to_pickle(self, path)
995
996 def save(self, path): # TODO remove in 0.14

/home/hsong01/anaconda/lib/python2.7/site-packages/pandas/io/pickle.pyc in to_pickle(obj, path)
12 """
13 with open(path, 'wb') as f:
---> 14 pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL)
15
16

SystemError: error return without exception set

output of `pd.show_versions()`

In [29]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-573.8.1.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.2
nose: 1.3.7
Cython: 0.22.1
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.2.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-03-24T16:40:58Z

you are doing something pretty odd in your frame. show

df.info(), df.head()

songhuiming · 2016-03-24T17:02:00Z

In [43]: bbc.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 12691264 entries, 0 to 12691263
Data columns (total 62 columns):
reporting_date object
basel_credit_asset_class object
borrower_risk_rating_model_code object
borrower_risk_rating_system_code object
borrower_risk_rating_model_desc object
pd_rr_score float64
ubn object
province_of_residence object
effective_maturity float64
osfi_exposure_type object
notional_amount_currency object
outstanding_amount float64
outstanding_amount_currency object
post_crmt_ead_offbs float64
post_crmt_ead_onbs float64
pre_crmt_lgd float64
post_crmt_lgd float64
post_crmt_pd float64
guarantor_risk_rating_code float64
pre_crmt_ead_offbs float64
source_system_id object
rec_typ_cd object
rec_key int64
annual_sales_currency object
total_annual_sales float64
canada_standard_industry int64
bank_legal_entity int64
level_14 object
type_of_product object
responsibility_center_level14 object
responsibility_center_level54 object
responsibility_center_level60 object
responsibility_center_level60_nm object
canadian_industry_classification int64
net_charge_off_amount_ytd float64
gross_charge_off_amount_ytd float64
recovery_amount_ytd float64
facility_maturity_date object
uncond_cancel_ind object
loan_product_code object
loan_product_type object
loan_product_name object
general_ledger_account_number float64
bmo_transit_code int64
scheduled_payment float64
payment_frequency object
amortization_term_in_months float64
effective_maturity_in_days float64
loan_interest_rate float64
nrrs_fac_identifier_assigned int64
nrrs_facility_origination_date object
nrrs_facility_commitment_amount_ float64
nrrs_date_was_approved object
nrrs_orig_os_amt float64
nrrs_rt_spread_rt float64
nrrs_nr_orig_risk_rtg_cd float64
responsibility_center_level58 object
authorized_amount float64
authorized_amount_curr object
fac_prim_typ_cd object
osfi_rsk_prod_grp_ds object
legal_name object
dtypes: float64(24), int64(6), object(32)
memory usage: 6.0+ GB

jreback · 2016-03-24T17:05:34Z

I would guess that this simply cannot be saved as its too big. pickle has some limits. you are much better off either saving in a database or using to_hdf (or lots of other IO routines. to_msgpack might works as well).

However, if your objects are not strings, then it might not work at all (and would be highly inefficient in any event).

kawochen · 2016-03-24T17:42:11Z

http://bugs.python.org/issue11564

jreback closed this as completed Mar 24, 2016

jreback added the Compat pandas objects compatability with Numpy or Python functions label Mar 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why I cannot save the DataFrame to pickle? #12712

why I cannot save the DataFrame to pickle? #12712

songhuiming commented Mar 24, 2016

jreback commented Mar 24, 2016

songhuiming commented Mar 24, 2016

jreback commented Mar 24, 2016

kawochen commented Mar 24, 2016

why I cannot save the DataFrame to pickle? #12712

why I cannot save the DataFrame to pickle? #12712

Comments

songhuiming commented Mar 24, 2016

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Mar 24, 2016

songhuiming commented Mar 24, 2016

jreback commented Mar 24, 2016

kawochen commented Mar 24, 2016

output of `pd.show_versions()`