Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why I cannot save the DataFrame to pickle? #12712

Closed
songhuiming opened this issue Mar 24, 2016 · 4 comments
Closed

why I cannot save the DataFrame to pickle? #12712

songhuiming opened this issue Mar 24, 2016 · 4 comments
Labels
Compat pandas objects compatability with Numpy or Python functions

Comments

@songhuiming
Copy link

Code Sample, a copy-pastable example if possible

bbc.to_pickle(r'/home/hsong01/work/dataMart/pcbasel/data/bbc_ifrs9')

Expected Output


SystemError Traceback (most recent call last)
in ()
----> 1 bbc.to_pickle(r'/home/hsong01/work/dataMart/pcbasel/data/bbc_ifrs9')

/home/hsong01/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in to_pickle(self, path)
992 """
993 from pandas.io.pickle import to_pickle
--> 994 return to_pickle(self, path)
995
996 def save(self, path): # TODO remove in 0.14

/home/hsong01/anaconda/lib/python2.7/site-packages/pandas/io/pickle.pyc in to_pickle(obj, path)
12 """
13 with open(path, 'wb') as f:
---> 14 pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL)
15
16

SystemError: error return without exception set

output of pd.show_versions()

In [29]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-573.8.1.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.2
nose: 1.3.7
Cython: 0.22.1
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.2.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: None

@jreback
Copy link
Contributor

jreback commented Mar 24, 2016

you are doing something pretty odd in your frame. show

df.info(), df.head()

@songhuiming
Copy link
Author

In [43]: bbc.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 12691264 entries, 0 to 12691263
Data columns (total 62 columns):
reporting_date object
basel_credit_asset_class object
borrower_risk_rating_model_code object
borrower_risk_rating_system_code object
borrower_risk_rating_model_desc object
pd_rr_score float64
ubn object
province_of_residence object
effective_maturity float64
osfi_exposure_type object
notional_amount_currency object
outstanding_amount float64
outstanding_amount_currency object
post_crmt_ead_offbs float64
post_crmt_ead_onbs float64
pre_crmt_lgd float64
post_crmt_lgd float64
post_crmt_pd float64
guarantor_risk_rating_code float64
pre_crmt_ead_offbs float64
source_system_id object
rec_typ_cd object
rec_key int64
annual_sales_currency object
total_annual_sales float64
canada_standard_industry int64
bank_legal_entity int64
level_14 object
type_of_product object
responsibility_center_level14 object
responsibility_center_level54 object
responsibility_center_level60 object
responsibility_center_level60_nm object
canadian_industry_classification int64
net_charge_off_amount_ytd float64
gross_charge_off_amount_ytd float64
recovery_amount_ytd float64
facility_maturity_date object
uncond_cancel_ind object
loan_product_code object
loan_product_type object
loan_product_name object
general_ledger_account_number float64
bmo_transit_code int64
scheduled_payment float64
payment_frequency object
amortization_term_in_months float64
effective_maturity_in_days float64
loan_interest_rate float64
nrrs_fac_identifier_assigned int64
nrrs_facility_origination_date object
nrrs_facility_commitment_amount_ float64
nrrs_date_was_approved object
nrrs_orig_os_amt float64
nrrs_rt_spread_rt float64
nrrs_nr_orig_risk_rtg_cd float64
responsibility_center_level58 object
authorized_amount float64
authorized_amount_curr object
fac_prim_typ_cd object
osfi_rsk_prod_grp_ds object
legal_name object
dtypes: float64(24), int64(6), object(32)
memory usage: 6.0+ GB

@jreback
Copy link
Contributor

jreback commented Mar 24, 2016

I would guess that this simply cannot be saved as its too big. pickle has some limits. you are much better off either saving in a database or using to_hdf (or lots of other IO routines. to_msgpack might works as well).

However, if your objects are not strings, then it might not work at all (and would be highly inefficient in any event).

@jreback jreback closed this as completed Mar 24, 2016
@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Mar 24, 2016
@kawochen
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

No branches or pull requests

3 participants