df.to_json segfaults with categorical column types #10778

blink1073 · 2015-08-09T15:55:45Z

This code from the Categorical tutorial:

df = pd.DataFrame({"A":["a","b","c","a"]})
df["B"] = df["A"].astype('category')

Will segfault when df.to_json is called. Tested on:

Python 3.4.3 |Anaconda 2.3.0 (x86_64)| (default, Mar  6 2015, 12:07:41)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
$ conda list  | grep pandas
pandas                    0.16.2               np19py34_0

The text was updated successfully, but these errors were encountered:

blink1073 · 2015-08-09T16:06:12Z

If I convert using df['B'] = df['B'].astype(str), then df.to_json works fine.

jreback · 2015-08-10T10:51:40Z

xref #10321

This has not been implemented. pull-requests are welcome.

bbirand · 2015-12-18T15:34:46Z

👍

Timusan · 2016-01-30T09:27:17Z

My apologies if this is simply extra noise, but I think I came across the same issue in (maybe) a slightly different situation. Thought this might help to give more insight into this issue.

With me this happens when trying to convert a DataFrame with binned data into JSON.

The kernel reports back the following:

Jan 30 18:18:04 debian kernel: [ 1632.371968] python[24794]: segfault at ffffffffffffffff ip 00007f58f570655d sp 00007ffe13a8c410 error 7 in multiarray.so[7f58f55bf000+1d9000]

Version info:

Linux Debian Jessie x86_64
Python 2.7.10 (GCC 4.9.2)
NumPy 1.10.4
Pandas 0.17.1

Greatly simplified piece of code which reproduces the error:

import pandas

def calculate_age_dist(df):
    # Setup the bins to divide. Setting a final bin
    # of 200 to mimic 60+ behavior.
    age_ranges = [0,2,10,20,30,40,50,60,200]
    age_ranges_labels = ['0-1', '2-9', '10-19', '20-29',
                         '30-39', '40-49', '50-59', '60+']

    # Divide the data into the bins. Using "right=False" to
    # exclude the upper range (so 30 is only included in bin 30-40
    # and not in bin 20-30).
    df['age_cat'] = pandas.cut(x = df['age'],
                               bins = age_ranges,
                               right = False,
                               labels = age_ranges_labels)

    # Leave only the top ten sports per age group
    # and convert the whole back to a DataFrame.
    age_dist = age_dist.groupby('age_cat').head(10).reset_index(drop=True)

    return age_dist

# The data.
df = pandas.DataFrame(data = {'age': [55,20,30,40,50,60,65,23],
                              'sport': ['soccer', 'baseball', 'soccer', 'football',
                                        'swimming', 'tennig', 'swimming', 'baseball']},
                      columns = {'age', 'sport'})

# The call which causes the segfault.
print calculate_age_dist(df).to_json()

Converting the "age_cat" column with df['age_cat'] = df['age_cat'].astype(str) does indeed circumvent the segfault.

annacnev · 2019-02-18T19:15:45Z

Apologies for the noise, came across this issue two weeks ago when I was converting a json file to a pandas data frame using pd.DataFrame.from_records. I had a large json file that I pulled from multiple webpages into one file that I was trying to parse through and separate using a loop and the pandas function. One of the webpages did not load and it went into my larger file as an empty list-unbeknownst to me. That empty list when read into pandas caused the seg fault. Took me forever to figure that out and eventually adding a try and except to my code solved the issue.

Not sure if this is the appropriate place to post this, but this was the first forum that came up when I googled "seg fault json pandas" and I wanted to put this out there incase someone in future ran into the problem I had and might be able to save some time.

One the developer site though, just a recommendation, it might be nice if a try and except could be added to the "from records" function that would give an output when an empty list is encountered because that seg fault had me messing with my python framework thinking something was very wrong.

jreback added IO JSON read_json, to_json, json_normalize Categorical Categorical Data Type labels Aug 10, 2015

jreback added this to the Next Major Release milestone Aug 10, 2015

jreback mentioned this issue Oct 27, 2015

to_json segfaulting with category type #11448

Closed

jreback added Difficulty Intermediate labels Oct 27, 2015

jreback mentioned this issue Oct 29, 2015

to_json segfaults with timezone-aware datetimes #11473

Closed

Komnomnomnom mentioned this issue Apr 5, 2016

BUG: fix json segfaults #12802

Closed

6 tasks

jreback modified the milestones: 0.18.1, Next Major Release Apr 5, 2016

jreback closed this as completed in 37a7e69 Apr 26, 2016

jakevdp mentioned this issue Oct 24, 2016

Support of ordinal based on pandas' ordered Categorical type? vega/altair#245

Closed

brainstorm mentioned this issue Jun 12, 2018

Do not convert categoricals to strings vega/altair#941

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

df.to_json segfaults with categorical column types #10778

df.to_json segfaults with categorical column types #10778

blink1073 commented Aug 9, 2015

blink1073 commented Aug 9, 2015

jreback commented Aug 10, 2015

bbirand commented Dec 18, 2015

Timusan commented Jan 30, 2016

annacnev commented Feb 18, 2019

df.to_json segfaults with categorical column types #10778

df.to_json segfaults with categorical column types #10778

Comments

blink1073 commented Aug 9, 2015

blink1073 commented Aug 9, 2015

jreback commented Aug 10, 2015

bbirand commented Dec 18, 2015

Timusan commented Jan 30, 2016

annacnev commented Feb 18, 2019