Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingestion of Wide dataset crashes returns a failure and crash of python process #2382

Closed
ashrith opened this issue Mar 12, 2020 · 2 comments · Fixed by #2383
Closed

Ingestion of Wide dataset crashes returns a failure and crash of python process #2382

ashrith opened this issue Mar 12, 2020 · 2 comments · Fixed by #2383
Assignees
Labels
segfault Severe bugs that lead to crashes / seg.faults / process termination
Projects

Comments

@ashrith
Copy link
Member

ashrith commented Mar 12, 2020

What I am try to do?
Uploading a wide dataset with 953 columns and 20 rows into python datatable.

What happens?
The python process crashes when uploading this dataset.

What are artifacts of this problem are available?

  1. Crash report from the system. - error.txt
    error.txt

  2. Sample dataset - testcase.txt
    testcase.txt

@st-pasha st-pasha added the segfault Severe bugs that lead to crashes / seg.faults / process termination label Mar 12, 2020
@st-pasha st-pasha added this to To Do in fread via automation Mar 12, 2020
@st-pasha st-pasha added this to the Release 0.11.0 milestone Mar 12, 2020
@st-pasha
Copy link
Contributor

The error happens not during reading, but when writing the input into a CSV file:

25  _datatable.cpython-36m-darwin.so    0x000000010d4c2c81 py::Frame::to_csv(py::PKArgs const&) + 2209

corresponding python snippet:

    for i in range(0, df.nrows, chunk_size):
        chunk = df[i:i + chunk_size, :]
        csv_chunk = chunk.to_csv(quoting="all")    # crashes
        data = csv_chunk.encode('utf8')

@st-pasha
Copy link
Contributor

MRE:

>>> import datatable as dt
>>> DT = dt.Frame([[True] * 20] * 200)
>>> out = DT.to_csv(quoting='all')
python(20974,0x7fffb5638380) malloc: *** error for object 0x7f98e7830e00: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6

fread automation moved this from To Do to Done Mar 12, 2020
st-pasha added a commit that referenced this issue Mar 12, 2020
The error was due to not properly accounting for space needed to write the quote marks.

Closes #2382
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
segfault Severe bugs that lead to crashes / seg.faults / process termination
Projects
fread
  
Done
Development

Successfully merging a pull request may close this issue.

2 participants