Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in version 1.0.1, read_parquet after creating a clickhouse odbc connection #31981

Closed
mvcalder-xbk opened this issue Feb 14, 2020 · 5 comments
Labels
Needs Info Clarification about behavior needed to assess issue

Comments

@mvcalder-xbk
Copy link

mvcalder-xbk commented Feb 14, 2020

We started getting segfaults in upgraded versions of pandas and tracked it down to an interaction with making odbc connections. The specific odbc driver is for Clickhouse built in Dec19 from:

https://github.com/ClickHouse/clickhouse-odbc
commit 08b252b93fc771fab607fb973443543479b3d972

The core dump is reproducable in the example below:

Code Sample, a copy-pastable example if possible

import pyodbc
import pandas as pd

con_str = f"Driver=libclickhouseodbc.so;url=http://clickhouse/query;timeout=600"
with pyodbc.connect(con_str, autocommit=True) as con:
    pass

df = pd.DataFrame({'A': [1,1,1], 'B': ['a', 'b', 'c']})
df.to_parquet('/tmp/foo.pq')
# This line core dumps:
pd.read_parquet('/tmp/foo.pq')

Problem description

In the code above, creating the clickhouse odbc connection (not using it) leads to a segfault when reading the parquet file:

terminate called after throwing an instance of 'std::bad_cast'
  what():  std::bad_cast
Aborted (core dumped)

This does not happen in pandas 0.25.3 but does in pandas 1.0.1.

Expected Output

No core dump.

Odbc driver dependencies

        linux-vdso.so.1 (0x00007ffe02bee000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f246c269000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f246c061000)
        libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f246be57000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f246ba66000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f246cd89000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f246b862000)

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1062.4.3.el7.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.0
pip : 9.0.1
setuptools : 45.2.0
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 5.1.2
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 0.999999999
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.2
numexpr : None
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : 0.15.1.dev539+g8cf0c8e0a
pytables : None
pytest : 5.3.2
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.11
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : None

@WillAyd
Copy link
Member

WillAyd commented Feb 14, 2020

Do you have the same issue with a stable version of pyarrow? We don't have a lot of C++ internally so I think root cause might be elsewhere

@WillAyd WillAyd added the Needs Info Clarification about behavior needed to assess issue label Feb 14, 2020
@jreback
Copy link
Contributor

jreback commented Feb 15, 2020

this is from pyarrow

pls open an issue there

@jorisvandenbossche
Copy link
Member

This does not happen in pandas 0.25.3 but does in pandas 1.0.1.

Are you using the same pyarrow version in both environments?

@mvcalder-xbk
Copy link
Author

mvcalder-xbk commented Feb 18, 2020

Yes, this is the same pyarrow version in both cases. I'll open an issue there. Thank you.

Arrow JIRA ticket.

Clickhouse ODBC Issue.

Stacktrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff7a24801 in __GI_abort () at abort.c:79
#2  0x00007ffff63c1957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff63c7ab6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff63c7af1 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff63c7d24 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff63c6a52 in __cxa_bad_cast () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff64131ec in std::__cxx11::collate<char> const& std::use_facet<std::__cxx11::collate<char> >(std::locale const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007fffbe4b8279 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::__cxx11::regex_traits<char>::transform_primary<char const*>(char const*, char const*) const () from /usr/local/lib/libparquet.so.100
#9  0x00007fffbe4bd71c in std::__detail::_BracketMatcher<std::__cxx11::regex_traits<char>, false, false>::_M_ready() () from /usr/local/lib/libparquet.so.100
#10 0x00007fffbe4bda9e in void std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_insert_character_class_matcher<false, false>() () from /usr/local/lib/libparquet.so.100
#11 0x00007fffbe4c0569 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_atom() () from /usr/local/lib/libparquet.so.100
#12 0x00007fffbe4c0ad8 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() () from /usr/local/lib/libparquet.so.100
#13 0x00007fffbe4c0a43 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() () from /usr/local/lib/libparquet.so.100
#14 0x00007fffbe4c0d1c in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction() () from /usr/local/lib/libparquet.so.100
#15 0x00007fffbe4c1469 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char const*, char const*, std::locale const&, std::regex_constants::syntax_option_type) () from /usr/local/lib/libparquet.so.100
#16 0x00007fffbe4a93d1 in parquet::ApplicationVersion::ApplicationVersion(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/local/lib/libparquet.so.100
#17 0x00007fffbe4c1c03 in parquet::FileMetaData::FileMetaDataImpl::FileMetaDataImpl(void const*, unsigned int*, std::shared_ptr<parquet::Decryptor> const&) () from /usr/local/lib/libparquet.so.100
#18 0x00007fffbe4a9e62 in parquet::FileMetaData::FileMetaData(void const*, unsigned int*, std::shared_ptr<parquet::Decryptor> const&) () from /usr/local/lib/libparquet.so.100
#19 0x00007fffbe4a9ec2 in parquet::FileMetaData::Make(void const*, unsigned int*, std::shared_ptr<parquet::Decryptor> const&) () from /usr/local/lib/libparquet.so.100
#20 0x00007fffbe48acaf in parquet::SerializedFile::ParseUnencryptedFileMetadata(std::shared_ptr<arrow::Buffer> const&, long, long, std::shared_ptr<arrow::Buffer>*, unsigned int*, unsigned int*) () from /usr/local/lib/libparquet.so.100
#21 0x00007fffbe492d75 in parquet::SerializedFile::ParseMetaData() () from /usr/local/lib/libparquet.so.100
#22 0x00007fffbe48d8f8 in parquet::ParquetFileReader::Contents::Open(std::shared_ptr<arrow::io::RandomAccessFile>, parquet::ReaderProperties const&, std::shared_ptr<parquet::FileMetaData>) () from /usr/local/lib/libparquet.so.100
#23 0x00007fffbe48e598 in parquet::ParquetFileReader::Open(std::shared_ptr<arrow::io::RandomAccessFile>, parquet::ReaderProperties const&, std::shared_ptr<parquet::FileMetaData>) () from /usr/local/lib/libparquet.so.100
#24 0x00007fffbe3a89bd in parquet::arrow::FileReaderBuilder::Open(std::shared_ptr<arrow::io::RandomAccessFile>, parquet::ReaderProperties const&, std::shared_ptr<parquet::FileMetaData>) () from /usr/local/lib/libparquet.so.100
#25 0x00007fffbe7dc348 in __pyx_pf_7pyarrow_8_parquet_13ParquetReader_2open(__pyx_obj_7pyarrow_8_parquet_ParquetReader*, _object*, int, _object*, __pyx_obj_7pyarrow_8_parquet_FileMetaData*, int) ()
   from /usr/local/lib/python3.6/dist-packages/pyarrow-0.15.1.dev539+g8cf0c8e0a-py3.6-linux-x86_64.egg/pyarrow/_parquet.cpython-36m-x86_64-linux-gnu.so
#26 0x00007fffbe7dcbc9 in __pyx_pw_7pyarrow_8_parquet_13ParquetReader_3open(_object*, _object*, _object*) () from /usr/local/lib/python3.6/dist-packages/pyarrow-0.15.1.dev539+g8cf0c8e0a-py3.6-linux-x86_64.egg/pyarrow/_parquet.cpython-36m-x86_64-linux-gnu.so
#27 0x000000000050ac25 in _PyCFunction_FastCallDict (kwargs=<optimized out>, nargs=<optimized out>, args=<optimized out>, func_obj=<built-in method open of pyarrow._parquet.ParquetReader object at remote 0x7fffbfc6b938>) at ../Objects/methodobject.c:231
#28 _PyCFunction_FastCallKeywords (kwnames=<optimized out>, nargs=<optimized out>, stack=<optimized out>, func=<optimized out>) at ../Objects/methodobject.c:294
#29 call_function.lto_priv () at ../Python/ceval.c:4851
#30 0x000000000050d390 in _PyEval_EvalFrameDefault () at ../Python/ceval.c:3351
#31 0x0000000000508245 in PyEval_EvalFrameEx (throwflag=0, f=
    Frame 0x142a818, for file /usr/local/lib/python3.6/dist-packages/pyarrow-0.15.1.dev539+g8cf0c8e0a-py3.6-linux-x86_64.egg/pyarrow/parquet.py, line 137, in __init__ (self=<ParquetFile(reader=<pyarrow._parquet.ParquetReader at remote 0x7fffbfc6b938>) at remote 0x7fffc4b68cc0>, source='/tmp/foo.pq', metadata=None, common_metadata=None, read_dictionary=None, memory_map=False, buffer_size=0)) at ../Python/ceval.c:754
#32 _PyEval_EvalCodeWithName.lto_priv.1836 () at ../Python/ceval.c:4166
#33 0x0000000000509642 in _PyFunction_FastCallDict () at ../Python/ceval.c:5075
#34 0x0000000000595311 in _PyObject_FastCallDict (kwargs={'metadata': None, 'memory_map': False, 'read_dictionary': None, 'common_metadata': None, 'buffer_size': 0}, nargs=2, args=0x7fffffffc430, func=<function at remote 0x7fffbfc5e378>)
    at ../Objects/abstract.c:2310
#35 _PyObject_Call_Prepend (kwargs={'metadata': None, 'memory_map': False, 'read_dictionary': None, 'common_metadata': None, 'buffer_size': 0}, args=<optimized out>, obj=<optimized out>, func=<function at remote 0x7fffbfc5e378>) at ../Objects/abstract.c:2373
#36 method_call.lto_priv () at ../Objects/classobject.c:314
#37 0x000000000054a6ff in PyObject_Call (kwargs={'metadata': None, 'memory_map': False, 'read_dictionary': None, 'common_metadata': None, 'buffer_size': 0}, args=('/tmp/foo.pq',), func=<method at remote 0x7ffff7f67fc8>) at ../Objects/abstract.c:2261
#38 slot_tp_init () at ../Objects/typeobject.c:6420
#39 0x0000000000551b81 in type_call.lto_priv () at ../Objects/typeobject.c:915
---Type <return> to continue, or q <return> to quit---

@WillAyd WillAyd closed this as completed Feb 20, 2020
@jorisvandenbossche
Copy link
Member

For reference, the arrow issue is here: https://issues.apache.org/jira/browse/ARROW-7873

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

4 participants