Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOAD queries starve SELECT queries (and mapdql) when number of total concurrent queries is GTE than number of cpu cores #95

Closed
fleapapa opened this issue Sep 25, 2017 · 9 comments

Comments

@fleapapa
Copy link
Contributor

fleapapa commented Sep 25, 2017

For issue issue #82, i wrote a script to simulate multiple concurrent writers doing LOAD queries and multiple concurrent readers doing SELECT queries.

Test dataset are a small subset of NY taxicab trip data consisting of 12 CSV files.

In the script, there are two constants NWRITER and NREADER which are set to the numbers of threads doing LOAD queries and doing SELECT queries, respectively.

A sympton is observed when running the script on my 7-core Ubuntu 16.04 VM. If NWRITER + NREADER >= 7, all readers will be starved from the very beginning and none of their SELECT queries will return any result. At the same time, launching any mapdql client is also locked out.

That is, reader lock-out occurs when [NWRITER, NREADER] = [1, 6], [2, 5], [3, 4], [4, 3], [5, 2], or [6, 1].
Lock-out does not occur when NWRITER + NREADER < 7.

It's interesting that it is always readers that are locked out.

Any compile-time or runtime parameter can be set to work around this constraint?

@andrewseidl
Copy link
Contributor

andrewseidl commented Sep 25, 2017

Try --tthreadpool-size. On a single GPU machine you should be able to set this to as high as 180 or slightly higher (not that you'd really want to with only 7 cores).

There is a limit on the size of the Thrift server's thread pool, defaulting to 8 threads. This limit at the Thrift/server-side is not desirable, but it is currently required to prevent some crashes we came across when using CUDA + OpenGL from multiple threads. We have a more appropriate workaround planned (moving rendering to a single thread), but no eta at the moment.

@fleapapa
Copy link
Contributor Author

fleapapa commented Sep 25, 2017

Thanks. It works after enlarging the pool to allow 9 threads.

I am curious of the reason why 7 (=NWRITER+NREADER) threads won't work when the pool allows 8 threads, why only reader threads are starved (not in the middle but from the very beginning), and why ALL readers are starved rather than only one (eg. the last one) reader.

Any hint will be great.

@fleapapa
Copy link
Contributor Author

fleapapa commented Sep 29, 2017

Here is more information about this issue, especially about "ALL readers are starved".

Steps to reproduce the issue:

  1. On one window start mapd server with command "bin/mapd_server --tthreadpool-size 4". (Small size 4 is to simplify the test.)
  2. On another window run the test script with command "python trip-mapd-load.py -w 1 -r 2 -c", which means to create a table, fork one writer thread to LOAD the table from some csv files and fork 2 reader threads to SELECT query the table.

The table was created, the writer threads started to load data, but both readers were blocked completely. At this time, on the other window, run mapdql shell, but it was also blocked and never reached the prompt.

At this time, open one more window and run gdb to attach mapd server. Below is the thread information and back trace of one thrift handler thread processing the SELECT query on mapd_server side. Apparently, (both) the SELECT query string has been accepted by Calcite client and probably has been submitted to Calcite parser. Somehow the thread just paused at __libc_recv() indefinitely.

(gdb) info thr
Id Target Id Frame

  • 1 Thread 0x7f49ebfc6300 (LWP 10404) "mapd_server" 0x00007f49e860598d in pthread_join (threadid=139955065657088, thread_return=0x0) at pthread_join.c:90
    2 Thread 0x7f49ebfbc700 (LWP 10425) "mapd_server" 0x00007f49e832e70d in poll () at ../sysdeps/unix/syscall-template.S:84
    3 Thread 0x7f49d4294700 (LWP 10426) "mapd_server" 0x00007f49e860d87f in __libc_recv (fd=38, buf=buf@entry=0x7f49c0010e90, n=n@entry=512, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:28
    4 Thread 0x7f49d4193700 (LWP 10427) "mapd_server" 0x00007f49e860598d in pthread_join (threadid=139954320836352, thread_return=0x0) at pthread_join.c:90
    5 Thread 0x7f49d4092700 (LWP 10428) "mapd_server" 0x00007f49e860d87f in __libc_recv (fd=37, buf=buf@entry=0x7f49b4010fa0, n=n@entry=512, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:28
    6 Thread 0x7f49d3f91700 (LWP 10429) "mapd_server" 0x00007f49e832e70d in poll () at ../sysdeps/unix/syscall-template.S:84
    7 Thread 0x7f49d3790700 (LWP 10430) "mapd_server" 0x00007f49e832e70d in poll () at ../sysdeps/unix/syscall-template.S:84
    8 Thread 0x7f49a7940700 (LWP 10601) "mapd_server" Importer_NS::is_eol (p=@0x7f49b2ec77b2: 57 '9', line_delims="\n\r\n") at /data/github/mapd-core/Import/Importer.cpp:127
    (gdb) thr 3
    [Switching to thread 3 (Thread 0x7f49d4294700 (LWP 10426))]
    #0 0x00007f49e860d87f in __libc_recv (fd=38, buf=buf@entry=0x7f49c0010e90, n=n@entry=512, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:28
    28 ../sysdeps/unix/sysv/linux/x86_64/recv.c: No such file or directory.
    (gdb) bt
    #0 0x00007f49e860d87f in __libc_recv (fd=38, buf=buf@entry=0x7f49c0010e90, n=n@entry=512, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:28
    Support for Geospatial Queries #1 0x00007f49e9f692e1 in recv (__flags=0, __n=512, __buf=, __fd=) at /usr/include/x86_64-linux-gnu/bits/socket2.h:44
    Improve hoisted literals code generation #2 apache::thrift::transport::TSocket::read (this=0x7f49c0010d20, buf=0x7f49c0010e90 "x", len=512) at src/thrift/transport/TSocket.cpp:559
    aarch64 - StorageTest fails #3 0x00007f49e9f76bd4 in apache::thrift::transport::TTransport::read (len=, buf=, this=) at ./src/thrift/transport/TTransport.h:105
    llvm on OSX #4 apache::thrift::transport::TBufferedTransport::readSlow (this=0x7f49c0010e30, buf=0x7f49d4291820 "P\030)\324I\177", len=4) at src/thrift/transport/TBufferTransports.cpp:53
    GPU is not used on Jetson TX2 #5 0x000000000117137f in apache::thrift::transport::TBufferBase::read (this=0x7f49c0010e30, buf=0x7f49d4291820 "P\030)\324I\177", len=4) at /usr/local/mapd-deps/lib/../include/thrift/transport/TBufferTransports.h:71
    Portable int64 Literals #6 0x00000000011743a0 in apache::thrift::transport::readAllapache::thrift::transport::TBufferBase (trans=..., buf=0x7f49d4291820 "P\030)\324I\177", len=4) at /usr/local/mapd-deps/lib/../include/thrift/transport/TTransport.h:41
    No OpenCL Support #7 0x00000000011713ff in apache::thrift::transport::TBufferBase::readAll (this=0x7f49c0010e30, buf=0x7f49d4291820 "P\030)\324I\177", len=4) at /usr/local/mapd-deps/lib/../include/thrift/transport/TBufferTransports.h:84
    How to enable backend rendering? #8 0x00000000011719c9 in apache::thrift::transport::TBufferedTransport::readAll (this=0x7f49c0010e30, buf=0x7f49d4291820 "P\030)\324I\177", len=4) at /usr/local/mapd-deps/lib/../include/thrift/transport/TBufferTransports.h:264
    Joins across multiple GPUs using P2P #9 0x0000000001189201 in apache::thrift::transport::TVirtualTransport<apache::thrift::transport::TBufferedTransport, apache::thrift::transport::TBufferBase>::readAll_virt (this=0x7f49c0010e30,
    buf=0x7f49d4291820 "P\030)\324I\177", len=4) at /usr/local/mapd-deps/lib/../include/thrift/transport/TVirtualTransport.h:92
    Build instructions for Ubuntu 16.04 #10 0x000000000116db8d in apache::thrift::transport::TTransport::readAll (this=0x7f49c0010e30, buf=0x7f49d4291820 "P\030)\324I\177", len=4) at /usr/local/mapd-deps/lib/../include/thrift/transport/TTransport.h:121
    Does Folly needed by mapd? #11 0x0000000001190067 in apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport, apache::thrift::protocol::TNetworkBigEndian>::readI32 (this=0x7f49c00008f0, i32=@0x7f49d4291864: 32585)
    at /usr/local/mapd-deps/lib/../include/thrift/protocol/TBinaryProtocol.tcc:373
    Cuda driver moved in homebrew #12 0x000000000118f808 in apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport, apache::thrift::protocol::TNetworkBigEndian>::readMessageBegin (this=0x7f49c00008f0, name="",
    messageType=@0x7f49d429193c: 0, seqid=@0x7f49d4291938: 0) at /usr/local/mapd-deps/lib/../include/thrift/protocol/TBinaryProtocol.tcc:205
    Compiler error with folly : gflags invalid #13 0x000000000118ecb0 in apache::thrift::protocol::TVirtualProtocol<apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport, apache::thrift::protocol::TNetworkBigEndian>, apache::thrift::protocol::TProtocolDefaults>::readMessageBegin_virt (this=0x7f49c00008f0, name="", messageType=@0x7f49d429193c: 0, seqid=@0x7f49d4291938: 0) at /usr/local/mapd-deps/lib/../include/thrift/protocol/TVirtualProtocol.h:403
    compiler of thrift-0.10.0 error: libboost_unit_test_framework.a: No such file or directory #14 0x000000000116e118 in apache::thrift::protocol::TProtocol::readMessageBegin (this=0x7f49c00008f0, name="", messageType=@0x7f49d429193c: 0, seqid=@0x7f49d4291938: 0)
    at /usr/local/mapd-deps/lib/../include/thrift/protocol/TProtocol.h:436
    Build uses wrong version of LLVM #15 0x00000000029995dc in CalciteServerClient::recv_process (this=0x7f49c00119d0, _return=...) at /data/github/mapd-core/gen-cpp/CalciteServer.cpp:1042
    in centos 7, bash deploy.sh is wrong #16 0x0000000002999303 in CalciteServerClient::process (this=0x7f49c00119d0, _return=..., user="mapd", passwd="SNM6gD5CyJV7o7IQILbRf1DJjG1OGiqX", catalog="mapd",
    sql_text="select count() from (select max(trip_time_in_secs) from trip1 where trip_time_in_secs > 10 group by trip_time_in_secs);", legacySyntax=true, isexplain=false) at /data/github/mapd-core/gen-cpp/CalciteServer.cpp:1013
    compiler mapd-core error #17 0x00000000017b1d3d in Calcite::<lambda()>::operator()(void) const (__closure=0x7f49d4291b60) at /data/github/mapd-core/Calcite/Calcite.cpp:255
    cmake CUDA issue #18 0x00000000017b27cd in measure<std::chrono::duration<long, std::ratio<1l, 1000l> > >::execution<Calcite::processImpl(const Catalog_Namespace::SessionInfo&, std::__cxx11::string, bool, bool)::<lambda()> >(Calcite::<lambda()>) (
    func=...) at /data/github/mapd-core/Shared/measure.h:28
    Exception: Query would require a scan without a limit on table(s) #19 0x00000000017b1fca in Calcite::processImpl (this=0x5273b40, session_info=..., sql_string="select count(
    ) from (select max(trip_time_in_secs) from trip1 where trip_time_in_secs > 10 group by trip_time_in_secs);",
    legacy_syntax=true, is_explain=false) at /data/github/mapd-core/Calcite/Calcite.cpp:257
    Build Error on Ubuntu 16.04.2 LTS #20 0x00000000017b14b9 in Calcite::process (this=0x5273b40, session_info=..., sql_string="select count() from (select max(trip_time_in_secs) from trip1 where trip_time_in_secs > 10 group by trip_time_in_secs);",
    legacy_syntax=true, is_explain=false) at /data/github/mapd-core/Calcite/Calcite.cpp:189
    Write Arrow buffer into host shared memory in CPU mode #21 0x0000000001269503 in MapDHandler::parse_to_ra (this=0x52cc880, query_str="select count(
    ) from (select max(trip_time_in_secs) from trip1 where trip_time_in_secs > 10 group by trip_time_in_secs);", session_info=...)
    at /data/github/mapd-core/ThriftHandler/MapDHandler.cpp:2231
    centos 7 Missing Python Libs? #22 0x000000000126796e in MapDHandler::<lambda()>::<lambda()>::operator()(void) const (__closure=0x7f49d4292050) at /data/github/mapd-core/ThriftHandler/MapDHandler.cpp:2101
    #23 0x000000000126b726 in measure<std::chrono::duration<long, std::ratio<1l, 1000l> > >::execution<MapDHandler::sql_execute_impl(TQueryResult&, const Catalog_Namespace::SessionInfo&, const string&, bool, const string&, ExecutorDeviceType, int32_t)::<lambda()>::<lambda()> >(MapDHandler::<lambda()>::<lambda()>) (func=...) at /data/github/mapd-core/QueryEngine/../Shared/measure.h:28
    No Backend Rendering Support #24 0x0000000001267b8f in MapDHandler::<lambda()>::operator()(void) const (__closure=0x7f49d4292b50) at /data/github/mapd-core/ThriftHandler/MapDHandler.cpp:2101
    Failing with Arrow build #25 0x000000000126b8ab in measure<std::chrono::duration<long, std::ratio<1l, 1000l> > >::execution<MapDHandler::sql_execute_impl(TQueryResult&, const Catalog_Namespace::SessionInfo&, const string&, bool, const string&, ExecutorDeviceType, int32_t)::<lambda()> >(MapDHandler::<lambda()>) (func=...) at /data/github/mapd-core/QueryEngine/../Shared/measure.h:28
    Implementing a distributed cluster #26 0x0000000001268e15 in MapDHandler::sql_execute_impl (this=0x52cc880, _return=..., session_info=...,
    query_str="select count() from (select max(trip_time_in_secs) from trip1 where trip_time_in_secs > 10 group by trip_time_in_secs);", column_format=true, nonce="", executor_device_type=ExecutorDeviceType::CPU, first_n=-1)
    at /data/github/mapd-core/ThriftHandler/MapDHandler.cpp:2091
    SMALLINT Datatype has flaws #27 0x0000000001258775 in MapDHandler::sql_execute (this=0x52cc880, _return=..., session="SNM6gD5CyJV7o7IQILbRf1DJjG1OGiqX",
    query_str="select count(
    ) from (select max(trip_time_in_secs) from trip1 where trip_time_in_secs > 10 group by trip_time_in_secs);", column_format=true, nonce="", first_n=-1)
    ---Type to continue, or q to quit---
    at /data/github/mapd-core/ThriftHandler/MapDHandler.cpp:526
    NUMERIC/DECIMAL datatype has flaw. #28 0x00000000011d2969 in MapDProcessor::process_sql_execute (this=0x52eb380, seqid=0, iprot=0x7f49cc002c10, oprot=0x7f49cc002c80, callContext=0x0) at /data/github/mapd-core/gen-cpp/MapD.cpp:15742
    Make APPROX_COUNT_DISTINCT error configurable #29 0x00000000011c9a41 in MapDProcessor::dispatchCall (this=0x52eb380, iprot=0x7f49cc002c10, oprot=0x7f49cc002c80, fname="sql_execute", seqid=0, callContext=0x0) at /data/github/mapd-core/gen-cpp/MapD.cpp:14752
    cmake: Fixed arrow include directory usage #30 0x000000000116ecce in apache::thrift::TDispatchProcessor::process (this=0x52eb380, in=..., out=..., connectionContext=0x0) at /usr/local/mapd-deps/lib/../include/thrift/TDispatchProcessor.h:121
    Incompatible clang causes nvcc and thread-local storage issues #31 0x00007f49e9f785c2 in apache::thrift::server::TConnectedClient::run (this=0x7f49cc002cf0) at src/thrift/server/TConnectedClient.cpp:62
    #32 0x00007f49e9f44a94 in apache::thrift::concurrency::ThreadManager::Task::run (this=0x7f49cc002db0) at src/thrift/concurrency/ThreadManager.cpp:196
    #33 apache::thrift::concurrency::ThreadManager::Worker::run (this=0x52f3bf0) at src/thrift/concurrency/ThreadManager.cpp:311
    BIGINT datatype doesn't have outside boundaries #34 0x00007f49e9f84833 in apache::thrift::concurrency::PthreadThread::threadMain (arg=0x52f41d0) at src/thrift/concurrency/PosixThreadFactory.cpp:208
    INT datatype doesn't have outside boundaries #35 0x00007f49e86046ba in start_thread (arg=0x7f49d4294700) at pthread_create.c:333
    Segfault with sql_execute_df #36 0x00007f49e833a3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
    (gdb)

Interestingly, this starvation only happens immediately after table creation in the same Python session of the script. For example, if the table exists and the script runs without '-c' option to re-create the table, no threads will be blocked and everything runs as expectated.

It seems more like a corner case, but just curious what runtime impact may the table creation put on later SELECT query.

@fleapapa
Copy link
Contributor Author

fleapapa commented Sep 30, 2017

Attached a tcpdump capture of the test in previous cell. It can be seen in the file that Calcite client sent two requests ('process' and 'get tables') but got no Thrift response from Calcite server other than TCP ACK.

Attached another tcpdump capture when the script ran without creating the table. In this capture, calcite server replied to each SELECT query with a 2613-byte Thrift response.

@dwayneberry
Copy link
Contributor

The issue here is that the calcite server requires access to the mapd-core server to get the db metadata.

If all the connections to the server are exhausted then the calcite server is going to sit and wait for a connection to become available.

The calcite server only makes a connection to the mapd-core server when it is first lazy loading the metadata for the tables the query is being executed on. Which explains why sometimes you see an issue and sometimes you do not depending on whether the calcite server has seen the metadata yet.

This connection issue is ultimately connected to an issue in the nvidia driver where render sessions have some kind of thread local storage that fails after "too many" threads have been instantiated.

We are fixing this by moving the render threads to a managed threadpool, so the limitation on the number of sessions will be removed.

In the short term if you wish to have many threads you should use the http protocol connections to the web server port (normally 9092 by default) as your mapdql transports

eg:

bin/mapdql --http --port 9092 -p password

There is an overhead in using the http protocol but it will avoid your thread starvation issue. but it does appear maybe you are just testing for corner case at the moment anyway.

@fleapapa
Copy link
Contributor Author

fleapapa commented Sep 30, 2017

@dwayneberry

When both mapd_server and my script got stuck, i tried mapdql with http port 9092. It didn't hang but dumped core like below.

$ bin/mapdql mapd     -u $MAPD_USERNAME     -p $MAPD_PASSWORD --http --port 9092
Thrift: Sat Sep 30 00:52:39 2017 TSocket::open() connect() <Host: localhost Port: 9092>Connection refused
terminate called after throwing an instance of 'apache::thrift::transport::TTransportException'
  what():  connect() failed: Connection refused
Aborted (core dumped)

Yes i'm still curious what minimum change to mapd_server or calcite server can make the script run to end. Though it is a corner case, it seems not an unusual "unit test" case that begins with creating a table and immediately forks multiple threads to read/write the table.

@fleapapa
Copy link
Contributor Author

Corrected. After interleaving the forking of threads with a delay, both reader threads got SELECT result :)

@fleapapa
Copy link
Contributor Author

@dwayneberry,

One thing i'm still confused (stubborn:) is, in previously attached 1.cap file, after Calcite server received SELECT requests (msg 1 and 3), it could send "get_tables" requests back to mapd_server (msg 11 and 12), which means Calcite had no problem on connecting back to mapd_server. If that is the case, the issue seems kicked back to mapd_server and the question now becomes "why didn't mapd_server reply to "get_tables" request of Calcite?".

@fleapapa
Copy link
Contributor Author

@dwayneberry
Sorry. I finally got it. Crux of the issue is not the number of "connections" but of "threads". Please disregard the last question.

anton-malakhov pushed a commit to anton-malakhov/omniscidb that referenced this issue Oct 26, 2020
Exceptions in dbe - to catch properly #1884
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants