Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Too many simultaneous queries" error after upgrading to 0.12 #163

Closed
spoofedpacket opened this issue Jun 18, 2021 · 6 comments
Closed

"Too many simultaneous queries" error after upgrading to 0.12 #163

spoofedpacket opened this issue Jun 18, 2021 · 6 comments

Comments

@spoofedpacket
Copy link

I was testing out graphite-clickhouse 0.12 on a clickhouse 20.3.17.173 installation. After some minutes, queries started failing (no data returned)

Looking at the clickhouse logs, it was reporting a "Too many simultaneous queries." error:

`2021.06.16 11:16:24.804066 [ 75359 ] {11a31aae1c21ac323ffc2531c5695489::9dd8ab24d310a5f3} <Error> HTTPHandler: Code: 202, e.displayText() = DB::Exception: Too many simultaneous queries. Maximum: 100, Stack trace (when copying this message, always include the lines below):

0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0xbc0eb8c in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x5033559 in /usr/bin/clickhouse
2. ? @ 0x4c489fd in /usr/bin/clickhouse
3. ? @ 0x8ce74a8 in /usr/bin/clickhouse
4. DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) @ 0x8ce8147 in /usr/bin/clickhouse
5. DB::HTTPHandler::processQuery(DB::Context&, Poco::Net::HTTPServerRequest&, HTMLForm&, Poco::Net::HTTPServerResponse&, DB::HTTPHandler::Output&) @ 0x509940e in /usr/bin/clickhouse
6. DB::HTTPHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&) @ 0x509c77e in /usr/bin/clickhouse
7. Poco::Net::HTTPServerConnection::run() @ 0x9ccb30b in /usr/bin/clickhouse
8. Poco::Net::TCPServerConnection::start() @ 0x9cc7ee7 in /usr/bin/clickhouse
9. Poco::Net::TCPServerDispatcher::run() @ 0x9cc82dd in /usr/bin/clickhouse
10. Poco::PooledThread::run() @ 0xbc7d86f in /usr/bin/clickhouse
11. Poco::ThreadImpl::runnableEntry(void*) @ 0xbc7aad8 in /usr/bin/clickhouse
12. ? @ 0xbc7c209 in /usr/bin/clickhouse
13. start_thread @ 0x7dd5 in /usr/lib64/libpthread-2.17.so
14. clone @ 0xfdead in /usr/lib64/libc-2.17.so
 (version 20.3.17.173)`

The same errors also bubble up to graphite-clickhouse own logs:

"error": "clickhouse response status 500: Code: 202, e.displayText() = DB::Exception: Too many simultaneous queries. Maximum: 100 (version 20.3.17.173)\n"

At the time there were about 1.2k queries/min in progress (as reported by the ClickHouse.ProfileEvents.Query metric). This is pretty normal load for the system I'm working with.

I also tested with internal-aggregation enabled and disabled, this didn't have any effect. Rolling back to graphite-clickhouse 0.11 resolved the issue.

Any insights appreciated! I'm keen to upgrade to 0.12 to take advantage of the internal-aggregation features.

Thanks.

@Felixoid
Copy link
Collaborator

Hey hey.

To me it looks a little bit strange. Can you try the master to confirm if the issue is still there?

In any case, the easiest solution would be to increase clickhouse setting max_concurrent_queries, just try 150.

@Felixoid
Copy link
Collaborator

Hey Robert @spoofedpacket, are there any news regarding the issue?

@spoofedpacket
Copy link
Author

@Felixoid Thanks for the response. I noticed from the logs that the majority of failed queries related to the graphite_index table so I decided to try something:

  • The graphite_index table has grown quite large over time. Metrics paths that have stopped updating werent being deleted from the table, so there was about a year's worth of stale data in there.
  • It was also partitioned with a YYYYMMDD partition key, which lead to it having over 400 partitions - I'm guessing that would have grown forever if left alone.
  • I dropped and recreated the table with a monthly partition key this time and let metrics feed in for a few days
  • There's also a cron in place now to purge metrics paths that haven't been updated in 2 weeks, based on this discussion: Don't list old metric names? #83 (comment)
  • After this, I started sending queries again for several hours and the error did not reappear

I'm thinking this is something peculiar to my setup, due to the very large index table at the time. Happy to close the issue for now, if it reoccurs I'll try bumping the max_concurrent_queries to 150.

Thanks again!

@Felixoid
Copy link
Collaborator

There's actually another way to solve the issue. You can try to use the index-use-daily = false parameter, it can be more efficient for your setup.

To prevent the writing of daily data, there's a setting in carbon-clickhouse, but I'm unhappy with its name disable-daily-index = true, so it will be changed in the next config refactoring in a couple of weeks

@spoofedpacket
Copy link
Author

Oh, I didn't know about that option in carbon-clickhouse. I'll give both a try, thanks for the tip!

@Felixoid
Copy link
Collaborator

I'm closing it, feel free to reopen it on new questions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants