Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite inserting guarantees #17

Closed
let4be opened this issue Jun 27, 2021 · 13 comments
Closed

Infinite inserting guarantees #17

let4be opened this issue Jun 27, 2021 · 13 comments

Comments

@let4be
Copy link

let4be commented Jun 27, 2021

I was looking at the code and I have a question
Do we have a guarantee nothing will be send to clickhouse -before- we actually invoke inserter.commit?

I'm doing a bunch of inserter.write calls and due to the way how I handle errors(backoff) there might be a situation where I accumulate quite a bunch of records and write them all with inserter.write and only then call inserter.commit

from what I see in the code inserter.write actually calls this if buffer is getting bigger than some threshold

self.send_chunk_if_exceeds(MIN_CHUNK_SIZE).await?;

which essentially should lead to writing into a channel(hyper Client's BODY)

now if I understand correctly nothing should be sent over the wire before we actually invoke inserter.end/inserter.commit (which closes the channel)

am I correct in my line of thought?

It's just I am experiencing quite weird sort of bugs on high load with clickhouse where all my writers are getting locked, the throughput can get quite high(~30k writes per second) but I am writing in chunks(using infinite inserter), from several green threads to exclude possible starvation and database has been switched to memory table engine to ensure it's not IO problem

Thanks!

@loyd
Copy link
Owner

loyd commented Jun 27, 2021

Thanks for the question! I think I should add more details to the documentation.

Do we have a guarantee nothing will be send to clickhouse -before- we actually invoke inserter.commit?

Actually, no.

ClickHouse does nothing with a query until the whole request is received. So, it's almost no difference between sending a whole INSERT once or split by chunks. "Almost" because it affected query_duration_ms in query_log. I don't know whether this behavior presents now, but it used to be like this.

Splitting the request into chunks allows using a network more efficiently because the data is being sent earlier. Yes, the kernel has a network buffer (tcp_wmem), but it varies between system installations and can be small enough to see a difference, especially in high-load cases.

However, splitting leads to more syscalls (more write()) and allocations (more BytesMut::with_capacity()). So, after benchmarking (benches/insert.rs and proprietary tests with a real database) I found, that 128KiB is the most appropriate buffer. Although It can change in the future after spreading io-uring.

What inserter.commit() actually guarantees is that INSERT cannot be ended until commit() or end() is called. However, a chunk without ending the active insert can be sent on any write() call. If some error occurs until INSERT is successfully ended, the request is aborted (in Drop instance of Insert). In other words, all rows are written between commit() calls will be sent in one INSERT statement. If they have the same partition and the rows count is less than max_insert_block_size, then they will be placed in one part that written to a disk atomically by CH.

Note that commit() returns a number of inserted rows, so it can be used to remove saved rows in local storage or memory in order to get guarantees of delivery.

Also, if you don't need any kind of atomicity (for instance, you're writing metrics, not business entities) you can just call commit() after every write().

It's just I am experiencing quite weird sort of bugs on high load with clickhouse where all my writers are getting locked, the throughput can get quite high

Can you provide more information here? Are you writing parallel from many tasks? 30k per second seems to be too low, what's the size of row? What's configuration of CH server and version?

@let4be
Copy link
Author

let4be commented Jun 27, 2021

Thanks for such detailed answer! It makes more sense now

Can you provide more information here? Are you writing parallel from many tasks? 30k per second seems to be too low, what's the size of row? What's configuration of CH server and version?

I'm writing in parallel from 3 green threads(each thread receives data over Flume async channel), rows are usually small - the biggest hitter having the following table structure

CREATE TABLE metrics_task (
    host String,
    url String,
    created_date Date DEFAULT now(),
    created_at DateTime,

    term UInt8,
    term_by_filter String,
    term_by_name String,
    term_kind String,
    term_reason String,

    error UInt8,
    error_known String,

    status_ok UInt8,
    status_code UInt16,
    wait_time_ms UInt32,
    status_time_ms UInt32,

    load_ok UInt8,
    load_time_ms UInt32,
    write_size_b UInt32,
    read_size_b UInt32,

    follow_ok UInt8,
    parse_time_micro UInt32
) ENGINE = Memory;

I will add more logging on my side and check what's going on with those locks, not yet sure where the "locking" occurs(don't think it's an actual deadlock)

Thinking to wrap all calls to this library into tokio::time::timeout to detect possibly lagging calls

@let4be
Copy link
Author

let4be commented Jun 27, 2021

Current writer logic is in https://github.com/let4be/crusty/blob/80a35d4168f389a925250e3be0629b10bb4166ce/main/src/clickhouse_utils.rs

would be grateful if you took a brief look if I use the library correctly

@let4be
Copy link
Author

let4be commented Jun 27, 2021

Caught it again. None of my safeguards with timeout worked, meaning it's a deadlock somewhere...
Clickhouse spams errors

2021.06.27 18:39:28.389217 [ 146 ] {7d2575c8-888c-43b1-9d78-acce63d9e1b3} <Debug> executeQuery: (from 172.16.0.1:53022, using production parser) INSERT INTO metrics_queue(`host`,`name`,`name_index`,`updated_at`,`len`) FORMAT RowBinary 
2021.06.27 18:39:28.389333 [ 146 ] {7d2575c8-888c-43b1-9d78-acce63d9e1b3} <Trace> ContextAccess (default): Access granted: INSERT(host, name, name_index, updated_at, len) ON crusty.metrics_queue
2021.06.27 18:39:28.389332 [ 141 ] {} <Trace> HTTPHandler-factory: HTTP Request for HTTPHandler-factory. Method: POST, Address: 172.16.0.1:53026, User-Agent: (none), Content Type: , Transfer Encoding: chunked, X-Forwarded-For: (none)
2021.06.27 18:39:28.389427 [ 141 ] {} <Trace> DynamicQueryHandler: Request URI: /?database=crusty&query=INSERT+INTO+metrics_queue%28%60host%60%2C%60name%60%2C%60name_index%60%2C%60updated_at%60%2C%60len%60%29+FORMAT+RowBinary
2021.06.27 18:39:28.389868 [ 146 ] {7d2575c8-888c-43b1-9d78-acce63d9e1b3} <Debug> DynamicQueryHandler: Done processing query
2021.06.27 18:39:28.389901 [ 146 ] {7d2575c8-888c-43b1-9d78-acce63d9e1b3} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
2021.06.27 18:39:28.390038 [ 193 ] {37b4da0d-8a3f-4054-ad6d-9345655616ab} <Debug> executeQuery: (from 172.16.0.1:53020, using production parser) INSERT INTO metrics_queue(`host`,`name`,`name_index`,`updated_at`,`len`) FORMAT RowBinary 
2021.06.27 18:39:28.390120 [ 193 ] {37b4da0d-8a3f-4054-ad6d-9345655616ab} <Trace> ContextAccess (default): Access granted: INSERT(host, name, name_index, updated_at, len) ON crusty.metrics_queue
2021.06.27 18:39:28.390139 [ 146 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):

0. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x11d44aae in /usr/bin/clickhouse
1. Poco::Net::SocketImpl::peerAddress() @ 0x11d46d26 in /usr/bin/clickhouse
2. DB::HTTPServerRequest::HTTPServerRequest(DB::Context const&, DB::HTTPServerResponse&, Poco::Net::HTTPServerSession&) @ 0xf6a0a34 in /usr/bin/clickhouse
3. DB::HTTPServerConnection::run() @ 0xf69f8a4 in /usr/bin/clickhouse
4. Poco::Net::TCPServerConnection::start() @ 0x11d4bb0f in /usr/bin/clickhouse
5. Poco::Net::TCPServerDispatcher::run() @ 0x11d4d521 in /usr/bin/clickhouse
6. Poco::PooledThread::run() @ 0x11e83c49 in /usr/bin/clickhouse
7. Poco::ThreadImpl::runnableEntry(void*) @ 0x11e7faaa in /usr/bin/clickhouse
8. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
9. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.3.13.9 (official build))
2021.06.27 18:39:28.390172 [ 102 ] {} <Trace> HTTPHandler-factory: HTTP Request for HTTPHandler-factory. Method: POST, Address: 172.16.0.1:53028, User-Agent: (none), Content Type: , Transfer Encoding: chunked, X-Forwarded-For: (none)
2021.06.27 18:39:28.390233 [ 102 ] {} <Trace> DynamicQueryHandler: Request URI: /?database=crusty&query=INSERT+INTO+metrics_queue%28%60host%60%2C%60name%60%2C%60name_index%60%2C%60updated_at%60%2C%60len%60%29+FORMAT+RowBinary
2021.06.27 18:39:28.390606 [ 193 ] {37b4da0d-8a3f-4054-ad6d-9345655616ab} <Debug> DynamicQueryHandler: Done processing query
2021.06.27 18:39:28.390628 [ 193 ] {37b4da0d-8a3f-4054-ad6d-9345655616ab} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
2021.06.27 18:39:28.509345 [ 193 ] {} <Trace> HTTPHandler-factory: HTTP Request for HTTPHandler-factory. Method: POST, Address: 172.16.0.1:53020, User-Agent: (none), Content Type: , Transfer Encoding: chunked, X-Forwarded-For: (none)
2021.06.27 18:39:28.509373 [ 145 ] {70b9dca4-9fce-4ff3-8487-75999df98b5b} <Debug> executeQuery: (from 172.16.0.1:53018, using production parser) INSERT INTO metrics_db(`host`,`created_at`,`table_name`,`label`,`took_ms`,`since_last_ms`,`items`) FORMAT RowBinary 
2021.06.27 18:39:28.509439 [ 193 ] {} <Trace> DynamicQueryHandler: Request URI: /?database=crusty&query=INSERT+INTO+metrics_db%28%60host%60%2C%60created_at%60%2C%60table_name%60%2C%60label%60%2C%60took_ms%60%2C%60since_last_ms%60%2C%60items%60%29+FORMAT+RowBinary
2021.06.27 18:39:28.509511 [ 145 ] {70b9dca4-9fce-4ff3-8487-75999df98b5b} <Trace> ContextAccess (default): Access granted: INSERT(host, created_at, table_name, label, took_ms, since_last_ms, items) ON crusty.metrics_db
2021.06.27 18:39:28.510492 [ 145 ] {70b9dca4-9fce-4ff3-8487-75999df98b5b} <Debug> DynamicQueryHandler: Done processing query
2021.06.27 18:39:28.510519 [ 145 ] {70b9dca4-9fce-4ff3-8487-75999df98b5b} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
2021.06.27 18:39:28.510764 [ 145 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):

0. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x11d44aae in /usr/bin/clickhouse
1. Poco::Net::SocketImpl::peerAddress() @ 0x11d46d26 in /usr/bin/clickhouse
2. DB::HTTPServerRequest::HTTPServerRequest(DB::Context const&, DB::HTTPServerResponse&, Poco::Net::HTTPServerSession&) @ 0xf6a0a34 in /usr/bin/clickhouse
3. DB::HTTPServerConnection::run() @ 0xf69f8a4 in /usr/bin/clickhouse
4. Poco::Net::TCPServerConnection::start() @ 0x11d4bb0f in /usr/bin/clickhouse
5. Poco::Net::TCPServerDispatcher::run() @ 0x11d4d521 in /usr/bin/clickhouse
6. Poco::PooledThread::run() @ 0x11e83c49 in /usr/bin/clickhouse
7. Poco::ThreadImpl::runnableEntry(void*) @ 0x11e7faaa in /usr/bin/clickhouse
8. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
9. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.3.13.9 (official build))
2021.06.27 18:39:28.621211 [ 148 ] {494f0551-dc8a-4871-ac26-c267afe9d69a} <Debug> executeQuery: (from 172.16.0.1:53014, using production parser) INSERT INTO metrics_db(`host`,`created_at`,`table_name`,`label`,`took_ms`,`since_last_ms`,`items`) FORMAT RowBinary 
2021.06.27 18:39:28.621343 [ 148 ] {494f0551-dc8a-4871-ac26-c267afe9d69a} <Trace> ContextAccess (default): Access granted: INSERT(host, created_at, table_name, label, took_ms, since_last_ms, items) ON crusty.metrics_db
2021.06.27 18:39:28.621658 [ 148 ] {494f0551-dc8a-4871-ac26-c267afe9d69a} <Debug> executeQuery: Query pipeline:
NullAndDoCopy
 InputStreamFromASTInsertQuery

2021.06.27 18:39:28.621943 [ 145 ] {} <Trace> HTTPHandler-factory: HTTP Request for HTTPHandler-factory. Method: POST, Address: 172.16.0.1:53030, User-Agent: (none), Content Type: , Transfer Encoding: chunked, X-Forwarded-For: (none)
2021.06.27 18:39:28.622020 [ 145 ] {} <Trace> DynamicQueryHandler: Request URI: /?database=crusty&query=INSERT+INTO+metrics_db%28%60host%60%2C%60created_at%60%2C%60table_name%60%2C%60label%60%2C%60took_ms%60%2C%60since_last_ms%60%2C%60items%60%29+FORMAT+RowBinary
2021.06.27 18:39:28.622319 [ 148 ] {494f0551-dc8a-4871-ac26-c267afe9d69a} <Information> executeQuery: Read 1 rows, 65.00 B in 0.00104886 sec., 953 rows/sec., 60.52 KiB/sec.
2021.06.27 18:39:28.622481 [ 148 ] {494f0551-dc8a-4871-ac26-c267afe9d69a} <Debug> DynamicQueryHandler: Done processing query
2021.06.27 18:39:28.622499 [ 148 ] {494f0551-dc8a-4871-ac26-c267afe9d69a} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
2021.06.27 18:39:28.622745 [ 148 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):

0. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x11d44aae in /usr/bin/clickhouse
1. Poco::Net::SocketImpl::peerAddress() @ 0x11d46d26 in /usr/bin/clickhouse
2. DB::HTTPServerRequest::HTTPServerRequest(DB::Context const&, DB::HTTPServerResponse&, Poco::Net::HTTPServerSession&) @ 0xf6a0a34 in /usr/bin/clickhouse
3. DB::HTTPServerConnection::run() @ 0xf69f8a4 in /usr/bin/clickhouse
4. Poco::Net::TCPServerConnection::start() @ 0x11d4bb0f in /usr/bin/clickhouse
5. Poco::Net::TCPServerDispatcher::run() @ 0x11d4d521 in /usr/bin/clickhouse
6. Poco::PooledThread::run() @ 0x11e83c49 in /usr/bin/clickhouse
7. Poco::ThreadImpl::runnableEntry(void*) @ 0x11e7faaa in /usr/bin/clickhouse
8. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
9. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.3.13.9 (official build))
2021.06.27 18:39:28.722631 [ 156 ] {50d87ae6-30e3-49c8-8abb-abd9b0c5cd87} <Debug> executeQuery: (from 172.16.0.1:52942, using production parser) INSERT INTO metrics_db(`host`,`created_at`,`table_name`,`label`,`took_ms`,`since_last_ms`,`items`) FORMAT RowBinary 
2021.06.27 18:39:28.722768 [ 156 ] {50d87ae6-30e3-49c8-8abb-abd9b0c5cd87} <Trace> ContextAccess (default): Access granted: INSERT(host, created_at, table_name, label, took_ms, since_last_ms, items) ON crusty.metrics_db
2021.06.27 18:39:28.723106 [ 156 ] {50d87ae6-30e3-49c8-8abb-abd9b0c5cd87} <Debug> executeQuery: Query pipeline:
NullAndDoCopy
 InputStreamFromASTInsertQuery

2021.06.27 18:39:28.723164 [ 148 ] {} <Trace> HTTPHandler-factory: HTTP Request for HTTPHandler-factory. Method: POST, Address: 172.16.0.1:53032, User-Agent: (none), Content Type: , Transfer Encoding: chunked, X-Forwarded-For: (none)
2021.06.27 18:39:28.723253 [ 148 ] {} <Trace> DynamicQueryHandler: Request URI: /?database=crusty&query=INSERT+INTO+metrics_db%28%60host%60%2C%60created_at%60%2C%60table_name%60%2C%60label%60%2C%60took_ms%60%2C%60since_last_ms%60%2C%60items%60%29+FORMAT+RowBinary
2021.06.27 18:39:28.723688 [ 156 ] {50d87ae6-30e3-49c8-8abb-abd9b0c5cd87} <Information> executeQuery: Read 2 rows, 128.00 B in 0.000999408 sec., 2001 rows/sec., 125.07 KiB/sec.
2021.06.27 18:39:28.723842 [ 156 ] {50d87ae6-30e3-49c8-8abb-abd9b0c5cd87} <Debug> DynamicQueryHandler: Done processing query
2021.06.27 18:39:28.723860 [ 156 ] {50d87ae6-30e3-49c8-8abb-abd9b0c5cd87} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.

I see thousands of "socket is not connected" errors in log

root@fed02341b7d7:/# ls -la /var/log/clickhouse-server/clickhouse-server.err.log
-rw-r----- 1 clickhouse clickhouse 2918141 Jun 27 18:49 /var/log/clickhouse-server/clickhouse-server.err.log
root@fed02341b7d7:/# clickhouse-client
ClickHouse client version 21.3.13.9 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 21.3.13 revision 54447.

fed02341b7d7 :) select version()

SELECT version()

Query id: b2b51280-3140-42c9-865f-6f60d14b9741

┌─version()─┐
│ 21.3.13.9 │
└───────────┘

1 rows in set. Elapsed: 0.003 sec. 

fed02341b7d7 :)

@let4be
Copy link
Author

let4be commented Jun 27, 2021

It lagged for a while, and then I saw a bunch of errors in my logs regarding the timeout safeguard I put in

Jun 27 18:52:37.365 ERROR task{name="Crusty::go"}:task{name="Crusty::clickhouse_writer"}:task{name="{{closure}}::{{closure}}"}: Finished with error=timeout during inserter.commit

Caused by:
    deadline has elapsed elapsed=836.003806222s
Jun 27 18:52:37.398 ERROR task{name="Crusty::go"}:task{name="Crusty::clickhouse_writer"}:task{name="{{closure}}::{{closure}}"}: Finished with error=timeout during inserter.commit

Caused by:
    deadline has elapsed elapsed=248.673025167s
Jun 27 18:52:37.427 ERROR task{name="Crusty::go"}:task{name="Crusty::clickhouse_writer"}:task{name="{{closure}}::{{closure}}"}: Finished with error=timeout during inserter.write

Caused by:
    deadline has elapsed elapsed=419.983367626s
Jun 27 18:52:41.922  WARN task{name="Crusty::go"}:task{name="Crusty::domain_resolver_aggregator"}: Domains discarded as overflow(over resolver's capacity): 75441
Jun 27 18:52:57.299 ERROR task{name="Crusty::go"}:task{name="Crusty::clickhouse_writer"}:task{name="{{closure}}::{{closure}}"}: Finished with error=timeout during inserter.commit

Caused by:
    deadline has elapsed elapsed=268.574666337s
Jun 27 18:52:57.303 ERROR task{name="Crusty::go"}:task{name="Crusty::clickhouse_writer"}:task{name="{{closure}}::{{closure}}"}: Finished with error=timeout during inserter.commit

Caused by:
    deadline has elapsed elapsed=383.189448827s
Jun 27 18:52:57.305  WARN task{name="Crusty::go"}:task{name="Crusty::domain_resolver_aggregator"}: Domains discarded as overflow(over resolver's capacity): 14552
Jun 27 18:52:57.342 ERROR task{name="Crusty::go"}:task{name="Crusty::clickhouse_writer"}:task{name="{{closure}}::{{closure}}"}: Finished with error=error during inserter.commit

Caused by:
    0: network error: channel closed
    1: channel closed elapsed=19.296903926s
Jun 27 18:53:16.973  WARN task{name="Crusty::go"}:task{name="Crusty::domain_resolver_aggregator"}: Domains discarded as overflow(over resolver's capacity): 49835
Jun 27 18:53:16.999 ERROR task{name="Crusty::go"}:task{name="Crusty::clickhouse_writer"}:task{name="{{closure}}::{{closure}}"}: Finished with error=error during inserter.commit

Caused by:
    0: network error: connection closed before message completed
    1: connection closed before message completed elapsed=875.63720069s
Jun 27 18:53:17.072 ERROR task{name="Crusty::go"}:task{name="Crusty::clickhouse_writer"}:task{name="{{closure}}::{{closure}}"}: Finished with error=error during inserter.write

Caused by:
    0: network error: channel closed
    1: channel closed elapsed=548.816135738s
Jun 27 18:53:17.079 ERROR task{name="Crusty::go"}:task{name="Crusty::clickhouse_writer"}:task{name="{{closure}}::{{closure}}"}: Finished with error=error during inserter.write

Caused by:
    0: network error: channel closed
    1: channel closed elapsed=18.459984505s
Jun 27 18:53:17.108 ERROR task{name="Crusty::go"}:task{name="Crusty::clickhouse_writer"}:task{name="{{closure}}::{{closure}}"}: Finished with error=error during inserter.write

Caused by:
    0: network error: channel closed
    1: channel closed elapsed=18.489463764s
Jun 27 18:53:20.896  WARN task{name="Crusty::go"}:task{name="Crusty::domain_resolver_aggregator"}: Domains discarded as overflow(over resolver's capacity): 67314
root@ip-172-32-198-52:/home/admin# 

Do we need to properly configure Hyper http client timeout? I don't see any timeout configuration code in this library...

those elapsed=* messages might be confusing in this context it's not related to the timeout but to the whole "task duration"
one thing for sure though - my timeout code which was configured for 10 seconds did not trigger in time... and then after several minutes it spewed out those error messages

@let4be
Copy link
Author

let4be commented Jun 28, 2021

The error spam in clickhouse(socket is not connected) could be because we never read response body

@loyd
Copy link
Owner

loyd commented Jun 28, 2021

I don't see any timeout configuration code in this library...

Yep, I decided that the default TCP keepalive should be used, but now I think that it's a bad idea (that's usually too large), I'm going to set some.

The error spam in clickhouse(socket is not connected) could be because we never read response body

I've checked our logs and found the same errors. It's interesting, they didn't exist before, maybe some changes on another side.
I'll check this assumption, thanks

@let4be
Copy link
Author

let4be commented Jun 28, 2021

I've checked our logs and found the same errors. It's interesting, they didn't exist before, maybe some changes on another side.
I'll check this assumption, thanks

When you do not read data from the socket - it's still hanging in here and hyper uses connection pool internally, I think what really might be happening is each and every request+response permanently spoils a connection... making hyper open another one and another one while killing existing connections...

@let4be
Copy link
Author

let4be commented Jun 28, 2021

I just did an attempted fix in my fork and it helped(tho for some reason benches did not compile, so I nuked them) - no more errors in the log
let4be@18ed8e9

yet to check if this actually fixed the other issue with hanging writers

@let4be
Copy link
Author

let4be commented Jun 28, 2021

Unfortunately this did not resolve the original issue, - writers still appear to get locked after a while
Clickhouse itself is pretty much very alive, responding to requests OK and error logs are clear...

@let4be
Copy link
Author

let4be commented Jun 28, 2021

Though now once log is clear of non-related error spam I found this:

root@cfc268c9ca1d:/# cat /var/log/clickhouse-server/*err*
2021.06.28 12:30:36.849792 [ 52 ] {} <Warning> Access(local directory): File /var/lib/clickhouse/access/users.list doesn't exist
2021.06.28 12:30:36.849816 [ 52 ] {} <Warning> Access(local directory): Recovering lists in directory /var/lib/clickhouse/access/
2021.06.28 12:30:36.898797 [ 52 ] {} <Warning> Application: Listen [0.0.0.0]:8123 failed: Poco::Exception. Code: 1000, e.code() = 98, e.displayText() = Net Exception: Address already in use: 0.0.0.0:8123 (version 21.3.13.9 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2021.06.28 12:30:36.898926 [ 52 ] {} <Warning> Application: Listen [0.0.0.0]:9000 failed: Poco::Exception. Code: 1000, e.code() = 98, e.displayText() = Net Exception: Address already in use: 0.0.0.0:9000 (version 21.3.13.9 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2021.06.28 12:30:36.899046 [ 52 ] {} <Warning> Application: Listen [0.0.0.0]:9009 failed: Poco::Exception. Code: 1000, e.code() = 98, e.displayText() = Net Exception: Address already in use: 0.0.0.0:9009 (version 21.3.13.9 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2021.06.28 12:30:36.899162 [ 52 ] {} <Warning> Application: Listen [0.0.0.0]:9004 failed: Poco::Exception. Code: 1000, e.code() = 98, e.displayText() = Net Exception: Address already in use: 0.0.0.0:9004 (version 21.3.13.9 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2021.06.28 12:30:39.095345 [ 100 ] {} <Warning> Application: Listen [::]:8123 failed: Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = DNS error: EAI: Address family for hostname not supported (version 21.3.13.9 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2021.06.28 12:30:39.095499 [ 100 ] {} <Warning> Application: Listen [::]:9000 failed: Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = DNS error: EAI: Address family for hostname not supported (version 21.3.13.9 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2021.06.28 12:30:39.095644 [ 100 ] {} <Warning> Application: Listen [::]:9009 failed: Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = DNS error: EAI: Address family for hostname not supported (version 21.3.13.9 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2021.06.28 12:30:39.095793 [ 100 ] {} <Warning> Application: Listen [::]:9004 failed: Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = DNS error: EAI: Address family for hostname not supported (version 21.3.13.9 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2021.06.28 12:30:39.985299 [ 102 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):

0. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x11d44aae in /usr/bin/clickhouse
1. Poco::Net::SocketImpl::peerAddress() @ 0x11d46d26 in /usr/bin/clickhouse
2. DB::HTTPServerRequest::HTTPServerRequest(DB::Context const&, DB::HTTPServerResponse&, Poco::Net::HTTPServerSession&) @ 0xf6a0a34 in /usr/bin/clickhouse
3. DB::HTTPServerConnection::run() @ 0xf69f8a4 in /usr/bin/clickhouse
4. Poco::Net::TCPServerConnection::start() @ 0x11d4bb0f in /usr/bin/clickhouse
5. Poco::Net::TCPServerDispatcher::run() @ 0x11d4d521 in /usr/bin/clickhouse
6. Poco::PooledThread::run() @ 0x11e83c49 in /usr/bin/clickhouse
7. Poco::ThreadImpl::runnableEntry(void*) @ 0x11e7faaa in /usr/bin/clickhouse
8. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
9. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.3.13.9 (official build))
2021.06.28 12:35:29.811569 [ 169 ] {d2e74470-8eb9-4510-93fb-6a3ce5ba82d5} <Error> DynamicQueryHandler: Code: 3, e.displayText() = DB::Exception: Unexpected end of file while reading chunk header of HTTP chunked data, Stack trace (when copying this message, always include the lines below):

0. DB::HTTPChunkedReadBuffer::readChunkHeader() @ 0xf6a23a4 in /usr/bin/clickhouse
1. DB::HTTPChunkedReadBuffer::nextImpl() @ 0xf6a263b in /usr/bin/clickhouse
2. DB::wrapReadBufferReference(DB::ReadBuffer&)::ReadBufferWrapper::nextImpl() @ 0xe4c6f0c in /usr/bin/clickhouse
3. DB::ConcatReadBuffer::nextImpl() @ 0xe713a4e in /usr/bin/clickhouse
4. DB::LimitReadBuffer::nextImpl() @ 0x859299c in /usr/bin/clickhouse
5. DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) @ 0xeef76db in /usr/bin/clickhouse
6. DB::HTTPHandler::processQuery(DB::Context&, DB::HTTPServerRequest&, DB::HTMLForm&, DB::HTTPServerResponse&, DB::HTTPHandler::Output&, std::__1::optional<DB::CurrentThread::QueryScope>&) @ 0xf6156fa in /usr/bin/clickhouse
7. DB::HTTPHandler::handleRequest(DB::HTTPServerRequest&, DB::HTTPServerResponse&) @ 0xf61988e in /usr/bin/clickhouse
8. DB::HTTPServerConnection::run() @ 0xf69fb2f in /usr/bin/clickhouse
9. Poco::Net::TCPServerConnection::start() @ 0x11d4bb0f in /usr/bin/clickhouse
10. Poco::Net::TCPServerDispatcher::run() @ 0x11d4d521 in /usr/bin/clickhouse
11. Poco::PooledThread::run() @ 0x11e83c49 in /usr/bin/clickhouse
12. Poco::ThreadImpl::runnableEntry(void*) @ 0x11e7faaa in /usr/bin/clickhouse
13. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
14. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.3.13.9 (official build))
2021.06.28 12:35:29.811898 [ 169 ] {d2e74470-8eb9-4510-93fb-6a3ce5ba82d5} <Error> DynamicQueryHandler: Cannot send exception to client: Code: 3, e.displayText() = DB::Exception: Unexpected end of file while reading chunk header of HTTP chunked data, Stack trace (when copying this message, always include the lines below):

0. DB::HTTPChunkedReadBuffer::readChunkHeader() @ 0xf6a23a4 in /usr/bin/clickhouse
1. DB::HTTPChunkedReadBuffer::nextImpl() @ 0xf6a263b in /usr/bin/clickhouse
2. DB::HTTPHandler::trySendExceptionToClient(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, DB::HTTPServerRequest&, DB::HTTPServerResponse&, DB::HTTPHandler::Output&) @ 0xf618c42 in /usr/bin/clickhouse
3. DB::HTTPHandler::handleRequest(DB::HTTPServerRequest&, DB::HTTPServerResponse&) @ 0xf619fb5 in /usr/bin/clickhouse
4. DB::HTTPServerConnection::run() @ 0xf69fb2f in /usr/bin/clickhouse
5. Poco::Net::TCPServerConnection::start() @ 0x11d4bb0f in /usr/bin/clickhouse
6. Poco::Net::TCPServerDispatcher::run() @ 0x11d4d521 in /usr/bin/clickhouse
7. Poco::PooledThread::run() @ 0x11e83c49 in /usr/bin/clickhouse
8. Poco::ThreadImpl::runnableEntry(void*) @ 0x11e7faaa in /usr/bin/clickhouse
9. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
10. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.3.13.9 (official build))
2021.06.28 12:35:29.812792 [ 169 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 32, e.displayText() = I/O error: Broken pipe, Stack trace (when copying this message, always include the lines below):

0. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x11d444fd in /usr/bin/clickhouse
1. Poco::Net::SocketImpl::sendBytes(void const*, int, int) @ 0x11d45bea in /usr/bin/clickhouse
2. Poco::Net::StreamSocketImpl::sendBytes(void const*, int, int) @ 0x11d4ab76 in /usr/bin/clickhouse
3. Poco::Net::HTTPSession::write(char const*, long) @ 0x11d1bd53 in /usr/bin/clickhouse
4. Poco::Net::HTTPChunkedIOS::~HTTPChunkedIOS() @ 0x11d05c50 in /usr/bin/clickhouse
5. Poco::Net::HTTPChunkedOutputStream::~HTTPChunkedOutputStream() @ 0x11d066be in /usr/bin/clickhouse
6. DB::HTTPServerConnection::run() @ 0xf69fc99 in /usr/bin/clickhouse
7. Poco::Net::TCPServerConnection::start() @ 0x11d4bb0f in /usr/bin/clickhouse
8. Poco::Net::TCPServerDispatcher::run() @ 0x11d4d521 in /usr/bin/clickhouse
9. Poco::PooledThread::run() @ 0x11e83c49 in /usr/bin/clickhouse
10. Poco::ThreadImpl::runnableEntry(void*) @ 0x11e7faaa in /usr/bin/clickhouse
11. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
12. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.3.13.9 (official build))
2021.06.28 12:36:09.513092 [ 142 ] {1e8d24cc-a677-4614-a78f-344017ccb536} <Error> DynamicQueryHandler: Code: 3, e.displayText() = DB::Exception: Unexpected end of file while reading chunk header of HTTP chunked data, Stack trace (when copying this message, always include the lines below):

0. DB::HTTPChunkedReadBuffer::readChunkHeader() @ 0xf6a23a4 in /usr/bin/clickhouse
1. DB::HTTPChunkedReadBuffer::nextImpl() @ 0xf6a263b in /usr/bin/clickhouse
2. DB::wrapReadBufferReference(DB::ReadBuffer&)::ReadBufferWrapper::nextImpl() @ 0xe4c6f0c in /usr/bin/clickhouse
3. DB::ConcatReadBuffer::nextImpl() @ 0xe713a4e in /usr/bin/clickhouse
4. DB::LimitReadBuffer::nextImpl() @ 0x859299c in /usr/bin/clickhouse
5. DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) @ 0xeef76db in /usr/bin/clickhouse
6. DB::HTTPHandler::processQuery(DB::Context&, DB::HTTPServerRequest&, DB::HTMLForm&, DB::HTTPServerResponse&, DB::HTTPHandler::Output&, std::__1::optional<DB::CurrentThread::QueryScope>&) @ 0xf6156fa in /usr/bin/clickhouse
7. DB::HTTPHandler::handleRequest(DB::HTTPServerRequest&, DB::HTTPServerResponse&) @ 0xf61988e in /usr/bin/clickhouse
8. DB::HTTPServerConnection::run() @ 0xf69fb2f in /usr/bin/clickhouse
9. Poco::Net::TCPServerConnection::start() @ 0x11d4bb0f in /usr/bin/clickhouse
10. Poco::Net::TCPServerDispatcher::run() @ 0x11d4d521 in /usr/bin/clickhouse
11. Poco::PooledThread::run() @ 0x11e83c49 in /usr/bin/clickhouse
12. Poco::ThreadImpl::runnableEntry(void*) @ 0x11e7faaa in /usr/bin/clickhouse
13. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
14. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.3.13.9 (official build))
2021.06.28 12:36:09.513275 [ 142 ] {1e8d24cc-a677-4614-a78f-344017ccb536} <Error> DynamicQueryHandler: Cannot send exception to client: Code: 3, e.displayText() = DB::Exception: Unexpected end of file while reading chunk header of HTTP chunked data, Stack trace (when copying this message, always include the lines below):

0. DB::HTTPChunkedReadBuffer::readChunkHeader() @ 0xf6a23a4 in /usr/bin/clickhouse
1. DB::HTTPChunkedReadBuffer::nextImpl() @ 0xf6a263b in /usr/bin/clickhouse
2. DB::HTTPHandler::trySendExceptionToClient(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, DB::HTTPServerRequest&, DB::HTTPServerResponse&, DB::HTTPHandler::Output&) @ 0xf618c42 in /usr/bin/clickhouse
3. DB::HTTPHandler::handleRequest(DB::HTTPServerRequest&, DB::HTTPServerResponse&) @ 0xf619fb5 in /usr/bin/clickhouse
4. DB::HTTPServerConnection::run() @ 0xf69fb2f in /usr/bin/clickhouse
5. Poco::Net::TCPServerConnection::start() @ 0x11d4bb0f in /usr/bin/clickhouse
6. Poco::Net::TCPServerDispatcher::run() @ 0x11d4d521 in /usr/bin/clickhouse
7. Poco::PooledThread::run() @ 0x11e83c49 in /usr/bin/clickhouse
8. Poco::ThreadImpl::runnableEntry(void*) @ 0x11e7faaa in /usr/bin/clickhouse
9. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
10. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.3.13.9 (official build))
2021.06.28 12:36:09.513994 [ 142 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 32, e.displayText() = I/O error: Broken pipe, Stack trace (when copying this message, always include the lines below):

0. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x11d444fd in /usr/bin/clickhouse
1. Poco::Net::SocketImpl::sendBytes(void const*, int, int) @ 0x11d45bea in /usr/bin/clickhouse
2. Poco::Net::StreamSocketImpl::sendBytes(void const*, int, int) @ 0x11d4ab76 in /usr/bin/clickhouse
3. Poco::Net::HTTPSession::write(char const*, long) @ 0x11d1bd53 in /usr/bin/clickhouse
4. Poco::Net::HTTPChunkedIOS::~HTTPChunkedIOS() @ 0x11d05c50 in /usr/bin/clickhouse
5. Poco::Net::HTTPChunkedOutputStream::~HTTPChunkedOutputStream() @ 0x11d066be in /usr/bin/clickhouse
6. DB::HTTPServerConnection::run() @ 0xf69fc99 in /usr/bin/clickhouse
7. Poco::Net::TCPServerConnection::start() @ 0x11d4bb0f in /usr/bin/clickhouse
8. Poco::Net::TCPServerDispatcher::run() @ 0x11d4d521 in /usr/bin/clickhouse
9. Poco::PooledThread::run() @ 0x11e83c49 in /usr/bin/clickhouse
10. Poco::ThreadImpl::runnableEntry(void*) @ 0x11e7faaa in /usr/bin/clickhouse
11. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
12. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.3.13.9 (official build))
2021.06.28 12:36:09.514402 [ 155 ] {81636a0e-67e4-4c43-ae62-f8c750198b83} <Error> DynamicQueryHandler: Code: 3, e.displayText() = DB::Exception: Unexpected end of file while reading chunk header of HTTP chunked data, Stack trace (when copying this message, always include the lines below):

0. DB::HTTPChunkedReadBuffer::readChunkHeader() @ 0xf6a23a4 in /usr/bin/clickhouse
1. DB::HTTPChunkedReadBuffer::nextImpl() @ 0xf6a263b in /usr/bin/clickhouse
2. DB::wrapReadBufferReference(DB::ReadBuffer&)::ReadBufferWrapper::nextImpl() @ 0xe4c6f0c in /usr/bin/clickhouse
3. DB::ConcatReadBuffer::nextImpl() @ 0xe713a4e in /usr/bin/clickhouse
4. DB::LimitReadBuffer::nextImpl() @ 0x859299c in /usr/bin/clickhouse
5. DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) @ 0xeef76db in /usr/bin/clickhouse
6. DB::HTTPHandler::processQuery(DB::Context&, DB::HTTPServerRequest&, DB::HTMLForm&, DB::HTTPServerResponse&, DB::HTTPHandler::Output&, std::__1::optional<DB::CurrentThread::QueryScope>&) @ 0xf6156fa in /usr/bin/clickhouse
7. DB::HTTPHandler::handleRequest(DB::HTTPServerRequest&, DB::HTTPServerResponse&) @ 0xf61988e in /usr/bin/clickhouse
8. DB::HTTPServerConnection::run() @ 0xf69fb2f in /usr/bin/clickhouse
9. Poco::Net::TCPServerConnection::start() @ 0x11d4bb0f in /usr/bin/clickhouse
10. Poco::Net::TCPServerDispatcher::run() @ 0x11d4d521 in /usr/bin/clickhouse
11. Poco::PooledThread::run() @ 0x11e83c49 in /usr/bin/clickhouse
12. Poco::ThreadImpl::runnableEntry(void*) @ 0x11e7faaa in /usr/bin/clickhouse
13. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
14. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.3.13.9 (official build))
2021.06.28 12:36:09.514626 [ 155 ] {81636a0e-67e4-4c43-ae62-f8c750198b83} <Error> DynamicQueryHandler: Cannot send exception to client: Code: 3, e.displayText() = DB::Exception: Unexpected end of file while reading chunk header of HTTP chunked data, Stack trace (when copying this message, always include the lines below):

0. DB::HTTPChunkedReadBuffer::readChunkHeader() @ 0xf6a23a4 in /usr/bin/clickhouse
1. DB::HTTPChunkedReadBuffer::nextImpl() @ 0xf6a263b in /usr/bin/clickhouse
2. DB::HTTPHandler::trySendExceptionToClient(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, DB::HTTPServerRequest&, DB::HTTPServerResponse&, DB::HTTPHandler::Output&) @ 0xf618c42 in /usr/bin/clickhouse
3. DB::HTTPHandler::handleRequest(DB::HTTPServerRequest&, DB::HTTPServerResponse&) @ 0xf619fb5 in /usr/bin/clickhouse
4. DB::HTTPServerConnection::run() @ 0xf69fb2f in /usr/bin/clickhouse
5. Poco::Net::TCPServerConnection::start() @ 0x11d4bb0f in /usr/bin/clickhouse
6. Poco::Net::TCPServerDispatcher::run() @ 0x11d4d521 in /usr/bin/clickhouse
7. Poco::PooledThread::run() @ 0x11e83c49 in /usr/bin/clickhouse
8. Poco::ThreadImpl::runnableEntry(void*) @ 0x11e7faaa in /usr/bin/clickhouse
9. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
10. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.3.13.9 (official build))
2021.06.28 12:36:09.515345 [ 155 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 32, e.displayText() = I/O error: Broken pipe, Stack trace (when copying this message, always include the lines below):

0. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x11d444fd in /usr/bin/clickhouse
1. Poco::Net::SocketImpl::sendBytes(void const*, int, int) @ 0x11d45bea in /usr/bin/clickhouse
2. Poco::Net::StreamSocketImpl::sendBytes(void const*, int, int) @ 0x11d4ab76 in /usr/bin/clickhouse
3. Poco::Net::HTTPSession::write(char const*, long) @ 0x11d1bd53 in /usr/bin/clickhouse
4. Poco::Net::HTTPChunkedIOS::~HTTPChunkedIOS() @ 0x11d05c50 in /usr/bin/clickhouse
5. Poco::Net::HTTPChunkedOutputStream::~HTTPChunkedOutputStream() @ 0x11d066be in /usr/bin/clickhouse
6. DB::HTTPServerConnection::run() @ 0xf69fc99 in /usr/bin/clickhouse
7. Poco::Net::TCPServerConnection::start() @ 0x11d4bb0f in /usr/bin/clickhouse
8. Poco::Net::TCPServerDispatcher::run() @ 0x11d4d521 in /usr/bin/clickhouse
9. Poco::PooledThread::run() @ 0x11e83c49 in /usr/bin/clickhouse
10. Poco::ThreadImpl::runnableEntry(void*) @ 0x11e7faaa in /usr/bin/clickhouse
11. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
12. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.3.13.9 (official build))
root@cfc268c9ca1d:/#

Those error log messages appear in clickhouse roughly at the same time I see problem with my writers

@let4be
Copy link
Author

let4be commented Jun 29, 2021

The original issue is most likely related to system/kernel/network configuration as it appears only when I push networking stack to it's breaking point
The software I'm using this clickhouse library in is a broad web crawler running on a beefy machine with 25gbit/sec outbound link, when I'm pushing it to 60k+ opened connections I start to see issues with db connections

Thing is the culprit is really hard to find(or I am being blind and missing something obvious), from obvious stuff I

  • increased available file descriptors to 1mil (/proc/sys/fs/nr_open and via ulimit -n)
  • increased net.ipv4.ip_local_port_range to 4096 65535
  • set net.ipv4.tcp_tw_reuse = 1
  • I'm also using several outbound public IP addresses(configured via ifconfig aliasing, verified that it's working with curl)

Yesterday I was trying changing other settings in sysctl.conf and I stumbled upon net.ipv4.tcp_mem increasing it initially helped to alleviate the issue(I saw no more hard freezes of writers) but then I failed to reproduce it on another system... :\

@let4be
Copy link
Author

let4be commented Jun 29, 2021

I resolved the original issue... somehow I managed to configure tokio runtime("optimizing") with max_blocking_threads = 1
and this library uses hyper in stock configuration(my web crawler runs on a separate tokio runtime and even if it did not it uses async rust based dns resolver, i.e. I use 2 runtimes), which uses default DNS resolver which is blocking, so stuff "being resolved" in a blocking thread which I had just one...
sorry for the troubles, I guess this issue can be closed :)

Configurable timeouts would be nice to have tho

@let4be let4be closed this as completed Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants