Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native stack: cannot find sent record in _tcbs. #750

Open
WJTian opened this issue May 25, 2020 · 9 comments
Open

Native stack: cannot find sent record in _tcbs. #750

WJTian opened this issue May 25, 2020 · 9 comments

Comments

@WJTian
Copy link

WJTian commented May 25, 2020

When I benchmark ceph crimson messenger using perf_crimson_msgr, the client use native network stack and 1 job(i.e. shard 1 send TCP SYN packet). The received SYN+ACK packet may be hashed to another shard which does not have corresponding record in _tcbs struct. This issue will cause TCP handshake failure.

The tcpdump output is:
15:21:47.400523 IP 192.168.122.111.53010 > 192.168.122.122.distinct: Flags [S], seq 2210197188, win 29200, options [mss 1460,wscale 7,eol], length 0
15:21:47.400735 IP 192.168.122.122.distinct > 192.168.122.111.53010: Flags [S.], seq 2180376672, ack 2210197189, win 29200, options [mss 1460,wscale 7,eol], length 0
15:21:47.401020 IP 192.168.122.111.53010 > 192.168.122.122.distinct: Flags [R.], seq 1, ack 1, win 0, length 0

@avikivity
Copy link
Member

avikivity commented May 25, 2020

This can happen if the NIC uses a hash function different from what we think it is.

What NIC are you using?

Please test with --smp 1, just to validate.

@WJTian
Copy link
Author

WJTian commented May 25, 2020

The NIC I use is Mellanox ConnectX-4 Lx.
The perf_crimson_msgr needs at least 2 threads: one for main thread and the other for job. I used --smp 2, the probability of TCP handshake success becomes very higher(50% I estimate).

@avikivity
Copy link
Member

avikivity commented May 25, 2020

Try changing if (smp::count > 1) to if (false) in dpdk_device::init_port_start(). If it works, we know it's a hash function mismatch.

/cc @vladzcloudius

@WJTian
Copy link
Author

WJTian commented May 25, 2020

Changing if (smp::count > 1) to if (false) in dpdk_device::init_port_start() does not work, and the client will always abort:

Aborting on shard 1.
Backtrace:
  0x0000000000b94768
  0x0000000000b50621
  0x0000000000b508ed
  0x0000000000b509b2
  /lib64/libpthread.so.0+0x000000000000f5df
  /lib64/libc.so.6+0x00000000000351f6
  /lib64/libc.so.6+0x00000000000368e7
  0x000000000060fbbf
  0x000000000060fc1a
  0x00000000006653cf
  0x0000000000b4bda8
  0x0000000000b4c0c7
  0x0000000000b7bb35
  0x0000000000b865fb
  0x0000000000b4480d
  /lib64/libpthread.so.0+0x0000000000007e24
  /lib64/libc.so.6+0x00000000000f834c

The seastar-addr2line shows:
[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /home/tianwenjie/ceph/ceph/src/seastar/include/seastar/util/backtrace.hh:56
seastar::backtrace_buffer::append_backtrace() at /home/tianwenjie/ceph/ceph/src/seastar/src/core/reactor.cc:741
 (inlined by) print_with_backtrace at /home/tianwenjie/ceph/ceph/src/seastar/src/core/reactor.cc:762
seastar::print_with_backtrace(char const*) at /home/tianwenjie/ceph/ceph/src/seastar/src/core/reactor.cc:769
sigabrt_action at /home/tianwenjie/ceph/ceph/src/seastar/src/core/reactor.cc:3473
 (inlined by) operator() at /home/tianwenjie/ceph/ceph/src/seastar/src/core/reactor.cc:3455
 (inlined by) _FUN at /home/tianwenjie/ceph/ceph/src/seastar/src/core/reactor.cc:3451
__ftrylockfile at :?
__GI___open_catalog at :?
__sigblock at :?
ceph::__ceph_assert_fail(char const*, char const*, int, char const*) at /home/tianwenjie/ceph/ceph/src/crimson/common/assert.cc:27
ceph::__ceph_assert_fail(ceph::assert_data const&) at /home/tianwenjie/ceph/ceph/src/crimson/common/assert.cc:14
operator() at /home/tianwenjie/ceph/ceph/src/tools/crimson/perf_crimson_msgr.cc:374
 (inlined by) apply at /home/tianwenjie/ceph/ceph/src/seastar/include/seastar/core/apply.hh:36
 (inlined by) apply<(anonymous namespace)::run((anonymous namespace)::perf_mode_t, const (anonymous namespace)::client_config&, const (anonymous namespace)::server_config&)::test_state::Client::connect_wait_verify(const entity_addr_t&)::<lambda(auto:72&)> [with auto:72 = (anonymous namespace)::run((anonymous namespace)::perf_mode_t, const (anonymous namespace)::client_config&, const (anonymous namespace)::server_config&)::test_state::Client]::<lambda()> > at /home/tianwenjie/ceph/ceph/src/seastar/include/seastar/core/apply.hh:44
 (inlined by) apply<(anonymous namespace)::run((anonymous namespace)::perf_mode_t, const (anonymous namespace)::client_config&, const (anonymous namespace)::server_config&)::test_state::Client::connect_wait_verify(const entity_addr_t&)::<lambda(auto:72&)> [with auto:72 = (anonymous namespace)::run((anonymous namespace)::perf_mode_t, const (anonymous namespace)::client_config&, const (anonymous namespace)::server_config&)::test_state::Client]::<lambda()> > at /home/tianwenjie/ceph/ceph/src/seastar/include/seastar/core/future.hh:1647
 (inlined by) operator() at /home/tianwenjie/ceph/ceph/src/seastar/include/seastar/core/future.hh:1226
 (inlined by) run_and_dispose at /home/tianwenjie/ceph/ceph/src/seastar/include/seastar/core/future.hh:504
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /home/tianwenjie/ceph/ceph/src/seastar/src/core/reactor.cc:2151
seastar::reactor::run_some_tasks() at /home/tianwenjie/ceph/ceph/src/seastar/src/core/reactor.cc:2566
seastar::reactor::run_some_tasks() at /home/tianwenjie/ceph/ceph/src/seastar/src/core/reactor.cc:2549
 (inlined by) seastar::reactor::run() at /home/tianwenjie/ceph/ceph/src/seastar/src/core/reactor.cc:2721
seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::{lambda()#3}::operator()() const at /home/tianwenjie/ceph/ceph/src/seastar/src/core/reactor.cc:3888
std::function<void ()>::operator()() const at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/std_function.h:690
 (inlined by) seastar::posix_thread::start_routine(void*) at /home/tianwenjie/ceph/ceph/src/seastar/src/core/posix.cc:60
start_thread at pthread_create.c:?

@avikivity
Copy link
Member

This looks like a ceph failure, not seastar. Of course it can be caused by a seastar bug, but it's not possible for me to diagnose ceph assertion failures.

Maybe you can try reproducing the problem with seastar's httpd (and then trying the dpdk.cc change).

@WJTian
Copy link
Author

WJTian commented May 25, 2020

Actually, the ceph assertion failures is caused by failure of TCP connection in line 374, and it is same error as from previous test when SYN+ACK packet is hashed to wrong shard:

364       seastar::future<> connect_wait_verify(const entity_addr_t& peer_addr) {
365         return container().invoke_on_all([peer_addr] (auto& client) {
366           // start clients in active cores (#1 ~ #jobs)
367           if (client.is_active()) {
368             mono_time start_time = mono_clock::now();
369             client.active_conn = client.msgr->connect(peer_addr, entity_name_t::TYPE_OSD);
370             // make sure handshake won't hurt the performance
371             return seastar::sleep(1s).then([&client, start_time] {
372               if (client.conn_stats.connected_time == mono_clock::zero()) {
373                 logger().error("\n{} not connected after 1s!\n", client.lname);
374                 ceph_assert(false);
375               }
376               client.conn_stats.connecting_time = start_time;
377             });
378           }
379           return seastar::now();
380         });
381       }

In short, changing only if (smp::count > 1) to if (false) in dpdk_device::init_port_start() does not work. May be more changes is needed to verify that it's a hash function mismatch?

@avikivity
Copy link
Member

client connect()s have extra hashing logic:

template <typename InetTraits>
auto tcp<InetTraits>::connect(socket_address sa) -> connection {
    uint16_t src_port;
    connid id;
    auto src_ip = _inet._inet.host_address();
    auto dst_ip = ipv4_address(sa);
    auto dst_port = net::ntoh(sa.u.in.sin_port);

    do {
        src_port = _port_dist(_e);
        id = connid{src_ip, dst_ip, src_port, dst_port};
    } while (_inet._inet.netif()->hw_queues_count() > 1 &&
             (_inet._inet.netif()->hash2cpu(id.hash(_inet._inet.netif()->rss_key())) != this_shard_id()
              || _tcbs.find(id) != _tcbs.end()));

So we may be using the wrong hash function here.

Please try with apps/seawreck, with smp=1 and smp>2, to validate that this is the problem.

@vladzcloudius
Copy link
Contributor

vladzcloudius commented May 28, 2020

Actually, the ceph assertion failures is caused by failure of TCP connection in line 374, and it is same error as from previous test when SYN+ACK packet is hashed to wrong shard:

364       seastar::future<> connect_wait_verify(const entity_addr_t& peer_addr) {
365         return container().invoke_on_all([peer_addr] (auto& client) {
366           // start clients in active cores (#1 ~ #jobs)
367           if (client.is_active()) {
368             mono_time start_time = mono_clock::now();
369             client.active_conn = client.msgr->connect(peer_addr, entity_name_t::TYPE_OSD);
370             // make sure handshake won't hurt the performance
371             return seastar::sleep(1s).then([&client, start_time] {
372               if (client.conn_stats.connected_time == mono_clock::zero()) {
373                 logger().error("\n{} not connected after 1s!\n", client.lname);
374                 ceph_assert(false);
375               }
376               client.conn_stats.connecting_time = start_time;
377             });
378           }
379           return seastar::now();
380         });
381       }

In short, changing only if (smp::count > 1) to if (false) in dpdk_device::init_port_start() does not work. May be more changes is needed to verify that it's a hash function mismatch?

@WJTian In order to see why the code above doesn't work I'll need to see the whole thing. A link to a github repo + branch would do nicely.

As @avikivity have already mentioned you can see how TCP client and servers may be implemented by looking at httpd (server) and seawreck (client) demo apps.

I tested them not long ago with DPDK (+native stack) and I definitely used a multi-queue/multi-shard configuration (I played with both ena and experimental user-virtio backends).

So, there is a good chance that our TCP code is healthy. Although I don't deny for a second that there is always a chance for a bug... ;)

There were issues with a reactor backend however: I had to use --reactor-backend epoll instead of the default linux-aio one.

@WJTian
Copy link
Author

WJTian commented Jun 18, 2020

I just remembered that the device I used is a tap instead of a dpdk(the physical NIC is Mellanox ConnectX-4 Lx), so modification of dpdk_device::init_port_start() should not work. Any ideas to validate that this is a hash function mismatch for tap device?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants