Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid kernel_recvmsg() #21

Closed
fridex opened this issue Apr 23, 2016 · 3 comments
Closed

Avoid kernel_recvmsg() #21

fridex opened this issue Apr 23, 2016 · 3 comments

Comments

@fridex
Copy link
Member

fridex commented Apr 23, 2016

Current implementation uses kernel_recvmsg() for receiving records. This function does copy from skbuff to passed vector (see [1] for TCP, see [2] for UDP), so it would be nice to avoid it.

  • When underlying protocol is TCP, there can be used tcp_read_sock(). Unfortunately the implementation of tcp_read_sock() does not support peeking (see [3]), which is necessary according to current AF_KTLS design.
  • When underlying protocol is UDP, there is currently no such copy-less logic that could be reused (AFAIK).

EDIT: we could consider to operate directly on skbuff

[1] http://lxr.free-electrons.com/source/net/ipv4/tcp.c#L1830
[2] http://lxr.free-electrons.com/source/net/ipv4/udp.c#L1392
[3] http://lxr.free-electrons.com/source/net/ipv4/tcp.c#L1485

@fridex
Copy link
Member Author

fridex commented Apr 25, 2016

I think the best approach would be to:

  • extend tcp_read_sock() with MSG_PEEK flag
  • introduce udp_read_sock() with MSG_PEEK support for UDP

Using directly skbuffs is not nice, since there should be appropriate operations on UDP/TCP sockets to encapsulate such logic (and make it possible to reuse these operations in other parts of the kernel).

@fridex
Copy link
Member Author

fridex commented May 9, 2016

When run "splice echo time" scenario for 2 seconds a simple ping-pong with server [1]:

splice(ksd, NULL, pipe, NULL, 1400, 0);
splice(pie, NULL, ksd, NULL, 1400, 0);
With MTU 1400:

I am getting following results:

  • 44.24% of total time spent in kernel_sendmsg()
    • 38.28% of total time spent in tcp_push - on actual sending
    • 1.15% of total time spent in allocation socket buffers skb_stream_alloc_skb
    • cca 2% on copy from kernel vector (copy_from_iter, memcpy_erms)
  • 33.14% of total time spent in tls_splice_read
    • 13.14% of total time spent in kernel_recvmsg
    • cca 2% on copy and allocation (skb_copy_datagram_iter, copy_page_to_iter)
With MTU 16000:

I am getting following results:

  • 22.29% of total time spent in kernel_sendmsg()
    • 16.30% of total time spent in tcp_push - on actual sending
    • 0.69% of total time spent in allocation socket buffers skb_stream_alloc_skb
    • 3.03% on copy from kernel vector (copy_from_iter, memcpy_erms)
  • 42.25% of total time spent in tls_splice_read
    • 9.02% of total time spent in kernel_recvmsg
    • 4.02 % on copy and allocation (skb_copy_datagram_iter, copy_page_to_iter)

Ideally we could save:

  • for 1400 MTU:
    • cca 2% by avoiding kernel_recvmsg()
    • cca 3.15% by avoiding kernel_sendmsg()
  • for 16000 MTU:
    • 3.72% by avoiding kernel_sendmsg()
    • 4.02% by avoiding kernel_recvmsg()

We have to consider addional logic within kernel_sendmsg() and kernel_recvmsg() (locking, ...). Using kernel_sendpage() and tcp_read_sock() (udp_read_sock()) can have different logic which could have positive/negative impact as well.

perf reporting context switches not expensive at all (0.30% of total)

related: https://github.com/fridex/af_ktls/issues/22

[1] https://github.com/fridex/af_ktls-tool/blob/master/action.c#L795

@djwatson
Copy link
Member

fixed by #62

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants